Just got assigned a patent filed back in 2005, "Sampling internet user traffic to improve search results". The problem we were trying to address was about bootstrapping a learning to rank system in absence of user search information. This is a typical problem you have when you are not the incumbent search engine, and you don't have already accumulated usage information and user behaviour activities (see more information here Calculating Search Rankings with User Web Traffic Data). How can you compensate this bootstrap situation?
The key intuition was to use web traffic information collected by the way web proxies and observing user traffic and navigational information, including traffic performed querying other search engines. A similar traffic can be observed by minging a collection of web logs for the HTTP_Referer tag, The methodology was used for improving the freshness, the coverage, the ranking and the clustering of search engine results and, more generically, may include monitoring web traffic on remote web servers on the communications network