Tuesday, February 9, 2010

Towards Recency Ranking in Web Search

Academia and search R&D labs are publishing more and more papers about recenty ranking. I am pretty excited about that since I spent the last 3 years on this topic both in Ask.com and in Bing.com.

Towards Recency Ranking in Web Search
is an high quality paper from Yahoo! about relevancy ranking. The main contribution of the paper is twofold: it presents a query classifier for recency and a ranking model for recent results.

The query classifier builds two models representing the Content and the Query data at time t, respectively. The two models are then compared on different instants of time and a query is considered recent if it increases his probability of being generated in two different istants. This approach is interesting. Nevertheless there are queries that would fresh results, even if they are constantly observed (such as "Obama", "Britney Spears", "stock quotation", etc).

The ranking model aims at learning a ranking function based on four categories of recency-related features: timestamp features, linktime features, webbuzz features and page classification
features. The learning algorithm is GBrank. To solve the recency data insufficiency problem, the authors explored several modeling approaches by utilizing regular ranking data. In compositional model the normal ranking output is used as a training feature, while in over-weighting model the normal ranking output is used with recency features and an emphirical optimal weight is derived. In adaptation model, training data from normal ranking is used for learning a regression tree model, which is then fine-tuned with recency ranking data.

The evaluation set is made up of 70,131 query-url pairs collected during a period of four months (Feb.∼May, 2009) judged by humans and is based on NDGC metrics. One final result is worth mentioning. In the paper, linktime features are the most important recency features among all recency features. Quoting the authors: "Thus, recency is competing with popularity, which is usually indicated by link-based features and click-based features. This leads to the interesting topic on how to appropriately deal with the relationship between recency and popularity"

1 comment: