"Understanding User's Query Intent with Wikipedia" is a nice paper from Microsoft. It explains how to classify Web queries in order to trigger results from different verticals (such as news, images, video, travel, shopping).
Each category is bootstrapped with few keywords chosen by editors. These descriptions are then automatically expanded using a random walk on wikipedia's categories and concepts. The semantic concepts are then extracted by using Gabrilovich's esa. Results are provided on Live search query log for July 2007 (which has just ~2.6M frequent queries). Precision, Recall and F1 measures are quite impressive and this generic solution can compete with the best ad-hoc KDD2005 classification competition result.
Query classification is a very important topic for Search Engines, and leveraging Wikipedia is definitevely a good idea. (see also my previous posting for Yahoo's query classification)
No comments:
Post a Comment