The paper discusses many off-line pre-processing techniques for extracting entities before query time. A mixed approach based on trie pattern-matching and svm classification is proposed. Extracted entities are then ranked by taking into account proximity, frequency count and relevance of documents. For a given query, a vertical result is then triggered when 'good enough' entities are retrieved. A case study based on product search vertical and Microsoft Live search is discussed. Entities are extracted from Wikipedia, Trec QA, and IMDB.
The approach is quite effective and the performances are pretty good. Anyway, it can show some limits of real time verticals (such as Twitter or News) where entities are not known a priori.

Antonio, the link to the paper doesn't appear to be working. Would you send me a PDF copy (if you have one)?
ReplyDelete