Thursday, August 6, 2009

Real Time Query Expansion -- Query Logs vs News

Many Search engines offer a related query suggestion service. For instance, when you search for "Obama" the search engine can suggest the query "Is Obama Muslin?". This happens because both the queries have been submitted very frequently by different users in the same search session. In information retrieval this process is called Query Expansion. A common approach is to extract correlations between query terms by analyzing user logs.

The query log based approach shows its limit when you deal with real time events. In this case, there might be no time to accumulate past queries since events are happening right now. For dealing with real time search query expansion, a new idea is to extract fresh correlation from news events.

For instance, Sonia Sotomayor has been just confirmed to the high court.



Judd Gregg, is one of the supporters



And the algorithm nailed the correlation



Now compare the query suggestion provided by Google, where no correlations are provided since the event is too recent.



And compare with the query suggestion provided by Bing, where related search query log based are shown



I believe that leveraging both past query logs and real time news events can provided a more complete and updated query expansion service, since you leverage the best of both the worlds.



(PS: In addition, please note that both Bing and Ask are showing a related fresh video, while Google is not)

3 comments:

  1. Congratulations to all the people involved! :)

    ReplyDelete
  2. How is the CTR looking? While the algorithm is key, if that is not what users are looking for, the science behind the magic makes no difference.

    ReplyDelete
  3. It depends on different categories and queries.

    For instance, if you submit a query about gossip you may want to explore related entities.

    Suppose that you search "Madonna" and there is a rumor that she has an affair with "Jesus Luz" then you may want to click on his name to explore that relation. This is an ephemeric correlation that may disappear in a while.

    Another example, suppose the you search for "Microsoft" in businnes and they just closed a deal with "Yahoo!". In this case, you may want to click that related company to explore that side of the story.

    CTR would be different for different categories, but the algorithm will capture fresh and real time correlations.

    ReplyDelete