Friday, January 18, 2013

Spelling, Spelling, Spelling. Facebook Graph Search and Bing Web Search

Continuing with my personal analysis on the Facebook Graph Search and Bing Web Search's integration,  here is another example of why you can get the best of two worlds. In this case, I will discuss misspelled words, a common problem for English (see for instance this funny article: "Why misspelled names are so common and what journalists are doing to prevent them").

Let's suppose that users start to type the word {swarzin}: their intent is to get information about the actor, but they are unsure of the correct spelling. As you see in the below image, Bing’s Speller processes tens of millions of data points mined from searches, web pages, clicks and user actions to help you find the right “Schwarzenegger".

Another example is {terramis} a delicious Italina cake of which the correct spelling is {tiramisu} - technically, {terramis} is a mispelled query prefix  for the full query{tiramisu}. Other mispelled prefixes are {bufet warr} and one of the myriad variants of {britnay s}. In all of these cases, Bing nails the correct suggestions and  are integrated into Facebook typeahead search.

Bing has a significant investment in building a state-of-art speller and those technologies are also leveraged by our team in London in securing the right suggestions. In this context, the additional problem is dealing with the mispelled fragments represented by the query prefix, correct the errors and, at the same time, predict which the more relevant suggestions are, and we need to do this every time new characters are typed in real time!

Can you imagine how fast the predictions and corrections need to be?

1 comment:

  1. Congratulations to you and your team on speller implementation. I would expect the predictions and corrections at the speed of light :). Hope it will be reality in future.

    Worlds data growing exponentially day-to-day. There seems to be lot of requirement for the data scientists and engineers (link below).

    Apart from core algorithmic and data structures skills, what would be your suggestion to existing or upcoming engineers who want to excel in search and data mining/BigData technologies?