Tuesday, February 5, 2013

Facebook naive graph search and the need of proper query understanding algos

Search engines use multiple data sources to help users in their query formulation. Among those sources three are very important:
  • The web graph, which includes all the web pages published on the net. Mining the content of those pages and their connections can be very useful for detecting topics/entities and for understanding the correlation among those topics/entities;
  • The query graph, which includes all the queries submitted to a search engine in the past. Mining this graph can be useful for detecting what are the needs expressed by past users and how those needs change over the time;
  • The blog and news graph, which are useful for detecting fresh content;
Facebook is very interesting for search because they have access to other sources of data including:
  •  The friend graph, which keeps track of all the friendship relations for each user;
  •  The content graph, which includes all the postings and the content produced by your friends.
However, it seems like Facebook lacks quality access to the first three types of information. Here there are some examples:

Freshness
Today, the most important financial news is Dell's acquisition and every single newspaper or financial web site report the news. Bing (and Google) both have this information in their suggestion, while Facebook is completely missing it. It seems like they are not analyzing the news graph (3rd source in the above list)


In the below example, the last two suggestions are provided by Bing. Facebook just tries to complete the prefix {microsoft and} with stuff such as {London} and {Pisa}, which are popular in my social graph but have NO correlation with Microsoft and with today's news.



Query understanding
So, If you spend sometime analyzing Facebook suggestion it's clear that they create synthetic suggestions trying to complete every query prefix with stuffs popular in my social graph. This approach is naive because they DON'T seem to understand the meaning of what the user is looking for. Here there are some more examples:


Again, "Happy Diwali", "Spotify", "Microsoft" and "London" are popular in my social graph but they have NO correlation with Tom Cruise. Still, Facebook is naively generating those suggestions.


A proper search engine understands the right correlation by analyzing the Web graph, the query graph and the news graph something that Facebook is missing at the moment.


There are many web sites out there which clearly say that tom cruise is correlated with {scientology}, {katie holmes} (his wife),  {suri} (his daughter), and {cameron diaz} (since they performed in many films together), and {nicole kidman} (his past wife). However, Facebook does not capture the intent of user query and does not understand the correlation among those entities. Let's see another example:


Last suggestion is from Bing, while Facebook keeps suggesting the same {Happy Diwali} stuff which is popular in my social graph. Again, they are missing a quality access to the 2nd and 3rd sources of information in my above list.


Bing (and Google) analyse the Web graph, the query graph, the news graph and many other sources and therefore they detect the right correlations among entities (strangely enough Google misses the correlation between {Obama} and {Michelle}).


Please note that if you try the above examples you may get different suggestions from Facebook because those are personalized with stuff popular in your social graph. However, it's easy to reproduce similar naive suggestions when you understand the pattern.


2 comments:

  1. Antonio, working on the project was really paintfull for anybody with IR, ML, and datamining background who isn't crazy about empty "graph" and "social" rhetoric :). I think you understand my decision better now.

    ReplyDelete
  2. One thing you are missing is that Facebook Graph Search is designed for searching your social network:

    It doesn't make sense to search your social graph for "Tom Cruise and Katie Holmes divorce" or "Obama gun control", while it makes perfect sense for web search.

    You are comparing two classes of very different products.

    I agree with you, however, that the Graph search suggestions are not very satisfying (yet).

    ReplyDelete