Sunday, March 13, 2011

From objects to subjects: Next possible scenarios for Social search

We are living an age of amazing transformation for Web search, which was very hard to forecast until two or three years ago. In a sense, search is a teenager and it is about to become an adult. A teenager plays with objects, an adult interacts with other subjects.

Search was born in 1995 circa with Lycos, an open source project from CMU. Do you remember "pursuit"? It was a mix of perl script and c code which offered (almost) everything a modern engine is offering including crawling, index, search, dynamic caption generation, and textual ranking. Well, the Web was kind of small at that time -- say a less than 10 millions of pages. So, when the Web became larger we saw the raise of Altavista, which added the ability to scale to an index of hundred of millions of pages. Anyway, the quality of search was still poor because the ranking was purely a textual match. Everything changed in 1999 when Google started to consider the Web as a graph with connections among pages. Pagerank was an idea already used in the academic world, where a paper is important if it is frequently cited by other (important) papers, but the wonderful intuition of Google was to apply the same principle to the Web. In doing so, they created a new type of science. Also, Google had the superb merit of understanding how to make money out of search, when everyone else was considering it as a commodity with no real business opportunities. They introduced an auction system and created an ecosystem with users, publishers & advertisers, and a new science (of making money). In a glimpse Google became a verb, a monopoly, and an incentive to reduce our attitude to learn and reason ("People don't think, they Google!", said my friend Alessio Signorini). Nowadays, Bing and Yahoo! have the merit of fighting the monopoly and using massive machine learning for systematically improving the quality of results, which are in addition more and more visual.

So in 1995 we had almost everything, with the only exception of scale (altavista), a good ranking (google), money (google), machine learning & visual ux (bing). If you think about it in 16 years we saw many changes but no one was dramatic (maybe the only one was to start making money, meaning that the search kiddie understood how to walk on her own legs). More or less, no change was dramatic. Why? Well the Web was still the same. It was a collection of objects (pages, images, videos, news and so on and so forth). Surely, we moved from from say a couple of millions of objects to hundreds of billions of objects (or trillions? ;-) and therefore the data mining problems to solve are fascinating and ever changing but still we are talking about objects.

Now, two years ago something changed. The web is no longer a collection of objects, but we are starting to see people, subjects, organized in a social network. Facebook counts more than 600 millions of subjects, connected in fascinating structure and very active in making instantaneous comments about the world, sharing objects, and liking or disliking actives carried out by other subjects. Twitter offers similar perspective.

Think about it: this is a complete change of search paradigm! You no longer search just objects, you can know start to interact with subjects.

Phase I

If you look carefully, this change is already starting to happen. Today, if you search Web pages (a type of object) in Bing the results are enriched with information coming from Facebook about the subject that produced an action related to that page (or that object). This action can be either sharing that particular link or liking that particular page. In this case, Bing offers a picture of the person -- the subject -- who carried out that particular action. Let's call this phase one.

Phase II

What can happen in the future? The limit is just in our imagination. If you think in terms of objects and subjects, you understand that, in phase I, search is annotating objects with subjects making actions on objects. But it is still a search about objects. Now what is the next step? How about a search directly mixing objects and subjects? In other words, can i submit a query and get as result the best subject, the best expert about that query? My friend Tony is my expert for handcrafted lights, and my friend Max doesn't know this. Can Max get Tony as a result of a query for {handcrafted lights}, in addition to the most relevant sites out of there? Think about it, this could easily be a new type of science where we learn the expertise of subjects by observing what types of objects they produce and what types of object they annotate? Once we have learned this information can we propagate it to the graph of interconnection among subjects. Sometime, we can learn expertise about friends that we were not aware of (my friend Ian is an expert in cycling and I was not aware of it). Sometime we can introduce friends that don't know each others once we discover their expertize. Sometime we can produce a set of subjects (e.g. persons) as result of a query.

Also, we clearly need a robust and consistent model for ranking subjects. We probably have many persona, including producers, early adopters, influential, and the mass. For each topic, we can follow the temporal evolution of likes and create an evolving ranking.

So, phase II would be about moving from a Web of Objects to a Web of Objects and Subjects.

Is this the evolution of the adolescent search into a more mature subject?

Phase III

Now let's assume that we have a Web of Subjects and Objects. How can this impact the art of making money? Can we sell the expertise of subjects and transform the ads space?

In my idea, once you discover expertise by observing public behaviors you can match different subjects and allow them to compete in selling their expertise with an auction based mechanism similar to the one adopted for ads.

Sometime, two matching subjects agree to exchange expertise just for free because they are friends or because they enjoy helping each others. Sometime, when the expertise is more professional and become a Service offered by Subjects, then several experts can compete for offering their expertise at the best price. In this case, the matchmaker takes a little margin for the cost of this transaction. As an example, my friend Ralf is an expert of non-bayesan prediction, and my friend Bret may be not aware of it. So Bret my want to talk with Ralf about his expertise.

So, phase III would be about moving from a Web of Objects to a Web of Objects, Subjects, and Services


Web is evolving from a collection of objects (pages, news, images, videos, and so on and so forth) into a collection of objects and subjects interacting each others and eventually offering services. In my view, Social search is all about this shift in paradigm.

Now, let's step back for a little while and let's ask to ourselves is this a generalization of a model seen elsewhere? I just said: "Pagerank was an idea already used in the academic world, where a paper is important if it is frequently cited by other (important) papers, but the wonderful intuition of Google was to apply the same principle to the Web".

Think about it: in academic world a paper, an object, is important if it is frequently cited by other (important) papers or by other (important) authors, the subjects. This is the natural evolution of PageRank. In addition, an author, a subject, is important if other authors are frequently citing her. More about this point. We can learn the expertise of an author by observing the papers she publishes, We can match authors even if they don't know each others once we learn about their expertise, We can learn about new expertise of authors we already know, and so on and so forth.

In short, the paradigm shift exists elsewhere and we just need to apply it to Social Search. Google focused on the objects (the papers, the pages), Social search is about objects and subjects (the authors and their expertise).


The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of Bing, Microsoft. Examples of analysis performed within this article are only examples. Assumptions made within the analysis are not reflective of the position of Bing, Microsoft.

I wrote this article during 1.5 hour flight while listening Greatest, Duran Duran


  1. Other related postings are