## Tuesday, November 29, 2011

### Six degree of separation and Facebook study

Recently Facebook and a group of researches from University of Milan, released a super fascinating study stating that there are 4.3 degrees of separation on Facebook Social graph. The study is inspired by the classical Six degree experiment which has been made popular by several movies and scripts.

One point is not clear to me and maybe someone can help me in understanding better. The classical Six degree experiment states that I can forward the message if I know the target person directly by name. If this is not the case, I can forward the message to someone I believe is closer to the target person. I always understood that there is a non null probability that the man in the middle can drop the message, either because he doesn't know the target or because he is not interested. In my view, this not null probability models my personal interest for the target person and how strong is our relationship. The closer I am, the smaller would be the probability of dropping the message.

On Facebook the fact that I am connected to someone doesn't necessary meaning that I would always send a message to him/her. Maybe, I missed this part in the paper and would love to understand whether the 4.3 degrees of separation is also modelling the fact that a message can be dropped with a non null probability.

Someone can help me in better understanding?

1. My understanding (as usually uneducated :) in the FB case these "degrees" are simply an statistical estimation of the graph diameter and nothing else.

2. I can! Actually, Milgram's experiment was for us only a source of inspiration, but clearly what we did and what Milgram did are quite different things (we try to make this clear straight from the introduction).
If you read carefully Milgram's work, it seems evident (at least, to me) that what he WANTED to do was actually measuring shortest-path distances in an acquaintance graph: let me (at least for the time being) assume that we all agree on what "acquaintance" means and let us think that "acquaintance" in life is the same as in fb.
In this view, what we did is precisely what Milgram wanted to do.

Reading his papers (and the papers that inspired his work, in particular the one by Rapoport and Horvitz), it is clear that his "experiment" was just an attempt to circumvent the problem of not being able to access acquaintance information directly.
This way, he obtained a completely different measure, that I may call "routing distance" and that is, in a sense, much more interesting than simple "shortest-path distance". This is what you are probably thinking of in your note.

The mathematical relation between the two measures is clear: ours is a lower bound for his. The practical implications of the two measures are anyway completely different. The problem with Milgram's experiment in the proper sense is that I don't see a way to replicate it without the intervention of human beings, or without a firm mathematically precise notion of "knowledge" or "information" that we currently miss.

There is still one point that is worth being discussed: the existence non-completed chains.
In our setting, non-completed chains do not exist, at least as long as the graph is connected; in Milgram's experiment most of the chains where uncompleted (more than 2/3). He spends many pages to try to understand why those chains stopped at some point. Well, the bare truth is that we will never know, but the explanation is probably of larger interest to sociologists than it is to me. I can make thousands wild guesses, probably all of them wrong.

3. Paolo, thanks a lot for your detailed explanation. I agree that your measure is the shortest path distance in the acquaintance graph, and this can be also seen as the 'routing distance' in this graph (btw similar to OSPF routing protocol).

Probably my understanding of Milgram's work is just uneducated or driven by the empirical intuition that there is a non zero cost c of carrying the message, or a non zero probability p to drop it. Sometime you don't know who can be a potential target, or a good man in the middle for routing the message, sometime you are too lazy to go to the post office to send the letter, sometime you don't have a stamp, sometime you don't want to disturb your near-friends. In these cases, I would model a reset probability that in case of drop would restart and inject a new message in the system. In other words, there is some measure of Uncertancy that would make the model closer to the intuition.

Think about routing a message in LinkedIn and asking to one of your friend to forward the message to someone you want to contact and they know directly. They may drop the message and you need to start again.

In any case, I understand that this would give a different measure and your measure is actually a lower bound of it.

4. Paolo: Yes, actually this is a realistic model of what probably happens in reality; it would be cool to find some way to experiment how close it is to what actually takes plac

5. This smells like a mix of random walk with restart (reset probability), plus a non uniform jump distribution modelling how close I estimate are my target and the men in the middle (P). The probability of fwd the message can be probably described by using a sigmoid function like in neural networks (the activation intuition is probably the same). Just my 2cents. Btw, Max is the guy for estimating P. I know, I see a ranking problem everywhere