Tuesday, November 29, 2011

Six degree of separation and Facebook study

Recently Facebook and a group of researches from University of Milan, released a super fascinating study stating that there are 4.3 degrees of separation on Facebook Social graph. The study is inspired by the classical Six degree experiment which has been made popular by several movies and scripts. 

One point is not clear to me and maybe someone can help me in understanding better. The classical Six degree experiment states that I can forward the message if I know the target person directly by name. If this is not the case, I can forward the message to someone I believe is closer to the target person. I always understood that there is a non null probability that the man in the middle can drop the message, either because he doesn't know the target or because he is not interested. In my view, this not null probability models my personal interest for the target person and how strong is our relationship. The closer I am, the smaller would be the probability of dropping the message.

On Facebook the fact that I am connected to someone doesn't necessary meaning that I would always send a message to him/her. Maybe, I missed this part in the paper and would love to understand whether the 4.3 degrees of separation is also modelling the fact that a message can be dropped with a non null probability. 

Someone can help me in better understanding?

Sunday, November 27, 2011

Find the k-th element in a list

You don't know the list length. And it should be optimal in time and space.


love this problem, apparently it's easy but you have to find an optimal solution (sublinear and space constant)

Friday, November 25, 2011

Given a string replace any running sequence of character

with the character followed by the number of repetition. This is called run-length-encoding

Thursday, November 24, 2011

Wednesday, November 23, 2011

((X *) 0)+1)

what is this strange thing computing?

Tuesday, November 22, 2011

Monday, November 21, 2011

Green eyes and Blue eyes

In a remote island, people have blue eyes. People with green eyes can sometime appear but they have to leave the island as soon as they realize that they don't have blue eyes. One day a visitor says: "i see that some of you have green eyes". For the sake of simplicity, let's assume that the people can meet just once per day at a public Town Hall. What will happen after the visitor's claim?

Sunday, November 20, 2011

Deleting integers numbers

You have a sequence of N integers and you randomly pick two of them. If they are both odd or both even then you remove them, if they are different you add an additional even number. Can you forecast what would be the final number left?


You need to consider what is the starting N. Is it odd or is it even?

Friday, November 18, 2011

Jumping stairs

There is a stair with N step. You can jump either 1 step or 2 steps at time. How many different ways do you want to reach the top?


N = 1 --> 1
N = 2 ,,  1s+1s, 2s -> 2
N = 3 , 1s+1s+1s, 2s+1s, 1s+2s -> 3
N = 4, 1s+1s+1s+1s, 1s+1s+2s, 1s+2s+1s, 2s+1s+1s, 2s+2s -> 5
N = 5, 1s+1s+1s+1s+1s, 1s+1s+1s+2s, 1s+1s+2s+1s, 1s+2s+1s+1s, 2s+1s+1s+1s, 1s+2s+2s, 2s+1s+2s, 2s+2s+1s = 8

so and why?

Thursday, November 17, 2011

Compute the binomial coefficient

(n k) in an optimal way. What is the complexity?

Cannot just compute it directly because you will overflow. Think about how we optimize Fibonacci computation

Tuesday, November 15, 2011

Given a 0/1 matrix

Transform it in such a way that if M[i][j]=1 then you toggle all the row m[i] and all the column m[j]

Monday, November 14, 2011

Facebook is associating Places to Albums

You know that I am a fan of putting images into locations and that I've shipped a prototype for adding this feature to Facebook. Apparently, Facebook itself is now suggesting to the user to tag albums with geo-locations.

Still I have a request for my Friends working at Facebook. Can you please open the Location API so that one and read and write the album location from an app?

Saturday, November 12, 2011

Editors' Picks

Editors’ picks are a collection of editorially curated sites that support task-based, spiking and seasonal based searches. Shipped by our teams in UK, Poland and US.


Friday, November 11, 2011

Big numbers. sum them

BigInt is a class storing big integers as lists. Implement BigInt and the code to add two BigInts

Thursday, November 10, 2011

Playing with addresses

Detect if a machine is Big Endian or little endian. Detect if the stack will grow up or down

Wednesday, November 9, 2011

Monday, November 7, 2011

Erlang and CSP

When I was at university, I enjoyed studying CSP, ECSP and Occam. How funny to find similar concepts in Erlang -- pretty lovely language, I'm enjoying reading books about it.

Sunday, November 6, 2011

what is this computing?

template<int n> struct funStruct
    enum { val = funStruct::val + funcstruct::val };
template<> struct funStruct<0>
    enum { val = 1 };

template<> struct funStruct<1>
    enum { val = 1 };

Saturday, November 5, 2011

Find the minimum with no comparison operation

Given two integers compare them, with no comparison operator. How many different ways can you suggest?

I have one. others out of there?

Friday, November 4, 2011

Dropping Balls

You are given K balls and a building with N floors. You know that there is a floor X < N such that when you drop the ball it crashes to the ground of the building. Below X it does not crash, Above X it will. What is the minimum number of ball drops you need to determine the point of break.


This is a classical G question and there is a simplified variant.  "There are 100 floors in a building. If you drop a glass ball from a floor down to the ground, it may or may not break. There's this threshold floor where dropping a ball from this floor or the above breaks the ball but dropping a ball from any floor below it won't. Now you are given only two glass balls to locate the threshold floor. How do you find it quickly?"

1st attempt: if you start from binary search then you are going in the wrong direction. K will be O(log N) but the key observation is that K is a constant here and there is the risk that you end your balls with not result. In the above example it's clear that you can consume 2 balls without getting a result with N=100.

2nd attempt: if you have K=1 there is nothing else you can make but a linear search starting from floor 1, and there is the implicit assumption that you can go back to the ground floor collect the surviving ball and redrop it. If you have K=2,  you can start from level 2: if it breaks, you can use the remaining one and go to level k-1 and conclude your test, if it does not break you still have two balls. The key intuition is that you can go up to the K level and if it breaks you can use the remaining one for K-1 attempts. So you can adopt the following approach for selecting the steps

K, K+(K-1), K+(K-1)+(K-2), K + (K-1) + (K-2) + (K-3) -> N = K + (K-1) + (K-2) + .. + 2 +1 = K (K+1)/2

2, 2+1, 2+1+

which gives you a way to get K given N. Now, you can generalize with more than 2 balls

Thursday, November 3, 2011

Circles, Smart Lists are they enough? the case for Topic MessageBoxes

Google+ introduced the idea of sharing your social updates with a dynamically created subset of your friends (e.g your "circles" of friends). This was a kind of evolution of previous Facebook model where the user was allowed to share just with statically predefined groups (e.g, just my direct friends, just my friends of friends, or everyone). Facebook answered to "circle" by presenting "smart lists" which are essentially doing .. exactly the same thing.  Well, sometime I fell like both circles and smart lists are limiting to my ability to share. Let me explain why...

One of my hobby are the "Acquaria" so let's suppose that I want to share an update about this topic. What I need to do? 
  • Is this private? so just for my direct friends? probably not because many of my friends are not sharing this hobby and I don't want to bother them;
  • Is this for my friends of friends? hmm, I would say no because of the same reason;
  • Is this public? certainly it is not, it would increase the noise received by many users and just few of them can see any value in my message;
  • Should I create a circle/smartlist for this topic? Hmm, maybe..  but hey wait! are you saying that I need to create a circle for each topic associated with my status updates. Do you know that creating circles is expensive and boring? I cannot pick all the users i want for each single message -- this is not functional;
What I simply want is to share a message and say that this message is about this topic. If you are interested in the topic you should get the message, if you are not interested in the topic you should not get the message. As simple as this. Plus, you maybe interested in a topic even if you are not (already) my friend  or a friend of my friends. We don't know each other, but you may still see a value in my message and have a good experience.

You are probably following me by now. Suppose that I put my message in a messagebox and there is a label associated to this messagebox describing the topic of the contained messages. I don't have the explicit need of saying who are the recipients of my message I just indicate the messagebox. If you are interested in that topic you just open the messagebox because you already expressed your interest. And I have the freedom of choosing the appropriate messagebox as I like.

There are several interesting problems for this new model based on topic messagesboxes.

First, how do you rank the messages for each messagebox? Well, I envision a combination of personalization techniques and user ranking. If the messages is coming from my friends than this is a strong signal of preference, if a message is coming from a person that I don't know but this person has been already considered authoritative for that topic that this is another strong signal, if a message is coming from a person that I don't know but is authoritative person than this is another strong signal. All those signals can be combined together using some regression model such as Gradient Boosting and can be evaluated according to the interactions generated for each message. Also, I need to define a user ranking based on his previous scores. All of those are interesting ML problems. (yes, Ian this would be your dream i know).

Second, how can I generate the topics associated to the messagesboxes? There are multiple options. I certainly want to have a set of predefined topics (maybe suggested via an autosuggest feature). In addition, I want to give to the user the ability of defining new topics. In any case, I need to rank those topics according to the user past preferences.

Third,  I want to have the freedom to subscribe to specific messageboxes and get updates as soon as new messages arrive. This would be a powerful way to explore the system and get value from the messages even from unknown people. If you think about it, this is also an opportunity of creating new friends.

I believe that you are following me by now. 

So let's repeat the above experiment and let me introduce Maya she has a passion for {romantic movies} and she wants to get as much suggestions as she can about these movies. How can you help her? Circle, Smart lists or Topics?

Wednesday, November 2, 2011

You have 500 bits

At step 0 all the bits are 0
At step 1 you toggle all the bits;
At step 2 you toggle all the even bits;
At step i you toggle all the bits in position k * i, with k an integer number

What are the bits turned on at the end?


take one bit say 26th this is toggled at steps 1, 2, 13, 26
take another one say 81 this is toggled at step 1, 9, 81

can you infer something in common?

Tuesday, November 1, 2011

Innovating the in the search space? is the problem solved?

Search is such a fascinating world. After 15 years for me in this space, there is always the temptation of saying "The problem is solved" or "Already saw that!".  Are you sure?  These days I am asking to myself: "is there a way to innovate and disrupt the space with something completely new?" I have some ideas about that, but before going there let me start from the basics, and show that there is a user need that is currently not satisfied while it should be.

Let's suppose that Bob is searching for the query "Cinema". So, what the search engines are doing nowadays? Well we have a bunch of static ranking functions based on links we found on the web graph, and a bunch of dynamic machine learning techniques which try to improve the experience by leveraging aggregated past users' queries, and we have a bunch of personalization techniques based on queries that Bob submitted in the past. So far so good. Is this enough? Simple answer NO

Why? Because it would be absolutely disruptive to know that Bob is actually 24 years old, that he lives in Baltimore, that he just recently checked in this particular place down town Baltimore, that he has friends that recently saw these films and they liked, that Bob himself read recently this book, that he loves these songs, and so on and so forth. Probably, you are starting to follow me. If not just let me explain myself. 

The fundamental assumption that we are making in search is that a query is a good proxy for representing the user's needs. But that's non necessarily always a correct assumption. Actually, It's the user himself that is a good proxy for the representing the user's need. The user and all the structured data she generates during all her life. Nothing more nothing less. If I know that you are currently in this place, and that you have read (and liked) three weeks ago a book written by Ken Bruen  (say the 2001 fictional crime novel), I would probably return the geographically closest cinema where they are currently representing the film London boulevards, because that's big screen adaptation of the book. You are 24 years old and a lot of users have liked that film in this demographic band.

Probably, you understood the impact of this approach by now. The query "cinema" itself it's not bringing enough context to provide a good experience for that specific user, because the search engine doesn't know that specific user enough. It's the user himself that is bringing this contextual information and it would be great to leverage it.

Now, two observations are in order.

First, current personalization techniques are again making the fundamental assumption that queries are a good proxy for the user. So, they enrich the current query with a context given by queries submitted in the past. This is a good direction to pursue, but it is not as rich to mine as the context that the user can bring with himself due to his past social activities.

Second, a critique that you have with personalization is that you actually don't need it because it's very inexpensive to refine a query if you are not satisfied by the current results. I don't buy this argument because the fundamental goal of a search engine is to help users, and you don't want to submit many queries for getting the desired results. This is particularly true if you are using a mobile device where every single character you type has an higher cost than in the PC environment.

So let's say it again: it's all about the user and her activities.  Those two are the elements of disruption in the next age search space. This is what I think. Now, let's repeat the above experiment and let me introduce Alice. Alice just took her mobile phone and started to write "Gifts". 

You are a next generation search engine. What would be the natural thing to do in order to offer her the best user experience?