Saturday, December 17, 2011


Openlayer offers a programmatic way for accessing bing, google, yahoo and other maps system with one javascript api. I am using it for my next playground project.

Friday, December 16, 2011

Points of Interest

Openstreetmap has a collection of 2.5M points of interest distributed world wide. I am using them for my next playground project.

Wednesday, December 14, 2011

Check if a list of characters is palidrome

It would be easy to compare it if you have an array, some additional difficulties because you have a list. Not that much.

Given a string s and a regular expression r match the regular expression

Supported operators :

  •  ^, match the beginning of the string
  • $, match the end of string
  • * match a sequence of characters
  • [a-z][A-Z] matches the correspondent character
  • Assume that there are no parenthesis
this is an interesting application of recursion

Tuesday, December 13, 2011


Find the maximum sum of a sub-sequence from an positive integer array where any two numbers of sub-sequence are not adjacent to each other in the original sequence

Max(i) = max(Ni + Max(i+2), Max(i + 1)) foreach i

Monday, December 12, 2011

Compute n to the power of m

You can make it in log (m) and that's a nice thing. X^8= (X^2)^4 so i can actually reduce the size by half at each step. Anyway if n is odd there is the need of an additional multiplication X^9=X * (X^8).

Compute (n k)

The solution to this should be recursive and similar to the classical fibonacci one,
given that (n k) = (n-1 k-1) (n-1 k)

Sunday, December 11, 2011

A couple of new features to PicPlacer

Mario added a couple of new features to PicPlacer (I wanted to have more time to work on this myself). Now it's possible to tag a whole album with a location and to search locations.

Saturday, December 10, 2011

A robot is moving in a rectangular board

It can move either down or right and the board is N x M. How many path does the robot have?


Steps are N+M and we can chose N, so ( N+M  N) is the binomial factor we are looking for.

Thursday, December 8, 2011

Design a smart pointer in C++

Solution Draft:

reference counter should be shared among all the instance of the pointers referring to the same object. So it's better to dynamically allocate a pointer.

Tuesday, December 6, 2011

Generate all the valid parenthesis for a string whose length is n


- left parenthesis ( is valid if we have any left
- right parenthesis ) is valid if we have any left and there is no syntax error

An interesting problem is to validate a string made of parenthesis

Monday, December 5, 2011

Generate all the subsets of a given set

This could be tricky unless you realize that given an set of size n there are 2^n subsets. An element can be in or out, so 0/1 and this means that you just need to generate all the binary numbers from 0  to n. The binary representation of each number will tell you whether or not you need to include the element (bit i = 1 means element in).

Sunday, December 4, 2011

Attaching a wall to physical locations: friendships is about sharing common experiences over the time

How many times you go to a museum, see a painting and wanted to express your opinion about it? How many times you go to a new restaurant, have a good dinner and wanted to leave a good comment about the chef? How many times you enjoy your night in that new pub and wanted to say this in public and not just to your friends. How many times you are in a physical place and wanted to know what are the nice places to see in the nearby, and wanted read the opinions of the people who have been there before?

Imagine a mashup between Bing (or Google) Maps and Facebook Wall. You can attach a Wall to any physical location in the map and you can immediately start commenting and leave your opinion in these walls. You go to see a movie and you can get the comments of the people who have been in the cinema before, and leave yours if you like. You just want to find good places for an hangout, well you take your mobile and the map will suggest where to go given the experiences of past people. Anyone, not just your already known friends.

This will be an extension of current Facebook Wall.

This is my current playground project. Reattach Walls to physical locations, and give the opportunity to different people to get in touch just because they share an experience in place and not because they are already friends. They don't need to be there at the same time, they just need to share the place.

Saturday, December 3, 2011

place n queens on a chessboard

typical recursive solution where we tentatively put a queen, if this doesn't violate conditions in column i. Then continue in  submatrix i+1, i+1. Then remove the queen. the important thing is that we accumulate the intermediate result.

Generate all the permutations of a given set.

This can be solved recursively.

{a1} -> a1
{a1, a2} -> {a1, a2}, {a2, a1}
{a1, a2, a3} -> {a1, a2, a3}, {a1, a3, a2}, {a3, a1, a2} , {a2, a1, a3}, {a2, a3, a1}, {a3, a2, a1}

So you can generate f(n) given f(n-1). pretty easy

Friday, December 2, 2011

Mixing reality with other sources: can I have Facebook there?

Wikitude and Layar are two lovely apps for your mobile. Both of them use your physical location and provide information about what is in the nearby including Wikipedia pages, Images from Panoramio, Videos from youtube and many other additional layers.

The only thing that is missing is a layer with the Facebook pages describing closer places or closer shops. I believe that this application can be useful to monetize in the mobile world. If I am in a mobile environment my location is my query. There is no need to submit a query based on keywords to get some information in return. Similarly, there is no need to subscribe a feed. I can just use my location and get what is interesting around me.

Thursday, December 1, 2011

Tuesday, November 29, 2011

Six degree of separation and Facebook study

Recently Facebook and a group of researches from University of Milan, released a super fascinating study stating that there are 4.3 degrees of separation on Facebook Social graph. The study is inspired by the classical Six degree experiment which has been made popular by several movies and scripts. 

One point is not clear to me and maybe someone can help me in understanding better. The classical Six degree experiment states that I can forward the message if I know the target person directly by name. If this is not the case, I can forward the message to someone I believe is closer to the target person. I always understood that there is a non null probability that the man in the middle can drop the message, either because he doesn't know the target or because he is not interested. In my view, this not null probability models my personal interest for the target person and how strong is our relationship. The closer I am, the smaller would be the probability of dropping the message.

On Facebook the fact that I am connected to someone doesn't necessary meaning that I would always send a message to him/her. Maybe, I missed this part in the paper and would love to understand whether the 4.3 degrees of separation is also modelling the fact that a message can be dropped with a non null probability. 

Someone can help me in better understanding?

Sunday, November 27, 2011

Find the k-th element in a list

You don't know the list length. And it should be optimal in time and space.


love this problem, apparently it's easy but you have to find an optimal solution (sublinear and space constant)

Friday, November 25, 2011

Given a string replace any running sequence of character

with the character followed by the number of repetition. This is called run-length-encoding

Thursday, November 24, 2011

Wednesday, November 23, 2011

((X *) 0)+1)

what is this strange thing computing?

Tuesday, November 22, 2011

Monday, November 21, 2011

Green eyes and Blue eyes

In a remote island, people have blue eyes. People with green eyes can sometime appear but they have to leave the island as soon as they realize that they don't have blue eyes. One day a visitor says: "i see that some of you have green eyes". For the sake of simplicity, let's assume that the people can meet just once per day at a public Town Hall. What will happen after the visitor's claim?

Sunday, November 20, 2011

Deleting integers numbers

You have a sequence of N integers and you randomly pick two of them. If they are both odd or both even then you remove them, if they are different you add an additional even number. Can you forecast what would be the final number left?


You need to consider what is the starting N. Is it odd or is it even?

Friday, November 18, 2011

Jumping stairs

There is a stair with N step. You can jump either 1 step or 2 steps at time. How many different ways do you want to reach the top?


N = 1 --> 1
N = 2 ,,  1s+1s, 2s -> 2
N = 3 , 1s+1s+1s, 2s+1s, 1s+2s -> 3
N = 4, 1s+1s+1s+1s, 1s+1s+2s, 1s+2s+1s, 2s+1s+1s, 2s+2s -> 5
N = 5, 1s+1s+1s+1s+1s, 1s+1s+1s+2s, 1s+1s+2s+1s, 1s+2s+1s+1s, 2s+1s+1s+1s, 1s+2s+2s, 2s+1s+2s, 2s+2s+1s = 8

so and why?

Thursday, November 17, 2011

Compute the binomial coefficient

(n k) in an optimal way. What is the complexity?

Cannot just compute it directly because you will overflow. Think about how we optimize Fibonacci computation

Tuesday, November 15, 2011

Given a 0/1 matrix

Transform it in such a way that if M[i][j]=1 then you toggle all the row m[i] and all the column m[j]

Monday, November 14, 2011

Facebook is associating Places to Albums

You know that I am a fan of putting images into locations and that I've shipped a prototype for adding this feature to Facebook. Apparently, Facebook itself is now suggesting to the user to tag albums with geo-locations.

Still I have a request for my Friends working at Facebook. Can you please open the Location API so that one and read and write the album location from an app?

Saturday, November 12, 2011

Editors' Picks

Editors’ picks are a collection of editorially curated sites that support task-based, spiking and seasonal based searches. Shipped by our teams in UK, Poland and US.

Friday, November 11, 2011

Big numbers. sum them

BigInt is a class storing big integers as lists. Implement BigInt and the code to add two BigInts

Thursday, November 10, 2011

Playing with addresses

Detect if a machine is Big Endian or little endian. Detect if the stack will grow up or down

Wednesday, November 9, 2011

Monday, November 7, 2011

Erlang and CSP

When I was at university, I enjoyed studying CSP, ECSP and Occam. How funny to find similar concepts in Erlang -- pretty lovely language, I'm enjoying reading books about it.

Sunday, November 6, 2011

what is this computing?

template<int n> struct funStruct
    enum { val = funStruct::val + funcstruct::val };
template<> struct funStruct<0>
    enum { val = 1 };

template<> struct funStruct<1>
    enum { val = 1 };

Saturday, November 5, 2011

Find the minimum with no comparison operation

Given two integers compare them, with no comparison operator. How many different ways can you suggest?

I have one. others out of there?

Friday, November 4, 2011

Dropping Balls

You are given K balls and a building with N floors. You know that there is a floor X < N such that when you drop the ball it crashes to the ground of the building. Below X it does not crash, Above X it will. What is the minimum number of ball drops you need to determine the point of break.


This is a classical G question and there is a simplified variant.  "There are 100 floors in a building. If you drop a glass ball from a floor down to the ground, it may or may not break. There's this threshold floor where dropping a ball from this floor or the above breaks the ball but dropping a ball from any floor below it won't. Now you are given only two glass balls to locate the threshold floor. How do you find it quickly?"

1st attempt: if you start from binary search then you are going in the wrong direction. K will be O(log N) but the key observation is that K is a constant here and there is the risk that you end your balls with not result. In the above example it's clear that you can consume 2 balls without getting a result with N=100.

2nd attempt: if you have K=1 there is nothing else you can make but a linear search starting from floor 1, and there is the implicit assumption that you can go back to the ground floor collect the surviving ball and redrop it. If you have K=2,  you can start from level 2: if it breaks, you can use the remaining one and go to level k-1 and conclude your test, if it does not break you still have two balls. The key intuition is that you can go up to the K level and if it breaks you can use the remaining one for K-1 attempts. So you can adopt the following approach for selecting the steps

K, K+(K-1), K+(K-1)+(K-2), K + (K-1) + (K-2) + (K-3) -> N = K + (K-1) + (K-2) + .. + 2 +1 = K (K+1)/2

2, 2+1, 2+1+

which gives you a way to get K given N. Now, you can generalize with more than 2 balls

Thursday, November 3, 2011

Circles, Smart Lists are they enough? the case for Topic MessageBoxes

Google+ introduced the idea of sharing your social updates with a dynamically created subset of your friends (e.g your "circles" of friends). This was a kind of evolution of previous Facebook model where the user was allowed to share just with statically predefined groups (e.g, just my direct friends, just my friends of friends, or everyone). Facebook answered to "circle" by presenting "smart lists" which are essentially doing .. exactly the same thing.  Well, sometime I fell like both circles and smart lists are limiting to my ability to share. Let me explain why...

One of my hobby are the "Acquaria" so let's suppose that I want to share an update about this topic. What I need to do? 
  • Is this private? so just for my direct friends? probably not because many of my friends are not sharing this hobby and I don't want to bother them;
  • Is this for my friends of friends? hmm, I would say no because of the same reason;
  • Is this public? certainly it is not, it would increase the noise received by many users and just few of them can see any value in my message;
  • Should I create a circle/smartlist for this topic? Hmm, maybe..  but hey wait! are you saying that I need to create a circle for each topic associated with my status updates. Do you know that creating circles is expensive and boring? I cannot pick all the users i want for each single message -- this is not functional;
What I simply want is to share a message and say that this message is about this topic. If you are interested in the topic you should get the message, if you are not interested in the topic you should not get the message. As simple as this. Plus, you maybe interested in a topic even if you are not (already) my friend  or a friend of my friends. We don't know each other, but you may still see a value in my message and have a good experience.

You are probably following me by now. Suppose that I put my message in a messagebox and there is a label associated to this messagebox describing the topic of the contained messages. I don't have the explicit need of saying who are the recipients of my message I just indicate the messagebox. If you are interested in that topic you just open the messagebox because you already expressed your interest. And I have the freedom of choosing the appropriate messagebox as I like.

There are several interesting problems for this new model based on topic messagesboxes.

First, how do you rank the messages for each messagebox? Well, I envision a combination of personalization techniques and user ranking. If the messages is coming from my friends than this is a strong signal of preference, if a message is coming from a person that I don't know but this person has been already considered authoritative for that topic that this is another strong signal, if a message is coming from a person that I don't know but is authoritative person than this is another strong signal. All those signals can be combined together using some regression model such as Gradient Boosting and can be evaluated according to the interactions generated for each message. Also, I need to define a user ranking based on his previous scores. All of those are interesting ML problems. (yes, Ian this would be your dream i know).

Second, how can I generate the topics associated to the messagesboxes? There are multiple options. I certainly want to have a set of predefined topics (maybe suggested via an autosuggest feature). In addition, I want to give to the user the ability of defining new topics. In any case, I need to rank those topics according to the user past preferences.

Third,  I want to have the freedom to subscribe to specific messageboxes and get updates as soon as new messages arrive. This would be a powerful way to explore the system and get value from the messages even from unknown people. If you think about it, this is also an opportunity of creating new friends.

I believe that you are following me by now. 

So let's repeat the above experiment and let me introduce Maya she has a passion for {romantic movies} and she wants to get as much suggestions as she can about these movies. How can you help her? Circle, Smart lists or Topics?

Wednesday, November 2, 2011

You have 500 bits

At step 0 all the bits are 0
At step 1 you toggle all the bits;
At step 2 you toggle all the even bits;
At step i you toggle all the bits in position k * i, with k an integer number

What are the bits turned on at the end?


take one bit say 26th this is toggled at steps 1, 2, 13, 26
take another one say 81 this is toggled at step 1, 9, 81

can you infer something in common?

Tuesday, November 1, 2011

Innovating the in the search space? is the problem solved?

Search is such a fascinating world. After 15 years for me in this space, there is always the temptation of saying "The problem is solved" or "Already saw that!".  Are you sure?  These days I am asking to myself: "is there a way to innovate and disrupt the space with something completely new?" I have some ideas about that, but before going there let me start from the basics, and show that there is a user need that is currently not satisfied while it should be.

Let's suppose that Bob is searching for the query "Cinema". So, what the search engines are doing nowadays? Well we have a bunch of static ranking functions based on links we found on the web graph, and a bunch of dynamic machine learning techniques which try to improve the experience by leveraging aggregated past users' queries, and we have a bunch of personalization techniques based on queries that Bob submitted in the past. So far so good. Is this enough? Simple answer NO

Why? Because it would be absolutely disruptive to know that Bob is actually 24 years old, that he lives in Baltimore, that he just recently checked in this particular place down town Baltimore, that he has friends that recently saw these films and they liked, that Bob himself read recently this book, that he loves these songs, and so on and so forth. Probably, you are starting to follow me. If not just let me explain myself. 

The fundamental assumption that we are making in search is that a query is a good proxy for representing the user's needs. But that's non necessarily always a correct assumption. Actually, It's the user himself that is a good proxy for the representing the user's need. The user and all the structured data she generates during all her life. Nothing more nothing less. If I know that you are currently in this place, and that you have read (and liked) three weeks ago a book written by Ken Bruen  (say the 2001 fictional crime novel), I would probably return the geographically closest cinema where they are currently representing the film London boulevards, because that's big screen adaptation of the book. You are 24 years old and a lot of users have liked that film in this demographic band.

Probably, you understood the impact of this approach by now. The query "cinema" itself it's not bringing enough context to provide a good experience for that specific user, because the search engine doesn't know that specific user enough. It's the user himself that is bringing this contextual information and it would be great to leverage it.

Now, two observations are in order.

First, current personalization techniques are again making the fundamental assumption that queries are a good proxy for the user. So, they enrich the current query with a context given by queries submitted in the past. This is a good direction to pursue, but it is not as rich to mine as the context that the user can bring with himself due to his past social activities.

Second, a critique that you have with personalization is that you actually don't need it because it's very inexpensive to refine a query if you are not satisfied by the current results. I don't buy this argument because the fundamental goal of a search engine is to help users, and you don't want to submit many queries for getting the desired results. This is particularly true if you are using a mobile device where every single character you type has an higher cost than in the PC environment.

So let's say it again: it's all about the user and her activities.  Those two are the elements of disruption in the next age search space. This is what I think. Now, let's repeat the above experiment and let me introduce Alice. Alice just took her mobile phone and started to write "Gifts". 

You are a next generation search engine. What would be the natural thing to do in order to offer her the best user experience?

Monday, October 31, 2011

Playing with languages, part I

Sometime ago I asked to my friend Charisma why she was studying Italian and she said that she wanted to learn Spanish. I found the answer somehow confusing because the two languages are different. The point is that I missed to understand the main point ;-) You can actually learn and improve a language  if you study different languages.

These days I am enjoying reading  "Seven languages in seven weeks" a book which compares  Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell. You can learn things such as Concurrency based on Actors in Ruby, or mixing OO and Functional programming in Scala, higher order programming in Haskel, logic programming in Prolog, having a language based on prototype patterns and messages as in Io, currying and clojures and many other things. Definitively a recommended reading.

The more I read, the more I understand and improve my C++. So Charisma you are right, you can actually study Italian for improving Spanish. It took me a while to understand that.

Sunday, October 30, 2011

Can we use Facebook messages to predict how the stock market is going?

This would be an interesting Machine Learning problem on a massive dataset. Can we use Facebook's messages to predict how the stock and bond market will evolve? Here a couple of examples of the raw dataset.

Web server performance

A web server W is running a service S which provides information I when it receive queries Q. How can you guarantee a time < s for the 95 percentile?

Friday, October 28, 2011

Two colors

Given a undirect graph G=(N, E) a two colour scheme assigns to each node either the colour black or white. A node is diverse if it has the opposite colour of at least half of its neighbours. How can you assign a diverse colour to all the nodes in a graph?

tricky one

Wednesday, October 26, 2011

Assign employees to bosses

You are given a list of employees and a list of potential bosses. Each employee expresses his ranked list of bosses' preferences. Each boss expresses is ranked list of employees' preferences. Match the two classes so that the maximum number of preferences is satisfied.

Tuesday, October 25, 2011

Got assigned a new patent: how to bootstrap a learning to rank system with real traffic information

Just got assigned a patent filed back in 2005, "Sampling internet user traffic to improve search results". The problem we were trying to address was about bootstrapping a learning to rank system in absence of user search information. This is a typical problem you have when you are not the incumbent search engine, and you don't have already accumulated usage information and user behaviour activities (see more information here Calculating Search Rankings with User Web Traffic Data). How can you compensate this bootstrap situation?

The key intuition was to use web traffic information collected by the way web proxies  and observing user traffic and navigational information, including traffic performed querying other search engines. A similar traffic can be observed by minging a collection of web logs for the HTTP_Referer tag,  The methodology was used for improving the freshness, the coverage, the ranking and the clustering of search engine results and, more generically,  may include monitoring web traffic on remote web servers on the communications network

Sunday, October 23, 2011

Friday, October 21, 2011

Find a way to detect whether two dna strings are similar

One way of doing this is to find the longest common subsequence, since we can allow to have to strings that are also having different genes interleaved. So suppose you have {ACGGAGGGAA} and {ACGAAGG} what is the longest common subsequence? {ACGAGG}

Solution: either direct DP or reducing this to Longest non decreasing subsequence problem.

Thursday, October 20, 2011

Find the longest non decreasing subsequence

Solution: s_i = max ( max (s_j + 1), 1) for j < i and A[j] <  A[i]. So take an already available longest non decreasing subsequence and if you find a subsequent A[i] > A[j] then sum 1. Remember that the minimum is 1

Wednesday, October 19, 2011

Sort a huge set of log files in C++ and STL

you are given a huge set of log files sorted by timestamp and you need to merge them. You have unlimited disk, but fixed amount of memory. In case of ties, break it by consider a progressive number for the log files


good to recall how a min-heap is defined in STL, it's just an heap with a custom comparison. An heap is nothing more that a normal std::vector and you build it with std::make_heap, std::push_back, std::push_heap(), std::front(), std::pop_heap(). It's always a bit confusing for me to consider that you need to push_back in the vector and than push_heap, but that's not so strange considering that an heap is built upon the vector itself ;-)

   bool greater (const std::pair & a, const std::pair & b) 

Tuesday, October 18, 2011

Compute the running median and the running average

You are given a very long stream and you need to compute the running median for each group of 1k numbers. Same question for the running median. What is the complexity?

Suppose you know a priori the number of elements in the stream.

given a running average, you can update the group by adding a new element and removing the oldest one. This will give you a solution that is linear in the number of elements in the stream

Similar intuition is for the median, you can use a balanced BST where every node has the number of nodes contained in the subtree rooted at it.

Monday, October 17, 2011

A feature suggestion for facebook: Social calendar

Nothing is more social than our calendar. By definition you invite your colleagues and friends to meet and discuss. So why not integrating a calendar in Facebook, I bet that this would be the 3rd killer app after News and Images.

PS: I love Microsoft, but this idea is simple and needed so I wonder Should I create a startup? ;)

Sunday, October 16, 2011

Winner takes it all

Given a set of 128 players, suppose that x "beats" y has the transitive property. How many games do you need to find the winner? and how many games for finding the second one?

Saturday, October 15, 2011

Friday, October 14, 2011

given a vector v generate all the subsets of v with no repetition

useful for a quick test of your STL knowledge. Then, try the same with repetition

Thursday, October 13, 2011

Social graph: create the links

You are given a snapshot of the social graph G = (N, E, t) at time t. Each node has a set of attributes (a1, a2, ... an). Three problems:

1. Create an edge if two nodes have exactly the same set of attributes
2. Create an edge if two nodes have similar attributes
3. What if a set of weights it's known a-priori for the attributes?

Wednesday, October 12, 2011

Where are my Friends? Cool Ios5 app similar to this experiment.

Apple just released a cool app into the new Ios5 which is based on the location provided by your phone. This is somehow similar to some experiments I made with Facebook location and Bing Maps. In my case, your facebook friends are mapped on a Bing map according to their Facebook location. In any case, mine was just  an experiment and nothing more

This is where my Facebook friends are located in the world

Here my friends in Europe

These are my friends in San Francisco area

And this is a view of a friend of mine in New York City

Here is the Apple's preview

Tuesday, October 11, 2011

Maximum span in an array

Given an array A of numbers, find a pair of numbers A[i] and A[j], such that A[i] < A[j] and j-i is maximized.

solution: There is a trivial O(N^2) solution and a tricky O(N) based on a very simple intuition, where is the minimum?

Saturday, October 8, 2011

Find the missing number

Given a db with about 300Millions social numbers where each number is  9 digits find a number that is missing. You have 2MB of Ram and unlimited disk

300Millions = 300 * 10^6 = 3 * 10^8

Suppose we take 10 counters and "hash" each number considering only the highest order digit, incrementing the counter if a given number has that starting digit.  One missing number would be located by the counter with the lowest value. That is the intuition, we can now use a bitmap which fits 2MB of Ram and apply the same principle.

Thursday, October 6, 2011

Write a function for computing next permutation of a string

This is a little pearl piece of code

void printVector(int * value, int N)
 for (int i = 0; i < N; i++)
  std::cout << value[i] << " " ;

 std::cout << std::endl;

void visit(int * value, int N, int K)
 static int level = -1;

 level++; value[K] = level;    // the level is current i 

 if (level == N)               // a complete permutation
  printVector(value, N);
  for (int i = 0; i < N; i++)  // 0... N-1
   if (value[i] == 0)             // not used before
    visit(value, N, i);

 level--; value[K] = 0;   // nothing else to explore for the level, ready for next round of usage

The above is cooler than the classical recursive solution
A permutation of 1,2,3,4 would be:
1 + permutation_of(2,3,4)
2 + permutation_of(1,3,4)
3 + permutation_of(2,1,4)
4 + permutation_of(2,3,1)

Wednesday, October 5, 2011


Given a list of non overlapping intervals, design a data structure for inserting, finding and deleting them (e.g. [10, 30), [35, 50), [70, 100) ...)

Tuesday, October 4, 2011

Feature suggestion: Do we need a new search engine? Certainly not, but Facebook may offer a new definition of search

Bing, Yahoo and Google are already offering a complete search experience and there is probably no need of having a new search engine. Certainly, we already have a fast growing set of new engines for emerging markets such as Yandex in Russia and Baidu in China, but that's because they offer specific expertise in those markets (even if I would be not surprised of seeing them coming more and more in western countries).

Anyway, there is something new and Facebook has a unique position for offering a new set of search services, which could be integrated within the Facebook's popular notification system. Let's start from some examples:

I search for new software development positions in london. Facebook has all the public postings made by their social users. The data is there but the ranking is very poor. Plus this type of search is buried and you need to click 3 different links for accessing it.

I am looking for a musical in London, first result is totally irrelevant even if they have the location of the people posting, a full access to their internal social graph, a full access to my position in the graph, the number of like within facebook and in any external web site embedding the like button, many forms of temporal information, the links you shared, the title and the caption you changed, user importance, how many time I've interacted with my friends, any picture I have changed from automatic crawling,  and a lot more (wow, not so bad). Many of those information are not necessarily available to other search engines

Let's see other examples

Let's suppose we want to listen some music

Or get some news

I assume that you understood that the data is there, but a quality ranking is not.  You may say that this is nothing new because we already had some interest in the past for real time search, and Bing is already including social likes information.

True. Anyway, Facebook as already a quite popular Notification system -- you know that white number on a red background -- which is updated anytime something new is happening in your social graph. So why not allowing myself to register for a new (and relevant please!) update for a query term that I've searched before? That would be a useful feature to have and a new definition of search where results are pushed to me instead of searching them again and again. Yes, I know I am lazy and I like when software is making something on my behalf because I save time for something else.

Monday, October 3, 2011

Fist lessons for Stanford ML couse

So far, it's pretty good and I'd recommend attending it.

Given two vectors of integer, merge them in place

Given two vectors of integer, merge them in place (e.g. relocate the second vector to the sum of the vectors and merge the vectors in place).

Here is the code

Sunday, October 2, 2011

Given a large collection of social numbers (e.g. 360Millions)

how to efficiently find a single missing number. You have a very large disk available for storing the number but just 2Mb of Ram

Saturday, October 1, 2011

LCS of very long sequences

Compression is always looking for repetition. Suppose you have two very long files stored in disk. how will you find the LCS with a limited amount of RAM?

Wednesday, September 28, 2011

A request for facebook: Let me use my timeline and change the timings

Other quick suggestion: it would be useful to allow a fine tune for 'created_time' for each post. This would be particularly useful for pictures. I may want to upload pictures from my past memories in my timeline so that their creation time should be back in the past. just my 2c

Microsoft Research

Microsoft Research is an immense asset for Microsoft. Really like that several Research guys are a driving force for Bing (including Harry Shum, Paul Viola, Susan Dumais and many others)

Tuesday, September 27, 2011

A request for facebook: open the location API

Facebook recently allowed to tag postings (e.g. status, pictures, and so on and so forth) with locations. Anyway, this information is not available via the OpenGraph API, as discussed here and here. Can we have it?

Monday, September 26, 2011

Discover anagrams in a collection of strings

here the old style c-only version code . Ok that's was a wild journey back in the past, here the more modern version of stl code

A great tool for developing with Facebook

This is useful for exploring the OpenGraph and this is useful for building apps.

Sunday, September 25, 2011

Max sum of a subarray of integers

This is the classical kadane's algorithm and it's always a pearl to study.  A couple of key observations: (1) the empty subarray has sum 0, and this is the minimum. (2) We can maintain a running sum and when we go negative we start again with the empty subarray


max_so_far = max_ending_here  = 0;
for (i  = 0; i < A.size(); i++)
   max_so_far = max (0, max_so_far+A[i]);  /// if we get negative, consider the empty string
   max_ending_here = max (max_so_far, max_ending_here);

Friday, September 23, 2011

Shipping that feature one day before of Facebook

I had the impression that the feature was useful and so we worked on (see my posting
Places where you and your friends have been: Geo-tagging your Social Pictures) and i was not aware that Facebook was working on something similar.

So we shipped one day before the f8 and that day Facebook shipped something similar as you can see from my timeline here below. This is apparently based on my pictures and the tag you have in your mobile phone. It seems that you cannot see the pictures from your friends around a place.

Woahhhhhh, nailed it. One day before

Thursday, September 22, 2011

Mantaining intervals and looking for a number

You are given a number a and a sequence of numeric intervals  ( .., ..) . What data structure would you use for searching the number in the intervals? What if they are overlapping?

Wednesday, September 21, 2011

Places where you and your friends have been: Geo-tagging your Social Pictures

Sharing your pictures with your friends is an important part of your social activities, and a killer application for Facebook. Anyway, I thought that something was missing for that feature. During a recent journey to Ibiza, I said to myself: "Hey it would be cool to see the pictures of my friends who have been in this place before?" So I went to Facebook and was looking for a way to search the places where me and my friends have been.

Surprisingly enough, that feature was not there. I said "uh! let's change that". So I work for Microsoft, but I also like to play with other players' API during some off-work long night programming sessions. I said: "maybe I can create a mash up using Bing Maps and Facebook API. Let's start with something simple and map the face of my friends over a Bing Map". That worked and was fast to implement.

So I involved my old-friend Mario Veri and exposed the idea: "Why don't we let the user tag their own images and show them on a map?", which he liked. I said to him: "we will just spend couple of hours per night but let's make it and let's change what is missing in Facebook". So, we worked for a week or so late at night with the target of having the first PROTOTYPE (a beta!) before the F8 conference. A lot of strawberries (for me),  coffees (for mario) and code written we are now happy to introduce to

An image is worth a thousand words, and an image with the correct geographical context is probably worth more! What can you do with picplacer? Let's see:

You can load your facebook albums, select a specific picture, and associate this picture with a geographical place. runs on the top of giants' shoulders and will search the appropriate information from Facebook and Bing on your behalf.

You can drag and drop the picture so that you will fine tune the position of the image in the map. You can even zoom in and out according to your needs.

You can see what you have shared on facebook and tagged in Picplacer for the whole world or for a specific geographical area

And we are actively working for adding the possibility to show the public pictures that your Social friends have shared for a specific place. But this will probably require another week of work late at night with strawberries and coffees (we already have that information and we know how to use it).

Send more coffees, strawberries and suggestions if you want to see more or if you have an opinion. We are going to add that missing search feature in Facebook and allow you and your friends to search and see the places where you have been.

Antonio & Mario

DISCLAIMER: Antonio works for Bing Microsoft in London and Mario is an ex-student of mine. Picplacer is a mashup that me and Mario have developed just for fun and is not rappresenting any official Microsoft or Facebook initiative. It's something we made just for fun and for our own pleasure to experiment with fast prototyping and making an impact. That's it.

NOTE: you probably have seen something similar with Footprint for Android,  Photos for Iphone, and Panoramio or Flicker on the web. Anyway, those are apps or web sites for mapping pictures on a map but there is no Social aspect there (where are your Social friends?) and they are not running on Facebook the place where you already are sharing your own images. We just wanted to connect the dots...

Tuesday, September 20, 2011

Mobile Social payments, Facebook, Skype and Google Wallet

I already posted about using Social Graphs (either Facebook or Skype) for making mobile payments and transactions. See my postings "A feature suggestion for Skype: turning skype in a cash-cow machine for mobile payments"  and Facebook Social Wallet. The idea is to use Facebook Login (or Skype Login) and transfer Facebook/Skype credits from one account to another in a secure way (see the postings for the details).

Today. Google announced their Wallet based on a chip in your phone where they store the credit card.  Technically, I don't quite understand why you need a brand new phone with a special custom chip on it.

Why this should not work with already available mobile phones (e.g. Android, Iphone, Windows mobile, RIM)?

Why do I need to buy a new phone for accessing the Wallet? As described in the postings, I propose to just use either Skype or Facebook. Both of them are already running on all the mobiles  available on the markets.

Friday, September 16, 2011

compute the k-th smallest element in a BST

what is the optimal complexity in time and space?

here the code

Thursday, September 15, 2011

implement a deque using stacks

This  is a kind of "two snakes" game


snake1      snake2

1 push

2 push

3 push

               3 push
               2 push

               1 push       -> ready to pop

so push in 2nd stack if empty.

Wednesday, September 14, 2011

The beauty of SQRT(N) -- part II

Another good SQRT(N) trick can be applied to Lowest Common Ancestor (LCA) problem.

This problem consists in identifying the lowest common ancestor given two nodes in a tree. One way is to split the tree vertically in SQRT(H) parts, where H is height of the tree. So for each part, we should store the ancestor situated in the part immediately above. That is easy because the ancestor for each border section is the father of the section (e.g. the node at the frontier).

So assume that the node i, has father F[i], we can pre-compute an array P [1..N] as it follows

if (i is at first level) P[i] = 1   // first level
else if (i is at the border of one of the SQRT(H) partitions than P[i] = F[i]) // border
else (P[i] = P [ F[ i] ])  // any other node

So we have SQRT(H) partitions and the pre computation is linear. We use P for reducing the cost of LCA to SQRT(H) as it follows

 while (P[x] != P[y])          // different section
          if (L[x] > L[y])     // up with the longest
             x = P[x];         // up one section    
             y = P[y];       // up one section
      while (x != y)           // same section 
          if (L[x] > L[y])     // up with the longest
             x = F[x];         // take the father
             y = F[y];
      return x;

Tuesday, September 13, 2011

The beauty of SQRT(N)

Many quadratic algorithms can be "changed" into a linear version pre-processing sqrt(N) "points" so that the algorithm will be a linear one on the subset of selected "points". This is a typical trick adopted in clustering and in many other situations. Let's review some examples.


Range minumum queries is the problem of finding the minimum value into an interval. A trivial solution is quadratic and can be computed with dynamic programming, by observing that the minimum for an interval with 1 number is the same number and that we can get the minimum for the interval [i, j] if we know where is the index corresponding to the minimum for the interval [i, j-1].

We can use the above trick and divide the interval in SQRT(N) intervals and for each of them precompute the minimum in O(N), then we can answer the query in SQRT(N) just by checking the overlapping intervals with the query -- those are no more than SQRT(N). So we will have

Monday, September 12, 2011

Machine learning from Stanford

Back in the late 90ies the first ML course @ stanford had less than 20 students. Next fall they will open the course for online students and more than 50K students (including me) have signed up so far. Consider the opportunity of adding yourself to the list.

Sunday, September 11, 2011

Passing a parameter to a jasonp callback

jsonp is a quite popular paradigm for mashup and crosssite scripting. Sometime is useful to pass a parameter to a  jsonp callback. My solution is to encode the parameter in the name of the callback or to dynamically push functions with the parameter embedded in the code. The name of the callback will be a random string.

Do you have other solutions?

Friday, September 9, 2011

balance strings

Implement a function string which given a string s consisting of some parenthesis returns a string s1 balanced and the differences between s and s1 are minimum

here the code

Thursday, September 8, 2011

Facebook Social Wallet

Facebook login became the universal login system. Plus they have a system for buying credits which they use for making transactions such as buying games and other items.

It would be interesting to expand this mechanism into a 'Social Wallet' for Social Payments. If a merchant has a facebook login than it can be recognized it in a trusted way. In addition, it would be possible to generate a unique number for each transaction in a way similar to what Paypal is doing today. In this scenario, there will be a set of merchants selling items in real life and using facebook as a payment system. The payment is nothing more but a facebook credit transfer (again similar to Paypal). Likewise, they can use Facebook to transfer social gifts,  game points, and other types of virtual objects.

The real power of this is that the system will be available for 700millions of users and it would work perfectly for micro-payments. Plus facebook is available on many different devices so it would possible to pay on phones, tablets, and PCs.

Wednesday, September 7, 2011


given a vector of ints rotate it by an integer k

here the code

Tuesday, September 6, 2011

implement strstr

int strstr (char * needle, char * haystack);

Monday, September 5, 2011

Google is shutting down several products and betas

Google is going to shut down many betas. Most notably, Aadwark was a very interesting semantical type of search and Google desktop was a very good piece of software.

  • Aardvark

  • Desktop

  • Fast Flip

  • Google Maps API for Flash

  • Google Pack

  • Google Web Security

  • Image Labeler

  • Notebook

  • Sidewiki

  • Saturday, September 3, 2011

    Compute the LCA for a BST

    I really like this problem because it's an interesting application of recursion and it shows your attitude to code paying attention to the details.

    A BST is a binary tree where the value stored in the left child is less than the value of the current node, which in turn is less than the value of the right child. This would allow to search for the LCA (lowest common anchestor) by leveraging the structure of the tree.

    You visit the current node, if it's value v_current is within the range [v1, v2] and you are looking for v=LCA[n1, n2] then you return the value v  otherwise you move recursively left ( right ) down in the tree according if v_current < n1 < n2 (n1 < n2 < v_current, respectively)

    here the code

    Thursday, September 1, 2011

    implement a getline

    This is a typical situation when you deal with the TCP/IP stack. One underlying layer defines a blocking read which reads into a buffer. You need to use this read for implementing a getLine() which reads until \n. You can have multiple calls of getLine()

    here is the code

    Tuesday, August 30, 2011

    A feature suggestion for facebook: multiple languages in posting

    A suggestion for facebook: Can I post my message in multiple languages and you route it according to the reader? Today everyone who is living in a non-English country has always English speaking friends and so you never know what language to adopt because you want to respect your multiple readers

    In other words, a type of circles based on languages. This time useful

    Monday, August 29, 2011

    Great video about search experiments from Google

    a scientific and rigorous methodology adopted by the industry and academia

    Thursday, August 25, 2011

    Design a messaging system

    What technologies would you adopt? how to make it scalable?

    Tuesday, August 23, 2011

    Find the diameter of a tree

    This is a typical problem of recursion. 

    The diameter of a tree  is the number of nodes on the longest path between two leaves in the tree. So you can compute it as the max among the height for the left subtree, the height of the right subtree and the longest path between two leaves passing through the root. 

    Monday, August 22, 2011

    Find minimum and maximum in an unsorted array in minimum number of comparisons.

    This is certainly linear, but how can you minimize the #of comparison.

    Hint: think about the way we build an heap.

    Managing an airport

    You are the manager of an airport and have a collection of N airplanes arriving at time a(i), leaving at time l(i) and paying p(i) for occupying one of the available K runaways (where K is << N). Write a program for maximizing your gain in terms of money.  Write a program for maximizing the usage of the K runaways.

    Bing News is now in Germany, Spain and Italy too!.

    Bing News is now in Germany, Spain and Italy too!.

    How many different ways do you have for spelling Gaddafi? It's Gaddafi in English and German, but it is Gheddafi in Italian and Gadafi in Spanish.

    Sunday, August 21, 2011

    Find unique number

    You are given an array that contains integers. The integers content is such that every integer occurs 3 times in that array leaving one integer that appears only once. Fastest way to find that single integer


    First thing that I was considering is to use XOR, because x^x = 0 so all the numbers that are replicated 2 times will be deleted and the unique number that is not duplicated will stay there. But here the numbers are duplicated 3 times and x^x^x=x so this would not help. We can remove items that are duplicated a even number of times.

    So we need an alternative way based on logic operations. For each element A[i]

    1. get elements that appear 1 time  ->  ones =^x; (remove)
    2. get elements that appear 2 times -> two |= ones & x; (needs to be before previous remove operation)
    3. get elements that appear 3 times -> three = ones & two
    4. get elements that not appear 3 times  ~three
    5. remove the one not appearing 3 times from one and two -> one &= ~three; two &= ~three

    Thursday, August 18, 2011

    Find the maximum rectangle under a histogram in linear time.

    The problem requires to find the maximum area of a rectangle containing an histogram, given N histog


    If we insert the histogram i, then the rectangle can have height h(i) or larger, because the rectangle must contain the histogram i with height h(i). If we insert the histogram i then we need to look to the left j \in [i-1 .. 0] for adjacent histograms with h(j) > h(i), say those are l(i). If we insert the histogram i then we need to look to the right j \in [i+1 .. n] for adjacent histograms with h(j) > h(i), say those are r(i). These two steps will give us the basis b(i) of rectangle R_i with height h(i) containing the histogram i. In particular b(i) = l(i) + r(i) + 1, and the area for rectangle i will be a(i) = b(i) * h(i). So the final answer would be A = max_i a(i) which can be computed in O(N)

    So the main problem is how to compute l(i) and r(i). If we take O(N) then the final outcome would be quadratic. We need a solution in O(1). Can you make it?

    Wednesday, August 17, 2011

    Numbers and strings

    Two problems;

    1. Suppose you represent every number with a character like 0 –> a, 1 –> b, 2 –> c, ………………24 –> y, 25 –> z Now given a number, give words combinations for it.

    2. Given a number, generate all the strings that can be obtained with that number on a telephone keypad


    Trivial recursion, just pay attention to space allocation.

    Tuesday, August 16, 2011

    Majority element

    This is a classical problem, quite frequently asked in interviews. Given an array find the majority element. How about a stream?


    There is a trivial O(nlogn) solution based on sorting and counting, but you are not using the majority propriety.

    There is another solution based on hashing, but you are not using the majority propriety and this will not work on streams because you cannot bound the hash table.

    The majority element will appear by definition on more than 50% of elements. So you can maintain a variable with the current element and another variable with the current counting. When you see the same element you increment, when you see another element you decrement, when you reach zero then you change the current element. In this way, if there is a majority element you will identify it in the current element variable.

    What happen if there is no majority element? will you have a false positive?

    How to adapt the above principle to a stream?

    There is a bunch of 120 electrical cables laid under the ground

    The cables are 10 KM in length and both ends of each wire are out in open. Label all the cables with minimum number of trips.


    The key intuition here is that you have 2 sources of information. Either the cables are connected or they are disconnected ;-). so you can connect all of them but leave two of disconnected or connect just two of them. if you leave 3 disconnected than you have an ambiguity and you need a round trip for solving this for just 3 cables. That's a lot

    Let's try to leave just 2 connected. You need 1 round trip to identify the pair and so a total of 60 trips. That's a lot.

    Let's try to connect all of them but leave 2 disconnected marking them red and blue. In one trip you find the disconnected pair and can name them #1 and #120 (any number works here provided that you remember it for the next step). All the other cables will be numbered from #2 to #119.  Here you can encode another information just by leveraging the order. You assign to a cable #2 and to the one connected to it #3. Then you assign to another cable #4 and to the one connected to it #5, and so on and so forth (*).

    Now Let's try to connect again all of them and leave 2 disconnected. This time we will split the disconnect pair and connect one cable of them. So assume we connect (#1, #2 ) (#3, #4)    (#118, #119) and leave #120 disconnected (**)

    Return to base. We have red and blue. But we know that one of them must be disconnected and we know that the disconnected one has number #120, while the other one must be #1. That' why we broke the pair because we wanted to solve the ambiguity a (disconnection is not ambiguous when there is just one disconnected). So now we know what is #1 and what is #120. 

    Surprisingly enough we are done! because we know that  the other side has (#1, #2 ) so we know #2  is the one connected to #1 and we can test what is connected to #1.  But when we know the #2 then we know the #3 because of the naming convention adopted here (*), and if we know the #3 then we know the #4 because of the choice taken here (**).

    Talking about encoding information.

    Monday, August 15, 2011

    maximize coins

    There are n coins in a line. Two players take turns to take a coin from one of the ends of the line until there are no more coins left. The player with the larger amount of money wins. Assume that you go first, describe an algorithm to compute the maximum amount of money you can win.


    this looks like a Dynamic Programming problem. Say M(i, j) is the maximum when there are the coins from i to j available. In this case I have 2 options.

    Pick from A[i], so i get value A[i]. The other player takes either A[j] or A[i+1] leaving me the options either for M(i+1, j-1) or M(i+2, j). The other player will try to minimize my gains so he will chose min(M(i+1, j-1), M(i+2, j))

    Pick from A[j], so i get value A[j]. The other player takes either A[i] or A[j-1] leaving me the options either for M(i+1, j-1) or M(i, j-2) and again the other player will try to minimize my choices min(M(i+1, j-1), M(i, j-2))

    I select the two above trying to maximize my ROI, so the total formula would be

    max (A[i]+min(M(i+1, j-1), M(i+2, j)), A[j]+min(M(i+1, j-1), M(i, j-2)))

    Pretty cool isnt'it?

    Standard memoization is suggested

    Sunday, August 14, 2011

    Sliding window on A

    A long array A[] is given to you. There is a sliding window of size w which is moving from the very left of the array to the very right. You can only see the w numbers in the window. Each time the sliding window moves rightwards by one position. Produce an array  B[], B[i] is the maximum value of from A[i] to A[i+w-1]


    - a naive solution is O(n w)
    - an heap based solution could be O( n lg w), but we need to change the heap for maintaining n-w max positions and invalidate when the windows move. too complicated and not sure we can make it

    - we can maintain a queue or better a dequeue while processing the array. The size of the queue Q is w and the key observation is that when we see an element A[j] we can remove each element in Q such that A[j] > Q[z] because we are looking for the max and those elements are not max given the esistence of A[j]. In doing so, we keep the current maximum at front and we remove the other elements from the back of the deque. Each element will enter 1 time in the queue and will exit at most 1 time, so the total complexity is O(n) which is pretty good. For keeping the sliding window we push current Array index so we need to remove from front when out of current window

    1. #include  
    3. void maxSlidingWindow(int A[], int n, int w, int B[])  
    4. {  
    5.    deque<int> Q;  
    7.    //Initilize deque Q for first window  
    8.    for (int i = 0; i < w; i++)  
    9.    {  
    10.        while (!Q.empty() && A[i] >= A[Q.back()])  
    11.            Q.pop_back();  
    12.        Q.push_back(i);  
    13.    }  
    15.    for (int i = w; i < n; i++)  
    16.    {  
    17.        B[i-w] = A[Q.front()];  
    19.        //update Q for new window  
    20.        while (!Q.empty() && A[i] >= A[Q.back()])  
    21.            Q.pop_back();  
    23.        //Pop older element outside window from Q      
    24.        while (!Q.empty() && Q.front() <= i-w)  
    25.            Q.pop_front();  
    27.        //Insert current element in Q  
    28.        Q.push_back(i);  
    29.    }  
    30.    B[n-w] = A[Q.front()];