Sometime ago I asked to my friend Charisma why she was studying Italian and she said that she wanted to learn Spanish. I found the answer somehow confusing because the two languages are different. The point is that I missed to understand the main point ;-) You can actually learn and improve a language if you study different languages.

These days I am enjoying reading "Seven languages in seven weeks" a book which compares Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell. You can learn things such as Concurrency based on Actors in Ruby, or mixing OO and Functional programming in Scala, higher order programming in Haskel, logic programming in Prolog, having a language based on prototype patterns and messages as in Io, currying and clojures and many other things. Definitively a recommended reading.

The more I read, the more I understand and improve my C++. So Charisma you are right, you can actually study Italian for improving Spanish. It took me a while to understand that.

Random commentary about C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search

## Monday, October 31, 2011

### Playing with languages, part I

## Sunday, October 30, 2011

### Can we use Facebook messages to predict how the stock market is going?

This would be an interesting Machine Learning problem on a massive dataset. Can we use Facebook's messages to predict how the stock and bond market will evolve? Here a couple of examples of the raw dataset.

### Web server performance

A web server W is running a service S which provides information I when it receive queries Q. How can you guarantee a time < s for the 95 percentile?

## Saturday, October 29, 2011

### Design a system for browsing songs from your friends

Similar to what spotify does.

## Friday, October 28, 2011

### Two colors

Given a undirect graph G=(N, E) a two colour scheme assigns to each node either the colour black or white. A node is diverse if it has the opposite colour of at least half of its neighbours. How can you assign a diverse colour to all the nodes in a graph?

tricky one

tricky one

## Thursday, October 27, 2011

### Design a system for implementing Facebook News feed

suppose you have 10K servers.

## Wednesday, October 26, 2011

### Assign employees to bosses

You are given a list of employees and a list of potential bosses. Each employee expresses his ranked list of bosses' preferences. Each boss expresses is ranked list of employees' preferences. Match the two classes so that the maximum number of preferences is satisfied.

## Tuesday, October 25, 2011

### Got assigned a new patent: how to bootstrap a learning to rank system with real traffic information

Just got assigned a patent filed back in 2005, "Sampling internet user traffic to improve search results". The problem we were trying to address was about bootstrapping a learning to rank system in absence of user search information. This is a typical problem you have when you are not the incumbent search engine, and you don't have

The key intuition was to use web traffic information collected by the way web proxies and observing user traffic and navigational information, including traffic performed querying other search engines. A similar traffic can be observed by minging a collection of web logs for the HTTP_Referer tag, The methodology was used for improving the freshness, the coverage, the ranking and the clustering of search engine results and, more generically, may include monitoring web traffic on remote web servers on the communications network

__already__accumulated usage information and user behaviour activities (see more information here Calculating Search Rankings with User Web Traffic Data). How can you compensate this bootstrap situation?The key intuition was to use web traffic information collected by the way web proxies and observing user traffic and navigational information, including traffic performed querying other search engines. A similar traffic can be observed by minging a collection of web logs for the HTTP_Referer tag, The methodology was used for improving the freshness, the coverage, the ranking and the clustering of search engine results and, more generically, may include monitoring web traffic on remote web servers on the communications network

## Monday, October 24, 2011

### Design a system for electing a skype supernode

## Sunday, October 23, 2011

### Design a system for buying drugs online

### Playing with addresses

Detect if a machine is Big Endian or little endian. Detect if the stack will grow up or down

## Saturday, October 22, 2011

### Reverse a string, word by word

make it inplace

## Friday, October 21, 2011

### Find a way to detect whether two dna strings are similar

One way of doing this is to find the longest common subsequence, since we can allow to have to strings that are also having different genes interleaved. So suppose you have {ACGGAGGGAA} and {ACGAAGG} what is the longest common subsequence? {ACGAGG}

Solution: either direct DP or reducing this to Longest non decreasing subsequence problem.

Solution: either direct DP or reducing this to Longest non decreasing subsequence problem.

## Thursday, October 20, 2011

### Find the longest non decreasing subsequence

**Solution:**s_i = max ( max (s_j + 1), 1) for j < i and A[j] < A[i]. So take an already available longest non decreasing subsequence and if you find a subsequent A[i] > A[j] then sum 1. Remember that the minimum is 1

## Wednesday, October 19, 2011

### Sort a huge set of log files in C++ and STL

you are given a huge set of log files sorted by timestamp and you need to merge them. You have unlimited disk, but fixed amount of memory. In case of ties, break it by consider a progressive number for the log files

good to recall how a min-heap is defined in STL, it's just an heap with a custom comparison. An heap is nothing more that a normal std::vector and you build it with std::make_heap, std::push_back, std::push_heap(), std::front(), std::pop_heap(). It's always a bit confusing for me to consider that you need to push_back in the vector and than push_heap, but that's not so strange considering that an heap is built upon the vector itself ;-)

**Solution:**good to recall how a min-heap is defined in STL, it's just an heap with a custom comparison. An heap is nothing more that a normal std::vector and you build it with std::make_heap, std::push_back, std::push_heap(), std::front(), std::pop_heap(). It's always a bit confusing for me to consider that you need to push_back in the vector and than push_heap, but that's not so strange considering that an heap is built upon the vector itself ;-)

bool greater (const std::pair& a, const std::pair & b)

## Tuesday, October 18, 2011

### Compute the running median and the running average

You are given a very long stream and you need to compute the running median for each group of 1k numbers. Same question for the running median. What is the complexity?

Suppose you know a priori the number of elements in the stream.

given a running average, you can update the group by adding a new element and removing the oldest one. This will give you a solution that is linear in the number of elements in the stream

Similar intuition is for the median, you can use a balanced BST where every node has the number of nodes contained in the subtree rooted at it.

Suppose you know a priori the number of elements in the stream.

**Solution:**given a running average, you can update the group by adding a new element and removing the oldest one. This will give you a solution that is linear in the number of elements in the stream

Similar intuition is for the median, you can use a balanced BST where every node has the number of nodes contained in the subtree rooted at it.

## Monday, October 17, 2011

### A feature suggestion for facebook: Social calendar

Nothing is more social than our calendar. By definition you invite your colleagues and friends to meet and discuss. So why not integrating a calendar in Facebook, I bet that this would be the 3rd killer app after News and Images.

PS: I love Microsoft, but this idea is simple and needed so I wonder Should I create a startup? ;)

PS: I love Microsoft, but this idea is simple and needed so I wonder Should I create a startup? ;)

## Sunday, October 16, 2011

### Winner takes it all

Given a set of 128 players, suppose that x "beats" y has the transitive property. How many games do you need to find the winner? and how many games for finding the second one?

## Saturday, October 15, 2011

### Compute the integer square of a 32bit number

What is the most efficient algorithm?

## Friday, October 14, 2011

###
given a vector v generate all the subsets of v with no repetition

useful for a quick test of your STL knowledge. Then, try the same with repetition

## Thursday, October 13, 2011

### Social graph: create the links

You are given a snapshot of the social graph G = (N, E, t) at time t. Each node has a set of attributes (a1, a2, ... an). Three problems:

1. Create an edge if two nodes have exactly the same set of attributes

2. Create an edge if two nodes have similar attributes

3. What if a set of weights it's known a-priori for the attributes?

1. Create an edge if two nodes have exactly the same set of attributes

2. Create an edge if two nodes have similar attributes

3. What if a set of weights it's known a-priori for the attributes?

## Wednesday, October 12, 2011

### Where are my Friends? Cool Ios5 app similar to this experiment.

Apple just released a cool app into the new Ios5 which is based on the location provided by your phone. This is somehow similar to some experiments I made with Facebook location and Bing Maps. In my case, your facebook friends are mapped on a Bing map according to their Facebook location. In any case, mine was just an experiment and nothing more

http://www.headtagger.com/WhereAreMyFriends/map.html

This is where my Facebook friends are located in the world

These are my friends in San Francisco area

And this is a view of a friend of mine in New York City

http://www.headtagger.com/WhereAreMyFriends/map.html

This is where my Facebook friends are located in the world

Here my friends in Europe

These are my friends in San Francisco area

And this is a view of a friend of mine in New York City

Here is the Apple's preview

## Tuesday, October 11, 2011

### Maximum span in an array

Given an array A of numbers, find a pair of numbers A[i] and A[j], such that A[i] < A[j] and j-i is maximized.

solution: There is a trivial O(N^2) solution and a tricky O(N) based on a very simple intuition, where is the minimum?

## Monday, October 10, 2011

### Computer min & max with divide & conquer strategy

trivial !

## Sunday, October 9, 2011

## Saturday, October 8, 2011

### Find the missing number

Given a db with about 300Millions social numbers where each number is 9 digits find a number that is missing. You have 2MB of Ram and unlimited disk

Solution

300Millions = 300 * 10^6 = 3 * 10^8

Suppose we take 10 counters and "hash" each number considering only the highest order digit, incrementing the counter if a given number has that starting digit. One missing number would be located by the counter with the lowest value. That is the intuition, we can now use a bitmap which fits 2MB of Ram and apply the same principle.

Solution

300Millions = 300 * 10^6 = 3 * 10^8

Suppose we take 10 counters and "hash" each number considering only the highest order digit, incrementing the counter if a given number has that starting digit. One missing number would be located by the counter with the lowest value. That is the intuition, we can now use a bitmap which fits 2MB of Ram and apply the same principle.

## Friday, October 7, 2011

### Given a stream identify the words that appears more than k% times.

## Thursday, October 6, 2011

### Write a function for computing next permutation of a string

This is a little pearl piece of code

void printVector(int * value, int N) { for (int i = 0; i < N; i++) std::cout << value[i] << " " ; std::cout << std::endl; } void visit(int * value, int N, int K) { static int level = -1; level++; value[K] = level; // the level is current i if (level == N) // a complete permutation printVector(value, N); else for (int i = 0; i < N; i++) // 0... N-1 if (value[i] == 0) // not used before visit(value, N, i); level--; value[K] = 0; // nothing else to explore for the level, ready for next round of usage }

The above is cooler than the classical recursive solution

```
A permutation of 1,2,3,4 would be:
1 + permutation_of(2,3,4)
2 + permutation_of(1,3,4)
3 + permutation_of(2,1,4)
4 + permutation_of(2,3,1)
```

## Wednesday, October 5, 2011

### Intervals

Given a list of non overlapping intervals, design a data structure for inserting, finding and deleting them (e.g. [10, 30), [35, 50), [70, 100) ...)

## Tuesday, October 4, 2011

### Feature suggestion: Do we need a new search engine? Certainly not, but Facebook may offer a new definition of search

Bing, Yahoo and Google are already offering a complete search experience and there is probably no need of having a new search engine. Certainly, we already have a fast growing set of new engines for emerging markets such as Yandex in Russia and Baidu in China, but that's because they offer specific expertise in those markets (even if I would be not surprised of seeing them coming more and more in western countries).

Anyway, there is something new and Facebook has a unique position for offering a new set of search services, which could be integrated within the Facebook's popular notification system. Let's start from some examples:

I search for new software development positions in london. Facebook has all the public postings made by their social users. The data is there but the ranking is very poor. Plus this type of search is buried and you need to click 3 different links for accessing it.

I am looking for a musical in London, first result is totally irrelevant even if they have the location of the people posting, a full access to their internal social graph, a full access to my position in the graph, the number of like within facebook and in any external web site embedding the like button, many forms of temporal information, the links you shared, the title and the caption you changed, user importance, how many time I've interacted with my friends, any picture I have changed from automatic crawling, and a lot more (wow, not so bad). Many of those information are not necessarily available to other search engines

Anyway, there is something new and Facebook has a unique position for offering a new set of search services, which could be integrated within the Facebook's popular notification system. Let's start from some examples:

I search for new software development positions in london. Facebook has all the public postings made by their social users. The data is there but the ranking is very poor. Plus this type of search is buried and you need to click 3 different links for accessing it.

I am looking for a musical in London, first result is totally irrelevant even if they have the location of the people posting, a full access to their internal social graph, a full access to my position in the graph, the number of like within facebook and in any external web site embedding the like button, many forms of temporal information, the links you shared, the title and the caption you changed, user importance, how many time I've interacted with my friends, any picture I have changed from automatic crawling, and a lot more (wow, not so bad). Many of those information are not necessarily available to other search engines

Let's see other examples

Let's suppose we want to listen some music

Or get some news

I assume that you understood that the data is there, but a quality ranking is not. You may say that this is nothing new because we already had some interest in the past for real time search, and Bing is already including social likes information.

True. Anyway, Facebook as already a quite popular Notification system -- you know that white number on a red background -- which is updated anytime something new is happening in your social graph. So why not allowing myself to register for a new (and relevant please!) update for a query term that I've searched before? That would be a useful feature to have and a new definition of search where results are pushed to me instead of searching them again and again. Yes, I know I am lazy and I like when software is making something on my behalf because I save time for something else.

## Monday, October 3, 2011

### Fist lessons for Stanford ML couse

### Given two vectors of integer, merge them in place

Given two vectors of integer, merge them in place (e.g. relocate the second vector to the sum of the vectors and merge the vectors in place).

Here is the code

Here is the code

## Sunday, October 2, 2011

### Given a large collection of social numbers (e.g. 360Millions)

how to efficiently find a single missing number. You have a very large disk available for storing the number but just 2Mb of Ram

## Saturday, October 1, 2011

### LCS of very long sequences

Compression is always looking for repetition. Suppose you have two very long files stored in disk. how will you find the LCS with a limited amount of RAM?

Subscribe to:
Posts (Atom)