Antonio Gulli's coding playground: August 2011

Tuesday, August 30, 2011

A feature suggestion for facebook: multiple languages in posting

A suggestion for facebook: Can I post my message in multiple languages and you route it according to the reader? Today everyone who is living in a non-English country has always English speaking friends and so you never know what language to adopt because you want to respect your multiple readers

In other words, a type of circles based on languages. This time useful

Monday, August 29, 2011

Great video about search experiments from Google

a scientific and rigorous methodology adopted by the industry and academia

Friday, August 26, 2011

Design a system to store a distributed social graph

How to make it scalable?

Thursday, August 25, 2011

Design a messaging system

What technologies would you adopt? how to make it scalable?

Wednesday, August 24, 2011

Today, 2 years in Microsoft

Tuesday, August 23, 2011

Find the diameter of a tree

This is a typical problem of recursion.

The diameter of a tree is the number of nodes on the longest path between two leaves in the tree. So you can compute it as the max among the height for the left subtree, the height of the right subtree and the longest path between two leaves passing through the root.

Monday, August 22, 2011

Find minimum and maximum in an unsorted array in minimum number of comparisons.

This is certainly linear, but how can you minimize the #of comparison.

Hint: think about the way we build an heap.

Managing an airport

You are the manager of an airport and have a collection of N airplanes arriving at time a(i), leaving at time l(i) and paying p(i) for occupying one of the available K runaways (where K is << N). Write a program for maximizing your gain in terms of money. Write a program for maximizing the usage of the K runaways.

Bing News is now in Germany, Spain and Italy too!.

Bing News is now in Germany, Spain and Italy too!.

How many different ways do you have for spelling Gaddafi? It's Gaddafi in English and German, but it is Gheddafi in Italian and Gadafi in Spanish.

Sunday, August 21, 2011

Find unique number

You are given an array that contains integers. The integers content is such that every integer occurs 3 times in that array leaving one integer that appears only once. Fastest way to find that single integer

Solution

First thing that I was considering is to use XOR, because x^x = 0 so all the numbers that are replicated 2 times will be deleted and the unique number that is not duplicated will stay there. But here the numbers are duplicated 3 times and x^x^x=x so this would not help. We can remove items that are duplicated a even number of times.

So we need an alternative way based on logic operations. For each element A[i]

1. get elements that appear 1 time -> ones =^x; (remove)
2. get elements that appear 2 times -> two |= ones & x; (needs to be before previous remove operation)
3. get elements that appear 3 times -> three = ones & two
4. get elements that not appear 3 times ~three
5. remove the one not appearing 3 times from one and two -> one &= ~three; two &= ~three

Thursday, August 18, 2011

Find the maximum rectangle under a histogram in linear time.

The problem requires to find the maximum area of a rectangle containing an histogram, given N histog

Solution

If we insert the histogram i, then the rectangle can have height h(i) or larger, because the rectangle must contain the histogram i with height h(i). If we insert the histogram i then we need to look to the left j \in [i-1 .. 0] for adjacent histograms with h(j) > h(i), say those are l(i). If we insert the histogram i then we need to look to the right j \in [i+1 .. n] for adjacent histograms with h(j) > h(i), say those are r(i). These two steps will give us the basis b(i) of rectangle R_i with height h(i) containing the histogram i. In particular b(i) = l(i) + r(i) + 1, and the area for rectangle i will be a(i) = b(i) * h(i). So the final answer would be A = max_i a(i) which can be computed in O(N)

So the main problem is how to compute l(i) and r(i). If we take O(N) then the final outcome would be quadratic. We need a solution in O(1). Can you make it?

Wednesday, August 17, 2011

Numbers and strings

Two problems;

1. Suppose you represent every number with a character like 0 –> a, 1 –> b, 2 –> c, ………………24 –> y, 25 –> z Now given a number, give words combinations for it.

2. Given a number, generate all the strings that can be obtained with that number on a telephone keypad

Solution

Trivial recursion, just pay attention to space allocation.

Tuesday, August 16, 2011

Majority element

This is a classical problem, quite frequently asked in interviews. Given an array find the majority element. How about a stream?

Solution

There is a trivial O(nlogn) solution based on sorting and counting, but you are not using the majority propriety.

There is another solution based on hashing, but you are not using the majority propriety and this will not work on streams because you cannot bound the hash table.

The majority element will appear by definition on more than 50% of elements. So you can maintain a variable with the current element and another variable with the current counting. When you see the same element you increment, when you see another element you decrement, when you reach zero then you change the current element. In this way, if there is a majority element you will identify it in the current element variable.

What happen if there is no majority element? will you have a false positive?

How to adapt the above principle to a stream?

There is a bunch of 120 electrical cables laid under the ground

The cables are 10 KM in length and both ends of each wire are out in open. Label all the cables with minimum number of trips.

Solution.

The key intuition here is that you have 2 sources of information. Either the cables are connected or they are disconnected ;-). so you can connect all of them but leave two of disconnected or connect just two of them. if you leave 3 disconnected than you have an ambiguity and you need a round trip for solving this for just 3 cables. That's a lot

Let's try to leave just 2 connected. You need 1 round trip to identify the pair and so a total of 60 trips. That's a lot.

Let's try to connect all of them but leave 2 disconnected marking them red and blue. In one trip you find the disconnected pair and can name them #1 and #120 (any number works here provided that you remember it for the next step). All the other cables will be numbered from #2 to #119. Here you can encode another information just by leveraging the order. You assign to a cable #2 and to the one connected to it #3. Then you assign to another cable #4 and to the one connected to it #5, and so on and so forth (*).

Now Let's try to connect again all of them and leave 2 disconnected. This time we will split the disconnect pair and connect one cable of them. So assume we connect (#1, #2 ) (#3, #4) (#118, #119) and leave #120 disconnected (**)

Return to base. We have red and blue. But we know that one of them must be disconnected and we know that the disconnected one has number #120, while the other one must be #1. That' why we broke the pair because we wanted to solve the ambiguity a (disconnection is not ambiguous when there is just one disconnected). So now we know what is #1 and what is #120.

Surprisingly enough we are done! because we know that the other side has (#1, #2 ) so we know #2 is the one connected to #1 and we can test what is connected to #1. But when we know the #2 then we know the #3 because of the naming convention adopted here (*), and if we know the #3 then we know the #4 because of the choice taken here (**).

Talking about encoding information.

Monday, August 15, 2011

maximize coins

There are n coins in a line. Two players take turns to take a coin from one of the ends of the line until there are no more coins left. The player with the larger amount of money wins. Assume that you go first, describe an algorithm to compute the maximum amount of money you can win.

Solution:

this looks like a Dynamic Programming problem. Say M(i, j) is the maximum when there are the coins from i to j available. In this case I have 2 options.

Pick from A[i], so i get value A[i]. The other player takes either A[j] or A[i+1] leaving me the options either for M(i+1, j-1) or M(i+2, j). The other player will try to minimize my gains so he will chose min(M(i+1, j-1), M(i+2, j))

Pick from A[j], so i get value A[j]. The other player takes either A[i] or A[j-1] leaving me the options either for M(i+1, j-1) or M(i, j-2) and again the other player will try to minimize my choices min(M(i+1, j-1), M(i, j-2))

I select the two above trying to maximize my ROI, so the total formula would be

max (A[i]+min(M(i+1, j-1), M(i+2, j)), A[j]+min(M(i+1, j-1), M(i, j-2)))

Pretty cool isnt'it?

Standard memoization is suggested

Sunday, August 14, 2011

Sliding window on A

A long array A[] is given to you. There is a sliding window of size w which is moving from the very left of the array to the very right. You can only see the w numbers in the window. Each time the sliding window moves rightwards by one position. Produce an array B[], B[i] is the maximum value of from A[i] to A[i+w-1]

Solution

- a naive solution is O(n w)
- an heap based solution could be O( n lg w), but we need to change the heap for maintaining n-w max positions and invalidate when the windows move. too complicated and not sure we can make it

- we can maintain a queue or better a dequeue while processing the array. The size of the queue Q is w and the key observation is that when we see an element A[j] we can remove each element in Q such that A[j] > Q[z] because we are looking for the max and those elements are not max given the esistence of A[j]. In doing so, we keep the current maximum at front and we remove the other elements from the back of the deque. Each element will enter 1 time in the queue and will exit at most 1 time, so the total complexity is O(n) which is pretty good. For keeping the sliding window we push current Array index so we need to remove from front when out of current window

#include
void maxSlidingWindow(int A[], int n, int w, int B[])
{
deque<int> Q;
//Initilize deque Q for first window
for (int i = 0; i < w; i++)
{
while (!Q.empty() && A[i] >= A[Q.back()])
Q.pop_back();
Q.push_back(i);
}
for (int i = w; i < n; i++)
{
B[i-w] = A[Q.front()];
//update Q for new window
while (!Q.empty() && A[i] >= A[Q.back()])
Q.pop_back();
//Pop older element outside window from Q
while (!Q.empty() && Q.front() <= i-w)
Q.pop_front();
//Insert current element in Q
Q.push_back(i);
}
B[n-w] = A[Q.front()];
}

Saturday, August 13, 2011

Knapsack

You have n types of items, where the i^th item type has an integer size s_i and a real value v_i. You need to ﬁll a knapsack of total capacity C with a selection of items of maximum value.

Friday, August 12, 2011

Pretty educative talk by the way of Google guys

Pretty educative talk by the way of Google guys.

Convert a BST into a circular list

I like this question because it forces you to think about recursion, visit of a tree and how to change pointers

Thursday, August 11, 2011

subtrees

Find if any subtree of a binary tree has sum equal to given value. Subtree may or may not start from root.

Hint: solve the one with substree starting from root first.

Jump or not jump (solution)

Given an array start from the first element and reach the last by jumping. The jump length can be at most the value at the current position in the array. Optimum result is when u reach the goal in minimum number of jumps.
For ex:
Given array A = {2,3,1,1,4}
possible ways to reach the end (index list)
i) 0,2,3,4 (jump 2 to index 2, then jump 1 to index 3 then 1 to index 4)
ii) 0,1,4 (jump 1 to index 1, then jump 3 to index 4)

Solution:

This problem smells like a good candidate for Dynamic Programming. One additional observation is that you need to start processing from right to left for solving the problem. Given the example:

min[4] = 0 (because you are at the end)
min[3] = 1 (because you can just go right of 1 position, more than that will be not allowed)
min[2] = min (min[4], min[3]) here you need to consider all the one that are effectively reachable from current position and this is expressed by current index.

Now how can you get this minimum for a given interval?

a) checking all the positions -> will give you a quadratic algorithm
b) using a tree -> insert and get minimum will give you an O(n log n) algorithm
c) using range queries or Range mimimum queries and this is linear because the minimum for an internal can be obtained in O(1)

Wednesday, August 10, 2011

Passing a subclass object to a function waiting for a superclass object

what will happen?

Find blobs

On a binary matrix mxn a blob is a sequence of contiguos 1 (there are at most 8 potential candidates), enumerate and track all the blobs.

I learned something new "Persistent data structure"

Persistent data structure. Thanks Giuseppe, that's a new meaning for the work {persistent} that I was not aware of.

Reservoir sampling

Reservoir sampling, a good algo to know for randomly selecting k items from a list of N. the key observation here is that the replacement is happening with increasing probability i

array R[k];    // result
integer i, j;

// fill the reservoir array
for each i in 1 to k do
    R[i] := S[i]
done;

// replace elements with gradually decreasing probability
for each i in k+1 to length(S) do
    j := random(1, i);   // important: inclusive range
    if j <= k then
        R[j] := S[i]
    fi
done

A couple of feature suggestions for Facebook News

Dear Facebook,

Here a couple of suggestions from an avid user:

1. Let me order my news feed by people. News feed on Facebook is the modern inbox, so i'd like to sort my incoming messages by sender (e.g. friends) and to associate different friends with different priorities. Nothing new here we have this in all email readers;

2. Let me create additional folders and rules over my news feed. Again, Facebook is the modern inbox, so i'd like to predefine a set of rules for filtering messages and automatically associate them to different folders. Nothing new here: we have this option in all email readers;

3. Let me create mailing lists and send messages to people in those specific lists. Again, Facebook is the modern inbox, and when I send an email I don't want to send it to all the people in my address book but just to a subset of them. If you want to make this feature cool call it "square", or "rectangle", or maybe "circle". Yes, "circle" has a mythological effect which is always a plus. Nothing new here we have this option in all email readers;

4. Let me filter senders. I do not want to get updates from some of my friends, so let me ban them. Yes, I know that this feature is already there but I wanted you to extend it. Even if I ban a friend, sometime I just want to get some of the updates from her because I want to give her a second chance. So please allow me to explore if she is become more catchy.

5. Let me create folders with topics. In addition, to the normal news feeds I'd like to create special folders where you automatically classify incoming messages. I give a couple of examples and you learn from them. For instance, I wanted all the messages linking to a news site in a folder. All the messages about recipes in another and so on and so forth; Please learn from my behavior.

6. Let me searches my past messages. Currently this feature s..ks!

Your loyal User,
Antonio

Tuesday, August 9, 2011

Find the diameter of a BST

Find the diameter of a binary search tree.

Hint: this can be either in the left sub-tree or in the right-subtree or crossing the root.

Monday, August 8, 2011

Can you learn how to optimize two or more loss function at the same time?

A good paper for something very useful

Ranking Specialization for Web Search: A Divide and Conquer Approach Using Topical RankSVM

Authors: Zha H, Bian J, Li X, Li F, Zheng Z

Source: the 19th International World Wide Web Conference

Keywords: Search Sciences

Download:

fp0842-bian.pdf

Sunday, August 7, 2011

what means "dead beef"?

Saturday, August 6, 2011

You need to implement a suggest as you type service

1. what approach would your use?
2. what data structure would you use?
3. suppose you want also to integrate a form of spell correction, what would be your approach?

Friday, August 5, 2011

My first 15 years in Search.

Pisa, 1996 - 2011. In August, 15 years ago a crazy group of few guys in Pisa were thinking about building a search engine. Our first index had 1.8M of web documents. wow! There were less than 10,000 servers world wide:

Arianna became operational on 1996 in web developed by a group IOL in collaboration with the Centre SerRA Università degli studi di Pisa and the laboratory of Pisa Olivetti of Telemedia. Soon the site becomes a point of reference for the web in Italian in perennial competition Virgilo

http://www.microsofttranslator.com/bv.aspx?ref=Internal&from&to=en&a=http%3A%2F%2Fit.wikipedia.org%2Fwiki%2FArianna_%28motore_di_ricerca%29

Thursday, August 4, 2011

My forecast: this will change the Ad display space forever

The ad analytics company is about to release a new service that will tell advertisers who’s seeing their ads--anywhere on the Internet. And Facebook is going to be the engine that powers it.

http://www.fastcompany.com/1771291/could-nielsen-s-new-measurement-system-finally-prove-the-worth-of-online-advertising

your opinion?

Wednesday, August 3, 2011

starting to be interested in Ads space

Good synthetic description of Generalized Second Price and motivation behind it.

Tuesday, August 2, 2011

Something to know: DAAT algorithm

DAAT is a nice IR algorithm, useful to know (page 100)

Monday, August 1, 2011

Computational Ads

I am more and more interested in Ads and this course from Andrei Broder is a pure golden mine.