A suggestion for facebook: Can I post my message in multiple languages and you route it according to the reader? Today everyone who is living in a non-English country has always English speaking friends and so you never know what language to adopt because you want to respect your multiple readers
In other words, a type of circles based on languages. This time useful
Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Tuesday, August 30, 2011
Monday, August 29, 2011
Great video about search experiments from Google
a scientific and rigorous methodology adopted by the industry and academia
Friday, August 26, 2011
Design a system to store a distributed social graph
How to make it scalable?
Thursday, August 25, 2011
Design a messaging system
What technologies would you adopt? how to make it scalable?
Wednesday, August 24, 2011
Tuesday, August 23, 2011
Find the diameter of a tree
This is a typical problem of recursion.
The diameter of a tree is the number of nodes on the longest path between two leaves in the tree. So you can compute it as the max among the height for the left subtree, the height of the right subtree and the longest path between two leaves passing through the root.
The diameter of a tree is the number of nodes on the longest path between two leaves in the tree. So you can compute it as the max among the height for the left subtree, the height of the right subtree and the longest path between two leaves passing through the root.
Monday, August 22, 2011
Find minimum and maximum in an unsorted array in minimum number of comparisons.
This is certainly linear, but how can you minimize the #of comparison.
Hint: think about the way we build an heap.
Hint: think about the way we build an heap.
Managing an airport
You are the manager of an airport and have a collection of N airplanes arriving at time a(i), leaving at time l(i) and paying p(i) for occupying one of the available K runaways (where K is << N). Write a program for maximizing your gain in terms of money. Write a program for maximizing the usage of the K runaways.
Bing News is now in Germany, Spain and Italy too!.
Bing News is now in Germany, Spain and Italy too!.
How many different ways do you have for spelling Gaddafi? It's Gaddafi in English and German, but it is Gheddafi in Italian and Gadafi in Spanish.
How many different ways do you have for spelling Gaddafi? It's Gaddafi in English and German, but it is Gheddafi in Italian and Gadafi in Spanish.
Sunday, August 21, 2011
Find unique number
You are given an array that contains integers. The integers content is such that every integer occurs 3 times in that array leaving one integer that appears only once. Fastest way to find that single integer
Solution
First thing that I was considering is to use XOR, because x^x = 0 so all the numbers that are replicated 2 times will be deleted and the unique number that is not duplicated will stay there. But here the numbers are duplicated 3 times and x^x^x=x so this would not help. We can remove items that are duplicated a even number of times.
So we need an alternative way based on logic operations. For each element A[i]
1. get elements that appear 1 time -> ones =^x; (remove)
2. get elements that appear 2 times -> two |= ones & x; (needs to be before previous remove operation)
3. get elements that appear 3 times -> three = ones & two
4. get elements that not appear 3 times ~three
5. remove the one not appearing 3 times from one and two -> one &= ~three; two &= ~three
Solution
First thing that I was considering is to use XOR, because x^x = 0 so all the numbers that are replicated 2 times will be deleted and the unique number that is not duplicated will stay there. But here the numbers are duplicated 3 times and x^x^x=x so this would not help. We can remove items that are duplicated a even number of times.
So we need an alternative way based on logic operations. For each element A[i]
1. get elements that appear 1 time -> ones =^x; (remove)
2. get elements that appear 2 times -> two |= ones & x; (needs to be before previous remove operation)
3. get elements that appear 3 times -> three = ones & two
4. get elements that not appear 3 times ~three
5. remove the one not appearing 3 times from one and two -> one &= ~three; two &= ~three
Thursday, August 18, 2011
Find the maximum rectangle under a histogram in linear time.
The problem requires to find the maximum area of a rectangle containing an histogram, given N histog
Solution
If we insert the histogram i, then the rectangle can have height h(i) or larger, because the rectangle must contain the histogram i with height h(i). If we insert the histogram i then we need to look to the left j \in [i-1 .. 0] for adjacent histograms with h(j) > h(i), say those are l(i). If we insert the histogram i then we need to look to the right j \in [i+1 .. n] for adjacent histograms with h(j) > h(i), say those are r(i). These two steps will give us the basis b(i) of rectangle R_i with height h(i) containing the histogram i. In particular b(i) = l(i) + r(i) + 1, and the area for rectangle i will be a(i) = b(i) * h(i). So the final answer would be A = max_i a(i) which can be computed in O(N)
So the main problem is how to compute l(i) and r(i). If we take O(N) then the final outcome would be quadratic. We need a solution in O(1). Can you make it?
Solution
If we insert the histogram i, then the rectangle can have height h(i) or larger, because the rectangle must contain the histogram i with height h(i). If we insert the histogram i then we need to look to the left j \in [i-1 .. 0] for adjacent histograms with h(j) > h(i), say those are l(i). If we insert the histogram i then we need to look to the right j \in [i+1 .. n] for adjacent histograms with h(j) > h(i), say those are r(i). These two steps will give us the basis b(i) of rectangle R_i with height h(i) containing the histogram i. In particular b(i) = l(i) + r(i) + 1, and the area for rectangle i will be a(i) = b(i) * h(i). So the final answer would be A = max_i a(i) which can be computed in O(N)
So the main problem is how to compute l(i) and r(i). If we take O(N) then the final outcome would be quadratic. We need a solution in O(1). Can you make it?
Wednesday, August 17, 2011
Numbers and strings
Two problems;
2. Given a number, generate all the strings that can be obtained with that number on a telephone keypad
Solution
Trivial recursion, just pay attention to space allocation.
1. Suppose you represent every number with a character like 0 –> a, 1 –> b, 2 –> c, ………………24 –> y, 25 –> z Now given a number, give words combinations for it.
2. Given a number, generate all the strings that can be obtained with that number on a telephone keypad
Solution
Trivial recursion, just pay attention to space allocation.
Tuesday, August 16, 2011
Majority element
This is a classical problem, quite frequently asked in interviews. Given an array find the majority element. How about a stream?
Solution
There is a trivial O(nlogn) solution based on sorting and counting, but you are not using the majority propriety.
There is another solution based on hashing, but you are not using the majority propriety and this will not work on streams because you cannot bound the hash table.
The majority element will appear by definition on more than 50% of elements. So you can maintain a variable with the current element and another variable with the current counting. When you see the same element you increment, when you see another element you decrement, when you reach zero then you change the current element. In this way, if there is a majority element you will identify it in the current element variable.
What happen if there is no majority element? will you have a false positive?
How to adapt the above principle to a stream?
Solution
There is a trivial O(nlogn) solution based on sorting and counting, but you are not using the majority propriety.
There is another solution based on hashing, but you are not using the majority propriety and this will not work on streams because you cannot bound the hash table.
The majority element will appear by definition on more than 50% of elements. So you can maintain a variable with the current element and another variable with the current counting. When you see the same element you increment, when you see another element you decrement, when you reach zero then you change the current element. In this way, if there is a majority element you will identify it in the current element variable.
What happen if there is no majority element? will you have a false positive?
How to adapt the above principle to a stream?
There is a bunch of 120 electrical cables laid under the ground
The cables are 10 KM in length and both ends of each wire are out in open. Label all the cables with minimum number of trips.
Solution.
The key intuition here is that you have 2 sources of information. Either the cables are connected or they are disconnected ;-). so you can connect all of them but leave two of disconnected or connect just two of them. if you leave 3 disconnected than you have an ambiguity and you need a round trip for solving this for just 3 cables. That's a lot
Let's try to leave just 2 connected. You need 1 round trip to identify the pair and so a total of 60 trips. That's a lot.
Let's try to connect all of them but leave 2 disconnected marking them red and blue. In one trip you find the disconnected pair and can name them #1 and #120 (any number works here provided that you remember it for the next step). All the other cables will be numbered from #2 to #119. Here you can encode another information just by leveraging the order. You assign to a cable #2 and to the one connected to it #3. Then you assign to another cable #4 and to the one connected to it #5, and so on and so forth (*).
Now Let's try to connect again all of them and leave 2 disconnected. This time we will split the disconnect pair and connect one cable of them. So assume we connect (#1, #2 ) (#3, #4) (#118, #119) and leave #120 disconnected (**)
Return to base. We have red and blue. But we know that one of them must be disconnected and we know that the disconnected one has number #120, while the other one must be #1. That' why we broke the pair because we wanted to solve the ambiguity a (disconnection is not ambiguous when there is just one disconnected). So now we know what is #1 and what is #120.
Surprisingly enough we are done! because we know that the other side has (#1, #2 ) so we know #2 is the one connected to #1 and we can test what is connected to #1. But when we know the #2 then we know the #3 because of the naming convention adopted here (*), and if we know the #3 then we know the #4 because of the choice taken here (**).
Talking about encoding information.
Solution.
The key intuition here is that you have 2 sources of information. Either the cables are connected or they are disconnected ;-). so you can connect all of them but leave two of disconnected or connect just two of them. if you leave 3 disconnected than you have an ambiguity and you need a round trip for solving this for just 3 cables. That's a lot
Let's try to leave just 2 connected. You need 1 round trip to identify the pair and so a total of 60 trips. That's a lot.
Let's try to connect all of them but leave 2 disconnected marking them red and blue. In one trip you find the disconnected pair and can name them #1 and #120 (any number works here provided that you remember it for the next step). All the other cables will be numbered from #2 to #119. Here you can encode another information just by leveraging the order. You assign to a cable #2 and to the one connected to it #3. Then you assign to another cable #4 and to the one connected to it #5, and so on and so forth (*).
Now Let's try to connect again all of them and leave 2 disconnected. This time we will split the disconnect pair and connect one cable of them. So assume we connect (#1, #2 ) (#3, #4) (#118, #119) and leave #120 disconnected (**)
Return to base. We have red and blue. But we know that one of them must be disconnected and we know that the disconnected one has number #120, while the other one must be #1. That' why we broke the pair because we wanted to solve the ambiguity a (disconnection is not ambiguous when there is just one disconnected). So now we know what is #1 and what is #120.
Surprisingly enough we are done! because we know that the other side has (#1, #2 ) so we know #2 is the one connected to #1 and we can test what is connected to #1. But when we know the #2 then we know the #3 because of the naming convention adopted here (*), and if we know the #3 then we know the #4 because of the choice taken here (**).
Talking about encoding information.
Monday, August 15, 2011
maximize coins
There are n coins in a line. Two players take turns to take a coin from one of the ends of the line until there are no more coins left. The player with the larger amount of money wins. Assume that you go first, describe an algorithm to compute the maximum amount of money you can win.
Solution:
this looks like a Dynamic Programming problem. Say M(i, j) is the maximum when there are the coins from i to j available. In this case I have 2 options.
Pick from A[i], so i get value A[i]. The other player takes either A[j] or A[i+1] leaving me the options either for M(i+1, j-1) or M(i+2, j). The other player will try to minimize my gains so he will chose min(M(i+1, j-1), M(i+2, j))
Pick from A[j], so i get value A[j]. The other player takes either A[i] or A[j-1] leaving me the options either for M(i+1, j-1) or M(i, j-2) and again the other player will try to minimize my choices min(M(i+1, j-1), M(i, j-2))
I select the two above trying to maximize my ROI, so the total formula would be
max (A[i]+min(M(i+1, j-1), M(i+2, j)), A[j]+min(M(i+1, j-1), M(i, j-2)))
Pretty cool isnt'it?
Standard memoization is suggested
Solution:
this looks like a Dynamic Programming problem. Say M(i, j) is the maximum when there are the coins from i to j available. In this case I have 2 options.
Pick from A[i], so i get value A[i]. The other player takes either A[j] or A[i+1] leaving me the options either for M(i+1, j-1) or M(i+2, j). The other player will try to minimize my gains so he will chose min(M(i+1, j-1), M(i+2, j))
Pick from A[j], so i get value A[j]. The other player takes either A[i] or A[j-1] leaving me the options either for M(i+1, j-1) or M(i, j-2) and again the other player will try to minimize my choices min(M(i+1, j-1), M(i, j-2))
I select the two above trying to maximize my ROI, so the total formula would be
max (A[i]+min(M(i+1, j-1), M(i+2, j)), A[j]+min(M(i+1, j-1), M(i, j-2)))
Pretty cool isnt'it?
Standard memoization is suggested
Sunday, August 14, 2011
Sliding window on A
A long array A[] is given to you. There is a sliding window of size w which is moving from the very left of the array to the very right. You can only see the w numbers in the window. Each time the sliding window moves rightwards by one position. Produce an array B[], B[i] is the maximum value of from A[i] to A[i+w-1]
Solution
- a naive solution is O(n w)
- an heap based solution could be O( n lg w), but we need to change the heap for maintaining n-w max positions and invalidate when the windows move. too complicated and not sure we can make it
- we can maintain a queue or better a dequeue while processing the array. The size of the queue Q is w and the key observation is that when we see an element A[j] we can remove each element in Q such that A[j] > Q[z] because we are looking for the max and those elements are not max given the esistence of A[j]. In doing so, we keep the current maximum at front and we remove the other elements from the back of the deque. Each element will enter 1 time in the queue and will exit at most 1 time, so the total complexity is O(n) which is pretty good. For keeping the sliding window we push current Array index so we need to remove from front when out of current window
Solution
- a naive solution is O(n w)
- an heap based solution could be O( n lg w), but we need to change the heap for maintaining n-w max positions and invalidate when the windows move. too complicated and not sure we can make it
- we can maintain a queue or better a dequeue while processing the array. The size of the queue Q is w and the key observation is that when we see an element A[j] we can remove each element in Q such that A[j] > Q[z] because we are looking for the max and those elements are not max given the esistence of A[j]. In doing so, we keep the current maximum at front and we remove the other elements from the back of the deque. Each element will enter 1 time in the queue and will exit at most 1 time, so the total complexity is O(n) which is pretty good. For keeping the sliding window we push current Array index so we need to remove from front when out of current window
- #include
- void maxSlidingWindow(int A[], int n, int w, int B[])
- {
- deque<int> Q;
- //Initilize deque Q for first window
- for (int i = 0; i < w; i++)
- {
- while (!Q.empty() && A[i] >= A[Q.back()])
- Q.pop_back();
- Q.push_back(i);
- }
- for (int i = w; i < n; i++)
- {
- B[i-w] = A[Q.front()];
- //update Q for new window
- while (!Q.empty() && A[i] >= A[Q.back()])
- Q.pop_back();
- //Pop older element outside window from Q
- while (!Q.empty() && Q.front() <= i-w)
- Q.pop_front();
- //Insert current element in Q
- Q.push_back(i);
- }
- B[n-w] = A[Q.front()];
- }
Saturday, August 13, 2011
Knapsack
You have n types of items, where the ith item type has an integer size si and a real value vi. You need to fill a knapsack of total capacity C with a selection of items of maximum value.
Friday, August 12, 2011
Pretty educative talk by the way of Google guys
Pretty educative talk by the way of Google guys.
Convert a BST into a circular list
I like this question because it forces you to think about recursion, visit of a tree and how to change pointers
Thursday, August 11, 2011
subtrees
Find if any subtree of a binary tree has sum equal to given value. Subtree may or may not start from root.
Hint: solve the one with substree starting from root first.
Hint: solve the one with substree starting from root first.
Jump or not jump (solution)
Given an array start from the first element and reach the last by jumping. The jump length can be at most the value at the current position in the array. Optimum result is when u reach the goal in minimum number of jumps.
For ex:
Given array A = {2,3,1,1,4}
possible ways to reach the end (index list)
i) 0,2,3,4 (jump 2 to index 2, then jump 1 to index 3 then 1 to index 4)
ii) 0,1,4 (jump 1 to index 1, then jump 3 to index 4)
Solution:
This problem smells like a good candidate for Dynamic Programming. One additional observation is that you need to start processing from right to left for solving the problem. Given the example:
min[4] = 0 (because you are at the end)
min[3] = 1 (because you can just go right of 1 position, more than that will be not allowed)
min[2] = min (min[4], min[3]) here you need to consider all the one that are effectively reachable from current position and this is expressed by current index.
Now how can you get this minimum for a given interval?
a) checking all the positions -> will give you a quadratic algorithm
b) using a tree -> insert and get minimum will give you an O(n log n) algorithm
c) using range queries or Range mimimum queries and this is linear because the minimum for an internal can be obtained in O(1)
For ex:
Given array A = {2,3,1,1,4}
possible ways to reach the end (index list)
i) 0,2,3,4 (jump 2 to index 2, then jump 1 to index 3 then 1 to index 4)
ii) 0,1,4 (jump 1 to index 1, then jump 3 to index 4)
Solution:
This problem smells like a good candidate for Dynamic Programming. One additional observation is that you need to start processing from right to left for solving the problem. Given the example:
min[4] = 0 (because you are at the end)
min[3] = 1 (because you can just go right of 1 position, more than that will be not allowed)
min[2] = min (min[4], min[3]) here you need to consider all the one that are effectively reachable from current position and this is expressed by current index.
Now how can you get this minimum for a given interval?
a) checking all the positions -> will give you a quadratic algorithm
b) using a tree -> insert and get minimum will give you an O(n log n) algorithm
c) using range queries or Range mimimum queries and this is linear because the minimum for an internal can be obtained in O(1)
Wednesday, August 10, 2011
Passing a subclass object to a function waiting for a superclass object
what will happen?
Find blobs
On a binary matrix mxn a blob is a sequence of contiguos 1 (there are at most 8 potential candidates), enumerate and track all the blobs.
I learned something new "Persistent data structure"
Persistent data structure. Thanks Giuseppe, that's a new meaning for the work {persistent} that I was not aware of.
Reservoir sampling
Reservoir sampling, a good algo to know for randomly selecting k items from a list of N. the key observation here is that the replacement is happening with increasing probability i
array R[k]; // result integer i, j; // fill the reservoir array for each i in 1 to k do R[i] := S[i] done; // replace elements with gradually decreasing probability for each i in k+1 to length(S) do j := random(1, i); // important: inclusive range if j <= k then R[j] := S[i] fi done
A couple of feature suggestions for Facebook News
Dear Facebook,
Here a couple of suggestions from an avid user:
1. Let me order my news feed by people. News feed on Facebook is the modern inbox, so i'd like to sort my incoming messages by sender (e.g. friends) and to associate different friends with different priorities. Nothing new here we have this in all email readers;
2. Let me create additional folders and rules over my news feed. Again, Facebook is the modern inbox, so i'd like to predefine a set of rules for filtering messages and automatically associate them to different folders. Nothing new here: we have this option in all email readers;
3. Let me create mailing lists and send messages to people in those specific lists. Again, Facebook is the modern inbox, and when I send an email I don't want to send it to all the people in my address book but just to a subset of them. If you want to make this feature cool call it "square", or "rectangle", or maybe "circle". Yes, "circle" has a mythological effect which is always a plus. Nothing new here we have this option in all email readers;
4. Let me filter senders. I do not want to get updates from some of my friends, so let me ban them. Yes, I know that this feature is already there but I wanted you to extend it. Even if I ban a friend, sometime I just want to get some of the updates from her because I want to give her a second chance. So please allow me to explore if she is become more catchy.
5. Let me create folders with topics. In addition, to the normal news feeds I'd like to create special folders where you automatically classify incoming messages. I give a couple of examples and you learn from them. For instance, I wanted all the messages linking to a news site in a folder. All the messages about recipes in another and so on and so forth; Please learn from my behavior.
6. Let me searches my past messages. Currently this feature s..ks!
Your loyal User,
Antonio
Here a couple of suggestions from an avid user:
1. Let me order my news feed by people. News feed on Facebook is the modern inbox, so i'd like to sort my incoming messages by sender (e.g. friends) and to associate different friends with different priorities. Nothing new here we have this in all email readers;
2. Let me create additional folders and rules over my news feed. Again, Facebook is the modern inbox, so i'd like to predefine a set of rules for filtering messages and automatically associate them to different folders. Nothing new here: we have this option in all email readers;
3. Let me create mailing lists and send messages to people in those specific lists. Again, Facebook is the modern inbox, and when I send an email I don't want to send it to all the people in my address book but just to a subset of them. If you want to make this feature cool call it "square", or "rectangle", or maybe "circle". Yes, "circle" has a mythological effect which is always a plus. Nothing new here we have this option in all email readers;
4. Let me filter senders. I do not want to get updates from some of my friends, so let me ban them. Yes, I know that this feature is already there but I wanted you to extend it. Even if I ban a friend, sometime I just want to get some of the updates from her because I want to give her a second chance. So please allow me to explore if she is become more catchy.
5. Let me create folders with topics. In addition, to the normal news feeds I'd like to create special folders where you automatically classify incoming messages. I give a couple of examples and you learn from them. For instance, I wanted all the messages linking to a news site in a folder. All the messages about recipes in another and so on and so forth; Please learn from my behavior.
6. Let me searches my past messages. Currently this feature s..ks!
Your loyal User,
Antonio
Tuesday, August 9, 2011
Find the diameter of a BST
Find the diameter of a binary search tree.
Hint: this can be either in the left sub-tree or in the right-subtree or crossing the root.
Hint: this can be either in the left sub-tree or in the right-subtree or crossing the root.
Monday, August 8, 2011
Can you learn how to optimize two or more loss function at the same time?
A good paper for something very useful
Ranking Specialization for Web Search: A Divide and Conquer Approach Using Topical RankSVM
Source: the 19th International World Wide Web Conference
Keywords: Search Sciences
Download:
Sunday, August 7, 2011
Saturday, August 6, 2011
You need to implement a suggest as you type service
1. what approach would your use?
2. what data structure would you use?
3. suppose you want also to integrate a form of spell correction, what would be your approach?
2. what data structure would you use?
3. suppose you want also to integrate a form of spell correction, what would be your approach?
Friday, August 5, 2011
My first 15 years in Search.
Pisa, 1996 - 2011. In August, 15 years ago a crazy group of few guys in Pisa were thinking about building a search engine. Our first index had 1.8M of web documents. wow! There were less than 10,000 servers world wide:
Arianna became operational on 1996 in web developed by a group IOL in collaboration with the Centre SerRA Università degli studi di Pisa and the laboratory of Pisa Olivetti of Telemedia. Soon the site becomes a point of reference for the web in Italian in perennial competition Virgilo
http://www.microsofttranslator.com/bv.aspx?ref=Internal&from&to=en&a=http%3A%2F%2Fit.wikipedia.org%2Fwiki%2FArianna_%28motore_di_ricerca%29
Arianna became operational on 1996 in web developed by a group IOL in collaboration with the Centre SerRA Università degli studi di Pisa and the laboratory of Pisa Olivetti of Telemedia. Soon the site becomes a point of reference for the web in Italian in perennial competition Virgilo
http://www.microsofttranslator.com/bv.aspx?ref=Internal&from&to=en&a=http%3A%2F%2Fit.wikipedia.org%2Fwiki%2FArianna_%28motore_di_ricerca%29
Thursday, August 4, 2011
My forecast: this will change the Ad display space forever
The ad analytics company is about to release a new service that will tell advertisers who’s seeing their ads--anywhere on the Internet. And Facebook is going to be the engine that powers it.
your opinion?
Wednesday, August 3, 2011
starting to be interested in Ads space
Good synthetic description of Generalized Second Price and motivation behind it.
Tuesday, August 2, 2011
Something to know: DAAT algorithm
DAAT is a nice IR algorithm, useful to know (page 100)
Monday, August 1, 2011
Computational Ads
I am more and more interested in Ads and this course from Andrei Broder is a pure golden mine.
Subscribe to:
Posts (Atom)