Sometime ago I asked to my friend Charisma why she was studying Italian and she said that she wanted to learn Spanish. I found the answer somehow confusing because the two languages are different. The point is that I missed to understand the main point ;-) You can actually learn and improve a language if you study different languages.
These days I am enjoying reading "Seven languages in seven weeks" a book which compares Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell. You can learn things such as Concurrency based on Actors in Ruby, or mixing OO and Functional programming in Scala, higher order programming in Haskel, logic programming in Prolog, having a language based on prototype patterns and messages as in Io, currying and clojures and many other things. Definitively a recommended reading.
The more I read, the more I understand and improve my C++. So Charisma you are right, you can actually study Italian for improving Spanish. It took me a while to understand that.
Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Monday, October 31, 2011
Sunday, October 30, 2011
Can we use Facebook messages to predict how the stock market is going?
This would be an interesting Machine Learning problem on a massive dataset. Can we use Facebook's messages to predict how the stock and bond market will evolve? Here a couple of examples of the raw dataset.
Web server performance
A web server W is running a service S which provides information I when it receive queries Q. How can you guarantee a time < s for the 95 percentile?
Saturday, October 29, 2011
Design a system for browsing songs from your friends
Similar to what spotify does.
Friday, October 28, 2011
Two colors
Given a undirect graph G=(N, E) a two colour scheme assigns to each node either the colour black or white. A node is diverse if it has the opposite colour of at least half of its neighbours. How can you assign a diverse colour to all the nodes in a graph?
tricky one
tricky one
Thursday, October 27, 2011
Design a system for implementing Facebook News feed
suppose you have 10K servers.
Wednesday, October 26, 2011
Assign employees to bosses
You are given a list of employees and a list of potential bosses. Each employee expresses his ranked list of bosses' preferences. Each boss expresses is ranked list of employees' preferences. Match the two classes so that the maximum number of preferences is satisfied.
Tuesday, October 25, 2011
Got assigned a new patent: how to bootstrap a learning to rank system with real traffic information
Just got assigned a patent filed back in 2005, "Sampling internet user traffic to improve search results". The problem we were trying to address was about bootstrapping a learning to rank system in absence of user search information. This is a typical problem you have when you are not the incumbent search engine, and you don't have already accumulated usage information and user behaviour activities (see more information here Calculating Search Rankings with User Web Traffic Data). How can you compensate this bootstrap situation?
The key intuition was to use web traffic information collected by the way web proxies and observing user traffic and navigational information, including traffic performed querying other search engines. A similar traffic can be observed by minging a collection of web logs for the HTTP_Referer tag, The methodology was used for improving the freshness, the coverage, the ranking and the clustering of search engine results and, more generically, may include monitoring web traffic on remote web servers on the communications network
The key intuition was to use web traffic information collected by the way web proxies and observing user traffic and navigational information, including traffic performed querying other search engines. A similar traffic can be observed by minging a collection of web logs for the HTTP_Referer tag, The methodology was used for improving the freshness, the coverage, the ranking and the clustering of search engine results and, more generically, may include monitoring web traffic on remote web servers on the communications network
Monday, October 24, 2011
Sunday, October 23, 2011
Playing with addresses
Detect if a machine is Big Endian or little endian. Detect if the stack will grow up or down
Saturday, October 22, 2011
Friday, October 21, 2011
Find a way to detect whether two dna strings are similar
One way of doing this is to find the longest common subsequence, since we can allow to have to strings that are also having different genes interleaved. So suppose you have {ACGGAGGGAA} and {ACGAAGG} what is the longest common subsequence? {ACGAGG}
Solution: either direct DP or reducing this to Longest non decreasing subsequence problem.
Solution: either direct DP or reducing this to Longest non decreasing subsequence problem.
Thursday, October 20, 2011
Find the longest non decreasing subsequence
Solution: s_i = max ( max (s_j + 1), 1) for j < i and A[j] < A[i]. So take an already available longest non decreasing subsequence and if you find a subsequent A[i] > A[j] then sum 1. Remember that the minimum is 1
Wednesday, October 19, 2011
Sort a huge set of log files in C++ and STL
you are given a huge set of log files sorted by timestamp and you need to merge them. You have unlimited disk, but fixed amount of memory. In case of ties, break it by consider a progressive number for the log files
Solution:
good to recall how a min-heap is defined in STL, it's just an heap with a custom comparison. An heap is nothing more that a normal std::vector and you build it with std::make_heap, std::push_back, std::push_heap(), std::front(), std::pop_heap(). It's always a bit confusing for me to consider that you need to push_back in the vector and than push_heap, but that's not so strange considering that an heap is built upon the vector itself ;-)
Solution:
good to recall how a min-heap is defined in STL, it's just an heap with a custom comparison. An heap is nothing more that a normal std::vector and you build it with std::make_heap, std::push_back, std::push_heap(), std::front(), std::pop_heap(). It's always a bit confusing for me to consider that you need to push_back in the vector and than push_heap, but that's not so strange considering that an heap is built upon the vector itself ;-)
bool greater (const std::pair& a, const std::pair & b)
Tuesday, October 18, 2011
Compute the running median and the running average
You are given a very long stream and you need to compute the running median for each group of 1k numbers. Same question for the running median. What is the complexity?
Suppose you know a priori the number of elements in the stream.
Solution:
given a running average, you can update the group by adding a new element and removing the oldest one. This will give you a solution that is linear in the number of elements in the stream
Similar intuition is for the median, you can use a balanced BST where every node has the number of nodes contained in the subtree rooted at it.
Suppose you know a priori the number of elements in the stream.
Solution:
given a running average, you can update the group by adding a new element and removing the oldest one. This will give you a solution that is linear in the number of elements in the stream
Similar intuition is for the median, you can use a balanced BST where every node has the number of nodes contained in the subtree rooted at it.
Monday, October 17, 2011
A feature suggestion for facebook: Social calendar
Nothing is more social than our calendar. By definition you invite your colleagues and friends to meet and discuss. So why not integrating a calendar in Facebook, I bet that this would be the 3rd killer app after News and Images.
PS: I love Microsoft, but this idea is simple and needed so I wonder Should I create a startup? ;)
PS: I love Microsoft, but this idea is simple and needed so I wonder Should I create a startup? ;)
Sunday, October 16, 2011
Winner takes it all
Given a set of 128 players, suppose that x "beats" y has the transitive property. How many games do you need to find the winner? and how many games for finding the second one?
Saturday, October 15, 2011
Compute the integer square of a 32bit number
What is the most efficient algorithm?
Friday, October 14, 2011
given a vector v generate all the subsets of v with no repetition
useful for a quick test of your STL knowledge. Then, try the same with repetition
Thursday, October 13, 2011
Social graph: create the links
You are given a snapshot of the social graph G = (N, E, t) at time t. Each node has a set of attributes (a1, a2, ... an). Three problems:
1. Create an edge if two nodes have exactly the same set of attributes
2. Create an edge if two nodes have similar attributes
3. What if a set of weights it's known a-priori for the attributes?
1. Create an edge if two nodes have exactly the same set of attributes
2. Create an edge if two nodes have similar attributes
3. What if a set of weights it's known a-priori for the attributes?
Wednesday, October 12, 2011
Where are my Friends? Cool Ios5 app similar to this experiment.
Apple just released a cool app into the new Ios5 which is based on the location provided by your phone. This is somehow similar to some experiments I made with Facebook location and Bing Maps. In my case, your facebook friends are mapped on a Bing map according to their Facebook location. In any case, mine was just an experiment and nothing more
http://www.headtagger.com/WhereAreMyFriends/map.html
This is where my Facebook friends are located in the world
These are my friends in San Francisco area
And this is a view of a friend of mine in New York City
http://www.headtagger.com/WhereAreMyFriends/map.html
This is where my Facebook friends are located in the world
Here my friends in Europe
These are my friends in San Francisco area
And this is a view of a friend of mine in New York City
Here is the Apple's preview
Tuesday, October 11, 2011
Maximum span in an array
Given an array A of numbers, find a pair of numbers A[i] and A[j], such that A[i] < A[j] and j-i is maximized.
solution: There is a trivial O(N^2) solution and a tricky O(N) based on a very simple intuition, where is the minimum?
Monday, October 10, 2011
Computer min & max with divide & conquer strategy
trivial !
Sunday, October 9, 2011
Saturday, October 8, 2011
Find the missing number
Given a db with about 300Millions social numbers where each number is 9 digits find a number that is missing. You have 2MB of Ram and unlimited disk
Solution
300Millions = 300 * 10^6 = 3 * 10^8
Suppose we take 10 counters and "hash" each number considering only the highest order digit, incrementing the counter if a given number has that starting digit. One missing number would be located by the counter with the lowest value. That is the intuition, we can now use a bitmap which fits 2MB of Ram and apply the same principle.
Solution
300Millions = 300 * 10^6 = 3 * 10^8
Suppose we take 10 counters and "hash" each number considering only the highest order digit, incrementing the counter if a given number has that starting digit. One missing number would be located by the counter with the lowest value. That is the intuition, we can now use a bitmap which fits 2MB of Ram and apply the same principle.
Friday, October 7, 2011
Thursday, October 6, 2011
Write a function for computing next permutation of a string
This is a little pearl piece of code
void printVector(int * value, int N) { for (int i = 0; i < N; i++) std::cout << value[i] << " " ; std::cout << std::endl; } void visit(int * value, int N, int K) { static int level = -1; level++; value[K] = level; // the level is current i if (level == N) // a complete permutation printVector(value, N); else for (int i = 0; i < N; i++) // 0... N-1 if (value[i] == 0) // not used before visit(value, N, i); level--; value[K] = 0; // nothing else to explore for the level, ready for next round of usage }
The above is cooler than the classical recursive solution
A permutation of 1,2,3,4 would be:
1 + permutation_of(2,3,4)
2 + permutation_of(1,3,4)
3 + permutation_of(2,1,4)
4 + permutation_of(2,3,1)
Wednesday, October 5, 2011
Intervals
Given a list of non overlapping intervals, design a data structure for inserting, finding and deleting them (e.g. [10, 30), [35, 50), [70, 100) ...)
Tuesday, October 4, 2011
Feature suggestion: Do we need a new search engine? Certainly not, but Facebook may offer a new definition of search
Bing, Yahoo and Google are already offering a complete search experience and there is probably no need of having a new search engine. Certainly, we already have a fast growing set of new engines for emerging markets such as Yandex in Russia and Baidu in China, but that's because they offer specific expertise in those markets (even if I would be not surprised of seeing them coming more and more in western countries).
Anyway, there is something new and Facebook has a unique position for offering a new set of search services, which could be integrated within the Facebook's popular notification system. Let's start from some examples:
I search for new software development positions in london. Facebook has all the public postings made by their social users. The data is there but the ranking is very poor. Plus this type of search is buried and you need to click 3 different links for accessing it.
I am looking for a musical in London, first result is totally irrelevant even if they have the location of the people posting, a full access to their internal social graph, a full access to my position in the graph, the number of like within facebook and in any external web site embedding the like button, many forms of temporal information, the links you shared, the title and the caption you changed, user importance, how many time I've interacted with my friends, any picture I have changed from automatic crawling, and a lot more (wow, not so bad). Many of those information are not necessarily available to other search engines
Anyway, there is something new and Facebook has a unique position for offering a new set of search services, which could be integrated within the Facebook's popular notification system. Let's start from some examples:
I search for new software development positions in london. Facebook has all the public postings made by their social users. The data is there but the ranking is very poor. Plus this type of search is buried and you need to click 3 different links for accessing it.
I am looking for a musical in London, first result is totally irrelevant even if they have the location of the people posting, a full access to their internal social graph, a full access to my position in the graph, the number of like within facebook and in any external web site embedding the like button, many forms of temporal information, the links you shared, the title and the caption you changed, user importance, how many time I've interacted with my friends, any picture I have changed from automatic crawling, and a lot more (wow, not so bad). Many of those information are not necessarily available to other search engines
Let's see other examples
Let's suppose we want to listen some music
Or get some news
I assume that you understood that the data is there, but a quality ranking is not. You may say that this is nothing new because we already had some interest in the past for real time search, and Bing is already including social likes information.
True. Anyway, Facebook as already a quite popular Notification system -- you know that white number on a red background -- which is updated anytime something new is happening in your social graph. So why not allowing myself to register for a new (and relevant please!) update for a query term that I've searched before? That would be a useful feature to have and a new definition of search where results are pushed to me instead of searching them again and again. Yes, I know I am lazy and I like when software is making something on my behalf because I save time for something else.
Monday, October 3, 2011
Given two vectors of integer, merge them in place
Given two vectors of integer, merge them in place (e.g. relocate the second vector to the sum of the vectors and merge the vectors in place).
Here is the code
Here is the code
Sunday, October 2, 2011
Given a large collection of social numbers (e.g. 360Millions)
how to efficiently find a single missing number. You have a very large disk available for storing the number but just 2Mb of Ram
Saturday, October 1, 2011
LCS of very long sequences
Compression is always looking for repetition. Suppose you have two very long files stored in disk. how will you find the LCS with a limited amount of RAM?
Subscribe to:
Posts (Atom)