Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Sunday, May 30, 2010
find the longest increasing subsequence of a given sequence
Saturday, May 29, 2010
How to find largest palindrome
in a string of lenght n
Friday, May 28, 2010
Apple beats Microsoft
The maker of Mac computers, interactive phone and tablet iPhone IPAD beats Microsoft as the most valued technology firm in the world on stock market.
Thursday, May 27, 2010
Wednesday, May 26, 2010
Business Model for Facebook
Facebook is one of the most interesting company out of there. They went from few millions of users up to five hundred million in a couple of years. I forecast they break the billion by Q2 2012.
In addition, running the business is not as expensive as search. They need to store user profiles and a lot of images. Video is outsourced to youtube and realtime updates are not so expensive (what is the history they maintain?).
Anyway, running a business requires to make money. So here is my call: What business model do you suggest for facebook to make money?
Content ads never worked for search. Could it work for FB? Or what else?
In addition, running the business is not as expensive as search. They need to store user profiles and a lot of images. Video is outsourced to youtube and realtime updates are not so expensive (what is the history they maintain?).
Anyway, running a business requires to make money. So here is my call: What business model do you suggest for facebook to make money?
Content ads never worked for search. Could it work for FB? Or what else?
Tuesday, May 25, 2010
Count numbers
How do you count the number of ways a number can be expressed as a sum
of 2 or more numbers?
For eg. if the number is 5 , count=3 i.e 1+1+1+1+1, 4+1, 3+2
note 2+3 is same as 3+2
of 2 or more numbers?
For eg. if the number is 5 , count=3 i.e 1+1+1+1+1, 4+1, 3+2
note 2+3 is same as 3+2
Monday, May 24, 2010
Detecting the dominant (>50%) symbol in a stream.
You are given a stream that you cannot hold in memory. At each instant you want to determine the dominant symbol observed in the stream (e.g. appearing more than 50% of times).
Sunday, May 23, 2010
Count in a range in O(1)
Given n integers in the range 0 to k, answers any query about how many of the n integers fall into a range [a b] in O(1) time. Make your own assumptions and build your own indexing data structures.
PS: I was used to ask this question during my interviews. Now no longer ;-)
PS: I was used to ask this question during my interviews. Now no longer ;-)
Saturday, May 22, 2010
Sort a dictionary of variable lenght words
Assume you have a dictionary of k words. They have variable lengths, but the total number of symbols counted if we juxtapose all the words is n. Give an optimal sorting algorithm.
PS: this is a tricky question, with practical implication in information retrieval.
PS: this is a tricky question, with practical implication in information retrieval.
Friday, May 21, 2010
The perfect interview: Make your own assumptions
I am making a lot of interviews these days. So I wonder if there is a way to make the "perfect" interview. I say, probably no. How can I understand how good is a candidate in just one hour? Certain people are good in communication, other people are shy. Certain people are good team workers, other people are good as individual contributors. Certain people are good with numbers and symbolic computations, other people are good in describing methodologies. Certain people can have a bad day, other people prefer to leave for other places. And so on... So you want me to understand all these factors in just one hour? I say: No ways. Give me a couple of weeks to start with.
Anyway, interviews are important and you need to find a solution. So I follow three golden rules:
1) Many judgments are better than one. The candidate should be evaluated by many independent interviewers in a loop. It would be better if the interviewers express no judgment until the loop is closed to avoid influencing each others;
2) I always ask to myself: "Can I work with this candidate? Would (s)he help me in solving the problems we face day by day?"
My interviews are around some problem solving (you read my blog so you know this), a lot of algorithmic questions ;-), a lot of C++ coding and design patterns. In addition, machine learning, retrieval, and data mining are my areas of expertise so do expect to get some questions here. I am not very much impressed if you know all the recent academic papers or the books. I am very much interested about your intuitions. In fact, my third question is the most important one:
3) "How much creative is this candidate? How much can we learn from him in the future?"
The most interesting part of the interview is when we can discuss about hard problems
applied to real life and on very large dataset (up to petabytes of data). I describe the problem with one or two sentences and then tell to the candidate
Make your own assumptions
Anyway, interviews are important and you need to find a solution. So I follow three golden rules:
1) Many judgments are better than one. The candidate should be evaluated by many independent interviewers in a loop. It would be better if the interviewers express no judgment until the loop is closed to avoid influencing each others;
2) I always ask to myself: "Can I work with this candidate? Would (s)he help me in solving the problems we face day by day?"
My interviews are around some problem solving (you read my blog so you know this), a lot of algorithmic questions ;-), a lot of C++ coding and design patterns. In addition, machine learning, retrieval, and data mining are my areas of expertise so do expect to get some questions here. I am not very much impressed if you know all the recent academic papers or the books. I am very much interested about your intuitions. In fact, my third question is the most important one:
3) "How much creative is this candidate? How much can we learn from him in the future?"
The most interesting part of the interview is when we can discuss about hard problems
applied to real life and on very large dataset (up to petabytes of data). I describe the problem with one or two sentences and then tell to the candidate
Make your own assumptions
Thursday, May 20, 2010
Increase the Page Views, Yahoo
Interesting move from Yahoo
Wednesday, May 19, 2010
Suffix Tree with Unicode Support
Interesting piece of code by the way of Zhang
Tuesday, May 18, 2010
Sort again
We have N element array with k distinct keys. sort this array without using any extra memory.
Monday, May 17, 2010
BST
given a bst of n nodes, find two nodes whose sum is equal to a number k in O(n) time and constant space
Sunday, May 16, 2010
Google and the WI-FI Mapping
Interesting posting about Google WI-FI Mapping by the way of Alessio. He suggested that this is due to the need of geo-localize mobile users. I hope that they do not want to make this alternative use found on YouTube
Saturday, May 15, 2010
YourOpenBook
A new UX on the top of Facebook OpenGraph search API -- http://youropenbook.org/
Friday, May 14, 2010
Why Facebook's "Like" buttom is a real game changer?
These are some elements we discussed with a friend of mine in front of a good coffee.
Facebook "Like" is a real game changer for two different reasons:
1) FB enlarged the base of its data sources. Every time a user push the "Like" button on a partner site, they will know.
2) FB enlarged the base of its data sources. Every time a user load an external page in a partner site including the "Like" button, they will know. Even if you do not push the button.
We both agreed that 2) is the most important information, because you know a lot about real-time traffic.
Facebook "Like" is a real game changer for two different reasons:
1) FB enlarged the base of its data sources. Every time a user push the "Like" button on a partner site, they will know.
2) FB enlarged the base of its data sources. Every time a user load an external page in a partner site including the "Like" button, they will know. Even if you do not push the button.
We both agreed that 2) is the most important information, because you know a lot about real-time traffic.
Thursday, May 13, 2010
Partition a set (a bit harder)
Partition a set of numbers into two sets such that the difference between their sum is mininum and they have equal num of elements
Wednesday, May 12, 2010
Tuesday, May 11, 2010
Common substrings
Find the longest common subsequence of given N strings each having length between 0 to M
Monday, May 10, 2010
Evolution of Search
For a long time I thought that Search was a mature market, with Google and Microsoft the only two players remaining to fight.
Well, I was wrong. Facebook has a lot of data to search and they are the only one who can mine it. Try to search the volcano situation. Strangely enough they are not giving too much emphasis to this feature. So far ...
Well, I was wrong. Facebook has a lot of data to search and they are the only one who can mine it. Try to search the volcano situation. Strangely enough they are not giving too much emphasis to this feature. So far ...
Sunday, May 9, 2010
Data analysis is the language of this age
Metric, data, numbers. Every theory must start with a measure.
Saturday, May 8, 2010
Friday, May 7, 2010
Minimum in two lists
You are given two sorted lists of size m and n. Give an O(log m+log n) time algorithm for computing the kth smallest element in the union of the two lists
Thursday, May 6, 2010
Optimal merge and operator AND in search
Given k sorted list merge them in optimal time. Assume that the total number of elements is n. Why this is useful for implementing the AND operator for a search engine?
(this is one of that questions that explains why basic algo knowledge is fundamental)
(this is one of that questions that explains why basic algo knowledge is fundamental)
Wednesday, May 5, 2010
What direction is the stack growing?
You are working on a machine / compiler and you want to determine if the stack is growing towards increasing or decreasing addresses. What strategy would you use?
Tuesday, May 4, 2010
Bartz in London
I must confess that I like her very much: ""I don't need everybody to think I am an asshole. You think it's so much fun answering your questions? If I didn't think there was a good bottle of white wine at the end of it – I probably wouldn't do"
Monday, May 3, 2010
Facebook Searches Double – Words per Search to 3.5
The number of search conducted on Facebook doubled in the last year to 650 million searches. The average number of words per search has reached 3.5
Sunday, May 2, 2010
Google acquired a 3d desktop company
I wrote about my will to invest in a 3d desktop company. Google acquired one company, but they are not true 3d they simulate 3d into a 2d space.
Saturday, May 1, 2010
Oneriot is indexing public Facebook data
Now, of course, we’re only showing (indeed, only have access to) data that has been shared publicly by Facebook users. A user can restrict the visibility of these Likes on their Facebook profile. However, we’d be sidestepping the issue if we didn’t recognize that some users might be concerned that stuff they have shared on Facebook can now pop up on services like ours. Given that, we are rolling out this feature as a very limited bucket test today to assess users’ reactions and gather feedback. We love the new feature. And if users do too then we’ll roll it out to everyone at an appropriate speed.
Subscribe to:
Posts (Atom)