Saturday, July 31, 2010

Most user viewed pages.

Design an algorithm to calculate the most user viewed pages.

Friday, July 30, 2010

Balancing learning to rank efficacy and efficiency

Search engines are used to apply dynamic ranking methods on the flight. Therefore efficiency of learning to rank methodologies is as important as efficacy (accuracy). Anyway, very little attention has been paid so far to efficacy.

Learning to efficiently rank introduces a principled and uni ed framework for learning ranking functions that jointly optimize both retrieval e ectiveness and efficiency. The key intuition is that those features having a weight under a threshold can be discarded, after the learning phase. An optimal parameter threshold is learned by direct optimization by using a non-analytical optimization procedure and performing a linear search in the optimization space. Each function evaluation requires measuring both efficiency and eff ectiveness of the current parameter setting . This can be costly for large training sets and large feature sets.

Gossip Engine: main features

Here a collection of features available in the Gossip engine online.

  • Temporal order, just a browsing interface where the stories are sorted by time
  • Cluster order, where the stories are grouped together and sorted by importance
  • Day order, where you can navigate the Top Stories of past days
  • Search where you can search by keywords

Off-topic and really amazing: controlling the enviroment with your mind

Thursday, July 29, 2010

Facebook Q&A and the power of the community as a ranking engine

There are mixed opinions about the debut of Facebook in the Q&A arena. Some people consider them useless.

I believe that here we are observing a complete new shift paradigm for search: a massive amout of users can be ranked according to their contribution for a specific activity. In this case, Q&A productions.

Traditional search engine (Bing and Google) can rank objects such as web pages, but they cannot rank the authors of such objects because they do not know them directly. Instead, Facebook can rank 500m of users whenever they produce an answer by taking into account any single action that each user peformed in Facebook before producing that answer (status updates, searches, link posted, etc).

Communities are the ranking engine with all the rich information produced by them (not just the anynimous links produced). Wow, what a change!

Facebook going Q&A

Today we're introducing Facebook Questions, a beta product that lets you pose questions like these to the Facebook community. With this new application, you can get a broader set of answers and learn valuable information from people knowledgeable on a range of topics.

Since we like to develop products carefully over time with your help, Facebook Questions is available to a limited number of people right now, and we'll be developing it rapidly based on their feedback. We're aiming to bring this product to all of you as quickly as we can.

Wednesday, July 28, 2010

Set intersection in an efficient way?

If you are given a set of 1000 integers in a set A , and 10,000,000 integers in a set B, how would you create a set C that would only contain numbers that are both in A and B?

Tuesday, July 27, 2010

Hats

N people team up and decide on a strategy for playing this game. Then they walk into a room. On entry to the room, each person is given a hat on which one of the first N natural numbers is written. There may be duplicate hat numbers. For example, for N=3, the 3 team members may get hats labeled 2, 1, 2. Each person can see the numbers written on the others' hats, but does not know the number written on his own hat. Every person then simultaneously guesses the number of his own hat. What strategy can the team follow to make sure that at least one person on the team guesses his hat number correctly?

Monday, July 26, 2010

Gossip Engine

I spent sometime to create a new hack engine based on Gossip. Just for my fun, simple and essential. I will discuss about it in the next days.

Sunday, July 25, 2010

Tree serialization

Design an algorithm and write code to serialize a binary tree.

Friday, July 23, 2010

Merge inplace

Given an array a1a2a3a4......anb1b2b3b4.......bnc1c2c3c4........cn
can you merge inplace and obtain
a1b1c1a2b2c2a3b3c3............................anbncn

Basic interview question

Given an array of n elements and an integer k where k≤n. Elemnts a[0].....a[k] and a[k+1].....a[n] are already sorted. Give an algorithm to sort in O(n) time and O(1) space

Thursday, July 22, 2010

Basic interview question

Do not come to me if you don't know how to answer it!!

how to search a number in a 2d matrix which is sorted both row wise and column wise

Wednesday, July 21, 2010

Facebook numbers

A few of the big numbers we deal with:
* 500 million active users
* 100 billion hits per day
* 50 billion photos
* 2 trillion objects cached, with hundreds of millions of requests per second
* 130TB of logs every day

Over the years we've written on this page about a number of the technical solutions we've used to handle these numbers. Today, I'd like to step back and talk about some of the general ways we think about scaling, and some of the principles we use to tackle scaling problems. Like Facebook itself, these principles involve both technology and people. In fact, only a couple of the principles below are entirely technical. At the end of the day it's people who build these systems and run them, and our best tools for scaling them are engineering and operations teams that can handle anything. The scaling statistic I'm most proud of is that we have over 1 million users per engineer, and this number has been steadily increasing.

Tuesday, July 20, 2010

Google image

Personally, I find the text under the image informative. I would love to see more image clustering instead.

"In terms of the consumer announcement, Google is rolling out a new UI and several new features. They include a new, more attractive “tiled” layout, infinite scroll (mimicking a similar feature on Bing), larger thumbnail images and a hover preview that offers more information about the image.

Gone is the text and related information beneath the images, as well as the need to click “next.” Google said that the new scrolling pages can hold up to 1,000 images on them. The company also announced that it sees more than a billion page views per day on Google images."

Sunday, July 18, 2010

Find first duplicate in O(k) for an array of N integers with k keys

In O(k) time and O(k) space is trivial, how about O(k) time and O(1) space?

Friday, July 16, 2010

An overview of some cool applications in Google

Realtime, Music, Mobile, Voice, Location, Picture similarity, Google Square, Answers all by the way of Marissa Mayer. "Search is just 10% sold, the rest is to be discovered". Agree 100% with Google here.

Thursday, July 15, 2010

Wednesday, July 14, 2010

Min difference in an array of integers

Given an array of integers A, find the two numbers a, b in A such that a-b is minimal.

Tuesday, July 13, 2010

Find three numbers in an array

Given an array of integers A, find the largest c such that c=a+b. a, b, c are in A.

Monday, July 12, 2010

A workload to be partitioned

you are given an amount of work load k, which can be partitioned among m workers. Generate all the partitions.

Sunday, July 11, 2010

Facebook data analytics

Get an idea, test with an A/B experiment and launch it. This on the top of Hadoop. See also the interesting numbers about number of users and processing clusters (2.5K nodes, 20K CPUs, 36Pb of data) with map/reduce and Hive (sql-like for map/reduce)



Parentesys, Dyck words and strings

A string S can contain just two symbols X and Y. The number of X symbols in S must be equal to the number of Y symbols. For each prefix of S, the number of X must be less than the number of Y.

What is the intepretation of the problem?

Wednesday, July 7, 2010

Monday, July 5, 2010

Beauty of palidromic search

Longest palidrom in a string in linear time. This posting is beauty and instructive.

Saturday, July 3, 2010

Comparison in external memory

Given a set of 1 Trillion integers on hard disk, find the smallest 1 million among them. You can fit at most 1 million integers in memory at a time.