Friday, December 31, 2010

Divide in two equal parts

You are given an array of positive integers, and you have to find a split point such that the two parts have equal sum

Thursday, December 30, 2010

Longest increasing sub-sequence

This is a well know algorithm, but it is useful to know for interviews

Wednesday, December 29, 2010

Subsequences

Given a sequence of integers, find a continuous subsequence which maximizes the sum of its elements

Tuesday, December 28, 2010

DP bracket problem

Given an increasing sequence of integers s1, ...., sk how many balanced parenthesis strings can you generate?

Monday, December 27, 2010

Count the number of inversions in an array

Given an array A unsorted, count the number of inversions.

Friday, December 24, 2010

Nesting envelopes

Given k different types of envelopes. The dimensions of envelope type i are xi × yi. You can place envelope A inside envelope B if and only if the dimensions A are
strictly smaller than the dimensions of B. how many possible combination you can have?

Thursday, December 23, 2010

Wednesday, December 22, 2010

Maximize the number of items you collect

This is a classical DP problem. A table composed of N*M cells with certain items non zero. Start from (0, 0) and you can go down or right one cell. Design an algorithm to find the maximum number of non zero items you can collect, if you are moving from upper-left corner to bottom-right corner.

What is the complexity and the space needed?

Tuesday, December 21, 2010

Launched News in UK, and FR


Another year, another news product. Optimized local rankers, classifiers and clustering. Oh yes, clustering my dear.

Monday, December 20, 2010

Are you the average of your friend actions?

A lot of people are enthusiastic about the benefits of social signals for selecting and ranking information. My question for you is: do you believe that you are well represented by the average actions of your most active social friends? or do you also have differences and a personal identity?

Sunday, December 19, 2010

Related Searches shipped in 8 countries


What an amazing ride. We shipped 8 countries in less than one year with localized filtering, aggregation, and ranking. Italy is part of the list (yeah!!!)


More pictures here

Saturday, December 18, 2010

Maximize and expression with no parenthesis

Given an expression E of n numbers with *, + operations and no precedence among the operators, identify the set of parenthesis which maximize the results for E.

Friday, December 17, 2010

Autosuggest shipped in 5 countries


What an amazing ride. We shipped 5 countries in less than one year with localized filtering, aggregation, and ranking.

More pictures here

Thursday, December 16, 2010

Encode and Decode a document

A 'Document' is defined as a collection of 'Fields'. Some fields can be optional. A 'Field' can be a first-order type (int, char, float), or an array of first order types.

Write the code to serialize and de-serialize a collection of 'Documents'

Wednesday, December 15, 2010

Hive, SQL like, Facebook and the map join optimization

Hive is the SQL-like processor on the top of Hadoop, the open source Map-Reduce implementation. Facebook recently proposed an interesting optimization for the sort and merge phase adopted by the Reducer: "The motivation of map join is to save the shuffle and reduce stages and do the join work only in the map stage. By doing so, when one of the join tables is small enough to fit into the memory, all the mappers can hold the data in memory and do the join work there. So all the join operations can be finished in the map stage. However there are some scaling problems with this type of map join. When thousands of mappers read the small join table from the Hadoop Distributed File System (HDFS) into memory at the same time, the join table easily becomes the performance bottleneck, causing the mappers to time out during the read operation".

This is the optimization: "Hive-1641 solves this scaling problem. The basic idea of optimization is to create a new MapReduce local task just before the original join MapReduce task. This new task reads the small table data from HDFS to an in-memory hash table. After reading, it serializes the in-memory hash table into a hashtable file. In the next stage, when the MapReduce task is launching, it uploads this hashtable file to the Hadoop distributed cache, which populates these files to each mapper's local disk. So all the mappers can load this persistent hashtable file back into memory and do the join work as before."

So they are doing some pre-computation before the join operation itself for computing an hash table which is then pre-loaded in the memory of the local mapper. The performance gains are amazing since optimized map join is 12 to 26 times faster than the previous one.

I wonder how a fast compression scheme could help here. Paolo?

Tuesday, December 14, 2010

K-th sum for two sorted arrays

You are given 2 arrays A, B sorted in decreasing order of size m and n respectively and a number k, such that 1 <= k <= m*n. You need to fine the kth largest sum(a+b) possible where a \in A, and b \in B.

Hint: i like this problem, a possible solution is by induction and the complexity is O(n+m).

Monday, December 13, 2010

Interesting google talk

Available here, with more information about current G's index compression, spanner project, and many other features.

Sunday, December 12, 2010

Out of band: Red power led on dg834g

My router dg834g decided to die on Sunday. It died in a strage way. When turned on the power led remains red. Then, the wireless router starts the power on-cycle but it goes in an infinite reboot cycle. Anyway, sometime it simply starts correctly.

It turns out that the problem was with power supply unit. I just replaced it with an equivalent one.

Saturday, December 11, 2010

Out of band: vnc connection to server behind a dg834g nat

I wanted to use my dg834g for accessing my servers behind a nat with dynamic ip.

The solution was to use dynamic dns (supported natively by the router), and ultravnc with security encryption. The dynamic ip allows to have a unique name, regardless of the dynamic ip assigned to the router. Then, I needed a way to reach the vnc severs. My solution was to a assign static ip to each server in my lan (instead of relaying on dhcp), and to have different range of vnc ports for each server (from server property page). Finally, i setup a static port forwarding set of rules for those ports (for each service configure a "service" in router's content filtering, and then open the "service" in the router's firewall and send to a specific lan server).

And voila, now my servers are accessible with 2048-bit RSA keys and 256-bit AES keys security encryption.

Friday, December 10, 2010

Out of band: vpn with a dg834g router

I wanted to have a native vpn with my dg834g router. This is the configuration. Then you can adopt a client, such as the greenbow


Facebook data store

I used this in my interviews, now not anymore. Device an infrastructure for storing data in Facebook. There are 500M users and let's assume the average number of friends is 100. Each user can post a message, or can read all the fresh messages produced by her friends.

Thursday, December 9, 2010

Google's Dremel


Greg pointed out this interesting G paper "Dremel: Interactive Analysis of Web-Scale Datasets" (PDF)"

"By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing."

The key idea is very simple, but the implementation is complex: "our goal is to store all values of a given field consecutively to improve retrieval efficiency". This is the so called columnar vs. record-oriented storage.

Then, SQL relational operations are implemented by using the following intuition: "Think of a nested record as a labeled tree, where each label corresponds to a field name. The selection operator prunes away the branches of the tree that do not satisfy the specified conditions." .. "Dremel uses a multi-level serving tree to execute queries (see Figure 7). A root server receives incoming queries, reads metadata from the tables, and routes the queries to
the next level in the serving tree."

Dremel scans quadrillions of records per month.

Wednesday, December 8, 2010

Segments with points

You are given a blue segment S of length n. There are n points p_i (0 <= i < n) distributed uniformly at random. You mark in red all the segments connecting points p_i and p_j what would be the predominant colour in S after that?

Tuesday, December 7, 2010

RTTI

I was using this in my interviews. Now no longer: "What is RTTI?"

Monday, December 6, 2010

Subarray sorted

Given an unsorted array arr[0..n-1] of size n, find the minimum length subarray arr[s..e] such that sorting this subarray makes the whole array sorted.

Sunday, December 5, 2010

Numbers (by the way of Geofffrey Dean)



A nanosecond (ns) is one billionth of a second (10-9 s

Saturday, December 4, 2010

Facebook friend suggestions

I used this in my past interviews: now not any longer. Devise an algorithm for suggesting friends in Facebook.

Friday, December 3, 2010

Queries and time

You are given a log of queries annotated with a time-stamp, where a single entry is . Define a data-structure for efficiently identify the top k most frequent queries in the time interval [t0, t0 + \tau]

Thursday, December 2, 2010

Celebrity twitters

In a n twitter set, a celebrity is defined as someone who is followed by everyone, but who follows no one. You need to identify the celebrities by asking to twitters the question: "do you know person x?" The answer will be yes/no. Find the celebrity by asking only O(n) questions.

Hint: how many information can you infer with one question?

Wednesday, December 1, 2010

Ants in a row

You have a set of ants moving on an horizontal segment of length n. Each ant can randomically move either right or left. When an ant takes a direction, she stays in that direction until she either drops out of the segment or it bumps with another ant. In this case, the ant changes her direction. Can you formalize the system and describe what will happen after t iterations?

(thanks giuseppe, muthu)