## Friday, December 31, 2010

### Divide in two equal parts

You are given an array of positive integers, and you have to find a split point such that the two parts have equal sum

## Thursday, December 30, 2010

### Longest increasing sub-sequence

This is a well know algorithm, but it is useful to know for interviews

## Wednesday, December 29, 2010

### Subsequences

Given a sequence of integers, find a continuous subsequence which maximizes the sum of its elements

## Tuesday, December 28, 2010

### DP bracket problem

Given an increasing sequence of integers s1, ...., sk how many balanced parenthesis strings can you generate?

## Monday, December 27, 2010

### Count the number of inversions in an array

Given an array A unsorted, count the number of inversions.

## Sunday, December 26, 2010

### Reverse a string with O(1) additional space

Old interview question

## Saturday, December 25, 2010

### Implement a sorting algorithm on a list

## Friday, December 24, 2010

### Nesting envelopes

Given k different types of envelopes. The dimensions of envelope type i are xi × yi. You can place envelope A inside envelope B if and only if the dimensions A are

strictly smaller than the dimensions of B. how many possible combination you can have?

strictly smaller than the dimensions of B. how many possible combination you can have?

## Thursday, December 23, 2010

### Modify a stack to implement min/max in O(1)

Modify a stack to implement min/max operations in O(1)

## Wednesday, December 22, 2010

### Maximize the number of items you collect

This is a classical DP problem. A table composed of N*M cells with certain items non zero. Start from (0, 0) and you can go down or right one cell. Design an algorithm to find the maximum number of non zero items you can collect, if you are moving from upper-left corner to bottom-right corner.

What is the complexity and the space needed?

What is the complexity and the space needed?

## Tuesday, December 21, 2010

### Launched News in UK, and FR

Another year, another news product. Optimized local rankers, classifiers and clustering. Oh yes, clustering my dear.

## Monday, December 20, 2010

### Are you the average of your friend actions?

A lot of people are enthusiastic about the benefits of social signals for selecting and ranking information. My question for you is: do you believe that you are well represented by the average actions of your most active social friends? or do you also have differences and a personal identity?

## Sunday, December 19, 2010

### Related Searches shipped in 8 countries

What an amazing ride. We shipped 8 countries in less than one year with localized filtering, aggregation, and ranking. Italy is part of the list (yeah!!!)

More pictures here

## Saturday, December 18, 2010

### Maximize and expression with no parenthesis

Given an expression E of n numbers with *, + operations and no precedence among the operators, identify the set of parenthesis which maximize the results for E.

## Friday, December 17, 2010

### Autosuggest shipped in 5 countries

What an amazing ride. We shipped 5 countries in less than one year with localized filtering, aggregation, and ranking.

More pictures here

## Thursday, December 16, 2010

### Encode and Decode a document

A 'Document' is defined as a collection of 'Fields'. Some fields can be optional. A 'Field' can be a first-order type (int, char, float), or an array of first order types.

Write the code to serialize and de-serialize a collection of 'Documents'

Write the code to serialize and de-serialize a collection of 'Documents'

## Wednesday, December 15, 2010

### Hive, SQL like, Facebook and the map join optimization

Hive is the SQL-like processor on the top of Hadoop, the open source Map-Reduce implementation. Facebook recently proposed an interesting optimization for the sort and merge phase adopted by the Reducer: "The motivation of map join is to save the shuffle and reduce stages and do the join work only in the map stage. By doing so, when one of the join tables is small enough to fit into the memory, all the mappers can hold the data in memory and do the join work there. So all the join operations can be finished in the map stage. However there are some scaling problems with this type of map join. When thousands of mappers read the small join table from the Hadoop Distributed File System (HDFS) into memory at the same time, the join table easily becomes the performance bottleneck, causing the mappers to time out during the read operation".

This is the optimization: "Hive-1641 solves this scaling problem. The basic idea of optimization is to create a new MapReduce local task just before the original join MapReduce task. This new task reads the small table data from HDFS to an in-memory hash table. After reading, it serializes the in-memory hash table into a hashtable file. In the next stage, when the MapReduce task is launching, it uploads this hashtable file to the Hadoop distributed cache, which populates these files to each mapper's local disk. So all the mappers can load this persistent hashtable file back into memory and do the join work as before."

So they are doing some pre-computation before the join operation itself for computing an hash table which is then pre-loaded in the memory of the local mapper. The performance gains are amazing since optimized map join is 12 to 26 times faster than the previous one.

I wonder how a fast compression scheme could help here. Paolo?

This is the optimization: "Hive-1641 solves this scaling problem. The basic idea of optimization is to create a new MapReduce local task just before the original join MapReduce task. This new task reads the small table data from HDFS to an in-memory hash table. After reading, it serializes the in-memory hash table into a hashtable file. In the next stage, when the MapReduce task is launching, it uploads this hashtable file to the Hadoop distributed cache, which populates these files to each mapper's local disk. So all the mappers can load this persistent hashtable file back into memory and do the join work as before."

So they are doing some pre-computation before the join operation itself for computing an hash table which is then pre-loaded in the memory of the local mapper. The performance gains are amazing since optimized map join is 12 to 26 times faster than the previous one.

I wonder how a fast compression scheme could help here. Paolo?

## Tuesday, December 14, 2010

### K-th sum for two sorted arrays

You are given 2 arrays A, B sorted in decreasing order of size m and n respectively and a number k, such that 1 <= k <= m*n. You need to fine the kth largest sum(a+b) possible where a \in A, and b \in B.

Hint: i like this problem, a possible solution is by induction and the complexity is O(n+m).

Hint: i like this problem, a possible solution is by induction and the complexity is O(n+m).

## Monday, December 13, 2010

### Interesting google talk

Available here, with more information about current G's index compression, spanner project, and many other features.

## Sunday, December 12, 2010

### Out of band: Red power led on dg834g

My router dg834g decided to die on Sunday. It died in a strage way. When turned on the power led remains red. Then, the wireless router starts the power on-cycle but it goes in an infinite reboot cycle. Anyway, sometime it simply starts correctly.

It turns out that the problem was with power supply unit. I just replaced it with an equivalent one.

It turns out that the problem was with power supply unit. I just replaced it with an equivalent one.

## Saturday, December 11, 2010

### Out of band: vnc connection to server behind a dg834g nat

I wanted to use my dg834g for accessing my servers behind a nat with dynamic ip.

The solution was to use dynamic dns (supported natively by the router), and ultravnc with security encryption. The dynamic ip allows to have a unique name, regardless of the dynamic ip assigned to the router. Then, I needed a way to reach the vnc severs. My solution was to a assign static ip to each server in my lan (instead of relaying on dhcp), and to have different range of vnc ports for each server (from server property page). Finally, i setup a static port forwarding set of rules for those ports (for each service configure a "service" in router's content filtering, and then open the "service" in the router's firewall and send to a specific lan server).

And voila, now my servers are accessible with 2048-bit RSA keys and 256-bit AES keys security encryption.

The solution was to use dynamic dns (supported natively by the router), and ultravnc with security encryption. The dynamic ip allows to have a unique name, regardless of the dynamic ip assigned to the router. Then, I needed a way to reach the vnc severs. My solution was to a assign static ip to each server in my lan (instead of relaying on dhcp), and to have different range of vnc ports for each server (from server property page). Finally, i setup a static port forwarding set of rules for those ports (for each service configure a "service" in router's content filtering, and then open the "service" in the router's firewall and send to a specific lan server).

And voila, now my servers are accessible with 2048-bit RSA keys and 256-bit AES keys security encryption.

## Friday, December 10, 2010

### Out of band: vpn with a dg834g router

I wanted to have a native vpn with my dg834g router. This is the configuration. Then you can adopt a client, such as the greenbow

### Facebook data store

I used this in my interviews, now not anymore. Device an infrastructure for storing data in Facebook. There are 500M users and let's assume the average number of friends is 100. Each user can post a message, or can read all the fresh messages produced by her friends.

## Thursday, December 9, 2010

### Google's Dremel

Greg pointed out this interesting G paper "Dremel: Interactive Analysis of Web-Scale Datasets" (PDF)"

"By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing."

The key idea is very simple, but the implementation is complex: "our goal is to store all values of a given field consecutively to improve retrieval efficiency". This is the so called columnar vs. record-oriented storage.

Then, SQL relational operations are implemented by using the following intuition: "Think of a nested record as a labeled tree, where each label corresponds to a field name. The selection operator prunes away the branches of the tree that do not satisfy the specified conditions." .. "Dremel uses a multi-level serving tree to execute queries (see Figure 7). A root server receives incoming queries, reads metadata from the tables, and routes the queries to

the next level in the serving tree."

Dremel scans quadrillions of records per month.

## Wednesday, December 8, 2010

### Segments with points

You are given a blue segment S of length n. There are n points p_i (0 <= i < n) distributed uniformly at random. You mark in red all the segments connecting points p_i and p_j what would be the predominant colour in S after that?

## Tuesday, December 7, 2010

### RTTI

I was using this in my interviews. Now no longer: "What is RTTI?"

## Monday, December 6, 2010

### Subarray sorted

Given an unsorted array arr[0..n-1] of size n, find the minimum length subarray arr[s..e] such that sorting this subarray makes the whole array sorted.

## Sunday, December 5, 2010

### Numbers (by the way of Geofffrey Dean)

## Saturday, December 4, 2010

### Facebook friend suggestions

I used this in my past interviews: now not any longer. Devise an algorithm for suggesting friends in Facebook.

## Friday, December 3, 2010

### Queries and time

You are given a log of queries annotated with a time-stamp, where a single entry is

. Define a data-structure for efficiently identify the top k most frequent queries in the time interval [t0, t0 + \tau]

## Thursday, December 2, 2010

### Celebrity twitters

In a n twitter set, a celebrity is defined as someone who is followed by everyone, but who follows no one. You need to identify the celebrities by asking to twitters the question: "do you know person x?" The answer will be yes/no. Find the celebrity by asking only O(n) questions.

Hint: how many information can you infer with one question?

Hint: how many information can you infer with one question?

## Wednesday, December 1, 2010

### Ants in a row

You have a set of ants moving on an horizontal segment of length n. Each ant can randomically move either right or left. When an ant takes a direction, she stays in that direction until she either drops out of the segment or it bumps with another ant. In this case, the ant changes her direction. Can you formalize the system and describe what will happen after t iterations?

(thanks giuseppe, muthu)

(thanks giuseppe, muthu)

Subscribe to:
Posts (Atom)