Can you prove that the below code produces a random permutation, where RANDOM(i, n) produces a random integer in (i, n]

RANDOMIZE-IN-PLACE(A)

n = length[A]

for i =0 ... n-1

do swap (A[i], A[RANDOM(i, n)])

Random commentary about C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search

## Tuesday, March 30, 2010

## Monday, March 29, 2010

### Leaving Yahoo!

I always like creativity. Looking for new opportunities?

## Sunday, March 28, 2010

### Do you run or walk when it is raining?

Suppose you need to traverse a Leicester square in London. It's raining. Do you run or walk if you want to minimize the amount of rain you get?

## Saturday, March 27, 2010

### Number of inversions

Given an array A[0,.... n] of integer an inversion is defined as i < j and A[j] < A[i]. Suppose to randomly permute A. Give an algorithm that computes the number of inversion in A.

## Friday, March 26, 2010

### Approximate intersection of two lists

Search engines mantains a word index made up by a dictionary of words and, for each work a list of documents containing the. For instance

impossible -> 343, 5, 63459, 4, ....., 32

mission -> 3449, 558, ...., 49

Suppose the two lists are very long. For instance, the word 'impossible' can be contained in 100*10^6 documents, while the word mission can be contained in 50*10^6 documents.

1) return the documents containing the words 'impossible mission' (documents must contain both the words)

2) what is the complexity?

3) can you accellerate the computation?

4) how to compute the size of the lists intersection?

5) can you estimate this size in a fast way?

impossible -> 343, 5, 63459, 4, ....., 32

mission -> 3449, 558, ...., 49

Suppose the two lists are very long. For instance, the word 'impossible' can be contained in 100*10^6 documents, while the word mission can be contained in 50*10^6 documents.

1) return the documents containing the words 'impossible mission' (documents must contain both the words)

2) what is the complexity?

3) can you accellerate the computation?

4) how to compute the size of the lists intersection?

5) can you estimate this size in a fast way?

## Thursday, March 25, 2010

### Buying a new car

My ex-Personal Assistant she was always suggesting to buy a new car.

So I am evaluating this strategy. Every week I go to car dealer, I test a couple of cars and, If i find one that I like, a sell my current car c_i for amount of money m_i and a buy a new one c_i+1 for a new amount of money m_i+1. Note that going to the car dealer has a fixed cost say cd.

My strategy is to find the best car ever, but I want to estimate what is the cost for achieving this goal.

So I am evaluating this strategy. Every week I go to car dealer, I test a couple of cars and, If i find one that I like, a sell my current car c_i for amount of money m_i and a buy a new one c_i+1 for a new amount of money m_i+1. Note that going to the car dealer has a fixed cost say cd.

My strategy is to find the best car ever, but I want to estimate what is the cost for achieving this goal.

## Wednesday, March 24, 2010

### Generate a random permutation of an array of intergers

You are given an array of n integers containing numbers from 1 to n (with repetition), generate a random permutation of the array. The distribution should be uniform.

## Tuesday, March 23, 2010

### Tossing coins (ufff again?)

Suppose you flip a fair coin n times. What is the longest streak of consecutive heads that you

expect to see?

expect to see?

## Monday, March 22, 2010

### Points in a plane

given 2N points in a plane. Pair up to obtain N distinct pairs such that total sum of paired distances is minimum. N can be atmost 50.

## Sunday, March 21, 2010

### Least Square Methods: simple form of regression

Sometime simple solutions are the most effective. If you have a series of events e1, ... en, where each ei appears occ(ei) times with probability p(ei), you can represent the events in a Cartesian plan with axis containing the occurrences and the probability, respectively. Then you can imagine a straight line y = m x + b such that the distances from the line to the points in the space is minimized. Surprisingly enough, this method is straightforward has a closed solution and is very effective to predict the behavior of unseen points in the space.

## Saturday, March 20, 2010

### Language detection

Suppose you need to write a module for detecting the language in the text. What approach would you adopt?

## Friday, March 19, 2010

### Spam detection

Suppose you need to classify a set of web pages basing just on the textual content. What approach would you adopt?

## Thursday, March 18, 2010

### Traffic lights and probabilities

A classical probability problem. A traffic light has a mean waiting time of 14 seconds and a variance of 4 second. What is the probability of waiting more than 21 seconds?

P(X>21) = 1-P(X<=21) ; 14 / 4 = 1.5 this is the standardization of the distribution , then..

P(X>21) = 1-P(X<=21) ; 14 / 4 = 1.5 this is the standardization of the distribution , then..

## Wednesday, March 17, 2010

### A ticketing system

There are b available seats in a cinema and n customers. The room is dark. At the beginning all the seats are empty. A customer enters the cinema and randomly selects a seat. If it is available, then it will be assigned to that customer. Otherwise, he will randomly select another seat. How many selections are necessary in average to get all the seat assigned?

PS: if you find this very similar to the hash collision problem, well that is not by chance.

PS: if you find this very similar to the hash collision problem, well that is not by chance.

## Tuesday, March 16, 2010

### Document processing with score and cost function

Suppose you have a stream of document that you are evaluating. Each document has a score function s(i) >= 0, and a cost function c(i) >= 0. if you analyze a document i you have to pay the cost c(i) and you get access to the associated score (i). The goal is to maximize the score, but minimize the cost. What strategy would you adopt?

## Monday, March 15, 2010

### Radomizing an array before linear search

What would be the benefit?

## Sunday, March 14, 2010

### Random search in random array

Suppose you have an array of random integers A[i] i = 0, ... , n-1. Suppose you adopt the following random search strategy for value x. Pick a random index i into A. If A[i] = x, then we terminate; otherwise, continue the search by picking a new random index into A. Note that we may examine a given element more than once.

What is the expected number of indexes analyzed?

What is the expected number of indexes analyzed?

## Saturday, March 13, 2010

### A compressed vector space

Sometime is useful to represent a vector in a compressed form. This is particularly useful when you want to represent a vector of features in machine learning. The simplest data structure is an un-ordered vector of pairs (id, value), ... (id, value). The vector can be sorted on-the-fly for particular operations such as the intersection of two vectors. Here you have the code.

## Friday, March 12, 2010

### Sort different runs of integers

Let S be a sequence of n elements divided into n/k subsequences each of length where all of the elements in any subsequence are larger than all of the elements of a preceding subsequence and smaller than all of the elements of a succeeding subsequence. Can you give an algorithm for sorting S? What is the optimal complexity?

## Thursday, March 11, 2010

### Water jugs

Suppose that you are given n red and n blue water jugs, all of different shapes and sizes. All red jugs hold different amounts of water, as do the blue ones. Moreover, for every red jug, there is a blue jug that holds the same amount of water, and vice versa.

It is your task to find a grouping of the jugs into pairs of red and blue jugs that hold the same amount of water. To do so, you may perform the following operation: pick a pair of jugs in which one is red and one is blue, fill the red jug with water, and then pour the water into the blue jug. This operation will tell you whether the red or the blue jug can hold more water, or if they are of the same volume. Assume that such a comparison takes one time unit. Your goal is to find an algorithm that makes a minimum number of comparisons to determine the grouping. Remember that you may not directly compare two red jugs or two blue jugsWater

It is your task to find a grouping of the jugs into pairs of red and blue jugs that hold the same amount of water. To do so, you may perform the following operation: pick a pair of jugs in which one is red and one is blue, fill the red jug with water, and then pour the water into the blue jug. This operation will tell you whether the red or the blue jug can hold more water, or if they are of the same volume. Assume that such a comparison takes one time unit. Your goal is to find an algorithm that makes a minimum number of comparisons to determine the grouping. Remember that you may not directly compare two red jugs or two blue jugsWater

## Wednesday, March 10, 2010

### List cache-aware

Build a list that maximize the use of cache hierarchies. In modern computer a cache miss is a very negative situation, just like page faults some year ago.

## Tuesday, March 9, 2010

### What autosuggest do you want?

Any feedback is welcome

## Monday, March 8, 2010

### Waledac: malware cloud computing

It has been estimated that Waledac infected hundreds of thousands of computer. Is that a malware cloud?

## Saturday, March 6, 2010

### Find the i-th element in a dynamic set

You are given a dynamic set of integers D. The set can increase/decrease its size. You are requested to provide the operator i-th(D, i) which returns the element with rank i in D.

## Friday, March 5, 2010

## Thursday, March 4, 2010

### Containing objects

An object is defined in terms of triplets of integers . An object Oi can be contained in an object Oj if ai < aj, bi < bj, ci < cj. Given a set of objects S, find the maximum subset of objects that can be contained each other.

Solve the generalized problem where an object is represented in terms of n-ples.

Solve the generalized problem where an object is represented in terms of n-ples.

### Searching a set of strings

Suppose you are given a (large) set S of variable length strings. The set is very large, say 1 billion of strings. The goal is to identify exact string duplicates. How would you solve the problem?

## Wednesday, March 3, 2010

### Hashing with linked chains

Hashing can use a linked list whenever there is a conflict for a key. Here a couple of questions:

1) how would you organize the list? sorted / unsorted?

2) would you need to allocate/disallocate in case of delete and insertion with collisions?

1) how would you organize the list? sorted / unsorted?

2) would you need to allocate/disallocate in case of delete and insertion with collisions?

## Tuesday, March 2, 2010

### Quicksort worst-case

Traditionally quicksort is considered quadratic in the worst case. there is a randomized version which makes it O(n log n). Do you know how to make it O(n log n) worst case without using randomization?

## Monday, March 1, 2010

### Why Google Pushed Buzz Out The Door Before It Was Ready

No more Don't be evil?

Subscribe to:
Posts (Atom)