Can you prove that the below code produces a random permutation, where RANDOM(i, n) produces a random integer in (i, n]
RANDOMIZE-IN-PLACE(A)
n = length[A]
for i =0 ... n-1
do swap (A[i], A[RANDOM(i, n)])
Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Tuesday, March 30, 2010
Monday, March 29, 2010
Leaving Yahoo!
I always like creativity. Looking for new opportunities?
Sunday, March 28, 2010
Do you run or walk when it is raining?
Suppose you need to traverse a Leicester square in London. It's raining. Do you run or walk if you want to minimize the amount of rain you get?
Saturday, March 27, 2010
Number of inversions
Given an array A[0,.... n] of integer an inversion is defined as i < j and A[j] < A[i]. Suppose to randomly permute A. Give an algorithm that computes the number of inversion in A.
Friday, March 26, 2010
Approximate intersection of two lists
Search engines mantains a word index made up by a dictionary of words and, for each work a list of documents containing the. For instance
impossible -> 343, 5, 63459, 4, ....., 32
mission -> 3449, 558, ...., 49
Suppose the two lists are very long. For instance, the word 'impossible' can be contained in 100*10^6 documents, while the word mission can be contained in 50*10^6 documents.
1) return the documents containing the words 'impossible mission' (documents must contain both the words)
2) what is the complexity?
3) can you accellerate the computation?
4) how to compute the size of the lists intersection?
5) can you estimate this size in a fast way?
impossible -> 343, 5, 63459, 4, ....., 32
mission -> 3449, 558, ...., 49
Suppose the two lists are very long. For instance, the word 'impossible' can be contained in 100*10^6 documents, while the word mission can be contained in 50*10^6 documents.
1) return the documents containing the words 'impossible mission' (documents must contain both the words)
2) what is the complexity?
3) can you accellerate the computation?
4) how to compute the size of the lists intersection?
5) can you estimate this size in a fast way?
Thursday, March 25, 2010
Buying a new car
My ex-Personal Assistant she was always suggesting to buy a new car.
So I am evaluating this strategy. Every week I go to car dealer, I test a couple of cars and, If i find one that I like, a sell my current car c_i for amount of money m_i and a buy a new one c_i+1 for a new amount of money m_i+1. Note that going to the car dealer has a fixed cost say cd.
My strategy is to find the best car ever, but I want to estimate what is the cost for achieving this goal.
So I am evaluating this strategy. Every week I go to car dealer, I test a couple of cars and, If i find one that I like, a sell my current car c_i for amount of money m_i and a buy a new one c_i+1 for a new amount of money m_i+1. Note that going to the car dealer has a fixed cost say cd.
My strategy is to find the best car ever, but I want to estimate what is the cost for achieving this goal.
Wednesday, March 24, 2010
Generate a random permutation of an array of intergers
You are given an array of n integers containing numbers from 1 to n (with repetition), generate a random permutation of the array. The distribution should be uniform.
Tuesday, March 23, 2010
Tossing coins (ufff again?)
Suppose you flip a fair coin n times. What is the longest streak of consecutive heads that you
expect to see?
expect to see?
Monday, March 22, 2010
Points in a plane
given 2N points in a plane. Pair up to obtain N distinct pairs such that total sum of paired distances is minimum. N can be atmost 50.
Sunday, March 21, 2010
Least Square Methods: simple form of regression
Sometime simple solutions are the most effective. If you have a series of events e1, ... en, where each ei appears occ(ei) times with probability p(ei), you can represent the events in a Cartesian plan with axis containing the occurrences and the probability, respectively. Then you can imagine a straight line y = m x + b such that the distances from the line to the points in the space is minimized. Surprisingly enough, this method is straightforward has a closed solution and is very effective to predict the behavior of unseen points in the space.
Saturday, March 20, 2010
Language detection
Suppose you need to write a module for detecting the language in the text. What approach would you adopt?
Friday, March 19, 2010
Spam detection
Suppose you need to classify a set of web pages basing just on the textual content. What approach would you adopt?
Thursday, March 18, 2010
Traffic lights and probabilities
A classical probability problem. A traffic light has a mean waiting time of 14 seconds and a variance of 4 second. What is the probability of waiting more than 21 seconds?
P(X>21) = 1-P(X<=21) ; 14 / 4 = 1.5 this is the standardization of the distribution , then..
P(X>21) = 1-P(X<=21) ; 14 / 4 = 1.5 this is the standardization of the distribution , then..
Wednesday, March 17, 2010
A ticketing system
There are b available seats in a cinema and n customers. The room is dark. At the beginning all the seats are empty. A customer enters the cinema and randomly selects a seat. If it is available, then it will be assigned to that customer. Otherwise, he will randomly select another seat. How many selections are necessary in average to get all the seat assigned?
PS: if you find this very similar to the hash collision problem, well that is not by chance.
PS: if you find this very similar to the hash collision problem, well that is not by chance.
Tuesday, March 16, 2010
Document processing with score and cost function
Suppose you have a stream of document that you are evaluating. Each document has a score function s(i) >= 0, and a cost function c(i) >= 0. if you analyze a document i you have to pay the cost c(i) and you get access to the associated score (i). The goal is to maximize the score, but minimize the cost. What strategy would you adopt?
Monday, March 15, 2010
Radomizing an array before linear search
What would be the benefit?
Sunday, March 14, 2010
Random search in random array
Suppose you have an array of random integers A[i] i = 0, ... , n-1. Suppose you adopt the following random search strategy for value x. Pick a random index i into A. If A[i] = x, then we terminate; otherwise, continue the search by picking a new random index into A. Note that we may examine a given element more than once.
What is the expected number of indexes analyzed?
What is the expected number of indexes analyzed?
Saturday, March 13, 2010
A compressed vector space
Sometime is useful to represent a vector in a compressed form. This is particularly useful when you want to represent a vector of features in machine learning. The simplest data structure is an un-ordered vector of pairs (id, value), ... (id, value). The vector can be sorted on-the-fly for particular operations such as the intersection of two vectors. Here you have the code.
Friday, March 12, 2010
Sort different runs of integers
Let S be a sequence of n elements divided into n/k subsequences each of length where all of the elements in any subsequence are larger than all of the elements of a preceding subsequence and smaller than all of the elements of a succeeding subsequence. Can you give an algorithm for sorting S? What is the optimal complexity?
Thursday, March 11, 2010
Water jugs
Suppose that you are given n red and n blue water jugs, all of different shapes and sizes. All red jugs hold different amounts of water, as do the blue ones. Moreover, for every red jug, there is a blue jug that holds the same amount of water, and vice versa.
It is your task to find a grouping of the jugs into pairs of red and blue jugs that hold the same amount of water. To do so, you may perform the following operation: pick a pair of jugs in which one is red and one is blue, fill the red jug with water, and then pour the water into the blue jug. This operation will tell you whether the red or the blue jug can hold more water, or if they are of the same volume. Assume that such a comparison takes one time unit. Your goal is to find an algorithm that makes a minimum number of comparisons to determine the grouping. Remember that you may not directly compare two red jugs or two blue jugsWater
It is your task to find a grouping of the jugs into pairs of red and blue jugs that hold the same amount of water. To do so, you may perform the following operation: pick a pair of jugs in which one is red and one is blue, fill the red jug with water, and then pour the water into the blue jug. This operation will tell you whether the red or the blue jug can hold more water, or if they are of the same volume. Assume that such a comparison takes one time unit. Your goal is to find an algorithm that makes a minimum number of comparisons to determine the grouping. Remember that you may not directly compare two red jugs or two blue jugsWater
Wednesday, March 10, 2010
List cache-aware
Build a list that maximize the use of cache hierarchies. In modern computer a cache miss is a very negative situation, just like page faults some year ago.
Tuesday, March 9, 2010
What autosuggest do you want?
Any feedback is welcome
Monday, March 8, 2010
Waledac: malware cloud computing
It has been estimated that Waledac infected hundreds of thousands of computer. Is that a malware cloud?
Saturday, March 6, 2010
Find the i-th element in a dynamic set
You are given a dynamic set of integers D. The set can increase/decrease its size. You are requested to provide the operator i-th(D, i) which returns the element with rank i in D.
Friday, March 5, 2010
Thursday, March 4, 2010
Containing objects
An object is defined in terms of triplets of integers . An object Oi can be contained in an object Oj if ai < aj, bi < bj, ci < cj. Given a set of objects S, find the maximum subset of objects that can be contained each other.
Solve the generalized problem where an object is represented in terms of n-ples.
Solve the generalized problem where an object is represented in terms of n-ples.
Searching a set of strings
Suppose you are given a (large) set S of variable length strings. The set is very large, say 1 billion of strings. The goal is to identify exact string duplicates. How would you solve the problem?
Wednesday, March 3, 2010
Hashing with linked chains
Hashing can use a linked list whenever there is a conflict for a key. Here a couple of questions:
1) how would you organize the list? sorted / unsorted?
2) would you need to allocate/disallocate in case of delete and insertion with collisions?
1) how would you organize the list? sorted / unsorted?
2) would you need to allocate/disallocate in case of delete and insertion with collisions?
Tuesday, March 2, 2010
Quicksort worst-case
Traditionally quicksort is considered quadratic in the worst case. there is a randomized version which makes it O(n log n). Do you know how to make it O(n log n) worst case without using randomization?
Monday, March 1, 2010
Why Google Pushed Buzz Out The Door Before It Was Ready
No more Don't be evil?
Subscribe to:
Posts (Atom)