## Friday, March 20, 2009

### The beauty of linear k-order selection

One out three thousands papers is a beautiful paper. I believe that "Time bounds for selection" (M. Blum, R.W. Floyd, V. Pratt, R. Rivest and R. Tarjan, J. Comput. System Sci. 7 (1973) 448-461) is an old and beautiful pearl.

It's well know that K-selection is a problem that can be solved in O(N logN) average time by using an approach similar to the one of QuickSort . Anyway, the worst-case is quadratic even when you adopt a randomization step (here you have the C++ code for Quickselect).

The above paper introduces a beautiful variant of classical selection. The key idea is to partition the input data array A in groups of five elements. Then:
1. Within each group G[i], find the median(G[i]) in constant time by direct comparison of the 5 elements; This step is at most linear;
2. Take the median of each group and find M, the median of medians (median(median(G, .... G[floor(n/5)+1]) by a recursive call; This steps takes T(n/5);
3. Use M to partition the input by calling the quickselect algorithm recursively; This step takes at most T(7n/10) -- see below;
At each step of the recursive call, QuickSelect drops either the elements greater than M (call this GTM) or the ones less than M (call this LTM). Let's assume we maintain the elements less than M. How many are they?
• Among the n/5 medians, n/10 are larger then M, being M the median;
• For each median, at least two other elements in A are larger than the median by construction;
• Therefore, GTM is at least 3n/10;
• For symmetry, LTM is at least 3n/10;
As a consequence the recursive call to QuickSelect takes at most T(7n/10);

The total cost is therefore T(n) <= Θ(N) + T(n/5) + T(7n/10)

This recursion equation is linear (it can be proven by substitution method). The intuition is that 1/5 + 7/10 = 9/10 and therefore the geometric series is converging.