Monday, September 3, 2012

Given a stream compute the running average and the running median

1 comment:

  1. Here my approach:

    Keep reading from the stream and store the values until you collect N values (N is decided by the user) into a temp array A. For each element "a" from the stream do:

    1.1) Keep calculating the AVG of the read values so far by summing and dividing by the current number of elements read from the stream.

    1.2) Maintain a MAX-heap and a min-heap and a variable k where top(MAX-heap) < k < top(min-heap). Once you read the next element "a" from the stream you compare it against k and if it is lesser than k you put "a" into MAX-heap else into min-heap. If the size of the two heaps differs by more than 1, then you pop the top elements of the heaps (let's call them M and m), and push M, m and k into the smaller heap. Then you pop from it the top element and assign k to that element. Now, the two heaps have the same size and k is the median.

    As you read the N+k (k>0) value do:

    2.1) print AVG and k as the current running average and median.

    2.2) Maintain the AVG current by subtracting A[i]/N and adding a/N. Then a[i] = a and i = (i + 1) % N (i starts from 0).

    2.3) maintain k as the current median applying the strategy described in 1.2.