Comment on 6 Hours of SEO, Social, Crawling and Authority Webinars in 1 Post
-
thank for share
8 minutes ago
Random commentary about C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
A binary heap with max ordering will do the job. For each encountered symbol, the heap will store its frequency and the most frequent symbol would reside in the heap's head.
ReplyDeleteThe number of symbols can be infinite (e.g. larger than any memory you have in the heap). this one is not a valid solution
ReplyDeleteif its over 50% of the time then on average, the symbol should occur every OTHER position in the stream. i think its pretty easy from there, isn't it? (ps: more than likely i'm completely wrong, i'm a total n00b)
ReplyDeleteI am aware of COUNT SKETCH DS as described in http://www.cs.rutgers.edu/~farach/pubs/FrequentStream.pdf
ReplyDeleteIs there a simple solution
You could maintain a counter (count) of the symbols seen so far. Then you could use an array F of N slots (its size would be N x4bytes or 8 bytes), where N is the size of the dictionary of your symbols (I'm assuming a fixed size dictionary). F[i] will store the number of times the i-th symbol has been seen so far.
ReplyDeleteAt any time you have the frequency of each symbol as F[i]/count and evaluate if there is a dominant symbol (F[i]/count > 0.5).
The problem is also about how to pick up the bigger element in the array F_t/count_t where _t means the configuration of the array F and the value of "count" at time "t".
You could sort it at time t (being N small this should a very fast operation to perform in memory).
Alternatively you could maintain the values of F into an max-heap. At any time you can pik up the most frequent symbol in O(1).
That's all... i'm missing something?