Tuesday, January 13, 2009

Magy!: how can we monitors most frequent queries?

Magy! receives a huge amount of queries everyday. They don't want to store all of them, since many of the queries are unique (submitted just one time by just one user). Anyway, for caching reason they want to understand what are the queries whose frequency exceed s=0.1%. How do you solve the problem? Remember we don't want to store all the queries.

1 comment:

  1. this is one among the few REAL datastream problem that I am aware of. Many of the remaining ones are just toys for theoretical analysis.

    Hint: split the stream into windows and accept some error in estimation.