My proposal for these problems. What is yours?
1. What about using a Randomness extractor starting and considering Twitter search as an (high-entrophy) source. There are applications like this based on RF-noise, Lavalamp, and similar funny sources. I would adopt either MD5 or SHA-1 hash function. This will give all the benefits of the crypto-hash function and bias-reduction.
2. Synching is a bit more hard, since the query is not enough. Time is another important factor: the results you see at time t are potentially different from the ones seen at time t + \eps. The index can change and the load balance mechanism in search can bring you to a different search cluster. I guess you must relay on some proxy which must capture a snapshot of twitter and then you must syncronize on the query and the time. There are a couple of services like this out of there.
No comments:
Post a Comment