Wednesday, July 22, 2009

Delta and Gamma encoding: a great paper from Yahoo

Web index compression is one of the most interesting research field. I stressed the similarity with traditional bandwidth reduction in the past.

Compressed Web Indexes is a great theoretical result by Yahoo!. They provide a strong bound for delta and gamma encoding. The index size is 1/3 of the document size. Now we have a valid justification for this emphirical observation. Another nice side observation is that terms distribution follows a Double Pareto Law and not just a Zipf law.

Great results on Delta encoding, mr. Yahoo!.

No comments:

Post a Comment