Thursday, February 25, 2010

Get k tags out of n tags

You have a dataset of N document with N tags associated for each document. How many documents should you process before seeing k unique tags, for a given k with k << N with high probability?

