Wednesday, May 18, 2011

Given a huge list of strings

1. Find the top 5000 strings (sorted by frequency).
2. Find the duplicates

2 comments:

  1. Is an approximate solution ok? If so, I would use a spectral Bloom filter.

    ReplyDelete