Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Wednesday, July 28, 2010
Set intersection in an efficient way?
If you are given a set of 1000 integers in a set A , and 10,000,000 integers in a set B, how would you create a set C that would only contain numbers that are both in A and B?
1. sort the numbers of set A 2. sort the numbers of set B 3. perform ordered list intersection in a way similar to the inverted list intersection (i.e. maintain two pointers and compare the items pointed by the pointers. Progress the pointer pointing to the smallest int, or progress both pointers when a common item is found) 4. For more efficiency place skip pointers on the second sorted list every sqrt(10^6) items. For every item in list A, perform a binary search for the correct skip pointer and place pointer B in the skip pointer you've found.
1. sort the numbers of set A
ReplyDelete2. sort the numbers of set B
3. perform ordered list intersection in a way
similar to the inverted list intersection
(i.e. maintain two pointers and compare the
items pointed by the pointers. Progress the
pointer pointing to the smallest int, or
progress both pointers when a common item is
found)
4. For more efficiency place skip pointers
on the second sorted list every sqrt(10^6)
items. For every item in list A, perform
a binary search for the correct skip pointer
and place pointer B in the skip pointer
you've found.