Dictionaries are a very useful data structure. Here I played a bit with inheritance and boost::serialization. A base class Dictionary has been derived into an hash based Dictionary, implemented using hash_set or vector. The base class Dictionary has been also derived into a compressed Dictionary, using prefix compression and avoid strings. All the classes support serialization. Here you have the code.
PS: note that hash_set needs some special #define on my ubuntu. Otherwise it won't compile. Check the Makefile
PS: boost serialization for a polymorphic class is not a easy one. Check the code for the serialization of a derived class
Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Saturday, June 28, 2008
Monday, June 16, 2008
Shingling and Text Clustering (Broder's shingles)
Shingling is an elegant clustering algorithm which can compute an approximation of Jaccard similarity in linear time. It is one of my favorite text clustering algorithm.
Here you can find a C++, STL, Boost implementation.
Here you can find a C++, STL, Boost implementation.
Subscribe to:
Posts (Atom)