a lightweighted LSH package on large document sets components LSH based on weighted Jaccard distance feature hashing dependencies pyfarmhash pymongo (if documents are indexed from a db) libtopic-nlp-iit TODO multithreading hashing iterations in parallel