Skip to content

JayDigvijay/LSH_Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LSH_Retrieval

The LSH algorithm consists of 3 parts:

  1. Shingling: Shingling is performed by running Shingling.py. This generates the pickle files necessary for further steps
  2. Minhashing: Minhashing.py generates the signature matrix using shingles generated by running Shinling.py
  3. Bucketing: LSH.py creates the buckets using signature matrix

We can also query the documents using Main_Retrieval.py. This returns the similar documents found through LSH

Running the code

sudo nano Parameters.py

Used to edit/check parameters k, r, b, etc. and the MinHash function.

python3 Shingling.py
python3 Minhashing.py
python3 LSH.py

Performs the LSH operations on the entire data set. Should be run once, or whenever the code/parameters/data are updated.

python3 Main_Retrieval 

Initiates system to take user query and return matching sequences

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages