simhash
Here are 23 public repositories matching this topic...
semantic-sh is a SimHash implementation to detect and group similar texts by taking power of word vectors and transformer-based language models (BERT).
-
Updated
Jul 25, 2024 - Python
A fast python implementation of the SimHash algorithm.
-
Updated
Oct 27, 2021 - Python
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
-
Updated
Aug 28, 2023 - Python
SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex
-
Updated
Nov 18, 2022 - Python
Find duplicate text files.
-
Updated
Jan 14, 2025 - Python
Code plagiarism system based on Simhash and Nicad.
-
Updated
Dec 22, 2018 - Python
⌨️ User Verification based on Keystroke Dynamics / Two-factor Authentication technology based on Key-Stroke
-
Updated
Apr 14, 2025 - Python
The extended version of simhash supports fingerprint extraction of documents and images.
-
Updated
Aug 22, 2022 - Python
Analysis of Massive Datasets FER labs
-
Updated
Jun 10, 2022 - Python
🐾 Create a behavioral fingerprint based on your zsh command line history
-
Updated
Aug 14, 2023 - Python
Improve this page
Add a description, image, and links to the simhash topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the simhash topic, visit your repo's landing page and select "manage topics."