Interact, analyze and structure massive text, image, embedding, audio and video datasets
-
Updated
May 11, 2025 - Python
Interact, analyze and structure massive text, image, embedding, audio and video datasets
Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
Interactive code for image similarity using SIFT algorithm
CLI utility to find near duplicate images and remove all but the best copy.
Fast Near-Duplicate Image Search and Delete using pHash, t-SNE and KDTree.
Find similar audio files easily
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation.
Easily delete your YouTube Music library (and manage playlists)
Advanced Duplicate File Finder for Python
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Advanced similarity and duplicate source code proof of concept for our research efforts.
A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash
An End-to-End Evaluation Framework for Entity Resolution Systems
This Python packages identifies duplicate files in a folder of interest.
Find duplicates and similar images in a folder
Find, remove and avoid duplicates with dugu: The Duplicates Guru
A command-line tool which automate the deletion of duplicate files based on their hash or perceptual-hash.
A python module to detect duplicate videos in a directory.
Securely wipe files or folders and clean duplicated files
Uses SSIM and MSE to get rid of duplicates and near duplicates
Add a description, image, and links to the duplicate-detection topic page so that developers can more easily learn about it.
To associate your repository with the duplicate-detection topic, visit your repo's landing page and select "manage topics."