Script to find similar or duplicate images using Image Hashing technique #8

devsk18 · 2025-05-04T16:46:23Z

Training

source images and its variants[rotated,flipped] are converted to hashes using dhash hashing technique.
skips similar images if the hashes are already found in memory.
trained data is saved to a pickle file for later use.
prints the total no.of images trained and time taken to train them.

Testing

trained data is loaded and reduced into chunks of data for faster processing.
each chunk is passed through a separate process for parallel execution.
2 pointer approach is used to reduce O(n) calculations in each process to O(n/2).
also added early exit condition if an exact match [0 difference between hashes] is found.
finds most minimum hash difference and calculates similarity percentage.
prints the most similar source file name if anything found [hash difference less than 3.2 ~ 95% similar].
prints the result and time taken for lookup.

NB : for better result comparison use - phash() , dhash() [default], average_hash() - [requires change in code - line42]

devsk18 added 2 commits May 4, 2025 21:42

feat: script to find similar or duplicate images

220e4b6

chore: commented sample usage

9f49083

Provide feedback