Skip to content

Conversation

@devsk18
Copy link

@devsk18 devsk18 commented May 4, 2025

Training

  1. source images and its variants[rotated,flipped] are converted to hashes using dhash hashing technique.
  2. skips similar images if the hashes are already found in memory.
  3. trained data is saved to a pickle file for later use.
  4. prints the total no.of images trained and time taken to train them.

Testing

  1. trained data is loaded and reduced into chunks of data for faster processing.
  2. each chunk is passed through a separate process for parallel execution.
  3. 2 pointer approach is used to reduce O(n) calculations in each process to O(n/2).
  4. also added early exit condition if an exact match [0 difference between hashes] is found.
  5. finds most minimum hash difference and calculates similarity percentage.
  6. prints the most similar source file name if anything found [hash difference less than 3.2 ~ 95% similar].
  7. prints the result and time taken for lookup.

NB : for better result comparison use - phash() , dhash() [default], average_hash() - [requires change in code - line42]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant