Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Phash validation #2149

Open
kermieisinthehouse opened this issue Dec 20, 2021 · 1 comment
Open

[Bug Report] Phash validation #2149

kermieisinthehouse opened this issue Dec 20, 2021 · 1 comment
Labels
bug Something isn't working
Milestone

Comments

@kermieisinthehouse
Copy link
Collaborator

kermieisinthehouse commented Dec 20, 2021

Describe the bug
We recently started using phashes for matching against StashDB. This has exposed bugs in the phash generation process. A broken or unparsable file will result in common phashes, which will then be matched to a random scene in StashDB that has had that same broken phash uploaded. We may not be able to fix the underlying ffmpeg issues, but we can work around them. During phash generation / use, we should validate to make sure the phash doesn't match a known bad phash ( results of solid color, color bars, etc.), and other phash validation rules.

Further investigation is still needed, but I've identified some facts already:
Real phashes should have a (roughly?) even amount of 1s and 0s
Many bad phashes 'look' strange. They have slow entropy.
If ffmpeg determines that the video duration is zero, the phash is almost always junk.

Known bad phashes so far:
(note, they may vary in the wild by 1-3 bits, so checks should check for a hamming-distance match)

a000000000800080
8080808080808080
870707030787fefc
82070707078ffff8
8055557555575575
805555555d755d55
87070707037ef8fc
8707070303fefcdc
cdcdcdc9c1233332
@kermieisinthehouse kermieisinthehouse added help wanted Extra attention is needed bug Something isn't working and removed help wanted Extra attention is needed labels Dec 20, 2021
@kermieisinthehouse kermieisinthehouse added this to the Version 0.13.0 milestone Dec 20, 2021
@kermieisinthehouse
Copy link
Collaborator Author

Also,

f8fcfcfcf8e00100
f8fcfcfcfc700000

@WithoutPants WithoutPants modified the milestones: "Soon", Backlog Dec 6, 2022
@github-project-automation github-project-automation bot moved this to To triage in Bug fixing Feb 13, 2024
@WithoutPants WithoutPants moved this from To triage to Backlog in Bug fixing Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

2 participants