Skip to content

[Feature] Mark bad scene Identifies. Automatically learn about conflicting phashes. #6028

@bbappserver

Description

@bbappserver

Is your feature request related to a problem? Please describe.
It seems unavoidable with the current hash that you will get more than a few hash collisions. Because this hash is fuzzy there is never going to be a perfect solution to this, but it should be possible to prevent identify from mucking things up by individually marking prior bad identifies.

Simply asserting "this scene is not " and clearing assigned from a scene should be adequate for this purpose.

IdentifyFalsePositive(scene_id,stash_id)

Then when identify once again encounters the faulty item it can ask a stashbox for the next item that has that phash, but not the now marked stash-id(s).

This way over time, even if the hashing algorithm is weak, a stashbox can learn that a phash could be one of several scenes, and basically keep offering until the correct one is selected or you run out of candidates, at which point it is time to consider submitting a new one.

Yes eventually you probably want a UI to pick particular metadata if more than one candidate comes back, but that's comparatively rare, so just brute forcing it like this is probably good enough as a start.

You could also selectively learn full file hashes like a md5 or sha256 to help disambiguate, this hybrid approach would allow for auto resolution provided users have identical versions of the target scene while only doing a ful file hash on those files where it is necessary.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions