Open
Description
Currently stashbox supports only single "source of truth" for scenes/performers/studios, where as performer data aggregated from various sources (index sites, tubes, social media, studios) may dither with varying degree of confidence
This is a proposal to create authority file that will:
- Have a list of data sources (sites)
- Have a regularly updated scrape of scenes/performers metadata
- Keep track of metadata as it changes over time
- Normalize metadata (birthdays/locations/scene dates and titles/ performer physical attributes)
- Generate periodic snapshots:
a. Assign confidence value to performer matches across sources - link and de-dup performers
b. Assign confidence value to metadata and de-dup
c. Generate output scenes/performers/studios dump
There is a discussion regarding adding that functionality to stash-box itself https://discord.com/channels/559159668438728723/798641040029777980/894662081830322206
Whether this will be integrated in to stashbox, or kept separate - we need to come up with a schema, so wanted to start this discussion.
Metadata
Assignees
Labels
No labels
Activity