Skip to content

Aggregating adult performers metadata - authority file / schema discussion #10

Open
@laurus-lx

Description

Currently stashbox supports only single "source of truth" for scenes/performers/studios, where as performer data aggregated from various sources (index sites, tubes, social media, studios) may dither with varying degree of confidence

This is a proposal to create authority file that will:

  1. Have a list of data sources (sites)
  2. Have a regularly updated scrape of scenes/performers metadata
  3. Keep track of metadata as it changes over time
  4. Normalize metadata (birthdays/locations/scene dates and titles/ performer physical attributes)
  5. Generate periodic snapshots:
    a. Assign confidence value to performer matches across sources - link and de-dup performers
    b. Assign confidence value to metadata and de-dup
    c. Generate output scenes/performers/studios dump

image

There is a discussion regarding adding that functionality to stash-box itself https://discord.com/channels/559159668438728723/798641040029777980/894662081830322206

Whether this will be integrated in to stashbox, or kept separate - we need to come up with a schema, so wanted to start this discussion.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions