Skip to content

Discovery: Replace/augment crocdb data source #82

@luandev

Description

@luandev

Problem

crocdb is currently offline, which breaks/limits metadata lookup and/or discovery flows.

Goal

Add at least one alternative metadata/discovery source that is:

  • Reliable enough to depend on
  • Low maintenance (limited dev time)
  • Doesn’t require mirroring/hosting any datasets in this repo
  • Can be swapped/combined with other sources

Candidate sources

1) Myrient (https://myrient.erista.me/files/)

  • Appears to expose static “Index of …” directory listings (good for discovery).
  • Has an FAQ that mentions third-party downloader tooling and that URLs can change as content is reorganized. :contentReference[oaicite:0]{index=0}

2) Vimm’s Vault (https://vimm.net/vault)

  • Curated and friendly to browse, but automation likely needs heavier scraping / browser automation and may be more fragile.

Research to do

A) Myrient: how to locate files (discovery-only)

  • Map the /files/ taxonomy (e.g., No-Intro / Redump) and which directories best match the project’s platform model. :contentReference[oaicite:1]{index=1}
  • Verify what metadata is available directly in listings (filename, size, date) and how consistent it is across collections. :contentReference[oaicite:2]{index=2}
  • Identify how often URLs break/move and how to make the provider resilient (caching, refresh, retries). :contentReference[oaicite:3]{index=3}

Note: This project should avoid implementing “download ROMs” features. Keep this provider focused on indexing & matching only.

B) Third-party libraries / existing projects

  • Check if any maintained npm packages exist for Myrient/Vimm scraping (likely none; may need custom provider).
  • Review existing open-source tools that already parse Myrient listings (even if not npm) to copy patterns safely:
    • myrient-scrape (Python CLI) :contentReference[oaicite:4]{index=4}
    • “Myrient Search Engine” repo ideas (frontend/indexing) :contentReference[oaicite:5]{index=5}

C) Decide “best” for Jacare

Compare Myrient vs Vimm on:

  • Stability of URLs / HTML structure
  • Ease of parsing (static index pages vs dynamic pages)
  • Metadata quality and platform coverage
  • Rate-limiting/robots constraints and ethical access

Proposed approach

  • Implement a SourceProvider interface:
    • listPlatforms()
    • listEntries(platformId)
    • search(query, platformId?)
    • resolve(entryId) → returns canonical metadata only
  • Ship Myrient provider first (simpler index pages), keep Vimm as “experimental”.

Acceptance criteria

  • Can retrieve a list of entries for at least one platform/collection from Myrient.
  • Can match an entry to a local file using filename heuristics.
  • Provider is swappable and defaults/fallbacks remain intact if sources go down.

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions