-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
crocdb is currently offline, which breaks/limits metadata lookup and/or discovery flows.
Goal
Add at least one alternative metadata/discovery source that is:
- Reliable enough to depend on
- Low maintenance (limited dev time)
- Doesn’t require mirroring/hosting any datasets in this repo
- Can be swapped/combined with other sources
Candidate sources
1) Myrient (https://myrient.erista.me/files/)
- Appears to expose static “Index of …” directory listings (good for discovery).
- Has an FAQ that mentions third-party downloader tooling and that URLs can change as content is reorganized. :contentReference[oaicite:0]{index=0}
2) Vimm’s Vault (https://vimm.net/vault)
- Curated and friendly to browse, but automation likely needs heavier scraping / browser automation and may be more fragile.
Research to do
A) Myrient: how to locate files (discovery-only)
- Map the
/files/taxonomy (e.g., No-Intro / Redump) and which directories best match the project’s platform model. :contentReference[oaicite:1]{index=1} - Verify what metadata is available directly in listings (filename, size, date) and how consistent it is across collections. :contentReference[oaicite:2]{index=2}
- Identify how often URLs break/move and how to make the provider resilient (caching, refresh, retries). :contentReference[oaicite:3]{index=3}
Note: This project should avoid implementing “download ROMs” features. Keep this provider focused on indexing & matching only.
B) Third-party libraries / existing projects
- Check if any maintained npm packages exist for Myrient/Vimm scraping (likely none; may need custom provider).
- Review existing open-source tools that already parse Myrient listings (even if not npm) to copy patterns safely:
myrient-scrape(Python CLI) :contentReference[oaicite:4]{index=4}- “Myrient Search Engine” repo ideas (frontend/indexing) :contentReference[oaicite:5]{index=5}
C) Decide “best” for Jacare
Compare Myrient vs Vimm on:
- Stability of URLs / HTML structure
- Ease of parsing (static index pages vs dynamic pages)
- Metadata quality and platform coverage
- Rate-limiting/robots constraints and ethical access
Proposed approach
- Implement a
SourceProviderinterface:listPlatforms()listEntries(platformId)search(query, platformId?)resolve(entryId)→ returns canonical metadata only
- Ship Myrient provider first (simpler index pages), keep Vimm as “experimental”.
Acceptance criteria
- Can retrieve a list of entries for at least one platform/collection from Myrient.
- Can match an entry to a local file using filename heuristics.
- Provider is swappable and defaults/fallbacks remain intact if sources go down.
Reactions are currently unavailable