archiver

Content curation, scraping, preservation and Codex integration

Features

Attempts to be a "good citizen" while scraping (respects robots.txt, etc.)
Archive.org download (via the Archive.org APIs) - https://archive.org/developers/index.html

Organises data first into collections, then into items as subfolders of those collections
Also archives metadata about the collection and item

Uses the Codex APIs to upload content to Codex nodes, and keeps track of which CIDs have which content
Keeps track of which collections and items have been uploaded to Codex

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.cursor/rules		.cursor/rules
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
TASKS.md		TASKS.md