Wayback Machine API interface & a command-line tool
-
Updated
Feb 26, 2024 - Python
Wayback Machine API interface & a command-line tool
WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
A dockerized, queued high fidelity web archiver based on Squidwarc
metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)
Decentralized web archiving
Seeder - Czech webarchive curating tool and public site
A tool for detecting viruses and NSFW material in WARC files
A link extractor and archive tool, uses archive.ph as an archiving service; useful for sites that are barebones and aren't advanced.
https://bl.ocks.org/PaladhiDinesh/raw/56e1843c31960ecfe919/ All the Assignments are mainly based on crawling data from websites, web archieving and analyzing the data and writing reports using python, latex, R. Includes studies of the Web's properties, protocols, algorithms, and societal effects.
A archiving utility with an interface for web servers.
Add a description, image, and links to the webarchiving topic page so that developers can more easily learn about it.
To associate your repository with the webarchiving topic, visit your repo's landing page and select "manage topics."