Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON do…

Pascal 681 42 Updated Apr 20, 2024

jeffjose / tget

tget is wget for torrents

JavaScript 622 51 Updated Dec 11, 2020

machawk1 / wail

🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation

Roff 350 35 Updated Oct 4, 2024

internetarchive / heritrix3

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Java 2,829 763 Updated Nov 7, 2024

steffenfritz / html2warc

simple script to convert web resources to a single warc file

Python 18 2 Updated May 11, 2023

leonmeka / sortql-cli

A file management automation tool with SQL-like syntax.

TypeScript 60 2 Updated Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Butters3388214

Block or report Butters3388214

CLI

ArchiveTeam / grab-site

rockdaboot / wget2

mirror / wget

ArchiveTeam / wpull

wkentaro / gdown

circulosmeos / gdown.pl

melbahja / got

benibela / xidel

jeffjose / tget

machawk1 / wail

internetarchive / heritrix3

steffenfritz / html2warc

leonmeka / sortql-cli