⬇️ `abx-dl` [VAPORWARE] (please make this!)

A simple all-in-one CLI tool to auto-detect and download everything available from a URL.
pip install abx-dl
abx-dl 'https://example.com/page/to/download'

Important

❈ NOT IMPLEMENTED YET Coming someday... read the Plugin Ecosystem Announcement (2024-10)
_{Release ETA: after archivebox v0.9.0} You should make this! Use https://deepwiki.com/archivebox/abx-pkg to set up the dependencies like yt-dlp, ffmpeg, chrome, etc. + a single global event queue and single worker process/actor for each.

✨ Ever wish you could yt-dlp, gallery-dl, wget, curl, puppeteer, etc. all in one command?

abx-dl is an all-in-one CLI tool for downloading URLs "by any means necessary".

It's useful for scraping, downloading, OSINT, digital preservation, and more.
abx-dl is built to provide a simpler one-shot CLI interface to the ArchiveBox archiving engine (it replaces the old archivebox oneshot command).

🍜 What does it save?

abx-dl --extract=title,favicon,headers,wget,media,singlefile,screenshot,pdf,dom,readability,git,... 'https://example.com'`

abx-dl gets everything by default, or you can tell it to --extract=... specific methods:

HTML, JS, CSS, images, etc. rendered with a headless browser
title, favicon, headers, outlinks, and other metadata
audio, video, subtitles, playlists, comments
snapshot of the page as a PDF, screenshot, and Singlefile HTML
article text, git source code
and much more...

🧩 How does it work?

Forget about writing janky manual crawling scripts with JS/Python/playwright/puppeteer/bash.

abx-dl renders all URLs passed in a fully-featured modern browser using puppeteer. It auto-detects a wide variety of embedded resources using plugins, and extracts discovered content out to raw files (mp4, png, txt, pdf, html, etc.) in the current working directory.

abx-dl collects all of your favorite powerful scraping and downloading tools, including: wget, wget-lua, curl, puppeteer, playwright, singlefile, readability, yt-dlp, forum-dl, and many more through the ABX Plugin Library (shared with ArchiveBox)...

You no longer have to deal with installing and configuring a bunch of tools individually.

⚙️ What options does it provide?

Pass --extract=<methods> to get only what you need, and set other config via env vars / args:

USER_AGENT, CHECK_SSL_VALIDITY, CHROME_USER_DATA_DIR/COOKIES_TXT
TIMEOUT=60, MAX_MEDIA_SIZE=750m, RESOLUTION=1440,2000, ONLY_NEW=True
and more here...

^{Configuration options apply seamlessly across all methods.}

📦 Install `Coming Soon...`

pip install abx-dl
abx-dl install           # optional: install any system packages needed

🔠 Usage

# Basic usage:
abx-dl [--help|--version] [--config|-c] [--extract=methods] [url]

Download everything

abx-dl 'https://example.com'
ls ./
# <see All Outputs below>

Download just title + screenshot

abx-dl --extract=title,screenshot 'https://example.com'
ls ./
# index.json  title.txt  screenshot.png

Download title + screenshot + html + media

abx-dl --extract=title,favicon,screenshot,singlefile,media 'https://example.com'
ls ./
# index.json  index.html  title.txt  favicon.ico  screenshot.png  singlefile.html  media/Some_video.mp4

Pass config options

Config can be persisted via file, set via env vars, or passed via CLI args.

# set per-user config in ~/.config/abx-dl/abx-dl.conf
abx-dl config --set CHECK_SSL_VALIDITY=True

# environment variables work too and are equivalent
env CHROME_USER_DATA_DIR=~/.config/abx-dl/personas/Default/chrome_profile

# pass per-run config as CLI args
abx-dl -c MAX_MEDIA_SIZE=250m --extract=title,singlefile,screenshot,media 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'

All Outputs

index.json, index.html
title.txt, title.json, headers.json, favicon.ico
example.com/*.{html,css,js,png...}, warc/ (saved with wget-lua)
screenshot.png, dom.html, output.pdf (rendered with chrome)
media/someVideo.mp4, media/subtitles, ... (downloaded with yt-dlp)
readability/, mercury/, htmltotext.txt (article text/markdown)
git/ (source code)
... and more via plugin library ...

For more advanced use with collections, parallel downloading, a Web UI + REST API, etc.
See: ArchiveBox/ArchiveBox

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
abx_dl		abx_dl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⬇️ `abx-dl` [VAPORWARE] (please make this!)

🍜 What does it save?

🧩 How does it work?

⚙️ What options does it provide?

📦 Install `Coming Soon...`

🔠 Usage

Download everything

Download just title + screenshot

Download title + screenshot + html + media

Pass config options

All Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ArchiveBox/abx-dl

Folders and files

Latest commit

History

Repository files navigation

⬇️ abx-dl [VAPORWARE] (please make this!)

🍜 What does it save?

🧩 How does it work?

⚙️ What options does it provide?

📦 Install Coming Soon...

🔠 Usage

Download everything

Download just title + screenshot

Download title + screenshot + html + media

Pass config options

All Outputs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

⬇️ `abx-dl` [VAPORWARE] (please make this!)

📦 Install `Coming Soon...`

Packages