A simple, fast downloader with resume and progress. Use the standalone binary or Docker β no Haskell setup required.
- Built for flaky networks: resume via HTTP Range and per-file retries
- Clear progress: percent, file count, bytes, aggregate speed
- Simple inputs: positional URLs or
-f file.txt
- β‘ Concurrent downloads with configurable worker pool
- π Automatic retries
- π Progress line with % complete, per-file size, and aggregate speed
- βΈοΈ Resume partial downloads (HTTP Range)
- π Read URLs from file (
-f file.txt, one URL per line) - π― Type-safe CLI (optparse-applicative) and robust error reporting
- π Google Drive support: Download public files and folders directly with automatic URL conversion and preserve original filenames
- Built-in resume and retries with HTTP Range handling (shell
curlloops often restart from scratch). - Live progress UI (TUI or quiet summary) across multiple files, not per-process logs.
- Concurrency with backpressure via worker pool instead of ad-hoc xargs/background jobs.
- Consistent summary (success/fail/bytes/time) and exit codes for automation.
- Single static binary or Docker image; no bespoke dependencies or fragile per-URL logic.
| Feature | Yank | wget | curl |
|---|---|---|---|
| Concurrent downloads | β Built-in worker pool | β Manual scripting needed | β Manual scripting needed |
| Resume support | β
HTTP Range with -r flag |
β
-c flag |
β
-C - flag |
| Progress tracking | β Aggregate multi-file TUI | ||
| Automatic retries | β Configurable per-file | --tries) |
β Manual implementation |
| Google Drive files | β Auto-detect & convert URLs | β No support | β No support |
| Google Drive folders | β Extract & download all files | β No support | β No support |
| URL list from file | β
-f file.txt |
β
-i file.txt |
β Requires xargs |
| Summary report | β Success/fail/bytes/time | β No aggregate stats | β No aggregate stats |
| Feature | Yank | gdown |
|---|---|---|
| Google Drive files | β Auto-detect share links | β Supports file IDs |
| Google Drive folders | β All files, no limit | |
| Folder structure | β Creates named subfolders | β Downloads to current dir |
| Mixed URL types | β Drive + HTTP/HTTPS in one run | β Drive-only |
| Concurrent downloads | β Configurable workers | β Sequential only |
| Resume support | β HTTP Range for all sources | β No resume |
| Progress tracking | β Real-time TUI with speed | |
| Automatic retries | β Configurable per-file | β No retries |
| Multiple URLs | β From CLI or file | |
| Authentication | β Public links only |
Key advantages of Yank:
- No 50-file limit for Google Drive folders (gdown limitation)
- Mixed sources: Download from Google Drive, GitHub releases, direct URLs in one command
- Production-ready: Concurrent downloads, retries, resume, and comprehensive error handling
- Folder organization: Automatically creates subfolders matching Drive folder names
- Prebuilt archives are published on GitHub Releases (tar.gz/zip per OS/arch).
- After extracting, place the binary on your PATH (e.g.,
~/.local/binon Linux/macOS or%USERPROFILE%\binon Windows).
Head over to the Releases page and grab the latest Linux archive (usually named yank-vX.Y.Z-linux-amd64.tar.gz).
Once you have the file on your machine or server, follow these steps to install it:
# Unzip the archive
tar -xzvf yank-vX.Y.Z-linux-amd64.tar.gz
# Make the binary executable
chmod +x yank
# Move it to your local bin path so you can run it from anywhere
sudo mv yank /usr/local/bin/Verify the installation:
# Check that yank is accessible from anywhere
which yank
# Display the version to confirm it's working
yank --versionIf which yank doesn't show /usr/local/bin/yank, ensure /usr/local/bin is in your $PATH:
echo $PATH
# Should include /usr/local/bin
# If not, add it to your shell profile (~/.bashrc, ~/.zshrc, etc.):
export PATH="/usr/local/bin:$PATH"The most common use case for ML engineers is downloading a list of URLs from a text file. If you have a datasets.txt containing links to your CSVs or other types of files:
# Publicly available direct-download CSV, ZIP and GZ files
# Last checked functional around early 2026
# All links are from open government, academic or well-known open data sources
# No login / payment required
# ββββββββββββββββββββββββββββββββββββββββββββββββ
# Plain .csv files (direct download, no compression)
# ββββββββββββββββββββββββββββββββββββββββββββββββ
https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv
# Very small classic dataset: monthly international airline passengers 1958β1960
https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv
# Tiny example file with fake names and addresses
https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv
# Very small: office workers height, weight, age, etc.
https://img.exim.gov/s3fs-public/dataset/vbhv-d8am/Data.Gov_-_FY25_Q3.csv
# Example export-related data snapshot from export.gov / data.gov
https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
# EPA Smart Location Database β large (~ hundreds of MB), good for urban / transport analysis
# ββββββββββββββββββββββββββββββββββββββββββββββββ
# .zip files (archives, usually containing CSVs inside)
# ββββββββββββββββββββββββββββββββββββββββββββββββ
https://edg.epa.gov/EPADataCommons/public/OA/WalkabilityIndex.zip
# EPA National Walkability Index dataset
https://galaxy-zoo-1.s3.amazonaws.com/GalaxyZoo1_DR_table5.csv.zip
# Galaxy Zoo citizen science project β classifications table
# ββββββββββββββββββββββββββββββββββββββββββββββββ
# .gz / .gzip files (compressed, usually TSV or CSV inside)
# ββββββββββββββββββββββββββββββββββββββββββββββββ
https://datasets.imdbws.com/title.basics.tsv.gz
# IMDb β title basics (movies, series, episodes, etc.) β very popular, ~1 GB uncompressed
https://datasets.imdbws.com/title.ratings.tsv.gz
# IMDb β user ratings and vote counts
https://datasets.imdbws.com/title.akas.tsv.gz
# IMDb β alternate titles / regions / languages
https://galaxy-zoo-1.s3.amazonaws.com/GalaxyZoo1_DR_table2.csv.gz
# Galaxy Zoo β another classifications / demographics table
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE147nnn/GSE147507/suppl/GSE147507_counts_processed_ENSEMBL.txt.gz
# NCBI GEO β single-cell RNA-seq count matrix example (COVID-19 PBMC study)
# ββββββββββββββββββββββββββββββββββββββββββββββββ
# Quick test / funny / tiny files
# ββββββββββββββββββββββββββββββββββββββββββββββββ
https://people.sc.fsu.edu/~jburkardt/data/csv/cities.csv
# Very small list of world cities with population & country
https://people.sc.fsu.edu/~jburkardt/data/csv/oscar_age_male.csv
# Small: ages of Best Actor Oscar winnersThen run:
yank -c 5 -o downloads -f datasets.txt --no-tui --retries 1Example output:
π Yank v0.2.1 - Concurrent Downloader
π¦ Downloading 15 files with concurrency 5
β
[ 6% | 1/15] 18 MB (18 MB/s) Data.Gov_-_FY25_Q3.csv
β
[ 13% | 2/15] 321 B (18 MB/s) airtravel.csv
β
[ 20% | 3/15] 328 B (18 MB/s) addresses.csv
β
[ 26% | 4/15] 849 B (18 MB/s) biostats.csv
β
[ 33% | 5/15] 3 MB (5 MB/s) GalaxyZoo1_DR_table5.csv.zip
β
[ 40% | 6/15] 7 MB (5 MB/s) title.ratings.tsv.gz
β
[ 46% | 7/15] 206 MB (39 MB/s) title.basics.tsv.gz
β
[ 53% | 8/15] 19 MB (21 MB/s) GalaxyZoo1_DR_table2.csv.gz
β [ 60% | 9/15] 0 B (18 MB/s) GSE147507_counts_processed_ENSEMBL.txt.gz
β
[ 66% | 10/15] 454 MB (50 MB/s) title.akas.tsv.gz
β
[ 73% | 11/15] 8 KB (47 MB/s) cities.csv
β
[ 80% | 12/15] 4 KB (44 MB/s) oscar_age_male.csv
β [ 86% | 13/15] 0 B (22 MB/s) Individual_Incident_Archive_CSV.zip
β
[ 93% | 14/15] 192 MB (11 MB/s) EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
β
[100% | 15/15] 405 MB (8 MB/s) WalkabilityIndex.zip
β¨ Download Summary:
β
Success: 13
β Failed: 2
π Total: 1 GB
β±οΈ Time: 2m 26s
β Failed downloads:
- https://dasil.sites.grinnell.edu/DataRepository/NIBRS/Individual_Incident_Archive_CSV.zip: Max retries exceeded
- https://ftp.ncbi.nlm.nih.gov/geo/series/GSE147nnn/GSE147507/suppl/GSE147507_counts_processed_ENSEMBL.txt.gz: Max retries exceeded
# TUI is enabled by default; disable via --no-tui
yank -f sample-urls.txt -o downloads -c 4 --resume
# Download real files (Node.js, Git, Terraform, Python, etc.)
yank -f sample-urls.txt -o downloads -c 4 --resume -v
# Or download specific large files directly
yank \
-o downloads \
-c 3 \
https://nodejs.org/dist/v20.10.0/node-v20.10.0-linux-x64.tar.gz \
https://github.com/git/git/archive/refs/tags/v2.42.0.tar.gz \
https://github.com/hashicorp/terraform/releases/download/v1.6.0/terraform_1.6.0_linux_amd64.zip
# Download a public Google Drive file (auto-convert link, preserve name)
yank -o downloads \
'https://drive.google.com/file/d/1NPM60Ifw7Nsh-zadY-2HYphCzdcc2o4J/view?usp=sharing'
# Resume interrupted downloads (server must support HTTP Range)
yank -f sample-urls.txt -o downloads -c 2 --resume -v
# Verbose mode for retry logs
yank -f sample-urls.txt -o downloads --resume -c 4 -v
# Help
yank --helpDownload public files and folders from Google Drive share links:
# Single file download β automatically extracts file ID and converts to direct download URL
yank -o downloads 'https://drive.google.com/file/d/1NPM60Ifw7Nsh-zadY-2HYphCzdcc2o4J/view?usp=sharing'
# Multiple files (mixed URLs and Google Drive links)
yank -o downloads \
'https://drive.google.com/file/d/1NPM60Ifw7Nsh-zadY-2HYphCzdcc2o4J/view?usp=sharing' \
'https://github.com/user/repo/archive/main.zip'
# Google Drive folder β creates subfolder with folder's name and downloads all files
yank -o downloads 'https://drive.google.com/drive/folders/1tsBX0MU1UiXrQt3qOqb_ifthheT47AT0?usp=sharing'
# Creates: downloads/Yank-test/customers-10000.csv, downloads/Yank-test/leads-100000.csv, etc.
# Folder with concurrency and resume
yank -o downloads -c 4 --resume 'https://drive.google.com/drive/folders/1tsBX0MU1UiXrQt3qOqb_ifthheT47AT0'
# From file with Google Drive and other URLs
yank -f urls.txt -o downloads --resume -vHow it works:
- Yank detects Google Drive share links and automatically extracts the file/folder ID.
- For files: Converts to direct download URL (
google.com/uc?export=download&id=...). - For folders:
- Fetches the folder page and extracts the folder name from the page title
- Creates a local subfolder with the same name as the Drive folder
- Extracts embedded file metadata and downloads each file with its original name
- All files are saved in the named subfolder (e.g.,
downloads/Yank-test/file1.csv)
- Preserves original filenames for files and folders using Google Drive metadata; falls back to response headers when metadata is missing.
Supported:
- β Public Google Drive files (any share link with "Anyone with the link can view")
- β Public Google Drive folders β creates matching subfolder and downloads all files concurrently with resume support
- β Original filenames and folder names preserved from Google Drive metadata
- β Automatic folder structure creation (folder downloads create named subfolders)
- β Private files or folders (requires authentication β not yet supported)
β οΈ Nested folders: Currently downloads files only in the top-level folder (not subfolders)
Build the image:
docker build -t yank:latest .Run with URLs file mounted and output directory:
mkdir -p downloads
docker run --rm \
-v $PWD/sample-urls.txt:/urls.txt \
-v $PWD/downloads:/downloads \
yank:latest \
-f /urls.txt -o /downloads -c 4 --resume --no-tuiRun with inline URLs:
mkdir -p downloads
docker run --rm \
-v $PWD/downloads:/downloads \
yank:latest \
-o /downloads -c 3 --resume --no-tui \
https://nodejs.org/dist/v20.10.0/node-v20.10.0-linux-x64.tar.gz \
https://github.com/git/git/archive/refs/tags/v2.42.0.tar.gzNotes:
- TUI is disabled in the examples (
--no-tui) for cleaner container logs; enable it if you run interactively. - Ensure mounted paths exist and are writable by the container user.
URLS...positional: list of URLs (optional if using-f)-f, --file FILE: read URLs from file (one per line)-o, --output DIR: output directory (default.)-c, --concurrency N: worker count (default 4)-r, --retries N: retries per file (default 3)-v, --verbose: verbose logging (shows retries)--resume: resume partial downloads using HTTP Range
When using -f or --file, the URL file should follow these rules:
# Lines starting with # are treated as comments and ignored
# This is a comment
# One URL per line
https://example.com/file1.zip
https://example.com/file2.tar.gz
# Blank lines are ignored
# Google Drive files and folders are supported
https://drive.google.com/file/d/FILE_ID/view?usp=sharing
https://drive.google.com/drive/folders/FOLDER_ID?usp=sharing
# Comments can describe the downloads
# The following will download Node.js v20
https://nodejs.org/dist/v20.10.0/node-v20.10.0-linux-x64.tar.gzRules:
- β One URL per line
- β
Lines starting with
#are comments (ignored) - β Blank lines are ignored
- β Leading/trailing spaces are trimmed
- β Supports HTTP, HTTPS, and Google Drive URLs
- β No special escaping needed
See sample-urls.txt for a complete example.
--no-tui: disable the default TUI (progress bars)-q, --quiet: quiet mode; suppress progress output (summary only)
- Batch download from a list with resume + verbose: scripts/download_list.sh
π Yank v0.2.1 - Concurrent Downloader
π¦ Downloading 1 files with concurrency 4
β
[100% | 1/1] 10 MB (5 MB/s) v2.42.0.tar.gz
β¨ Download Summary:
β
Success: 1
β Failed: 0
π Total: 10 MB
β±οΈ Time: 2s
- Resume depends on the server supporting HTTP Range.
- Speed can show 0 B/s for near-instant downloads (elapsed rounds to zero).
- Retries are per-file; if exhausted, the file is marked failed.
- Google Drive folders: Currently downloads only top-level files. Nested subfolders are not recursively downloaded.
Track real runs in docs/benchmarks.md.
- See docs/contributing.md for full guidelines.
- Fork the repo, create a feature branch, and open a pull request.
- Keep changes focused; add tests or sample runs when relevant.
- Run a local check (binary or Docker) to validate downloads and resume behavior before submitting.
- Prebuilt binaries are published on GitHub Releases for each tagged version.
- CI automatically builds and uploads assets on tagged pushes
vX.Y.Z. - Release notes: draft with docs/release-template.md.
MIT; see LICENSE.
- Usage examples: docs/
- Benchmarks and sample runs: docs/benchmarks.md, docs/sample-run.txt
- Release notes template: docs/release-template.md
- Sample list:
sample-urls.txt - Place
yankon PATH (or use the Docker image).