GitHub - HosseinAbedi/Yank: Yank is a fast CLI downloader that fetches multiple files concurrently from URLs, with progress tracking, resume support, and automatic retries. Read URLs from a file or arguments and parallelize downloads efficiently.

A simple, fast downloader with resume and progress. Use the standalone binary or Docker — no Haskell setup required.

Why Yank

Built for flaky networks: resume via HTTP Range and per-file retries
Clear progress: percent, file count, bytes, aggregate speed
Simple inputs: positional URLs or -f file.txt

Highlights

⚡ Concurrent downloads with configurable worker pool
🔄 Automatic retries
📊 Progress line with % complete, per-file size, and aggregate speed
⏸️ Resume partial downloads (HTTP Range)
📂 Read URLs from file (-f file.txt, one URL per line)
🎯 Type-safe CLI (optparse-applicative) and robust error reporting
🔗 Google Drive support: Download public files and folders directly with automatic URL conversion and preserve original filenames

Why not a shell script?

Built-in resume and retries with HTTP Range handling (shell curl loops often restart from scratch).
Live progress UI (TUI or quiet summary) across multiple files, not per-process logs.
Concurrency with backpressure via worker pool instead of ad-hoc xargs/background jobs.
Consistent summary (success/fail/bytes/time) and exit codes for automation.
Single static binary or Docker image; no bespoke dependencies or fragile per-URL logic.

Comparison with Other Tools

vs. wget/curl

Feature	Yank	wget	curl
Concurrent downloads	✅ Built-in worker pool	❌ Manual scripting needed	❌ Manual scripting needed
Resume support	✅ HTTP Range with `-r` flag	✅ `-c` flag	✅ `-C -` flag
Progress tracking	✅ Aggregate multi-file TUI	⚠️ Per-file only	⚠️ Per-file only
Automatic retries	✅ Configurable per-file	⚠️ Limited (`--tries`)	❌ Manual implementation
Google Drive files	✅ Auto-detect & convert URLs	❌ No support	❌ No support
Google Drive folders	✅ Extract & download all files	❌ No support	❌ No support
URL list from file	✅ `-f file.txt`	✅ `-i file.txt`	❌ Requires xargs
Summary report	✅ Success/fail/bytes/time	❌ No aggregate stats	❌ No aggregate stats

vs. gdown

Feature	Yank	gdown
Google Drive files	✅ Auto-detect share links	✅ Supports file IDs
Google Drive folders	✅ All files, no limit	⚠️ Up to 50 files only
Folder structure	✅ Creates named subfolders	❌ Downloads to current dir
Mixed URL types	✅ Drive + HTTP/HTTPS in one run	❌ Drive-only
Concurrent downloads	✅ Configurable workers	❌ Sequential only
Resume support	✅ HTTP Range for all sources	❌ No resume
Progress tracking	✅ Real-time TUI with speed	⚠️ Basic progress bar
Automatic retries	✅ Configurable per-file	❌ No retries
Multiple URLs	✅ From CLI or file	⚠️ Single file at a time
Authentication	❌ Public links only	⚠️ Limited auth support

Key advantages of Yank:

No 50-file limit for Google Drive folders (gdown limitation)
Mixed sources: Download from Google Drive, GitHub releases, direct URLs in one command
Production-ready: Concurrent downloads, retries, resume, and comprehensive error handling
Folder organization: Automatically creates subfolders matching Drive folder names

Downloads

Prebuilt archives are published on GitHub Releases (tar.gz/zip per OS/arch).
After extracting, place the binary on your PATH (e.g., ~/.local/bin on Linux/macOS or %USERPROFILE%\bin on Windows).

Download and Install the Binary

Head over to the Releases page and grab the latest Linux archive (usually named yank-vX.Y.Z-linux-amd64.tar.gz).

Once you have the file on your machine or server, follow these steps to install it:

# Unzip the archive
tar -xzvf yank-vX.Y.Z-linux-amd64.tar.gz

# Make the binary executable
chmod +x yank

# Move it to your local bin path so you can run it from anywhere
sudo mv yank /usr/local/bin/

Verify the installation:

# Check that yank is accessible from anywhere
which yank

# Display the version to confirm it's working
yank --version

If which yank doesn't show /usr/local/bin/yank, ensure /usr/local/bin is in your $PATH:

echo $PATH
# Should include /usr/local/bin

# If not, add it to your shell profile (~/.bashrc, ~/.zshrc, etc.):
export PATH="/usr/local/bin:$PATH"

Examples

Basic Usage: Parallel Data Pulls

The most common use case for ML engineers is downloading a list of URLs from a text file. If you have a datasets.txt containing links to your CSVs or other types of files:

# Publicly available direct-download CSV, ZIP and GZ files
# Last checked functional around early 2026
# All links are from open government, academic or well-known open data sources
# No login / payment required

# ────────────────────────────────────────────────
# Plain .csv files (direct download, no compression)
# ────────────────────────────────────────────────

https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv
# Very small classic dataset: monthly international airline passengers 1958–1960

https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv
# Tiny example file with fake names and addresses

https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv
# Very small: office workers height, weight, age, etc.

https://img.exim.gov/s3fs-public/dataset/vbhv-d8am/Data.Gov_-_FY25_Q3.csv
# Example export-related data snapshot from export.gov / data.gov

https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
# EPA Smart Location Database – large (~ hundreds of MB), good for urban / transport analysis

# ────────────────────────────────────────────────
# .zip files (archives, usually containing CSVs inside)
# ────────────────────────────────────────────────

https://edg.epa.gov/EPADataCommons/public/OA/WalkabilityIndex.zip
# EPA National Walkability Index dataset

https://galaxy-zoo-1.s3.amazonaws.com/GalaxyZoo1_DR_table5.csv.zip
# Galaxy Zoo citizen science project – classifications table

# ────────────────────────────────────────────────
# .gz / .gzip files (compressed, usually TSV or CSV inside)
# ────────────────────────────────────────────────

https://datasets.imdbws.com/title.basics.tsv.gz
# IMDb – title basics (movies, series, episodes, etc.) – very popular, ~1 GB uncompressed

https://datasets.imdbws.com/title.ratings.tsv.gz
# IMDb – user ratings and vote counts

https://datasets.imdbws.com/title.akas.tsv.gz
# IMDb – alternate titles / regions / languages

https://galaxy-zoo-1.s3.amazonaws.com/GalaxyZoo1_DR_table2.csv.gz
# Galaxy Zoo – another classifications / demographics table

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE147nnn/GSE147507/suppl/GSE147507_counts_processed_ENSEMBL.txt.gz
# NCBI GEO – single-cell RNA-seq count matrix example (COVID-19 PBMC study)

# ────────────────────────────────────────────────
# Quick test / funny / tiny files
# ────────────────────────────────────────────────

https://people.sc.fsu.edu/~jburkardt/data/csv/cities.csv
# Very small list of world cities with population & country

https://people.sc.fsu.edu/~jburkardt/data/csv/oscar_age_male.csv
# Small: ages of Best Actor Oscar winners

Then run:

yank -c 5 -o downloads -f datasets.txt --no-tui --retries 1

Example output:

🚀 Yank v0.2.1 - Concurrent Downloader
📦 Downloading 15 files with concurrency 5

✅ [  6% | 1/15] 18 MB (18 MB/s) Data.Gov_-_FY25_Q3.csv
✅ [ 13% | 2/15] 321 B (18 MB/s) airtravel.csv
✅ [ 20% | 3/15] 328 B (18 MB/s) addresses.csv
✅ [ 26% | 4/15] 849 B (18 MB/s) biostats.csv
✅ [ 33% | 5/15] 3 MB (5 MB/s) GalaxyZoo1_DR_table5.csv.zip
✅ [ 40% | 6/15] 7 MB (5 MB/s) title.ratings.tsv.gz
✅ [ 46% | 7/15] 206 MB (39 MB/s) title.basics.tsv.gz
✅ [ 53% | 8/15] 19 MB (21 MB/s) GalaxyZoo1_DR_table2.csv.gz
❌ [ 60% | 9/15] 0 B (18 MB/s) GSE147507_counts_processed_ENSEMBL.txt.gz
✅ [ 66% | 10/15] 454 MB (50 MB/s) title.akas.tsv.gz
✅ [ 73% | 11/15] 8 KB (47 MB/s) cities.csv
✅ [ 80% | 12/15] 4 KB (44 MB/s) oscar_age_male.csv
❌ [ 86% | 13/15] 0 B (22 MB/s) Individual_Incident_Archive_CSV.zip
✅ [ 93% | 14/15] 192 MB (11 MB/s) EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
✅ [100% | 15/15] 405 MB (8 MB/s) WalkabilityIndex.zip

✨ Download Summary:
   ✅ Success: 13
   ❌ Failed:  2
   📊 Total:   1 GB
   ⏱️  Time:    2m 26s

❌ Failed downloads:
   - https://dasil.sites.grinnell.edu/DataRepository/NIBRS/Individual_Incident_Archive_CSV.zip: Max retries exceeded
   - https://ftp.ncbi.nlm.nih.gov/geo/series/GSE147nnn/GSE147507/suppl/GSE147507_counts_processed_ENSEMBL.txt.gz: Max retries exceeded

Standalone Binary

# TUI is enabled by default; disable via --no-tui
yank -f sample-urls.txt -o downloads -c 4 --resume

# Download real files (Node.js, Git, Terraform, Python, etc.)
yank -f sample-urls.txt -o downloads -c 4 --resume -v

# Or download specific large files directly
yank \
  -o downloads \
  -c 3 \
  https://nodejs.org/dist/v20.10.0/node-v20.10.0-linux-x64.tar.gz \
  https://github.com/git/git/archive/refs/tags/v2.42.0.tar.gz \
  https://github.com/hashicorp/terraform/releases/download/v1.6.0/terraform_1.6.0_linux_amd64.zip

# Download a public Google Drive file (auto-convert link, preserve name)
yank -o downloads \
  'https://drive.google.com/file/d/1NPM60Ifw7Nsh-zadY-2HYphCzdcc2o4J/view?usp=sharing'

# Resume interrupted downloads (server must support HTTP Range)
yank -f sample-urls.txt -o downloads -c 2 --resume -v

# Verbose mode for retry logs
yank -f sample-urls.txt -o downloads --resume -c 4 -v

# Help
yank --help

Google Drive

Download public files and folders from Google Drive share links:

# Single file download — automatically extracts file ID and converts to direct download URL
yank -o downloads 'https://drive.google.com/file/d/1NPM60Ifw7Nsh-zadY-2HYphCzdcc2o4J/view?usp=sharing'

# Multiple files (mixed URLs and Google Drive links)
yank -o downloads \
  'https://drive.google.com/file/d/1NPM60Ifw7Nsh-zadY-2HYphCzdcc2o4J/view?usp=sharing' \
  'https://github.com/user/repo/archive/main.zip'

# Google Drive folder — creates subfolder with folder's name and downloads all files
yank -o downloads 'https://drive.google.com/drive/folders/1tsBX0MU1UiXrQt3qOqb_ifthheT47AT0?usp=sharing'
# Creates: downloads/Yank-test/customers-10000.csv, downloads/Yank-test/leads-100000.csv, etc.

# Folder with concurrency and resume
yank -o downloads -c 4 --resume 'https://drive.google.com/drive/folders/1tsBX0MU1UiXrQt3qOqb_ifthheT47AT0'

# From file with Google Drive and other URLs
yank -f urls.txt -o downloads --resume -v

How it works:

Yank detects Google Drive share links and automatically extracts the file/folder ID.
For files: Converts to direct download URL (google.com/uc?export=download&id=...).
For folders:
- Fetches the folder page and extracts the folder name from the page title
- Creates a local subfolder with the same name as the Drive folder
- Extracts embedded file metadata and downloads each file with its original name
- All files are saved in the named subfolder (e.g., downloads/Yank-test/file1.csv)
Preserves original filenames for files and folders using Google Drive metadata; falls back to response headers when metadata is missing.

Supported:

✅ Public Google Drive files (any share link with "Anyone with the link can view")
✅ Public Google Drive folders — creates matching subfolder and downloads all files concurrently with resume support
✅ Original filenames and folder names preserved from Google Drive metadata
✅ Automatic folder structure creation (folder downloads create named subfolders)
❌ Private files or folders (requires authentication — not yet supported)
⚠️ Nested folders: Currently downloads files only in the top-level folder (not subfolders)

Docker

Build the image:

docker build -t yank:latest .

Run with URLs file mounted and output directory:

mkdir -p downloads
docker run --rm \
  -v $PWD/sample-urls.txt:/urls.txt \
  -v $PWD/downloads:/downloads \
  yank:latest \
  -f /urls.txt -o /downloads -c 4 --resume --no-tui

Run with inline URLs:

mkdir -p downloads
docker run --rm \
  -v $PWD/downloads:/downloads \
  yank:latest \
  -o /downloads -c 3 --resume --no-tui \
  https://nodejs.org/dist/v20.10.0/node-v20.10.0-linux-x64.tar.gz \
  https://github.com/git/git/archive/refs/tags/v2.42.0.tar.gz

Notes:

TUI is disabled in the examples (--no-tui) for cleaner container logs; enable it if you run interactively.
Ensure mounted paths exist and are writable by the container user.

CLI Flags

URLS... positional: list of URLs (optional if using -f)
-f, --file FILE : read URLs from file (one per line)
-o, --output DIR: output directory (default .)
-c, --concurrency N: worker count (default 4)
-r, --retries N : retries per file (default 3)
-v, --verbose : verbose logging (shows retries)
--resume : resume partial downloads using HTTP Range

URL File Format

When using -f or --file, the URL file should follow these rules:

# Lines starting with # are treated as comments and ignored
# This is a comment

# One URL per line
https://example.com/file1.zip
https://example.com/file2.tar.gz

# Blank lines are ignored


# Google Drive files and folders are supported
https://drive.google.com/file/d/FILE_ID/view?usp=sharing
https://drive.google.com/drive/folders/FOLDER_ID?usp=sharing

# Comments can describe the downloads
# The following will download Node.js v20
https://nodejs.org/dist/v20.10.0/node-v20.10.0-linux-x64.tar.gz

Rules:

✅ One URL per line
✅ Lines starting with # are comments (ignored)
✅ Blank lines are ignored
✅ Leading/trailing spaces are trimmed
✅ Supports HTTP, HTTPS, and Google Drive URLs
✅ No special escaping needed

See sample-urls.txt for a complete example.

--no-tui : disable the default TUI (progress bars)
-q, --quiet : quiet mode; suppress progress output (summary only)

CLI Demo

Scripts

Batch download from a list with resume + verbose: scripts/download_list.sh

Example Output

🚀 Yank v0.2.1 - Concurrent Downloader
📦 Downloading 1 files with concurrency 4

✅ [100% | 1/1] 10 MB (5 MB/s) v2.42.0.tar.gz

✨ Download Summary:
   ✅ Success: 1
   ❌ Failed:  0
   📊 Total:   10 MB
   ⏱️  Time:    2s

Known Limitations

Resume depends on the server supporting HTTP Range.
Speed can show 0 B/s for near-instant downloads (elapsed rounds to zero).
Retries are per-file; if exhausted, the file is marked failed.
Google Drive folders: Currently downloads only top-level files. Nested subfolders are not recursively downloaded.

Benchmarks

Track real runs in docs/benchmarks.md.

Contributing

See docs/contributing.md for full guidelines.
Fork the repo, create a feature branch, and open a pull request.
Keep changes focused; add tests or sample runs when relevant.
Run a local check (binary or Docker) to validate downloads and resume behavior before submitting.

Releases

Prebuilt binaries are published on GitHub Releases for each tagged version.
CI automatically builds and uploads assets on tagged pushes vX.Y.Z.
Release notes: draft with docs/release-template.md.

License

MIT; see LICENSE.

Documentation

Usage examples: docs/
Benchmarks and sample runs: docs/benchmarks.md, docs/sample-run.txt
Release notes template: docs/release-template.md

Tips

Sample list: sample-urls.txt
Place yank on PATH (or use the Docker image).

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PUBLISHING.md		PUBLISHING.md
README.md		README.md
Setup.hs		Setup.hs
sample-urls.txt		sample-urls.txt
stack.yaml		stack.yaml
stack.yaml.lock		stack.yaml.lock
yank-logo.jpg		yank-logo.jpg
yank.cabal		yank.cabal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why Yank

Highlights

Why not a shell script?

Comparison with Other Tools

vs. wget/curl

vs. gdown

Downloads

Download and Install the Binary

Examples

Basic Usage: Parallel Data Pulls

Standalone Binary

Google Drive

Docker

CLI Flags

URL File Format

CLI Demo

Scripts

Example Output

Known Limitations

Benchmarks

Contributing

Releases

License

Documentation

Tips

About

Uh oh!

Releases 2

Packages

Languages

License

HosseinAbedi/Yank

Folders and files

Latest commit

History

Repository files navigation

Why Yank

Highlights

Why not a shell script?

Comparison with Other Tools

vs. wget/curl

vs. gdown

Downloads

Download and Install the Binary

Examples

Basic Usage: Parallel Data Pulls

Standalone Binary

Google Drive

Docker

CLI Flags

URL File Format

CLI Demo

Scripts

Example Output

Known Limitations

Benchmarks

Contributing

Releases

License

Documentation

Tips

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages