Skip to content

Yank is a fast CLI downloader that fetches multiple files concurrently from URLs, with progress tracking, resume support, and automatic retries. Read URLs from a file or arguments and parallelize downloads efficiently.

License

Notifications You must be signed in to change notification settings

HosseinAbedi/Yank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Yank Logo


A simple, fast downloader with resume and progress. Use the standalone binary or Docker β€” no Haskell setup required.

Why Yank

  • Built for flaky networks: resume via HTTP Range and per-file retries
  • Clear progress: percent, file count, bytes, aggregate speed
  • Simple inputs: positional URLs or -f file.txt

Highlights

  • ⚑ Concurrent downloads with configurable worker pool
  • πŸ”„ Automatic retries
  • πŸ“Š Progress line with % complete, per-file size, and aggregate speed
  • ⏸️ Resume partial downloads (HTTP Range)
  • πŸ“‚ Read URLs from file (-f file.txt, one URL per line)
  • 🎯 Type-safe CLI (optparse-applicative) and robust error reporting
  • πŸ”— Google Drive support: Download public files and folders directly with automatic URL conversion and preserve original filenames

Why not a shell script?

  • Built-in resume and retries with HTTP Range handling (shell curl loops often restart from scratch).
  • Live progress UI (TUI or quiet summary) across multiple files, not per-process logs.
  • Concurrency with backpressure via worker pool instead of ad-hoc xargs/background jobs.
  • Consistent summary (success/fail/bytes/time) and exit codes for automation.
  • Single static binary or Docker image; no bespoke dependencies or fragile per-URL logic.

Comparison with Other Tools

vs. wget/curl

Feature Yank wget curl
Concurrent downloads βœ… Built-in worker pool ❌ Manual scripting needed ❌ Manual scripting needed
Resume support βœ… HTTP Range with -r flag βœ… -c flag βœ… -C - flag
Progress tracking βœ… Aggregate multi-file TUI ⚠️ Per-file only ⚠️ Per-file only
Automatic retries βœ… Configurable per-file ⚠️ Limited (--tries) ❌ Manual implementation
Google Drive files βœ… Auto-detect & convert URLs ❌ No support ❌ No support
Google Drive folders βœ… Extract & download all files ❌ No support ❌ No support
URL list from file βœ… -f file.txt βœ… -i file.txt ❌ Requires xargs
Summary report βœ… Success/fail/bytes/time ❌ No aggregate stats ❌ No aggregate stats

vs. gdown

Feature Yank gdown
Google Drive files βœ… Auto-detect share links βœ… Supports file IDs
Google Drive folders βœ… All files, no limit ⚠️ Up to 50 files only
Folder structure βœ… Creates named subfolders ❌ Downloads to current dir
Mixed URL types βœ… Drive + HTTP/HTTPS in one run ❌ Drive-only
Concurrent downloads βœ… Configurable workers ❌ Sequential only
Resume support βœ… HTTP Range for all sources ❌ No resume
Progress tracking βœ… Real-time TUI with speed ⚠️ Basic progress bar
Automatic retries βœ… Configurable per-file ❌ No retries
Multiple URLs βœ… From CLI or file ⚠️ Single file at a time
Authentication ❌ Public links only ⚠️ Limited auth support

Key advantages of Yank:

  • No 50-file limit for Google Drive folders (gdown limitation)
  • Mixed sources: Download from Google Drive, GitHub releases, direct URLs in one command
  • Production-ready: Concurrent downloads, retries, resume, and comprehensive error handling
  • Folder organization: Automatically creates subfolders matching Drive folder names

Downloads

  • Prebuilt archives are published on GitHub Releases (tar.gz/zip per OS/arch).
  • After extracting, place the binary on your PATH (e.g., ~/.local/bin on Linux/macOS or %USERPROFILE%\bin on Windows).

Download and Install the Binary

Head over to the Releases page and grab the latest Linux archive (usually named yank-vX.Y.Z-linux-amd64.tar.gz).

Once you have the file on your machine or server, follow these steps to install it:

# Unzip the archive
tar -xzvf yank-vX.Y.Z-linux-amd64.tar.gz

# Make the binary executable
chmod +x yank

# Move it to your local bin path so you can run it from anywhere
sudo mv yank /usr/local/bin/

Verify the installation:

# Check that yank is accessible from anywhere
which yank

# Display the version to confirm it's working
yank --version

If which yank doesn't show /usr/local/bin/yank, ensure /usr/local/bin is in your $PATH:

echo $PATH
# Should include /usr/local/bin

# If not, add it to your shell profile (~/.bashrc, ~/.zshrc, etc.):
export PATH="/usr/local/bin:$PATH"

Examples

Basic Usage: Parallel Data Pulls

The most common use case for ML engineers is downloading a list of URLs from a text file. If you have a datasets.txt containing links to your CSVs or other types of files:

# Publicly available direct-download CSV, ZIP and GZ files
# Last checked functional around early 2026
# All links are from open government, academic or well-known open data sources
# No login / payment required

# ────────────────────────────────────────────────
# Plain .csv files (direct download, no compression)
# ────────────────────────────────────────────────

https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv
# Very small classic dataset: monthly international airline passengers 1958–1960

https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv
# Tiny example file with fake names and addresses

https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv
# Very small: office workers height, weight, age, etc.

https://img.exim.gov/s3fs-public/dataset/vbhv-d8am/Data.Gov_-_FY25_Q3.csv
# Example export-related data snapshot from export.gov / data.gov

https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
# EPA Smart Location Database – large (~ hundreds of MB), good for urban / transport analysis

# ────────────────────────────────────────────────
# .zip files (archives, usually containing CSVs inside)
# ────────────────────────────────────────────────

https://edg.epa.gov/EPADataCommons/public/OA/WalkabilityIndex.zip
# EPA National Walkability Index dataset

https://galaxy-zoo-1.s3.amazonaws.com/GalaxyZoo1_DR_table5.csv.zip
# Galaxy Zoo citizen science project – classifications table

# ────────────────────────────────────────────────
# .gz / .gzip files (compressed, usually TSV or CSV inside)
# ────────────────────────────────────────────────

https://datasets.imdbws.com/title.basics.tsv.gz
# IMDb – title basics (movies, series, episodes, etc.) – very popular, ~1 GB uncompressed

https://datasets.imdbws.com/title.ratings.tsv.gz
# IMDb – user ratings and vote counts

https://datasets.imdbws.com/title.akas.tsv.gz
# IMDb – alternate titles / regions / languages

https://galaxy-zoo-1.s3.amazonaws.com/GalaxyZoo1_DR_table2.csv.gz
# Galaxy Zoo – another classifications / demographics table

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE147nnn/GSE147507/suppl/GSE147507_counts_processed_ENSEMBL.txt.gz
# NCBI GEO – single-cell RNA-seq count matrix example (COVID-19 PBMC study)

# ────────────────────────────────────────────────
# Quick test / funny / tiny files
# ────────────────────────────────────────────────

https://people.sc.fsu.edu/~jburkardt/data/csv/cities.csv
# Very small list of world cities with population & country

https://people.sc.fsu.edu/~jburkardt/data/csv/oscar_age_male.csv
# Small: ages of Best Actor Oscar winners

Then run:

yank -c 5 -o downloads -f datasets.txt --no-tui --retries 1

Example output:

πŸš€ Yank v0.2.1 - Concurrent Downloader
πŸ“¦ Downloading 15 files with concurrency 5

βœ… [  6% | 1/15] 18 MB (18 MB/s) Data.Gov_-_FY25_Q3.csv
βœ… [ 13% | 2/15] 321 B (18 MB/s) airtravel.csv
βœ… [ 20% | 3/15] 328 B (18 MB/s) addresses.csv
βœ… [ 26% | 4/15] 849 B (18 MB/s) biostats.csv
βœ… [ 33% | 5/15] 3 MB (5 MB/s) GalaxyZoo1_DR_table5.csv.zip
βœ… [ 40% | 6/15] 7 MB (5 MB/s) title.ratings.tsv.gz
βœ… [ 46% | 7/15] 206 MB (39 MB/s) title.basics.tsv.gz
βœ… [ 53% | 8/15] 19 MB (21 MB/s) GalaxyZoo1_DR_table2.csv.gz
❌ [ 60% | 9/15] 0 B (18 MB/s) GSE147507_counts_processed_ENSEMBL.txt.gz
βœ… [ 66% | 10/15] 454 MB (50 MB/s) title.akas.tsv.gz
βœ… [ 73% | 11/15] 8 KB (47 MB/s) cities.csv
βœ… [ 80% | 12/15] 4 KB (44 MB/s) oscar_age_male.csv
❌ [ 86% | 13/15] 0 B (22 MB/s) Individual_Incident_Archive_CSV.zip
βœ… [ 93% | 14/15] 192 MB (11 MB/s) EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
βœ… [100% | 15/15] 405 MB (8 MB/s) WalkabilityIndex.zip

✨ Download Summary:
   βœ… Success: 13
   ❌ Failed:  2
   πŸ“Š Total:   1 GB
   ⏱️  Time:    2m 26s

❌ Failed downloads:
   - https://dasil.sites.grinnell.edu/DataRepository/NIBRS/Individual_Incident_Archive_CSV.zip: Max retries exceeded
   - https://ftp.ncbi.nlm.nih.gov/geo/series/GSE147nnn/GSE147507/suppl/GSE147507_counts_processed_ENSEMBL.txt.gz: Max retries exceeded

Standalone Binary

# TUI is enabled by default; disable via --no-tui
yank -f sample-urls.txt -o downloads -c 4 --resume

# Download real files (Node.js, Git, Terraform, Python, etc.)
yank -f sample-urls.txt -o downloads -c 4 --resume -v

# Or download specific large files directly
yank \
  -o downloads \
  -c 3 \
  https://nodejs.org/dist/v20.10.0/node-v20.10.0-linux-x64.tar.gz \
  https://github.com/git/git/archive/refs/tags/v2.42.0.tar.gz \
  https://github.com/hashicorp/terraform/releases/download/v1.6.0/terraform_1.6.0_linux_amd64.zip

# Download a public Google Drive file (auto-convert link, preserve name)
yank -o downloads \
  'https://drive.google.com/file/d/1NPM60Ifw7Nsh-zadY-2HYphCzdcc2o4J/view?usp=sharing'

# Resume interrupted downloads (server must support HTTP Range)
yank -f sample-urls.txt -o downloads -c 2 --resume -v

# Verbose mode for retry logs
yank -f sample-urls.txt -o downloads --resume -c 4 -v

# Help
yank --help

Google Drive

Download public files and folders from Google Drive share links:

# Single file download β€” automatically extracts file ID and converts to direct download URL
yank -o downloads 'https://drive.google.com/file/d/1NPM60Ifw7Nsh-zadY-2HYphCzdcc2o4J/view?usp=sharing'

# Multiple files (mixed URLs and Google Drive links)
yank -o downloads \
  'https://drive.google.com/file/d/1NPM60Ifw7Nsh-zadY-2HYphCzdcc2o4J/view?usp=sharing' \
  'https://github.com/user/repo/archive/main.zip'

# Google Drive folder β€” creates subfolder with folder's name and downloads all files
yank -o downloads 'https://drive.google.com/drive/folders/1tsBX0MU1UiXrQt3qOqb_ifthheT47AT0?usp=sharing'
# Creates: downloads/Yank-test/customers-10000.csv, downloads/Yank-test/leads-100000.csv, etc.

# Folder with concurrency and resume
yank -o downloads -c 4 --resume 'https://drive.google.com/drive/folders/1tsBX0MU1UiXrQt3qOqb_ifthheT47AT0'

# From file with Google Drive and other URLs
yank -f urls.txt -o downloads --resume -v

How it works:

  • Yank detects Google Drive share links and automatically extracts the file/folder ID.
  • For files: Converts to direct download URL (google.com/uc?export=download&id=...).
  • For folders:
    • Fetches the folder page and extracts the folder name from the page title
    • Creates a local subfolder with the same name as the Drive folder
    • Extracts embedded file metadata and downloads each file with its original name
    • All files are saved in the named subfolder (e.g., downloads/Yank-test/file1.csv)
  • Preserves original filenames for files and folders using Google Drive metadata; falls back to response headers when metadata is missing.

Supported:

  • βœ… Public Google Drive files (any share link with "Anyone with the link can view")
  • βœ… Public Google Drive folders β€” creates matching subfolder and downloads all files concurrently with resume support
  • βœ… Original filenames and folder names preserved from Google Drive metadata
  • βœ… Automatic folder structure creation (folder downloads create named subfolders)
  • ❌ Private files or folders (requires authentication β€” not yet supported)
  • ⚠️ Nested folders: Currently downloads files only in the top-level folder (not subfolders)

Docker

Build the image:

docker build -t yank:latest .

Run with URLs file mounted and output directory:

mkdir -p downloads
docker run --rm \
  -v $PWD/sample-urls.txt:/urls.txt \
  -v $PWD/downloads:/downloads \
  yank:latest \
  -f /urls.txt -o /downloads -c 4 --resume --no-tui

Run with inline URLs:

mkdir -p downloads
docker run --rm \
  -v $PWD/downloads:/downloads \
  yank:latest \
  -o /downloads -c 3 --resume --no-tui \
  https://nodejs.org/dist/v20.10.0/node-v20.10.0-linux-x64.tar.gz \
  https://github.com/git/git/archive/refs/tags/v2.42.0.tar.gz

Notes:

  • TUI is disabled in the examples (--no-tui) for cleaner container logs; enable it if you run interactively.
  • Ensure mounted paths exist and are writable by the container user.

CLI Flags

  • URLS... positional: list of URLs (optional if using -f)
  • -f, --file FILE : read URLs from file (one per line)
  • -o, --output DIR: output directory (default .)
  • -c, --concurrency N: worker count (default 4)
  • -r, --retries N : retries per file (default 3)
  • -v, --verbose : verbose logging (shows retries)
  • --resume : resume partial downloads using HTTP Range

URL File Format

When using -f or --file, the URL file should follow these rules:

# Lines starting with # are treated as comments and ignored
# This is a comment

# One URL per line
https://example.com/file1.zip
https://example.com/file2.tar.gz

# Blank lines are ignored


# Google Drive files and folders are supported
https://drive.google.com/file/d/FILE_ID/view?usp=sharing
https://drive.google.com/drive/folders/FOLDER_ID?usp=sharing

# Comments can describe the downloads
# The following will download Node.js v20
https://nodejs.org/dist/v20.10.0/node-v20.10.0-linux-x64.tar.gz

Rules:

  • βœ… One URL per line
  • βœ… Lines starting with # are comments (ignored)
  • βœ… Blank lines are ignored
  • βœ… Leading/trailing spaces are trimmed
  • βœ… Supports HTTP, HTTPS, and Google Drive URLs
  • βœ… No special escaping needed

See sample-urls.txt for a complete example.

  • --no-tui : disable the default TUI (progress bars)
  • -q, --quiet : quiet mode; suppress progress output (summary only)

CLI Demo

asciicast

Scripts

Example Output

πŸš€ Yank v0.2.1 - Concurrent Downloader
πŸ“¦ Downloading 1 files with concurrency 4

βœ… [100% | 1/1] 10 MB (5 MB/s) v2.42.0.tar.gz

✨ Download Summary:
   βœ… Success: 1
   ❌ Failed:  0
   πŸ“Š Total:   10 MB
   ⏱️  Time:    2s

Known Limitations

  • Resume depends on the server supporting HTTP Range.
  • Speed can show 0 B/s for near-instant downloads (elapsed rounds to zero).
  • Retries are per-file; if exhausted, the file is marked failed.
  • Google Drive folders: Currently downloads only top-level files. Nested subfolders are not recursively downloaded.

Benchmarks

Track real runs in docs/benchmarks.md.

Contributing

  • See docs/contributing.md for full guidelines.
  • Fork the repo, create a feature branch, and open a pull request.
  • Keep changes focused; add tests or sample runs when relevant.
  • Run a local check (binary or Docker) to validate downloads and resume behavior before submitting.

Releases

  • Prebuilt binaries are published on GitHub Releases for each tagged version.
  • CI automatically builds and uploads assets on tagged pushes vX.Y.Z.
  • Release notes: draft with docs/release-template.md.

License

MIT; see LICENSE.

Documentation

Tips

  • Sample list: sample-urls.txt
  • Place yank on PATH (or use the Docker image).

About

Yank is a fast CLI downloader that fetches multiple files concurrently from URLs, with progress tracking, resume support, and automatic retries. Read URLs from a file or arguments and parallelize downloads efficiently.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published