Tool for downloading files from GHArchive.org, with support for recompressing them to Zstandard.
usage: download.py [-h] [-c CONCURRENCY] [-f FROM_] [-o OUTPUT] [-r] [-t TO]
options:
-h, --help show this help message and exit
-c, --concurrency CONCURRENCY
Number of concurrent download tasks
-f, --from FROM_ Only include files since this date
-o, --output OUTPUT Output path format including filename [default: %Y-%m/%Y-%m-%d-%-H.json.gz]
-r, --recompress Recompress files using Zstandard, reducing sizes by ~60% over Gzip
-t, --to TO Only include files up to this date
docker run --rm -v ./gharchive:/data tcrinky/gharchive-downloader --recompress --from 2020-01-01 --to 2021-01-01 --concurrency 4