π GitHub Action and CLI tool to archive YouTube channels by automatically uploading an entire YouTube channel to archive.org in few clicks.
- All you need is an Internet Archive account.
- β‘οΈ To use this tool as a GitHub Action, jump to GitHub Action: Getting Started.
- π§βπ» To use this tool as a command line interface (CLI), jump to CLI: Getting Started.
Using internetarchive-youtube as a GitHub Action instructions
-
Enable the workflows in your fork.
-
Add your Archive.org credentials to the repository's actions secrets:
ARCHIVE_USER_EMAIL
ARCHIVE_PASSWORD
- Add a list of the channels you want to archive as a
CHANNELS
secret to the repository's actions secrets:
The CHANNELS
secret should be formatted like this example:
CHANNEL_NAME: CHANNEL_URL
FOO: FOO_CHANNEL_URL
FOOBAR: FOOBAR_CHANNEL_URL
SOME_CHANNEL: SOME_CHANNEL_URL
Don't add any quotes around the name or the URL, and make sure to keep one space between the colon and the URL.
- Add the database secret(s) to the repository's Actions secrets:
If you picked option 1 (MongoDB), add this secret:
MONGODB_CONNECTION_STRING
The value of the secret is the database conneciton string.
If you picked option 2 (JSON bin), add this additional secret:
JSONBIN_KEY
The value of this secret is the MASTER KEY token you copied from JSONbin.
-
(optional) You can add command line options other than the defaults by creating a secret called
CLI_OPTIONS
and adding the options to the secret. See the CLI: Getting Started for a list of all the available options. -
Run the workflow under
Actions
manually, or wait for it to run automatically every 6 hours.
That's it! π
Using internetarchive-youtube as a CLI tool instructions
- π Python>=3.7
pip install internetarchive-youtube
Then login to internetarchive:
ia configure
-
Create a backend database (or JSON bin) to track the download/upload overall progress.
-
If you choose MongoDB, export the connection string as an environment variable:
export MONGODB_CONNECTION_STRING=mongodb://username:password@host:port
# or add it to your shell configuration file:
echo "MONGODB_CONNECTION_STRING=$MONGODB_CONNECTION_STRING" >> "$HOME/.$(basename $SHELL)rc"
source "$HOME/.$(basename $SHELL)rc"
- If you choose JSONBin, export the master key as an environment variable:
export JSONBIN_KEY=xxxxxxxxxxxxxxxxx
# or add it to your shell configuration file:
echo "JSONBIN_KEY=$JSONBIN_KEY" >> "$HOME/.$(basename $SHELL)rc"
source "$HOME/.$(basename $SHELL)rc"
usage: ia-yt [-h] [-p PRIORITIZE] [-s SKIP_LIST] [-f] [-t TIMEOUT] [-n] [-a] [-c CHANNELS_FILE] [-S] [-C] [-m] [-T THREADS] [-k] [-i IGNORE_VIDEO_IDS]
options:
-h, --help show this help message and exit
-p PRIORITIZE, --prioritize PRIORITIZE
Comma-separated list of channel names to prioritize when processing videos.
-s SKIP_LIST, --skip-list SKIP_LIST
Comma-separated list of channel names to skip.
-f, --force-refresh Refresh the database after every video (Can slow down the workflow significantly, but is useful when running multiple concurrent
jobs).
-t TIMEOUT, --timeout TIMEOUT
Kill the job after n hours (default: 5.5).
-n, --no-logs Don't print any log messages.
-a, --add-channel Add a channel interactively to the list of channels to archive.
-c CHANNELS_FILE, --channels-file CHANNELS_FILE
Path to the channels list file to use if the environment variable `CHANNELS` is not set (default: ~/.yt_channels.txt).
-S, --show-channels Show the list of channels in the channels file.
-C, --create-collection
Creates/appends to the backend database from the channels list.
-m, --multithreading Enables processing multiple videos concurrently.
-T THREADS, --threads THREADS
Number of threads to use when multithreading is enabled. Defaults to the optimal maximum number of workers.
-k, --keep-failed-uploads
Keep the files of failed uploads on the local disk.
-i IGNORE_VIDEO_IDS, --ignore-video-ids IGNORE_VIDEO_IDS
Comma-separated list or a path to a file containing a list of video ids to ignore.
Creating A Backend Database instructions
NOTICE: The JSONBIN
option will not work at the moment because jsonbin.io changed their API recently. Please use MongoDB for now until the next release.
- Option 1: MongoDB (recommended).
- Self-hosted (see: Alyetama/quick-MongoDB or dockerhub image).
- Free cloud database on Atlas.
- Option 2: JSON bin (if you want a quick start).
- Sign up to JSONBin here.
- Click on
VIEW MASTER KEY
, then copy the key.
- Information about the
MONGODB_CONNECTION_STRING
can be found here. - Jobs can run for a maximum of 6 hours, so if you're archiving a large channel, the job might die, but it will resume in a new job when it's scheduled to run.
- Instead of raw text, you can pass a file path or a file URL with a list of channels formatted as
CHANNEL_NAME: CHANNEL_URL
. You can also pass raw text or a file of the channels in JSON format{"CHANNEL_NAME": "CHANNEL_URL"}
.