This program backs up a Reddit user's saved posts locally. It currently emphasizes saving media content, but the ability to download more saved content types will be implemented in the near future.
- Back up your entire saved history (>1000 items) from a .csv file provided by Reddit
- Download saved text posts and media content (gifs, videos, albums) from a variety of sources
- Extract deleted media using pushshift, cached reddit previews, and the wayback machine
- Store a record of all saved and unsaved posts locally, skipping already-saved posts in subsequent program executions
- Create an app on Reddit to obtain an API client ID and secret.
- Set the name to whatever you want, e.g.
geddit
- Set
script
as the API type - Set the redirect URI to
http://localhost:8080
- Set the name to whatever you want, e.g.
- Obtain an Imgur API key to facilitate downloading Imgur albums.
- Set the name to
geddit
- Set
Anonymous usage without user authorization
as the authorization type - Set the email and description fields accordingly
- Repeat these steps to obtain several more API keys to circumvent Imgur's rate limit (12500 requests per API)
- Set the name to
- Rename the
user_template.json
file touser.json
, and fill in its fields with their corresponding API information.
- Request your Reddit data as a zip file.
- Move the
saved_posts.csv
within the zip file into the home directory of the repository. - When starting the Docker environment or running the Python program, append
--csv
to the command.
-
Clone the repository.
-
Navigate into the cloned repository and build an image from the Dockerfile.
docker build --tag geddit .
-
Start a container from the built image. Replace
$(pwd)
with%cd%
on Windows.docker container run --network host -it -v $(pwd)/data:/geddit/data geddit [--csv] docker container run --network host -it -v %cd%/data:/geddit/data geddit [--csv]
-
Clone the repository into a virtual environment.
-
Install the required Python packages.
pip install -r requirements.txt
-
Download ffmpeg for the video downloader.
-
Run the program.
python3 -m geddit [--csv]
- If you're experiencing problems with building the Docker image on Windows, try going into the Docker Engine settings and setting the
buildkit
option tofalse
. - Another problem may arise when trying to run the Docker container, where the praw library throws an OAuthException. This is likely due to the Reddit account having 2FA enabled and it needs to be disabled for the program to work.
- Implement post
and commentdownloading - Incoporate wayback machine API calls for deleted media
- Add scraping to deal with rare instances where reddit API JSON does not adhere to regular, predictable formatting
- Add multithreading
- Add progress bar with tqdm
- Add bloom filter to determine whether post is already saved, improve seek time