The Central Intelligence Agency has released a fascinating historical record by declassifying (most of) the daily presidential briefings that it delivered during the Kennedy, Johnson, Nixon, and Ford administrations.
The CIA has published these briefings as a collection of several thousand individual PDF files. This repo provides code for downloading these files in bulk and collating them into easier-to-handle monthly collections.
View and download monthly collated PDFs right here on GitHub, or just download the zipfile of the entire repo. You'll find the PDFs in docs/pdfs
.
-
Requirements
- Python 2.7 or Python 3.x
- TQDM for friendly progress bars:
pip install tqdm
-
The scripts anticipate that they'll be run in a directory with subdirectories called
documents
anddocuments/originals
. If you clone this repo, they should be there, but if not you'll need to create them. -
Run
./scrape_documents.py
to download all individual PDFs todocuments/originals
. On a reasonably fast Internet connection this will take 10-20 minutes. -
Run
./merge_documents.py
to merge the original PDFs into monthly collections in thedocuments
directory. -
If you want to zip the collated documents into annual tarballs, run
./zip_briefings.sh
.