Skip to content

Retrieving S3 bulk download files

Pat Phongsvirajati edited this page Feb 1, 2018 · 4 revisions

Bulk data downloads

The Commission's FTP server has moved to a new location, https://www.fec.gov/files/bulk-downloads/index.html. This is using a S3 bucket to store the bulk data files for the public to access.

Downloading large files from the browser

Currently large bulk data files that are large and take longer than 30 seconds to download are failing due to proxy timeouts. A work around to download large bulk data files is to hit the S3 bucket directly to download the file, example: https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2016/indiv16.zip. This has been noted in this bug issue where we will try to address this problem.

Tool for downloading bulk data

To download the files using command line you, there is a tool called AWS CLI that you should be able to install on your machine. Documentation about how to install it here: http://docs.aws.amazon.com/cli/latest/userguide/installing.html. This tool allows you to run copy commands from our s3 bucket to your local machine.

How to download using AWS CLI

Once you have the AWS CLI tool installed just run the below commands. These commands allow you to recursively copy S3 objects to a local directory on your machine. More documentation about the copy command can be found here: http://docs.aws.amazon.com/cli/latest/reference/s3/cp.html

aws --region us-gov-west-1 s3 cp --recursive s3://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74/bulk-downloads/[directory_to_download] --no-sign-request [local_dir_to_copy_files]

You can even add parameters to exclude and include specific file names or types:

For example, the below command will only download files from the electronic folder with filenames that begin with "201710", which essentially is all October 2017 files.

aws --region us-gov-west-1 s3 cp --recursive s3://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74/bulk-downloads/electronic --exclude "*" --include "201710*"