S3Downloader

This Python script takes a CSV file as input, which contains the column "location", as its first one.

location: the S3-location of the file

Then, it downloads the files in parallel.
In case the field "should_extract_hash_and_size" is set to "True" inside the program, then the hash and size of each file is calculated and returned in an output csv-file.
In case the field "should_delete_file_after_calculation" is set the "True", then any downloaded file will be deleted right after its hash and size has been calculated.
Note: in case the files where uploaded in S3 in one part and not "multipart", then it would be possible to extract the md5hash and size of the file form the "etag", without downloading it. However, that would require additional permission, including the "list-bucket" one.

Requirements:

Run on a Linux system, with Python3.
Install the "boto3" dependency: pip install boto3
The following environment variables should be set:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_REGION
- S3_ENDPOINT
- S3_BUCKET
Have the input CSV file ready.

Run-instructions:

python3 s3downloader.py <csv_filename> <downloads_dir> <max_files_to_download> <num_of_threads> <num_seconds_between_requests_in_each_thread>

Notes:

If you want to download all the files, then set the "max_files_to_download" argument, to zero (0).
After running experiments, it seems that the number of 4 to 8 threads is optimal for an 8-cores CPU, when collecting the metadata.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
atomic_counter.py		atomic_counter.py
log.ini		log.ini
s3downloader.py		s3downloader.py
time_formatter.py		time_formatter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

S3Downloader

Requirements:

Run-instructions:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

LSmyrnaios/S3Downloader

Folders and files

Latest commit

History

Repository files navigation

S3Downloader

Requirements:

Run-instructions:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages