Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-part uploads using s3boto3.py are corrupt. #449

Closed
tveastman opened this issue Jan 11, 2018 · 1 comment · Fixed by #504
Closed

Multi-part uploads using s3boto3.py are corrupt. #449

tveastman opened this issue Jan 11, 2018 · 1 comment · Fixed by #504

Comments

@tveastman
Copy link
Contributor

tveastman commented Jan 11, 2018

There is a bug in the multi-part upload mechanism. The entire buffer is uploaded as a 'part' every time the file is written to. I think this is because the temporary file (self.file) is not being truncated when _flush_write_buffer() is called.

Once the temporary buffer is larger than AWS_S3_FILE_BUFFER_SIZE, every subsequent write uploads the WHOLE file as a 'part'.

Here's an example:

pipenv run python demo.py
INFO:root:Size of 'test-file.txt' is 10485760 bytes. (10 megabytes)
INFO:root:Writing to <S3Boto3StorageFile: test-file.txt> in chunks of 1048576 bytes (1 megabyte).
INFO:root:Upload complete, checking size of file in S3 Bucket
INFO:root:Size of 's3://kx-tom-misc-test-bucket/test-file.txt' is 47185920 bytes. (45 megabytes)

I uploaded a 10 megabyte file in 1 megabyte chunks, and ended up with a 45 megabyte file in S3.

Instead of 2 parts of 5 megabytes each, I have parts sized 5 + 6 + 7 + 8 + 9 + 10 == 45

Code for demo at: https://gist.github.com/tveastman/9d15076da4f4f0646c9ce4b0006be616

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants