Amazon S3: can't get file chunk without loading whole file into memory #383

user0007 · 2017-08-21T17:02:25Z

Example:

DEFAULT_FILE_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"

from django.core.files.storage import default_storage
default_storage.open("mykey").read(1024)

causes MemoryError to me, because it try to load ~5GB file.

It's because in a S3Boto3StorageFile._get_file method the whole file is loaded into memory every time we request it:

  ...
  self._is_dirty = False
  self._file.write(self.obj.get()['Body'].read())
  self._file.seek(0)

But boto3 supports reading file by chunk:

s3.Object('mybucket', 'mykey').get()['Body'].read(1024)

the only way to workaround is:

default_storage.open("mykey").obj.get()["Body"].read(1024)

but it's not generic anymore.

The text was updated successfully, but these errors were encountered:

jeffcjohnson · 2017-10-04T02:57:33Z

I'm having the same trouble. I'm trying to download a 2.6 GB file with the following code but the open() call tries to read the entire file and is getting a timeout.

Seems that get()['Body'] is already using a StreamingBody and it saves to a SpooledTemporaryFile. But it is doing it in a single write which means it never gets a chance to switch from BytesIO to TemporaryFile. See PR below for fix.

AWS_S3_MAX_MEMORY_SIZE must be set in settings before S3Boto3Storage is imported.

yunmanger1 · 2017-10-08T17:30:57Z

@jeffcjohnson nice catch.

I will leave this traceback here for those googling for solution

 File "/venv/lib/python2.7/site-packages/storages/backends/s3boto3.py", line 123, in _get_file
    self._file.write(self.obj.get()['Body'].read())
  File "/venv/lib/python2.7/site-packages/botocore/response.py", line 74, in read
    chunk = self._raw_stream.read(amt)
  File "/venv/lib/python2.7/site-packages/botocore/vendored/requests/packages/urllib3/response.py", line 239, in read
    data = self._fp.read()
  File "/usr/lib/python2.7/httplib.py", line 581, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python2.7/httplib.py", line 693, in _safe_read
    return ''.join(s)
MemoryError

dummerbd · 2017-10-20T15:39:15Z

For those who've come across this issue and don't want to wait for the PR to be merged, checkout django-s3-storage, they're already doing this the right way: https://github.com/etianen/django-s3-storage/blob/master/django_s3_storage/storage.py#L216

scheparev-moberries · 2017-11-02T19:08:48Z

We faced a problem of very slow uploads, switching to django-s3-storage helped.

Probably caused by the fact that our s3 instance was not in the same region like our server

jeffcjohnson mentioned this issue Oct 4, 2017

SpooledTemporaryFile.write() is meant to be called with small chunks,… #400

Closed

tveastman mentioned this issue Jan 15, 2018

S3 large multipart uploads #450

Closed

wettenhj mentioned this issue Jan 15, 2018

Multi-part uploads using s3boto3.py are corrupt. #449

Closed

jnm mentioned this issue May 30, 2018

Fix data-corruption issue with s3boto and s3boto3 multipart uploads #504

Merged

sww314 added s3boto and removed s3boto labels Jul 12, 2018

jschneier pushed a commit that referenced this issue Aug 12, 2018

Fixed #383 -- Stream large file reads from S3

9660eee

jschneier closed this as completed in ea0986d Aug 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amazon S3: can't get file chunk without loading whole file into memory #383

Amazon S3: can't get file chunk without loading whole file into memory #383

user0007 commented Aug 21, 2017 •

edited

Loading

jeffcjohnson commented Oct 4, 2017 •

edited

Loading

yunmanger1 commented Oct 8, 2017

dummerbd commented Oct 20, 2017

scheparev-moberries commented Nov 2, 2017

Amazon S3: can't get file chunk without loading whole file into memory #383

Amazon S3: can't get file chunk without loading whole file into memory #383

Comments

user0007 commented Aug 21, 2017 • edited Loading

jeffcjohnson commented Oct 4, 2017 • edited Loading

yunmanger1 commented Oct 8, 2017

dummerbd commented Oct 20, 2017

scheparev-moberries commented Nov 2, 2017

user0007 commented Aug 21, 2017 •

edited

Loading

jeffcjohnson commented Oct 4, 2017 •

edited

Loading