Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon S3: can't get file chunk without loading whole file into memory #383

Closed
user0007 opened this issue Aug 21, 2017 · 4 comments
Closed
Labels

Comments

@user0007
Copy link

user0007 commented Aug 21, 2017

Example:

DEFAULT_FILE_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"

from django.core.files.storage import default_storage
default_storage.open("mykey").read(1024)

causes MemoryError to me, because it try to load ~5GB file.

It's because in a S3Boto3StorageFile._get_file method the whole file is loaded into memory every time we request it:

  ...
  self._is_dirty = False
  self._file.write(self.obj.get()['Body'].read())
  self._file.seek(0)

But boto3 supports reading file by chunk:

s3.Object('mybucket', 'mykey').get()['Body'].read(1024)

the only way to workaround is:

default_storage.open("mykey").obj.get()["Body"].read(1024)

but it's not generic anymore.

@jeffcjohnson
Copy link

jeffcjohnson commented Oct 4, 2017

I'm having the same trouble. I'm trying to download a 2.6 GB file with the following code but the open() call tries to read the entire file and is getting a timeout.

Seems that get()['Body'] is already using a StreamingBody and it saves to a SpooledTemporaryFile. But it is doing it in a single write which means it never gets a chance to switch from BytesIO to TemporaryFile. See PR below for fix.

AWS_S3_MAX_MEMORY_SIZE must be set in settings before S3Boto3Storage is imported.

@yunmanger1
Copy link

@jeffcjohnson nice catch.

I will leave this traceback here for those googling for solution

 File "/venv/lib/python2.7/site-packages/storages/backends/s3boto3.py", line 123, in _get_file
    self._file.write(self.obj.get()['Body'].read())
  File "/venv/lib/python2.7/site-packages/botocore/response.py", line 74, in read
    chunk = self._raw_stream.read(amt)
  File "/venv/lib/python2.7/site-packages/botocore/vendored/requests/packages/urllib3/response.py", line 239, in read
    data = self._fp.read()
  File "/usr/lib/python2.7/httplib.py", line 581, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python2.7/httplib.py", line 693, in _safe_read
    return ''.join(s)
MemoryError

@dummerbd
Copy link

For those who've come across this issue and don't want to wait for the PR to be merged, checkout django-s3-storage, they're already doing this the right way: https://github.com/etianen/django-s3-storage/blob/master/django_s3_storage/storage.py#L216

@scheparev-moberries
Copy link

We faced a problem of very slow uploads, switching to django-s3-storage helped.

Probably caused by the fact that our s3 instance was not in the same region like our server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants