Closed
Description
Is it necessary to send GET
request to S3 every time the file is uploaded?
In methods S3FS.setbinfile()
and S3FS.setbytes()
, it sends two GET requests. First time, it checks if the the parent directory exists (calls S3FS.isdir()
). Second time, it checks if the URL is not a directory (calls S3FS.getinfo()
).
These operations are quite expensive. The effective upload speed increases at least x3 times when these methods are not called.
Here is the existing code:
def setbinfile(self, path, file):
_path = self.validatepath(path)
_key = self._path_to_key(_path)
if not self.isdir(dirname(path)):
raise errors.ResourceNotFound(path)
try:
info = self.getinfo(path)
if info.is_dir:
raise errors.FileExpected(path)
except errors.ResourceNotFound:
pass
with s3errors(path):
self.client.upload_fileobj(file, self._bucket_name, _key)
Proposed solution:
def setbinfile(self, path, file):
_path = self.validatepath(path)
_key = self._path_to_key(_path)
with s3errors(path):
self.client.upload_fileobj(file, self._bucket_name, _key)
And similar changes should be applied to setbytes()
method.
Metadata
Metadata
Assignees
Labels
No labels