Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize uploading to S3 #9

Closed
genben opened this issue Oct 17, 2017 · 6 comments
Closed

Optimize uploading to S3 #9

genben opened this issue Oct 17, 2017 · 6 comments

Comments

@genben
Copy link
Contributor

genben commented Oct 17, 2017

Is it necessary to send GET request to S3 every time the file is uploaded?

In methods S3FS.setbinfile() and S3FS.setbytes(), it sends two GET requests. First time, it checks if the the parent directory exists (calls S3FS.isdir()). Second time, it checks if the URL is not a directory (calls S3FS.getinfo()).

These operations are quite expensive. The effective upload speed increases at least x3 times when these methods are not called.

Here is the existing code:

def setbinfile(self, path, file):
    _path = self.validatepath(path)
    _key = self._path_to_key(_path)

    if not self.isdir(dirname(path)):
        raise errors.ResourceNotFound(path)
    try:
        info = self.getinfo(path)
        if info.is_dir:
            raise errors.FileExpected(path)
    except errors.ResourceNotFound:
        pass

    with s3errors(path):
        self.client.upload_fileobj(file, self._bucket_name, _key)

Proposed solution:

def setbinfile(self, path, file):
    _path = self.validatepath(path)
    _key = self._path_to_key(_path)

    with s3errors(path):
        self.client.upload_fileobj(file, self._bucket_name, _key)

And similar changes should be applied to setbytes() method.

@willmcgugan
Copy link
Member

It is unfortunate, but in order to be a compliant implementation (and pass the tests), it has to throw the same set of exceptions as all the other implementations. I'm open to suggestion regarding optimisations. I'd also consider api additions that didn't break old interfaces.

@genben
Copy link
Contributor Author

genben commented Oct 17, 2017

Maybe, just introduce the additional option to S3FS contstructor, which would disable/enable such validation?
Something like validate_dst_path=False (need better name).
For backward compatibility, the validation can be enabled by default.

@willmcgugan
Copy link
Member

That's an idea. How about strict which defaults to True, but would omit expensive validation if set to False.

@genben
Copy link
Contributor Author

genben commented Oct 18, 2017

Yes. This name ("strict") is much better.

@genben
Copy link
Contributor Author

genben commented Oct 18, 2017

I have made the mentioned changes to the forked repository: miarec@756aff3

Probably, the comment could be improved.

@willmcgugan
Copy link
Member

Looks good. Would you like to submit a PR for that? I'll get it in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants