Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3Store: Add support for creation-defer-length extension #219

Merged
merged 16 commits into from
Jan 5, 2019

Conversation

acj
Copy link
Contributor

@acj acj commented Nov 19, 2018

Currently, the S3 backend will drop non-final parts that are smaller than MinPartSize to comply with constraints in the AWS S3 service. This is inefficient, and it also prevents S3Store from supporting the creation-defer-length extension.

This PR implements an idea that was proposed in #182: when a part is smaller than MinPartSize, we send it to S3 temporarily as a normal (non-multipart upload) object. On the next PATCH request for the upload, the temporary part is retrieved from S3 and prepended to the request body. This process may be repeated until the part is large enough to be added to the multipart upload.

@Acconut
Copy link
Member

Acconut commented Nov 20, 2018

Wow, thank you very much. I will have a more in-depth look at this soon but do you know how this affects performance? It would be nice to see if and how much the upload times change after this patch.

@acj
Copy link
Contributor Author

acj commented Nov 20, 2018

I agree. This adds two roundtrips to the S3 API (HeadObject and then GetObject) for each PATCH request, and in the case where there's an incomplete part to fetch, there will also be a DeleteObjects call to S3 -- and possibly a PutObject if there's another incomplete part to store. There should be no additional latency during the transfer itself.

It's likely that some of this workflow can be parallelized to reduce the latency, but I wanted to focus on correctness first.

I'll do a bit of profiling and then post back.

@acj
Copy link
Contributor Author

acj commented Nov 21, 2018

Here's a rough breakdown of the latency for the start of a PATCH request. tusd was running on an EC2 instance in us-east-1. I ran the test in the early afternoon (EST) on a weekday.

image

In most cases, I think we can expect this PR to add <30ms of latency at the start of the request.

This testing turned up an issue with 403 vs. 404 responses from the S3 API, which I've addressed in the last two commits.

Copy link
Member

@Acconut Acconut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very, very much @acj! I am deeply sorry for the inactivity on my side, it was hard to find time for such a big PR. However, everything looks great except for two questions.

I also agree with you that we should ship this simple version before attempting to improve performance.

s3store/s3store.go Outdated Show resolved Hide resolved
s3store/s3store.go Outdated Show resolved Hide resolved
@acj acj force-pushed the acj/buffer-incomplete-s3-uploads branch from 6fc2a67 to 5017824 Compare December 31, 2018 02:39
@acj acj force-pushed the acj/buffer-incomplete-s3-uploads branch from 5017824 to e267953 Compare January 4, 2019 22:03
@acj
Copy link
Contributor Author

acj commented Jan 4, 2019

@Acconut I removed the tagging changes and force-pushed. A couple of the test builds are failing, but it looks like an upstream package issue on older Go versions.

@Acconut
Copy link
Member

Acconut commented Jan 5, 2019

Thank you for the quick changes. And don't worry about the CI failures, I will take care of them. They are not related to tusd but a non-critical dependency.

@Acconut Acconut merged commit 33d1253 into tus:master Jan 5, 2019
@acj acj deleted the acj/buffer-incomplete-s3-uploads branch January 5, 2019 12:28
Acconut added a commit that referenced this pull request Jan 28, 2019
Previously, some data would be discarded if the user pauseed the upload.
Pausing causes the connection to be interrupted which makes Go's
net/http return an io.ErrUnexpectedEOF.
Before #219 from @acj this would
not be noticed since data is discarded anyway. However, after this
PR, every byte can now be saved on S3 even if we don't have enough
data for a multipart upload part.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants