-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3Store: Add support for creation-defer-length extension #219
Conversation
Wow, thank you very much. I will have a more in-depth look at this soon but do you know how this affects performance? It would be nice to see if and how much the upload times change after this patch. |
I agree. This adds two roundtrips to the S3 API (HeadObject and then GetObject) for each PATCH request, and in the case where there's an incomplete part to fetch, there will also be a DeleteObjects call to S3 -- and possibly a PutObject if there's another incomplete part to store. There should be no additional latency during the transfer itself. It's likely that some of this workflow can be parallelized to reduce the latency, but I wanted to focus on correctness first. I'll do a bit of profiling and then post back. |
Here's a rough breakdown of the latency for the start of a PATCH request. tusd was running on an EC2 instance in us-east-1. I ran the test in the early afternoon (EST) on a weekday. In most cases, I think we can expect this PR to add <30ms of latency at the start of the request. This testing turned up an issue with 403 vs. 404 responses from the S3 API, which I've addressed in the last two commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very, very much @acj! I am deeply sorry for the inactivity on my side, it was hard to find time for such a big PR. However, everything looks great except for two questions.
I also agree with you that we should ship this simple version before attempting to improve performance.
6fc2a67
to
5017824
Compare
If the IAM role doesn't have permission to list the contents of the bucket, then HEAD requests will return 403 for nonexistent objects.
5017824
to
e267953
Compare
@Acconut I removed the tagging changes and force-pushed. A couple of the test builds are failing, but it looks like an upstream package issue on older Go versions. |
Thank you for the quick changes. And don't worry about the CI failures, I will take care of them. They are not related to tusd but a non-critical dependency. |
Previously, some data would be discarded if the user pauseed the upload. Pausing causes the connection to be interrupted which makes Go's net/http return an io.ErrUnexpectedEOF. Before #219 from @acj this would not be noticed since data is discarded anyway. However, after this PR, every byte can now be saved on S3 even if we don't have enough data for a multipart upload part.
Currently, the S3 backend will drop non-final parts that are smaller than
MinPartSize
to comply with constraints in the AWS S3 service. This is inefficient, and it also prevents S3Store from supporting thecreation-defer-length
extension.This PR implements an idea that was proposed in #182: when a part is smaller than
MinPartSize
, we send it to S3 temporarily as a normal (non-multipart upload) object. On the nextPATCH
request for the upload, the temporary part is retrieved from S3 and prepended to the request body. This process may be repeated until the part is large enough to be added to the multipart upload.