-
Couldn't load subscription status.
- Fork 35
bottomless: stream gzip snapshot #585
base: main
Are you sure you want to change the base?
Conversation
66203ad to
2cfbf01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Thanks for contributing, left a few nitpicks. I wonder if error handling for multipart uploads doesn't get handled by itself already -- e.g. when https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html is configured. The temporary leftover file for storing the last part is also handled on failure, since it will get deleted eventually -- and if you switch to tempfile, it would probably even get deleted as part of its drop routine.
Yes, but you are still billed by how much you retain the incomplete parts. I also don't know if this can be done for other S3-compatible stores. As a best-effort we can wrap the upload parts in a |
|
I decided to not abort the multipart upload on panics, that was harder than what I initially thought because Only catching errors should be enough for now, and we can leave the panic case to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this PR should be married with #574 - one cover the issue of asynchronous snapshot upload between checkpoints (which is a must have in order to avoid latency spikes), another covers multipart upload.
0df0ad1 to
fedc6ac
Compare
Stream gzip snaphosts using S3's multipart upload.
S3 requires
Content-Lengthto be set when doing aPutObjectoperation and since we can't know the size of the gzip stream upfront, we send it in parts to S3 using a fixed-sized buffer that increases every 16 chunks up to 100 MiB, ensuring small allocations for small databases.