Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

storage: refactor UploadWriter and implements part size inflation #600

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

kennytm
Copy link
Collaborator

@kennytm kennytm commented Nov 16, 2020

What problem does this PR solve?

It was previously found that Dumpling cannot upload files larger than 50 GB to S3. This is because we used multi-part upload to S3 with each part being 5 MB, but AWS S3 only allows up to 10,000 parts, so data beyond 50 GB will fail with "Part number must be an integer between 1 and 10000, inclusive".

What is changed and how it works?

Here we implement "part size inflation" to exponentially increase the size of each part as we write more data. Every part is larger than the previous part by 0.0654% (configurable). With small data, the part size is very close to the optimal size of 5 MB, but later ones will gradually increase, and the exponential increase ensures that after the 10,000th part the inflation reaches 688 × 5 MB and we can serve a total file size up to 5 TB, the maximum size allowed by S3.

In this PR we also refactored the UploadWriter so that the part size can be accurately controlled:

  1. the functionality of noCompressionBuffer is merged entirely into simpleCompressBuffer by a no-op compress writer.
  2. uploadChunk is now controlled by the size of compressed buffer rather than data input, so every part is accurately 5 MiB on S3 (this also reduces number of parts).
  3. the options to NewUploadWriter are collected into a struct since we are going to have too many arguments.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change
    • NewUploadWriter's signature is entirely changed.

Side effects

  • Possible performance regression
    • Part size inflation means that, towards the end (around n=4500), we will be trying to upload hundreds of megabytes to S3 as a single part. This is prone to network failure (but there's probably nothing we could do besides retrying...)

Related changes

Release Note

  • (Dumpling) now supports writing files more than 50 GB to AWS S3.

@kennytm kennytm force-pushed the progressively-increase-uploader-capacity branch from 302454a to e4b8b93 Compare November 17, 2020 02:15
@glorv
Copy link
Collaborator

glorv commented Nov 24, 2020

/run-all-tests

Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 LGTM1 label Dec 1, 2020
@overvenus overvenus added this to the v4.0.10 milestone Dec 1, 2020
@ti-chi-bot
Copy link
Member

@kennytm: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lichunzhu
Copy link
Contributor

@kennytm please resolve the conflicts

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants