Skip to content

the multipart upload performance is not ideal #125

Open
@HenryCaiHaiying

Description

@HenryCaiHaiying

From the code, it doesn't looks to me it's really using S3 multi-threading for multipart upload:

  1. A while loop in S3ClientWrapper#uploadLogFile to break the original segment file into multiple parts and upload them one by one through the custom S3OutputStream: https://github.com/aiven/tiered-storage-for-apache-kafka/blob/main/s3/src/main/java/io/aiven/kafka/tiered/storage/s3/S3ClientWrapper.java#L179
  2. In S3OutputStream, the code will try to use S3's multipart upload API to upload the file in multiple chunks. But since the caller (S3ClientWrapper) already configured the file size to be config.s3StorageUploadPartSize, this would just end up with 1 upload in S3OutputStream

S3s multipart upload is supposed to use multiple threads to upload a big file concurrently onto S3. The current code path doesn't seem using S3's multipart upload threading.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions