Skip to content

Conversation

@james-rms
Copy link
Contributor

@james-rms james-rms commented Dec 3, 2025

Which issue does this PR close?

#563

Rationale for this change

Today, users that attempt to copy a >5GB object in S3 using object_store will see this error:

Server returned non-2xx status code: 400 Bad Request: 
<Error><Code>InvalidRequest</Code><Message>
The specified copy source is larger than the maximum allowable size for a copy source: 5368709120
</Message></Error>

The way to get around this problem per AWS's docs is to do the copy in several parts using multipart copies. This PR adds that functionality to the AWS client.

It adds two additional configuration parameters:

    /// The size threshold above which copy uses multipart copies under the hood. defaults to 5GB.
    multipart_copy_threshold: u64
    /// When using multipart copies, the part size used. Defaults to 5GB.
    multipart_copy_part_size: u64

The defaults are chosen to minimise surprise: if people are used to copies not requiring several requests, we don't switch to that method until it's absolutely necessary, and when necessary, we use as few parts as possible.

What changes are included in this PR?

See above.

Are there any user-facing changes?

Yes - these configuration parameters should be covered by the docstring changes.

TODO

  • Will add tests after maintainers take a quick look at the expected behavior.

@james-rms james-rms force-pushed the jrms/aws-multipart-copy branch 3 times, most recently from 4719ef4 to 09cef9b Compare December 4, 2025 12:11
@james-rms james-rms force-pushed the jrms/aws-multipart-copy branch from 09cef9b to acc8cc4 Compare December 4, 2025 12:14
@james-rms james-rms marked this pull request as ready for review December 4, 2025 12:26
@tustvold
Copy link
Contributor

tustvold commented Dec 4, 2025

I think this probably warrants a higher level ticket to discuss how we should support this, as a start it would be good to understand how other stores, i.e. GCS and Azure handle this, so that we can develop an abstraction that makes sense.

In particular I wonder if adding this functionality would make more sense as part of the multipart upload functionality? This of course depends on what other stores support.

In general filing an issue first to get consensus on an approach is a good idea before jumping in on an implementation

@james-rms
Copy link
Contributor Author

Great, created #563.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants