Skip to content

Revisit Design of ObjectStore::put_multipart #84

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently streaming uploads are supported by ObjectStore::put_multipart. This returns a AsyncWrite, which provides a push-based interface for writing data.

However, this approach is not without issue:

Describe the solution you'd like

apache/arrow-rs#4971 added a MultipartStore abstraction that more closely mirrors the APIs exposed by object stores, avoiding all of the above issues. If we could devise a way to implement this interface for LocalFileSystem we could then "promote" it into the ObjectStore trait and deprecate put_multipart. This would provide the maximum flexibility to users, whilst being in keeping with the objectives of this crate to closely hew to the APIs of the stores themselves.

The key observation that makes this possible, is that we already recommend MultiPartStore be used with fixed size chunks for compatibility with r2, we therefore could require this for LocalFilesystem, in turn allowing it to support out-of-order / parallel writes as the file offsets can be determined from the part index.

apache/arrow-rs#5431 and apache/arrow-rs#4857 added BufWriter and BufReader and these would be retained to preserve compatibility with the tokio ecosystem and provide a more idiomatic API on top of this

Describe alternatives you've considered

I briefly considered a put_stream API, however, this doesn't resolve many of the above issues

We could also just implement MultipartStore for LocalFilesystem, whilst retaining the current put_multipart. This would allow downstreams to opt-in to the lower level API if they so wished.

We could also modify put_multipart to return something other than AsyncWrite, possibly something closer to PutPart

Additional context

Many of the stores also support composing objects from others, this might be something to consider in this design - #121

FYI @wjones127 @Xuanwo @alamb @roeap

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions