-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] OpenAPI for S3 multipart upload #7197
Conversation
No linked issues found. Please add the corresponding issues in the pull request description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Let's talk about the client having to specify the number of parts in advance.
|
||
In order to minimaize the scope of this feature and understand the benefits before supporting all blockstores, this feature will be scoped to S3 blockstore and only if presigned capability is enabled. | ||
|
||
Initiation of Upload: The API will allow clients to initiate a multipart upload session, assigning a unique upload ID for subsequent operations. The client will require pass the number of parts it requires and the call will provide a set of URLs to upload each part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT S3 multipart does not require the client to specify the number of parts in advance.
This is actually useful in some cases 🤷🏽! For example, say I want to add multipart to lakeFSFS. The Hadoop FS API is basically create
, write
, write
, ..., close
. And indeed S3A keeps a current block in memory and uploads it as a part whenever it fills. This is actually probably efficient. But it relies on not having to supply the intended number of blocks when starting an upload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, any streaming or working with unknown size is good for multipart or at least working with "upload manager" that will perform multipart or single object upload.
In order to limit the scope of the implementation, it is a limitation as I didn't want the client to call lakeFS for each part.
Working with local file and known in advance the size we like to upload - optional list as part of the create to start multipart will handle it on a single request.
The suggested API doesn't require the server to return any presigned part so we can include this capability later without breaking the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so perhaps document next possible step -- it isn't clear?
For instance somewhere...
### Next steps
None of the returned URLs has to be used, it is fine to ask for more than are needed. In future we may add an _additional_ API call to URLs for uploading more parts. This will allow more "streaming" uses, for instance as parallels to how Hadoop S3A uses the S3 MPU API. The current API is compatible with such an addition.
|
||
Uploading Parts: Clients can upload individual parts of the file in parallel or in sequence. Each part will be based on the presigned URL provided by the initial call. | ||
|
||
Support for Large Files: The API will handle files of substantial sizes, ensuring that large datasets can be uploaded without issues. Minimum part size will be 5M. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minimum part size will be 5M.
This sounds suspiciously like the S3 limitation -- an abstraction leak. So if we expose this, it should at least be the minimum part size of {S3, GCS, Azure BlobStore}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update that each part (exclude the last one) minimum size depends on the underlying blockstore implementation.
|
||
**Paths for multipart upload operations** | ||
|
||
```yaml= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```yaml= | |
```yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - my markdown editor use yaml=
to render yaml + line numbers
locations: | ||
type: array | ||
items: | ||
$ref: "#/components/schemas/StagingLocation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously if we don't require the client to supply the number of parts in advance, then we cannot do this. We will need another API call to get another presigned URL for an upload ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, in the current spec the number of parts
in the request is optional. But will be required by the server.
And the server response include StagingLocations as optional in the case we will like to support dynamically request for part presigned URL(s).
parts: | ||
type: array | ||
items: | ||
$ref: "#/components/schemas/UploadPart" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth mentioning in the description that the object is created of these parts in the specified order -- NOT in the order in which the objects were uploaded, or in the order that createMultipartUpload
returned the parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - the parts should be sorted by part number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amusingly, the S3 API does not document such a requirement AFAICT? But this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, you learn from the hard way.
Error Code: InvalidPartOrder
Description: The list of parts was not in ascending order. The parts list must be specified in order by part number.
HTTP Status Code: 400 Bad Request
Documented in https://docs.aws.amazon.com/cli/latest/reference/s3api/complete-multipart-upload.html
part_number: | ||
type: integer | ||
minimum: 1 | ||
maximum: 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3 allows 10_000 parts. In any case I do not believe that this limitation should be in the OpenAPI spec: if I later decide to change it on lakeFS, old clients should be able to use this higher maximum value.
- **Minimum part size:** Each part must be at least 5MB in size. This is a temporary constraint and isn't currently configurable or discoverable. It will become an option when additional storage options are supported. | ||
- **Initiating the upload:** When starting a multipart upload, you'll need to specify the total number of parts in your request. It reduce the requests for each part presigned URL when the client already knows the size. | ||
- **Presigning part:** Presigned of a specific part URL for upload will not be supported. This will block unknown size upload using this API. | ||
- **Limited support:** Multipart uploads are not currently supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Co-authored-by: Ariel Shaqed (Scolnicov) <ariels@treeverse.io>
- Change from 1000 to 10000 parts limit - Remove the limit from the open api spec
Replace yaml= to yaml
limited support
Added this one as limitation, didn't want to open another API which I will not use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal is on point.
This needs to be part of the API! ✌🏻
I added a few minor comments.
**Paths for multipart upload operations** | ||
|
||
```yaml | ||
/repositories/{repository}/branches/{branch}/staging/multipart: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're planning to support both pre-signed and local credentials eventually, it would make sense to add a presign
query param here to align with other endpoints. Initially, we can set its default to true and return HTTP status 400 (or 405) if it's set to false. Obviously, this limitation would need to be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
- Didn't want to open another interface to upload data through lakefs
- The requests to create/complete/abort multipart works on upload ID that represents the operation. The presined parts currently returned on CreateMultipartUpload operations only if you specify the number of parts you like to get and it is optional. It means that if we like in the future to open upload part data through lakeFS we can still provide this API.
- I will update the spec to rename
parts
topresigned_parts
to indicate this number represents the number of urls the call will return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the alternative as uploading through lakeFS. I see the non-pre-signed version of this using local credentials on the client, i.e., the client uses the AWS SDK to upload directly to an S3, according to a physical address returned from this endpoint.
But I'm also ok with not doing this now and seeing if there's a need there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both ways - using presign and non-presigned URLs require local credentials to work.
Supporting multipart upload using the API while data transfer goes though lakeFS is possible, but I would like to avoid it and just provide more ways to enable the client direct access to the specific object using the underlying object store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
||
In order to minimaize the scope of this feature and understand the benefits before supporting all blockstores, this feature will be scoped to S3 blockstore and only if presigned capability is enabled. | ||
|
||
Initiation of Upload: The API will allow clients to initiate a multipart upload session, assigning a unique upload ID for subsequent operations. The client will require pass the number of parts it requires and the call will provide a set of URLs to upload each part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so perhaps document next possible step -- it isn't clear?
For instance somewhere...
### Next steps
None of the returned URLs has to be used, it is fine to ask for more than are needed. In future we may add an _additional_ API call to URLs for uploading more parts. This will allow more "streaming" uses, for instance as parallels to how Hadoop S3A uses the S3 MPU API. The current API is compatible with such an addition.
parts: | ||
type: array | ||
items: | ||
$ref: "#/components/schemas/UploadPart" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amusingly, the S3 API does not document such a requirement AFAICT? But this is fine.
Co-authored-by: Ariel Shaqed (Scolnicov) <ariels@treeverse.io>
Co-authored-by: eladlachmi <110764839+eladlachmi@users.noreply.github.com>
Thanks about the multipart upload complete api parts order, the response in the previous comment, I'll paste it here too:
Documented in https://docs.aws.amazon.com/cli/latest/reference/s3api/complete-multipart-upload.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
### Support and discover | ||
|
||
Presign support is a capability lakectl discover before switching to use presign for upload or download from lakeFS. | ||
The multipart upload support will be part of the storage capability add `multipart_upload_support` optional field that when set to `true` the user can perform multipart upload using the new API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we actually need this extra param?
Can't we rely on the block adapter type for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that lakectl as a client will need to check:
- s3 block adapter
- presgined support enabled
- lakefs version is above v1.x that include this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1,2 the client can get this information from the GetConfig API
Regarding no. 3 I think is not necessary, IMHO it's perfectly fine for the server to return 501 and for the client to handle it appropriately.
Adding this field will cause ambiguity, since it is a direct result of blockstore_type + pre_sign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took elad's suggestion and like a feature flag it represents the capability. if we like to support multipart through lakefs API we can.
The second issue related to try and fail does not help with client side code when you like to sync files in parallel while trying to understand one of you can use this feature. possible but not clean.
Also status code for unsupported operation in the logs vs clean 200 ok, I prefer the last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool design,
One question regarding the new config param
Co-authored-by: N-o-Z <ozery.nir@gmail.com>
Co-authored-by: N-o-Z <ozery.nir@gmail.com>
Co-authored-by: N-o-Z <ozery.nir@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved thanks!
No description provided.