-
Notifications
You must be signed in to change notification settings - Fork 113
fix(dataset): enforce max file size for multipart upload #4146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
xuang7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. LGTM!
| val prefixBytes: Long = partSizeBytesValue * nMinus1 | ||
| if (prefixBytes > fileSizeBytesValue) { | ||
| throw new WebApplicationException( | ||
| "Upload session is inconsistent (prefixBytes > fileSizeBytes). Re-init the upload.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For error messages, consider simplifying them or using “restart the upload” instead of “re-init the upload” for consistency. It may be more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed them, can you let me now if they are still complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
aicam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think life cycle of upload session records needs better design, if needed, we can meet
| val fileSizeBytesValue: Long = session.getFileSizeBytes | ||
| val partSizeBytesValue: Long = session.getPartSizeBytes | ||
|
|
||
| if (fileSizeBytesValue <= 0L) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After failing to upload, the record should be deleted from database, I suggest move error catching logics to a function, and if any of them failed, just remove database records, this way you write the recycling logic once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Since this is the endpoint for uploading a part, no record is created in this endpoint at this moment, so no record should be deleted here in case of errors, the current logic relies that all the parts rows are created on init phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Regarding the refactor, do you mean each check that throws an exception should have its own function?
| case e: DataAccessException | ||
| if Option(e.getCause) | ||
| .collect { case s: SQLException => s.getSQLState } | ||
| .contains("55P03") => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the 55P03 with a constant or something more meaningful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we do that in another PR, since that addition is not from this one.
| try LakeFSStorageClient.parsePhysicalAddress(physicalAddr) | ||
| catch { | ||
| case e: IllegalArgumentException => | ||
| throw new WebApplicationException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In initMultipartUpload, validateAndNormalizeFilePathOrThrow is used to make sure physical address is correct, why we check here again? this is a duplicate logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reason was since file path validation is a very important security measure, the check should go here as well even if it feels repetitive. Please confirm if you still want it to be deleted.
| did: Integer, | ||
| encodedFilePath: String, | ||
| numParts: Optional[Integer], | ||
| fileSizeBytes: Optional[java.lang.Long], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why here the parameter receive java.lang.Long and then in line 1487 its converted to Scala? I think its better to receive Scala long in the first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
What changes were proposed in this PR?
single_file_upload_max_size_miblimit for multipart uploads at init by requiringfileSizeBytes+partSizeBytesand rejecting when the total declared file size exceeds the configured max.file_size_bytesandpart_size_bytestodataset_upload_session, plus constraints to keep them valid.uploadPartagainst size bypasses by computing the expected part size from the stored session metadata and rejecting any request whoseContent-Lengthdoes not exactly match the expected size (including the final part).fileSizeBytesandpartSizeByteswhen initializing multipart uploads.sql/updates/18.sql) to apply the schema change on existing deployments.Any related issues, documentation, discussions?
Close #4147
How was this PR tested?
Added/updated unit tests for multipart upload validation and malicious cases, including:
Content-Lengthmismatch rejection (non-numeric/overflow/mismatch)Was this PR authored or co-authored using generative AI tooling?
Co-authored-by: ChatGPT