Skip to content

Conversation

@suricactus
Copy link
Collaborator

The server side is currently using a shortcut in calculating the values of the md5sum field - it uses the Object Storage (S3) ETag value.

ETag is a MD5. But for the multipart uploaded files, the MD5 is computed from the concatenation of the MD5s of each uploaded part.

Say you uploaded a 14MB file and your part size is 5MB. Calculate 3 MD5 checksums corresponding to each part, i.e. the checksum of the first 5MB, the second 5MB, and the last 4MB. Then take the checksum of their concatenation.
Since MD5 checksums are hex representations of binary data, just make sure you take the MD5 of the decoded binary concatenation, not of the ASCII or UTF-8 encoded concatenation.
When that's done, add a hyphen and the number of parts to get the ETag.

The server side is currently using a shortcut in calculating the values
of the `md5sum` field - it uses the Object Storage (S3) ETag value.

ETag is a MD5. But for the multipart uploaded files, the MD5 is computed
from the concatenation of the MD5s of each uploaded part.

Say you uploaded a 14MB file and your part size is 5MB.
Calculate 3 MD5 checksums corresponding to each part, i.e. the checksum
of the first 5MB, the second 5MB, and the last 4MB.
Then take the checksum of their concatenation.
Since MD5 checksums are hex representations of binary data, just make
sure you take the MD5 of the decoded binary concatenation, not of the
ASCII or UTF-8 encoded concatenation.
When that's done, add a hyphen and the number of parts to get the ETag.
@suricactus suricactus merged commit 15b5db6 into master May 15, 2024
@suricactus suricactus deleted the QF-3813-etag branch May 15, 2024 15:09
@suricactus suricactus requested a review from boardend May 21, 2024 08:19
Copy link

@boardend boardend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

resp = self._request("GET", f"files/{project_id}", params=params)
return resp.json()
remote_files = resp.json()
# TODO remove this temporary decoration with `etag` key

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO should just stay as a reminder for the next one who works on this?

Copy link
Collaborator Author

@suricactus suricactus May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This decoration will be obsolete once the "file metadata in database" is developed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants