Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[object_store] Support adding MD5 checksum headers to validate object integrity #6914

Open
tzembo opened this issue Dec 26, 2024 · 0 comments · May be fixed by #6915
Open

[object_store] Support adding MD5 checksum headers to validate object integrity #6914

tzembo opened this issue Dec 26, 2024 · 0 comments · May be fixed by #6915
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@tzembo
Copy link

tzembo commented Dec 26, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I'd like verify the integrity of uploaded objects (using some kind of checksum) across all three cloud providers.

Currently, the S3 implementation allows setting an AmazonS3 configuration value that attaches a x-amz-checksum-sha256 header to PUT requests against the store. However:

  • This is only available for AWS (and SHA256).
    • AWS supports other checksum algorithms (MD5, SHA1, CRC32, CRC32C)
    • Azure supports MD5 and CRC64
    • GCP supports MD5 and CRC32C
  • This requires another pass to calculate the checksum value, which the user of this library may already have computed in another context.

Describe the solution you'd like

I'm proposing that we add a Checksum attribute which specifies a ChecksumAlgorithm enum. The value for this attribute would be the base64-encoded checksum value. For now, MD5 can be the only supported checksum algorithm (which all three cloud providers support via the Content-MD5 header). The value for this algorithm is a base64-encoded 128-bit digest.

pub enum Attribute {
    ...
    /// Provides a checksum used to verify object data integrity
    Checksum(ChecksumAlgorithm),
}

pub enum ChecksumAlgorithm {
    MD5,
}

Describe alternatives you've considered

I considered implementing this for more checksum algorithms, but I'm starting with MD5 because it's the only one supported by all three cloud providers. In the future, we could extend this to support additional checksum algorithms (e.g. CRC32C). However:

  • stores that do not support a particular error would need to return an error
  • stores would ideally be able to "report" their supported checksum algorithms (for usability reasons)
  • (AWS) we'd need to figure out how a SHA-256 checksum provided via attributes interacts with the one provided via config

I considered calling the attribute ContentMD5 but decided to make it a bit more generic to support additional checksums in the future.

Additional context

I can put up a PR for this issue.

@tzembo tzembo added the enhancement Any new improvement worthy of a entry in the changelog label Dec 26, 2024
@tzembo tzembo linked a pull request Dec 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant