Skip to content

Conversation

@ripplehang
Copy link
Contributor

@ripplehang ripplehang commented Aug 7, 2024

Rationale for this change

server-side encryption with customer-provided keys is an important security feature for aws s3, it's useful when user want to manager the encryption key themselves, say, they don't want the data to be exposed to the aws system admin, and ensure the object is safe even the ACCESS_KEY and SECRET_KEY is somehow leaked.
Some comparison of S3 encryption options :
https://www.linkedin.com/pulse/delusion-s3-encryption-benefits-ravi-ivaturi/

What changes are included in this PR?

  1. Add the sse_customer_key member for S3Options to support server-side encryption with customer-provided keys (SSE-C keys).

    • The sse_customer_key was expected to be 256 bits (32 bytes) according to aws doc
    • The sse_customer_key was expected to be the raw key rather than base64 encoded value, arrow would calculate the base64 and MD5 on the fly.
    • By default the sse_customer_key is empty, and when the sse_customer_key is empty, there is no impact on the existing workflow. When the sse_customer_key is configured, it would require the aws sdk version to newer than 1.9.201.
  2. Add the tls_ca_file_path, tls_ca_dir_path and tls_verify_certificates members for S3Options.

    • the tls_ca_file_path, tls_ca_dir_path member for S3Options would override the value configured by arrow::fs::FileSystemGlobalOptions.
    • for s3, according to aws sdk doc, the tls_ca_file_path and tls_ca_dir_path only take effect in Linux, in order to support connect to the the storage server like minio with self-signed certificates on non-linux platform, we expose the tls_verify_certificates.
  3. Refine the unit test to start the minio server with self-signed certificate on linux platform, so the unit test could cover the https case on linux, and http case on non-linux platform.

Are these changes tested?

Yes

Are there any user-facing changes?

Only additional members to S3Options.

@github-actions
Copy link

github-actions bot commented Aug 7, 2024

⚠️ GitHub issue #43535 has been automatically assigned in GitHub to PR creator.

@ripplehang
Copy link
Contributor Author

@pitrou @mapleFU @felipecrv can you help to review the PR, Thanks

@pitrou
Copy link
Member

pitrou commented Aug 22, 2024

I took a quick look and here are some assorted thoughts:

  • there are no unit tests
  • only the encryption key needs to be on S3Options, not the MD5 and algorithm string (which can be computed on the fly)

@pitrou
Copy link
Member

pitrou commented Aug 22, 2024

Also, for the other readers wondering, I've found a useful (though possibly outdated?) comparison of S3 encryption options here:
https://www.linkedin.com/pulse/delusion-s3-encryption-benefits-ravi-ivaturi/

@ripplehang
Copy link
Contributor Author

I took a quick look and here are some assorted thoughts:

@pitrou Thanks for your review, for #1, i would try to add ut for the change, for #2, yes, actually ONLY the Key is needed, that's why i only expose one SetSSECKey function, and three Get function, and private the new added datamember.
do you suggest to only add one datamember for S3Options? but if there is only one memeber, we may needs to calculate everywhere?

@ripplehang
Copy link
Contributor Author

@pitrou I've submitted another commit to only expose the ssec-key in S3Options, and then calculate the MD5 every time on the fly, can you help to review whether there are other comments? I would try to add the ut if there is no interface change required, Thanks

@ripplehang ripplehang force-pushed the support_ssec_for_aws_s3 branch 3 times, most recently from bb2e024 to 539fa7c Compare August 27, 2024 06:00
@ripplehang
Copy link
Contributor Author

@pitrou Can you help to review the PR again?Thanks

@ripplehang
Copy link
Contributor Author

@kou @mapleFU can you help to review the change, Thanks

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a careful around this weekend

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind change to Result<std::string>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I would try to refine the return value for the function.

Comment on lines 278 to 280
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do this handles error if base64_encoded_key is not valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://github.com/aws/aws-sdk-cpp/blob/ac2da09e6930e3988d1289717e2df5d4b7408f17/src/aws-cpp-sdk-core/source/utils/base64/Base64.cpp#L91-L121, seems this aws util API didn't validate the input properly,but directly calculate the output size and allocate the buffer, so i add the check to ensure every character is the valid character here.
Meanwhile, I also add the related Unit test for the CalculateSSECKeyMD5 to test the different input value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this not equal to 256?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the expect_input_key_size is exposed via S3Options, so it's possible for this value to be set as any length.

@github-actions github-actions bot added awaiting committer review Awaiting committer review awaiting review Awaiting review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Aug 30, 2024
@ripplehang ripplehang requested a review from mapleFU August 30, 2024 11:28
@ripplehang ripplehang force-pushed the support_ssec_for_aws_s3 branch 3 times, most recently from de05328 to 8649388 Compare August 30, 2024 12:24
@kou kou changed the title GH-43535:[C++] support the AWS S3 SSE-C encryption GH-43535: [C++] support the AWS S3 SSE-C encryption Aug 31, 2024
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix style by pre-commit run --show-diff-on-failure --color=always --all?

Comment on lines 58 to 61
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use ASSERT_RAISES_WITH_MESSAGES() instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to cover the workflow with sse_customer_key nonempty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always set sse_custom_key, we can't test no sse_custom_key case.
Can we add a test that uses sse_custom_key instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and ideally the SSE-C encryption test would write a file with the encryption key, and check that trying to read it with another key (or with no key at all) fails.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting review Awaiting review and removed awaiting review Awaiting review labels Aug 31, 2024
@kou
Copy link
Member

kou commented Nov 3, 2024

I'll merge this in the next week if nobody objects it.

@pitrou pitrou force-pushed the support_ssec_for_aws_s3 branch from c18343f to 452185c Compare November 3, 2024 16:07
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge if CI is green, thanks for the updates @ripplehang !

@pitrou
Copy link
Member

pitrou commented Nov 3, 2024

@github-actions crossbow submit -g cpp

@github-actions

This comment was marked as outdated.

@pitrou
Copy link
Member

pitrou commented Nov 3, 2024

@github-actions crossbow submit -g cpp

@github-actions
Copy link

github-actions bot commented Nov 3, 2024

Revision: ce67a27

Submitted crossbow builds: ursacomputing/crossbow @ actions-7d5b001eaa

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou pitrou merged commit 00e7c65 into apache:main Nov 3, 2024
38 of 39 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label Nov 3, 2024
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 00e7c65.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants