Skip to content

HADOOP-17922. move to fs.s3a.encryption.algorithm - JCEKS integration (#3466) #3508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

mehakmeet
Copy link
Contributor

The ordering of the resolution of new and deprecated s3a encryption options & secrets is the same when JCEKS and other hadoop credentials stores are used to store them as
when they are in XML files: per-bucket settings always take priority over global values,
even when the bucket-level options use the old option names.

Contributed by Mehakmeet Singh and Steve Loughran

mehakmeet and others added 4 commits August 10, 2021 16:29
apache#2706)

This (big!) patch adds support for client side encryption in AWS S3,
with keys managed by AWS-KMS.

Read the documentation in encryption.md very, very carefully before
use and consider it unstable.

S3-CSE is enabled in the existing configuration option
"fs.s3a.server-side-encryption-algorithm":

fs.s3a.server-side-encryption-algorithm=CSE-KMS
fs.s3a.server-side-encryption.key=<KMS_KEY_ID>

You cannot enable CSE and SSE in the same client, although
you can still enable a default SSE option in the S3 console.

* Filesystem list/get status operations subtract 16 bytes from the length
  of all files >= 16 bytes long to compensate for the padding which CSE
  adds.
* The SDK always warns about the specific algorithm chosen being
  deprecated. It is critical to use this algorithm for ranged
  GET requests to work (i.e. random IO). Ignore.
* Unencrypted files CANNOT BE READ.
  The entire bucket SHOULD be encrypted with S3-CSE.
* Uploading files may be a bit slower as blocks are now
  written sequentially.
* The Multipart Upload API is disabled when S3-CSE is active.

Contributed by Mehakmeet Singh
…d enabled (apache#3239)

S3A S3Guard tests to skip if S3-CSE are enabled (apache#3263)

    Follow on to
    * HADOOP-13887. Encrypt S3A data client-side with AWS SDK (S3-CSE)

    If the S3A bucket is set up to use S3-CSE encryption, all tests which turn
    on S3Guard are skipped, so they don't raise any exceptions about
    incompatible configurations.

Contributed by Mehakmeet Singh
This migrates the fs.s3a-server-side encryption configuration options
to a name which covers client-side encryption too.

fs.s3a.server-side-encryption-algorithm becomes fs.s3a.encryption.algorithm
fs.s3a.server-side-encryption.key becomes fs.s3a.encryption.key

The existing keys remain valid, simply deprecated and remapped
to the new values. If you want server-side encryption options
to be picked up regardless of hadoop versions, use
the old keys.

(the old key also works for CSE, though as no version of Hadoop
with CSE support has shipped without this remapping, it's less
relevant)


Contributed by: Mehakmeet Singh
…apache#3466)

The ordering of the resolution of new and deprecated s3a encryption options & secrets is the same when JCEKS and other hadoop credentials stores are used to store them as
when they are in XML files: per-bucket settings always take priority over global values,
even when the bucket-level options use the old option names.

Contributed by Mehakmeet Singh and Steve Loughran
@mehakmeet
Copy link
Contributor Author

Tested on tip of the chain of commits:
Region: ap-south-1
mvn clean verify -Dparallel-tests -DtestsThreadCount=4 -Dscale

CSE:

[INFO] Results: [INFO] [WARNING] Tests run: 586, Failures: 0, Errors: 0, Skipped: 5

[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   ITestS3AMiscOperationCost.testGetContentSummaryRoot:96->AbstractS3ACostTest.verifyMetrics:376->lambda$testGetContentSummaryRoot$1:96->getContentSummary:140 » TestTimedOut
[ERROR]   ITestS3AMiscOperationCost.testGetContentSummaryRoot:96->AbstractS3ACostTest.verifyMetrics:376->lambda$testGetContentSummaryRoot$1:96->getContentSummary:140 » TestTimedOut
[INFO] 
[ERROR] Tests run: 1467, Failures: 0, Errors: 2, Skipped: 637
[ERROR]   ITestS3AFileContextStatistics>FCStatisticsBaseTest.testStatistics:103->verifyWrittenBytes:96 Mismatch in bytes written expected:<512> but was:<698>
[ERROR] Errors: 
[ERROR]   ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRecursiveRootListing:267 » TestTimedOut
[INFO] 
[ERROR] Tests run: 151, Failures: 2, Errors: 1, Skipped: 28

timeout due to setup/bandwidth, happens on trunk as well for me.
ITestS3AFileContextStatistics, there is a bug, that needs fixing.

non-CSE

[INFO] Results: [INFO] [WARNING] Tests run: 586, Failures: 0, Errors: 0, Skipped: 5
[INFO] [ERROR] Tests run: 1467, Failures: 0, Errors: 2, Skipped: 467
[INFO] [ERROR] Tests run: 151, Failures: 1, Errors: 1, Skipped: 28

CSE-S3Guard

[INFO] Results: [INFO] [WARNING] Tests run: 586, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results: [INFO] [WARNING] Tests run: 1467, Failures: 0, Errors: 0, Skipped: 1256
[INFO] Results: [INFO] [WARNING] Tests run: 151, Failures: 0, Errors: 0, Skipped: 92

non-CSE-S3Guard

[INFO] Results: [INFO] [WARNING] Tests run: 586, Failures: 0, Errors: 0, Skipped: 5
[INFO] [ERROR] Tests run: 1467, Failures: 0, Errors: 4, Skipped: 388
[INFO] [ERROR] Tests run: 11, Failures: 0, Errors: 3, Skipped: 0

CC: @steveloughran

@steveloughran
Copy link
Contributor

So this is the full chain of commits? And there's been no changes other than cherrypicking on to branch-3.3?

if so, +1 pending yetus. I can check out then commit the sequence locally, without having to merge the commits

@mehakmeet
Copy link
Contributor Author

Yes, this is the tip branch, the commits in order are #3292, #3506, #3507, and then this one. I am not sure how the merge works in this case, would it merge all PRs after the tip is merged?

#3506 was two Jiras made into one commit, as it's just CSE-s3guard related IOE and then skip tests, thought it's better to make that as one?
In the last commit, there was a mismatch in java StringUtils lang and lang3. I think 3.3 can't have lang so replaced those with lang3 stringUtils.
The rest is same.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 6m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 34 new or modified test files.
_ branch-3.3 Compile Tests _
+0 🆗 mvndep 12m 19s Maven dependency ordering for branch
+1 💚 mvninstall 21m 5s branch-3.3 passed
+1 💚 compile 17m 29s branch-3.3 passed
+1 💚 checkstyle 2m 47s branch-3.3 passed
+1 💚 mvnsite 2m 35s branch-3.3 passed
+1 💚 javadoc 2m 29s branch-3.3 passed
+1 💚 spotbugs 3m 45s branch-3.3 passed
+1 💚 shadedclient 23m 59s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 1m 32s the patch passed
+1 💚 compile 16m 36s the patch passed
-1 ❌ javac 16m 36s /results-compile-javac-root.txt root generated 3 new + 1953 unchanged - 2 fixed = 1956 total (was 1955)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 2m 40s /results-checkstyle-root.txt root: The patch generated 13 new + 165 unchanged - 39 fixed = 178 total (was 204)
+1 💚 mvnsite 2m 33s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 2m 29s the patch passed
+1 💚 spotbugs 4m 5s the patch passed
+1 💚 shadedclient 23m 56s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 10s hadoop-common in the patch passed.
+1 💚 unit 2m 28s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 59s The patch does not generate ASF License warnings.
170m 45s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3508/1/artifact/out/Dockerfile
GITHUB PR #3508
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml markdownlint
uname Linux 1cc771fa9fd0 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / a98a2b1
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~18.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3508/1/testReport/
Max. process+thread count 1251 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3508/1/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

the merge button on the github UI is "squash and merge", but if I Check out your branch I can just cherrypick the chain of commits on top of branch-3.3

@mehakmeet
Copy link
Contributor Author

@steveloughran, did you mean chain of commits in a single PR? I thought we were gonna do a chain of PRs with single commits.
But, I think you can still cherry-pick all the commits directly from this branch. It has all the commits, after merging we can close other PRs as done.

@mehakmeet
Copy link
Contributor Author

merged in branch-3.3

@mehakmeet mehakmeet closed this Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants