Skip to content

HADOOP-16794. S3A reverts KMS encryption to the bucket's default KMS … #1875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: branch-3.2
Choose a base branch
from

Conversation

mukund-thakur
Copy link
Contributor

…key in rename/copy.

AreContributed by Mukund Thakur.

This addresses an issue which surfaced with KMS encryption: the wrong
KMS key could be picked up in the S3 COPY operation, so
renamed files, while encrypted, would end up with the
bucket default key.

As well as adding tests in the new suite
ITestS3AEncryptionWithDefaultS3Settings,
AbstractSTestS3AHugeFiles has a new test method to
verify that the encryption settings also work
for large files copied via multipart operations.

…key in rename/copy.

AreContributed by Mukund Thakur.

This addresses an issue which surfaced with KMS encryption: the wrong
KMS key could be picked up in the S3 COPY operation, so
renamed files, while encrypted, would end up with the
bucket default key.

As well as adding tests in the new suite
ITestS3AEncryptionWithDefaultS3Settings,
AbstractSTestS3AHugeFiles has a new test method to
verify that the encryption settings also work
for large files copied via multipart operations.
@mukund-thakur
Copy link
Contributor Author

Tested with my bucket in ap-south-1 with params -Ds3guard -Ddynamo -Dparallel-tests -DtestsThreadCount=8 . All tests fine apart from ITestS3GuardToolDynamoDB.testDynamoDBInitDestroyCycle which was failing on base branh-3.2 as well.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 7m 58s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 7 new or modified test files.
_ branch-3.2 Compile Tests _
+1 💚 mvninstall 18m 57s branch-3.2 passed
+1 💚 compile 0m 29s branch-3.2 passed
+1 💚 checkstyle 0m 24s branch-3.2 passed
+1 💚 mvnsite 0m 34s branch-3.2 passed
+1 💚 shadedclient 12m 49s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 25s branch-3.2 passed
+0 🆗 spotbugs 0m 50s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 0m 47s branch-3.2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 35s the patch passed
+1 💚 compile 0m 24s the patch passed
+1 💚 javac 0m 24s the patch passed
-0 ⚠️ checkstyle 0m 16s hadoop-tools/hadoop-aws: The patch generated 7 new + 12 unchanged - 0 fixed = 19 total (was 12)
+1 💚 mvnsite 0m 27s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 9s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 23s the patch passed
+1 💚 findbugs 0m 55s the patch passed
_ Other Tests _
+1 💚 unit 4m 30s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 30s The patch does not generate ASF License warnings.
65m 13s
Subsystem Report/Notes
Docker Client=19.03.7 Server=19.03.7 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1875/1/artifact/out/Dockerfile
GITHUB PR #1875
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 3a8fb3d0c03d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision branch-3.2 / 369f4f9
Default Java 1.8.0_242
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1875/1/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1875/1/testReport/
Max. process+thread count 452 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1875/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@liuml07 liuml07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with one question. Thanks @mukund-thakur

CC: @steveloughran

return conf;
}

/**
* This removes the encryption settings from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not "removes the encryption settings from the configuration". Could you explain why that is not required in branch-3.2 and/or update the java doc here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes comment is misleading. I had to fix the merge conflict and as removeBaseAndBucketOverrides() was not present earlier I had to delete that line thinking it is not required.

Although I can see the same method is present in S3ATestUtils but with different parameters and it is not used in this test.
@steveloughran Do you also think tests might break here if bucket level conf are set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried setting the bucket level conf for fs.s3a.bucket.mthakur-data.server-side-encryption.key and some tests for eg. ITestS3AEncryptionSSES3 started failing. I even tried adding the removeBaseAndBucketOverrides call but it doesn't help.

I tried setting then same bucket level conf in apache/trunk as well and the tests are succeeding there.

The reason for test failure is
java.io.IOException: AES256 is enabled but an encryption key was set in fs.s3a.server-side-encryption.key (key of length 76 ending with 8) at org.apache.hadoop.fs.s3a.S3AUtils.getEncryptionAlgorithm(S3AUtils.java:1440) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:316) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3298) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)

removeBaseAndBucketOverrides() call essentially removes this parameter not sure how it is getting added in FileSystem initialisation flow again. It is too hard to compare and debug the difference between branch-3.2 and trunk as there are many changes. Any pointers in this regards would be greatly appreciated.

Also could someone please trigger the test based on their configs and share the results. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce the test failure. I ran the test with fs.s3a.bucket.BUCKETNAME.server-side-encryption.key set in auth-keys.xml.

I set the S3 bucket default encryption policy as SSE-KMS, and that is not changing the ITestS3AEncryptionSSES3 test failure (as expected). Well, test ITestS3AEncryptionWithDefaultS3Settings could pass then (again, as expected).

Copy link
Member

@liuml07 liuml07 Mar 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I did some investigation, and the observation is as following:

  1. The failing tests are just about tests. The fix in copy options should work as expected. Specially, some encryption tests fail only when bucket level encryption is set.
  2. The inconsistent encryption algorithm and key situation in conf is partially because of the addConfResource(CONTRACT_XML); function after creating and patching the conf object, see here.
  3. The call to addConfResource(CONTRACT_XML); should be considered a tricky unrelated bug, not brought by this patch. Most likely it is fixed by HADOOP-16626 in trunk. Guess that's why we don't see test failures there. I'm wondering how trunk is not failing this...? Was HADOOP-16626 enough?
  4. Specially, there are some discussions about the Configuration loading resources "after" the conf object has been patched when setting up the S3 tests. In our test case, if you have the change to call removeBucketOverrides(), and you have the change in S3AContract not calling addConfResource(CONTRACT_XML);, then you can test with this: you add multiple testEncryption() and multiple testEncryptionOverRename() test cases in AbstractTestS3AEncryption (example). Guess what, only one test fail and all others are failing...How interesting? This seems very related to static resource loading when creating S3AFileSystem...

I'll stop here since I for now have no more context. I will request @steveloughran to provide more help. Shall we backport HADOOP-16626 to branch-3.2 and see?

@liuml07 liuml07 added bug fs/s3 changes related to hadoop-aws; submitter must declare test endpoint labels Mar 5, 2020
@liuml07
Copy link
Member

liuml07 commented Mar 9, 2020

@steveloughran and @mukund-thakur

Per my previous investigation in this review, I have backported the changes in HADOOP-16626 partially into branch-3.2. The backported change was to fix the problem of adding configuration resources unexpectedly. The test ITestRestrictedReadAccess was not there in branch-3.2 yet so I ignored that file.

Then I tested with removeBaseAndBucketOverrides when patching encryption configuration, it seems to work, at least for both ITestS3AEncryptionWithDefaultS3Settings and ITestS3AEncryptionSSES3. The code is in my branch please see the latest two commits. I think we can run more tests to confirm this.

Question: does this backport make sense to happen separately, or we can merge with this PR?

@liuml07
Copy link
Member

liuml07 commented Mar 10, 2020

Again, for the branch-2.10 change, I backport the fix to there and did some manual testing. I did not work on the integration tests conflicts since they are not resolved even to branch-3.2.

  1. Create an S3 bucket and set the default encryption to KMS. Create our own key in KMS and select that for the S3 bucket as default encryption key. All of this was from AWS Web Console for S3.
  2. No client configurations for S3 at all, neither in core-site.xml nor command line -D
  3. Confirm the bug without this patch
    3.1 hadoop fs -put /etc/hosts s3a://s3guard-mingliang/1 => This is not using our KMS key but the Amazon one
    3.2 hadoop fs -cp s3a://s3guard-mingliang/1 s3a://s3guard-mingliang/1-copy => This is not using our KMS key but the Amazon one
    3.3 hadoop fs -put -d /etc/hosts s3a://s3guard-mingliang/1-d => This is using our KMS key because this is a direct put, without copy operation internally
  4. Confirm the fix with this patch
    4.1 hadoop fs -put /etc/hosts s3a://s3guard-mingliang/2 => This is using our KMS key
    4.2 hadoop fs -cp s3a://s3guard-mingliang/2 s3a://s3guard-mingliang/2-copy => This is using our KMS key
    4.3 hadoop fs -put -d /etc/hosts s3a://s3guard-mingliang/2-d => This is using our KMS key

@mukund-thakur
Copy link
Contributor Author

Thanks @liuml07 for the investigation. I believe HADOOP-16626 needs to be backported to branch 3.2. We will wait for @steveloughran comments.

@steveloughran
Copy link
Contributor

I think the way to do 3.2 support properly is to pretty much backport all the 3.3.x stuff in order

@liuml07
Copy link
Member

liuml07 commented Apr 2, 2020

@steveloughran Thanks. I think bulk backporting 3.3 stuff to 3.2 for most if not all hadoop-aws module changes makes sense. I see HADOOP-15619 is already fixed. Is this a good time to do the bulk backport?

Specially, I think we need to backport this security fix to 2.10 branch. We locally backported (without test coding due to conflicts) and it works well on that branch so far.

@steveloughran
Copy link
Contributor

yeah, bulk backport it is. Some of the stuff are big features (delegation tokens) that I'd targeted 3.3 with, but they change the code enough its easiest to leave them in.

HADOOP-13230 (#1861) is a WiP -a full backport will make cherrypicking that to 3.2 straightforward. I'll have to a new patch for <3.1. Not because I want them to add the ability to retain markers, but I don't want them to mistake a dir marker for an empty dir. That's also why marker retention is off by default, and the "authoritative" option will restrict it to auth mode directories, so Hive can do it with its managed directories while leaving the rest alone

@liuml07
Copy link
Member

liuml07 commented Apr 6, 2020

Thanks @steveloughran I can help review #1861 That is a big patch.

For the bulk import, I don't have enough context which JIRAs should be left in 3.3 and may not be able to resolve conflicts. I can help with review and testing though.

@steveloughran
Copy link
Contributor

yeah, #1861 is big. I am going to see if I can get the changed getFileStatus code into hadoop 3.3.0 before it is frozen, so I don't have to worry about cross-version compatibility there. It will still be deleting directory markers as today, simply doing a LIST for marker & children and skipping HEAD + /

@steveloughran
Copy link
Contributor

what's the status of this? Merge or WONTFIX?

@liuml07
Copy link
Member

liuml07 commented Sep 18, 2020

@steveloughran I think we can merge this into branch-2.10.

This is very important fix. The failing tests are just about tests per my previous debugging comment. The fix itself is reviewed and already in 3.3.0 releases. We have backported this fix into our internal light-fork 6 months ago, and customers have not complained since then. For the normal testing without per-bucket settings, the test will pass.

Thoughts?

@liuml07
Copy link
Member

liuml07 commented Sep 20, 2020

@steveloughran Given the failing tests are not related to the fix per se, do you think it's fine if I backport the fix only without including tests into branch-2.10? I think it would be nice to include this in new pending release 2.10.1.

@apache apache deleted a comment from hadoop-yetus Sep 23, 2020
@steveloughran
Copy link
Contributor

do you think it's fine if I backport the fix only without including tests into branch-2.10? I think it would be nice to include this in new pending release 2.10.1.

fine with me

@liuml07
Copy link
Member

liuml07 commented Sep 23, 2020

Thanks @steveloughran I will file a new PR for backporting the fix (not necessarily including test changes) this into branch-2.10 branch. Hadoop 2.10.1 has been just released so this will be target 2.10.2.

So, is my another proposal also valid? Specially, If we plan the effort of bulk backporting those good stuff from 3.3/3.4 back to 3.2, the per-bucket test failures will also be resolved. But I do not see that is planed yet. Since this PR is fixing security issues, I think we can merge this PR into branch-3.2 before that. If so, @mukund-thakur can help us rebase and ideally run another round of tests.

@steveloughran
Copy link
Contributor

3.2 and 3.3 have diverged so much in s3a. I tried a bit of a backport and gave up.

Now, internally, CDH 7/CDP is built on the 3.1+ codebase, with much of the s3a changes backported. But its not always easy, and was done as code was added

  • mockito changes break tests, even though things compile
  • commons-lang is trouble
  • some things (multipart uploader API) haven't gone back

I'm reluctant to replicate the same problems here.

What would be good

  • do an AWS SDK update everywhere
  • on older releases, turn off tests we know are flakly/fixed with big changes on later branches
  • backport FS API Changes (openFile() in particular) so that code can compile more broadly against them. That doesn't mean backporting all the speedups behind the scenes, just the API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants