Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-13887. Support S3 client side encryption (S3-CSE) using AWS-SDK #2706

Merged
merged 10 commits into from
Jul 27, 2021

Conversation

mehakmeet
Copy link
Contributor

Tests: mvn -T 32 clean verify -Ddynamo -Dauth -Dscale -Dparallel-tests
Region: ap-south-1

[INFO] Results:
[INFO]
[WARNING] Tests run: 535, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   ITestS3AEncryptionCSEAsymmetric>ITestS3AEncryptionCSE.testEncryption:48->ITestS3AEncryptionCSE.validateEncryptionForFilesize:82->AbstractS3ATestBase.writeThenReadFile:196->AbstractS3ATestBase.writeThenReadFile:209->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Wrong file length of file s3a://mehakmeet-singh-data/fork-0001/test/testEncryption0 status: S3AFileStatus{path=s3a://mehakmeet-singh-data/fork-0001/test/testEncryption0; isDirectory=false; length=16; replication=1; blocksize=33554432; modification_time=1613656460000; access_time=0; owner=mehakmeet.singh; group=mehakmeet.singh; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=91f7a9a65b91c2332c572ea76bdbe822 versionId=null expected:<0> but was:<16>
[ERROR]   ITestS3AEncryptionCSEAsymmetric>ITestS3AEncryptionCSE.testEncryptionOverRename:63->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Wrong file length of file s3a://mehakmeet-singh-data/fork-0001/test/testEncryptionOverRename status: S3AFileStatus{path=s3a://mehakmeet-singh-data/fork-0001/test/testEncryptionOverRename; isDirectory=false; length=1040; replication=1; blocksize=33554432; modification_time=1613656458000; access_time=0; owner=mehakmeet.singh; group=mehakmeet.singh; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=97712202251ecc080bbbf909c9000718 versionId=null expected:<1024> but was:<1040>
[ERROR]   ITestS3AEncryptionCSEKms>ITestS3AEncryptionCSE.testEncryption:48->ITestS3AEncryptionCSE.validateEncryptionForFilesize:82->AbstractS3ATestBase.writeThenReadFile:196->AbstractS3ATestBase.writeThenReadFile:209->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Wrong file length of file s3a://mehakmeet-singh-data/fork-0001/test/testEncryption0 status: S3AFileStatus{path=s3a://mehakmeet-singh-data/fork-0001/test/testEncryption0; isDirectory=false; length=16; replication=1; blocksize=33554432; modification_time=1613656468000; access_time=0; owner=mehakmeet.singh; group=mehakmeet.singh; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=74cb30cc19ee58be3bcd462120c16eb7 versionId=null expected:<0> but was:<16>
[ERROR]   ITestS3AEncryptionCSEKms>ITestS3AEncryptionCSE.testEncryptionOverRename:63->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Wrong file length of file s3a://mehakmeet-singh-data/fork-0001/test/testEncryptionOverRename status: S3AFileStatus{path=s3a://mehakmeet-singh-data/fork-0001/test/testEncryptionOverRename; isDirectory=false; length=1040; replication=1; blocksize=33554432; modification_time=1613656465000; access_time=0; owner=mehakmeet.singh; group=mehakmeet.singh; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=d03375638396ae14794cde1a7e5fd4f8 versionId=null expected:<1024> but was:<1040>
[ERROR]   ITestS3AEncryptionCSESymmetric>ITestS3AEncryptionCSE.testEncryption:48->ITestS3AEncryptionCSE.validateEncryptionForFilesize:82->AbstractS3ATestBase.writeThenReadFile:196->AbstractS3ATestBase.writeThenReadFile:209->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Wrong file length of file s3a://mehakmeet-singh-data/fork-0002/test/testEncryption0 status: S3AFileStatus{path=s3a://mehakmeet-singh-data/fork-0002/test/testEncryption0; isDirectory=false; length=16; replication=1; blocksize=33554432; modification_time=1613656453000; access_time=0; owner=mehakmeet.singh; group=mehakmeet.singh; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=ff01163e592622879cd0cdb7005618ca versionId=null expected:<0> but was:<16>
[ERROR]   ITestS3AEncryptionCSESymmetric>ITestS3AEncryptionCSE.testEncryptionOverRename:63->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Wrong file length of file s3a://mehakmeet-singh-data/fork-0002/test/testEncryptionOverRename status: S3AFileStatus{path=s3a://mehakmeet-singh-data/fork-0002/test/testEncryptionOverRename; isDirectory=false; length=1040; replication=1; blocksize=33554432; modification_time=1613656451000; access_time=0; owner=mehakmeet.singh; group=mehakmeet.singh; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=1802986c25982cdacb58aff624626eeb versionId=null expected:<1024> but was:<1040>

When we do fs.getFileStatus().getLen() to get the content length of the encrypted file, it is not the same as the original length and hence, the tests are breaking.

[ERROR] Tests run: 1427, Failures: 7, Errors: 21, Skipped: 458

Failures other than S3-CSE tests are config related

[ERROR] Errors:
[ERROR]   ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRecursiveRootListing:267 » TestTimedOut
[INFO]
[ERROR] Tests run: 151, Failures: 0, Errors: 1, Skipped: 28

When I remove the checks for file content Lengths from the tests, the tests run successfully, hence, the Key-wrap Algo and Content Encryption Algo are used as intended successfully.

One possible way we explored to tackle the padding issue was to tweak the s3GetFileStatus call to return a FileStatus with "UNENCRYPTED_CONTENT_LENGTH" header which comes in the user metadata, but this would still break where we don't get these headers to do our tasks. Hence, consistency of content Length needs to be maintained.
P.S: Need more tests to validate that even with tweaking, we are breaking some tests.

CC: @steveloughran

@mehakmeet
Copy link
Contributor Author

Added the "UNENCRYPTED_CONTENT_LENGTH" header in getFileStatus calls and replaced the content length with it if client-side encryption(CSE) is enabled. This helped pass the initial set of tests for all methods of CSE, but after adding a "Directory Listing" test to check the content lengths of files of different sizes, it broke:

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   ITestS3AEncryptionCSESymmetric>ITestS3AEncryptionCSE.testDirectoryListingFileLengths:105 [File length isn't same as expected from directory listing]
Expecting:
  <[16, 17, 271, 4111]>
to contain exactly in any order:
  <[0, 1, 255, 4095]>
elements not found:
  <[0, 1, 255, 4095]>
and elements not expected:
  <[16, 17, 271, 4111]>

[INFO]
[ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0

@steveloughran
Copy link
Contributor

we've discussed using that header before -but not gone with because, as you note: LIST operations don't pick up the header. All code which uses the list to determine object length is going to break.

@mehakmeet
Copy link
Contributor Author

mehakmeet commented Feb 23, 2021

Also on that note, I figured something in the aws-sdk while doing a put request we have this in S3CryptoModuleBase.java:

// Record the original, unencrypted content-length so it can be accessed
        // later
        final long plaintextLength = plaintextLength(request, metadata);
        if (plaintextLength >= 0) {
            metadata.addUserMetadata(Headers.UNENCRYPTED_CONTENT_LENGTH,
                                     Long.toString(plaintextLength));
            // Put the ciphertext length in the metadata
            metadata.setContentLength(ciphertextLength(plaintextLength));
        }
        request.setMetadata(metadata);

Every put request in AmazonS3EncryptionClientV2, is wrapped like this. but it seems this isn’t the issue, in LIST ops, we don’t use these, right? so, how do we get file length in LIST calls?

@steveloughran
Copy link
Contributor

we get the filelength back in the list API calls; with the v2 LIST being the one we default to. it returns file name, length, timestamp, etag, storage class and optionally owner (we don't)

This is what has blocked us before: code which goes from list dir -> fileStatus[]. foreach.open().read().

if the length of a file in listing != that of the file in read, things break. Lots of things.

if we still can't do unpadded CSE, it's still unusable for our workloads

@mehakmeet
Copy link
Contributor Author

Had merge conflicts so had to force push.
Tests:

[ERROR] Tests run: 1430, Failures: 1, Errors: 34, Skipped: 538

Scale:

[ERROR] Tests run: 151, Failures: 3, Errors: 21, Skipped: 29

Most errors are MultiPart upload related:

com.amazonaws.SdkClientException: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.

Simply adding 16(Padding length) to multipart upload block size won't work. The part sizes need to be a multiple of 16, so it has that restriction for CSE. Also, one more thing to note here is that it assumes the last part to be an exception, which makes me believe that multipart upload in CSE has to be sequential(or can we parallel upload the starting parts and then upload the last part?)? So, potentially another constraint while uploading could have performance impacts here apart from the HEAD calls being required while downloading/listing.
@steveloughran

@bogthe
Copy link
Contributor

bogthe commented May 16, 2021

Had merge conflicts so had to force push.
Tests:

[ERROR] Tests run: 1430, Failures: 1, Errors: 34, Skipped: 538

Scale:

[ERROR] Tests run: 151, Failures: 3, Errors: 21, Skipped: 29

Most errors are MultiPart upload related:

com.amazonaws.SdkClientException: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.

Simply adding 16(Padding length) to multipart upload block size won't work. The part sizes need to be a multiple of 16, so it has that restriction for CSE. Also, one more thing to note here is that it assumes the last part to be an exception, which makes me believe that multipart upload in CSE has to be sequential(or can we parallel upload the starting parts and then upload the last part?)? So, potentially another constraint while uploading could have performance impacts here apart from the HEAD calls being required while downloading/listing.
@steveloughran

Hi @mehakmeet , regarding multipart uploads. The last part is always an exception with regular multi part uploads too! You can do parallel uploads and even upload the last part first and it would still work (for regular multi-part). My assumption is that for multi part uploads with CSE enabled the same functionality holds (except for cipher block size, but the minimum part size for regular multi-part is 5MB = 5 * 1024 * 1024 which is still a multiple of 16 :D ).

@mehakmeet
Copy link
Contributor Author

Hey @bogthe,

The last part is always an exception with regular multi part uploads too! You can do parallel uploads and even upload the last part first and it would still work (for regular multi-part).

Ah, I see, even I was thinking that with CSE it should still be able to find which part was last since during the parts upload step we provide part numbers and complete it in ascending order, But, I ran some tests with CSE enabled and I was facing these issues:

AbstractContractMultipartUploaderTest#testMultipartUpload()
T1: partSize: 5242880bytes(5MB) + 1byte = 5242881 bytes

2021-05-17 02:43:43,998 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) - Starting: Put part 1 (size 5242881) s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:43:44,002 [s3a-transfer-shared-pool1-t2] INFO  s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) - upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: Retried 0: org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.
2021-05-17 02:43:44,824 [s3a-transfer-shared-pool1-t2] INFO  s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) - upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: Retried 1: org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.
2021-05-17 02:43:46,184 [s3a-transfer-shared-pool1-t2] INFO  s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) - upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: Retried 2: org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.
2021-05-17 02:43:50,537 [s3a-transfer-shared-pool1-t2] INFO  s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) - upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: Retried 3: org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.
2021-05-17 02:44:00,768 [s3a-transfer-shared-pool1-t2] INFO  s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) - upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: Retried 4: org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.

This retries a couple of times and fails with the exception:

org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_ on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.: Invalid part size: part sizes for encrypted multipart uploads must be multiples of the cipher block size (16) with the exception of the last part.

T2: partSize: 5242880bytes(5MB)

2021-05-17 02:46:22,270 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) - Starting: Put part 1 (size 5242880) s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:46:22,907 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:close(98)) - Put part 1 (size 5242880) s3a://mehakmeet-singh-data/test/testMultipartUpload: duration 0:00.637s
2021-05-17 02:46:22,910 [JUnit-testMultipartUpload] INFO  contract.ContractTestUtils (ContractTestUtils.java:end(1924)) - Duration of Uploaded part 1: 637,220,364 nS
2021-05-17 02:46:22,911 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (AbstractContractMultipartUploaderTest.java:putPart(352)) - Upload bandwidth 7.846579 MB/s
2021-05-17 02:46:22,934 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) - Starting: Put part 2 (size 5242880) s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:46:23,254 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:close(98)) - Put part 2 (size 5242880) s3a://mehakmeet-singh-data/test/testMultipartUpload: duration 0:00.320s
2021-05-17 02:46:23,254 [JUnit-testMultipartUpload] INFO  contract.ContractTestUtils (ContractTestUtils.java:end(1924)) - Duration of Uploaded part 2: 319,980,951 nS
2021-05-17 02:46:23,255 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (AbstractContractMultipartUploaderTest.java:putPart(352)) - Upload bandwidth 15.625930 MB/s
2021-05-17 02:46:23,275 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) - Starting: Put part 3 (size 5242880) s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:46:23,990 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:close(98)) - Put part 3 (size 5242880) s3a://mehakmeet-singh-data/test/testMultipartUpload: duration 0:00.715s
2021-05-17 02:46:23,990 [JUnit-testMultipartUpload] INFO  contract.ContractTestUtils (ContractTestUtils.java:end(1924)) - Duration of Uploaded part 3: 715,353,661 nS
2021-05-17 02:46:23,990 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (AbstractContractMultipartUploaderTest.java:putPart(352)) - Upload bandwidth 6.989550 MB/s
2021-05-17 02:46:23,991 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) - Starting: Complete upload to s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:47:49,055 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:close(98)) - Complete upload to s3a://mehakmeet-singh-data/test/testMultipartUpload: duration 1:25.064s
2021-05-17 02:47:49,056 [JUnit-testMultipartUpload] INFO  contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) - Starting: Abort upload to s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:47:49,058 [s3a-transfer-shared-pool1-t6] INFO  s3a.S3AFileSystem (S3AFileSystem.java:abortMultipartUpload(4703)) - Aborting multipart upload l0UfFfsZXE8ogO8ojviT6D8iJo3oEM052apJu.txB1b5j1KPD4F8LQWWYHmOru4G1mu.uPPtGZhIYoT0P2S3g.k10ROOP7uXOiX7czPpmXzlA.67xB7YoN2_IczirQDL to test/testMultipartUpload

Eventually fails with the exception:

org.apache.hadoop.fs.s3a.AWSClientIOException: Completing multipart upload on test/testMultipartUpload: com.amazonaws.SdkClientException: Unable to complete an encrypted multipart upload without being told which part was the last.  Without knowing which part was the last, the encrypted data in Amazon S3 is incomplete and corrupt.: Unable to complete an encrypted multipart upload without being told which part was the last.  Without knowing which part was the last, the encrypted data in Amazon S3 is incomplete and corrupt.

Both of these passes without CSE.

So, basically, we have a restriction to use only multiple of 16 as partSizes, even though min size of parts is 5MB and anything which is a multiple of MB, would be a multiple of 16, but we can't set any custom bytes(not multiple of 16) as partSize in CSE.
And, even after we set it to a multiple of 16, I am seeing the exception regarding last part. So, is the logic of part numbers not applicable in CSE? Maybe I am missing something here?

CC: @steveloughran

@bogthe
Copy link
Contributor

bogthe commented May 18, 2021

Did some digging and found out some more context:

Also make sure your multi part code sets lastPart to true for CSE otherwise you'll still get the error you mentioned above. *

@mehakmeet
Copy link
Contributor Author

Force pushing due to merge conflicts.

  • Multipart upload:
    - Added the serial uploads by setting the "Active upload blocks" equal to 1, when CSE is enabled.
    - For the last part, while closing the S3ABlockOutputStream, we'll pass "isCSEEnabled" as the parameter for lastPart while uploading so that we can have isLastPart=true for CSE.
    - There is a boundary condition when part size == last part or part size is multiple of bytes you have to upload, that we don't have any active blocks while closing the stream. For this, we would skip the uploading of the last block in write and wait for close(), so that our isLastPart = true works every time.
  • Multipart Get:
    - So, when we upload via multipart in CSE, not all blocks are padded with 16 bytes, hence, we don't need to worry about the HEAD call we were thinking of doing while trying to figure out the content length. We can just strip out the padded length from the file.
    - UNENCRYPTED_CONTENT_LENGTH header is also not included while multi-part upload in CSE, so, we would have to subtract 16 bytes while s3aGetFileStatus.
  • Only multipart work is done in S3ABlockOutputStream, so directly calling multipart from S3AFilesystem won't work, thus the hasCapabilities changes.
  • Custom Signer, as @bogthe said earlier, that this header : x-amz-content-sha256:UNSIGNED-PAYLOAD, should only be used for S3 Service, that holds true, and while signing in ITestCustomSigner, we were using AWSS3V4Signer even for AWSKMS service rather than AWSV4Signer, which works for AWSKMS, since it doesn't have that header by default.

Ran tests for both with and without CSE on ap-south-1:
Without CSE:

[INFO] Results:
[INFO] 
[WARNING] Tests run: 1434, Failures: 0, Errors: 0, Skipped: 466

Scale: 

[ERROR] Errors: 
[ERROR]   ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRecursiveRootListing:267 » TestTimedOut
[INFO] 
[ERROR] Tests run: 151, Failures: 0, Errors: 1, Skipped: 28

With CSE:

[INFO] 
[ERROR] Tests run: 1435, Failures: 0, Errors: 17, Skipped: 551

Scale:

[ERROR] Errors: 
[ERROR]   ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRecursiveRootListing:267 » TestTimedOut
[INFO] 
[ERROR] Tests run: 151, Failures: 0, Errors: 1, Skipped: 28

Errors in CSE are:

[ERROR] testMultipartUploadAbort(org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader)  Time elapsed: 1.23 s  <<< ERROR!
java.lang.UnsupportedOperationException: Multi-part uploader not supported for Client side encryption.
	at org.apache.hadoop.fs.s3a.S3AFileSystem.createMultipartUploader(S3AFileSystem.java:5041)

The weird thing is it skips in IDE, but for some reason, I am not able to make it skip in mvn terminal.
This is the code used to skip this in the case of CSE in the setup() of AbstractContractMultipartUploaderTest.java:

Assume.assumeTrue("Multipart uploader is not supported",
        fs.hasPathCapability(testPath, CommonPathCapabilities.FS_MULTIPART_UPLOADER));

Any feedback as to why this is happening would be really helpful.

@mehakmeet
Copy link
Contributor Author

Seems like git pull, didn't work for me, just before pushing, still have some conflicts, I'll rebase once more by fetching and applying.

@mehakmeet
Copy link
Contributor Author

CC: @mukund-thakur

@apache apache deleted a comment from hadoop-yetus Jun 24, 2021
@steveloughran
Copy link
Contributor

Hey, someone just broke things.Sorry.

The weird thing is it skips in IDE, but for some reason, I am not able to make it skip in mvn terminal

What I usually miss out there is you need to rebuild hadoop-common after changing the tests there; running tests in hadoop-aws pick up the old version.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an ambitious piece of work. We always run away from it because of file listing length problems,
But this does appear to correct it provided you are always listing directories containing only encrypted content.

What is going to happen if you list a bucket containing unencrypted data? it's going to be shorter.

it might be interesting to point encrypted and unencrypted FS instances at the same FS and
see how they behave,. e.g assert that lengths are different, and when the unencrypted data is read
the bytes are different too.

  • What happens when you try and read unencrypted data through an encrypted FS? Again, a test can show this.

We are going to need a documentation page on this, especially one that which covers how file lengths are fixed up.

How to seek work in this world? Slow I guess, assuming it starts from the beginning of the file.
In a multipart file it may only decrypt from the beginning of that block so seeking with in a block should be much more efficient.
That is: the closer your initial read position is to the start of a block, the less data to decrypt, and hence the faster the read.

You could probably add an experiment there - not one you could make real sessions on, but we could at least use the nano timer to measure/report on timing differences.

Once we know a bit more about the performance hit of reading/seeking in encrypted data, we could provide more guidance in the docs.

I think I'd like some more "observability" on this with the FS and output stream toString() values reporting CSE.
We could also think about setting a gauge in their IOStatistics reports, making that a way to check, and ensuring
it gets aggregated into other reports. (or we set a maximum and rely on max aggregation to always set it to 1 if merging)
bit of a hack, but ... (we never did have a good story for gauge aggregation, did we?)

//ContractTestUtils.verifyFileContents(fs, src, data);
Path dest = path(src.getName() + "-copy");
fs.rename(src, dest);
ContractTestUtils.verifyFileContents(fs, dest, data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we should use DurationInfo to log durations of the write rename and verify operations? Might be interesting to look at

@@ -524,10 +525,15 @@ public static S3AFileStatus createFileStatus(Path keyPath,
long blockSize,
String owner,
String eTag,
String versionId) {
String versionId,
boolean isCSEEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it's time we moved to a builder here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would require a few changes, don't know if it's worth it in this patch or should we do kind of a refactor patch separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, for now keep separate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth creating a ticket in JIRA now so we don't forget about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to do it....


static final FileStatusAcceptor ACCEPT_ALL_BUT_S3N =
new AcceptAllButS3nDirs();

private final ListingOperationCallbacks listingOperationCallbacks;

public Listing(ListingOperationCallbacks listingOperationCallbacks,
StoreContext storeContext) {
StoreContext storeContext, boolean isCSEEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move isCSEEnabled to storeContext?

}
}
}

/**
* Start an asynchronous upload of the current block.
* @param isLast true, if part being uploaded is last and client side
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and or 'or'??

// Creating a unique path by adding file length in file name.
Path path = writeThenReadFile(getMethodName() + len, len);
assertEncrypted(path);
rm(getFileSystem(), path, false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to explicitly do this? Won't this be done as a part of test cleanup and teardown?

Comment on lines 2064 to 2067
- CSE is enabled.
- If a getFileStatus call is made, a check to see if any client side
encryption algorithm was used.
- contentLength >= 16 bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When all three conditions are met right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd one is specific to getFileStatus call, in directory listing we won't have object metadata to verify if an encryption algorithm is set or not.

@@ -1682,3 +1682,105 @@ com.amazonaws.SdkClientException: Unable to execute HTTP request:

When this happens, try to set `fs.s3a.connection.request.timeout` to a larger value or disable it
completely by setting it to `0`.

### <a name="client-side-encryption"></a> S3 Client Side Encryption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we describe solutions to these problems as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These problems are more of a "what you shouldn't do", so it kind of answers itself in the problem statements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The troubleshooting documents need to assume that the person reading them or googling for them has just seen a specific error message and is trying to work out what's wrong/what to do.

This means the phrasing here needs to be tweaked slightly

  • move these into the "encryption" section (ahead of those which clearly shouldn't be there :)
  • Add a ### subtitle for every stack trace
  • with the error text the title of it -so it's the first thing people see
  • interpretation after the stack trace
  • and ideally a recommendation, if there's a simple solution (including "don't")

"Instruction file not found for S3 object"

Reading a file fails when the client is configured with S3-CSE

(insert )

The file is not encrypted with S3-CSE.

@mukund-thakur
Copy link
Contributor

Ran the raw s3 tests without setting up CSE.
Seeing just two failures and I think they are known issues.
[ERROR] Failures: [ERROR] ITestS3AMiscOperations.testEmptyFileChecksums:177->Assert.assertEquals:120->Assert.failNotEquals:835->Assert.fail:89 checksums of empty files expected:<etag: "a85fa2b141e24a8c0359c1541d7e2dcd"> but was:<etag: "1d61cbdc5bd4e70a8d93aefad3cf9d0e"> [ERROR] ITestS3AMiscOperations.testNonEmptyFileChecksumsUnencrypted:253->Assert.assertEquals:120->Assert.failNotEquals:835->Assert.fail:89 checksums expected:<etag: "e9fbb793667d00b7f362aa05ac4ccaeb"> but was:<etag: "05863e693027ce0f49048759006a0a40">

@mehakmeet
Copy link
Contributor Author

Interesting you see those errors since they should be skipped if CSE is enabled, I'll try to see what's wrong.

your bucket.
- If already created, [view the kms key ID by these steps.](https://docs.aws.amazon.com/kms/latest/developerguide/find-cmk-id-arn.html)
- Set `fs.s3a.cse.method=KMS`.
- Set `fs.s3a.cse.kms.keyId=<KMS_KEY_ID>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think property name is wrong. should be fs.s3a.cse.kms.key-id. Found while running tests.

@mukund-thakur
Copy link
Contributor

Interesting you see those errors since they should be skipped if CSE is enabled, I'll try to see what's wrong.

No that was without enabling CSE as said above.

Now I am trying to run the tests after enabling CSE. And the first issue I see is
[ERROR] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time elapsed: 22.59 s <<< FAILURE! - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader [ERROR] testMultipartUpload(org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader) Time elapsed: 6.382 s <<< ERROR! java.lang.UnsupportedOperationException: Multi-part uploader not supported for Client side encryption. at org.apache.hadoop.fs.s3a.S3AFileSystem.createMultipartUploader(S3AFileSystem.java:5396) at org.apache.hadoop.fs.s3a.S3AFileSystem.createMultipartUploader(S3AFileSystem.java:249) at org.apache.hadoop.fs.contract.AbstractContractMultipartUploaderTest.setup(AbstractContractMultipartUploaderTest.java:92) at org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader.setup(ITestS3AContractMultipartUploader.java:115) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

This should have been skipped rather that failed.

I will update on how the rest goes.

@mehakmeet
Copy link
Contributor Author

This should have been skipped rather that failed.

I think I saw it a while ago too after making the change, what worked for me was a clean build from root of project rather than hadoop-aws module, and then running the tests.
refer to Steve's comment: #2706 (comment)

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good; primarily reviewed docs. Tests all good with s3guard off.

I need to set up a client through the instructions next...I see mukund has commented there

@@ -1682,3 +1682,105 @@ com.amazonaws.SdkClientException: Unable to execute HTTP request:

When this happens, try to set `fs.s3a.connection.request.timeout` to a larger value or disable it
completely by setting it to `0`.

### <a name="client-side-encryption"></a> S3 Client Side Encryption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The troubleshooting documents need to assume that the person reading them or googling for them has just seen a specific error message and is trying to work out what's wrong/what to do.

This means the phrasing here needs to be tweaked slightly

  • move these into the "encryption" section (ahead of those which clearly shouldn't be there :)
  • Add a ### subtitle for every stack trace
  • with the error text the title of it -so it's the first thing people see
  • interpretation after the stack trace
  • and ideally a recommendation, if there's a simple solution (including "don't")

"Instruction file not found for S3 object"

Reading a file fails when the client is configured with S3-CSE

(insert )

The file is not encrypted with S3-CSE.

@@ -2029,3 +2029,51 @@ signer for all services.
For a specific service, the service specific signer is looked up first.
If that is not specified, the common signer is looked up. If this is
not specified as well, SDK settings are used.

### <a name="cse"></a> Amazon S3 Client Side Encryption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to encryption.md


### <a name="cse"></a> Amazon S3 Client Side Encryption
Amazon S3 Client Side Encryption(CSE), uses `AmazonS3EncryptionClientV2.java
` AmazonS3 client. The encryption and decryption is done in AWS SDK side. Atm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"atm?"

Recommend an intro, features and limitations section:

A key reason this feature (HADOOP-13887) has been unavailable for a long time is that the AWS S3 client pads uploaded objects with a 16 byte footer. This meant that files were shorter when being read that when are listed them through any of the list API calls/getFileStatus(). Which broke many applications, including anything seeking near the end of a file to read a footer, as ORC and Parquet do.

There is now a workaround: compensate for the footer in listings when CSE is enabled.

  • when listing files and directories, 16 bytes are subtracted from the length of all non-empty objects.
  • directory markers MAY be longer than 0 bytes long.

This "appears" to work; secondly it does in the testing to date. However, the length of files when listed through the S3A client is now going to be shorter than the length of files listed with other clients -including S3A clients where S3-CSE has not been enabled.

features

  • Supports client side encryption with keys managed in AWS KMS
  • encryption settings propagated into jobs through any issued delegation tokens
  • encryption information stored as headers in the uploaded object

Limitations

  • performance will be reduced. All encrypt/decrypt is now being done on the client.
  • writing files may be slower, as only a single block can be encrypted and uploaded at a time
  • Multipart Uploader API disabled

+ client,
client instanceof InconsistentAmazonS3Client);
inconsistentClient = (InconsistentAmazonS3Client) client;
if(client instanceof InconsistentAmazonS3Client) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add space after if

@@ -242,6 +242,7 @@ protected void logTimePerIOP(String operation,

@Test
public void testTimeToOpenAndReadWholeFileBlocks() throws Throwable {
skipIfClientSideEncryption();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we turning this off ? Performance? seek() perf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses Landsat data that isn't encrypted which causes failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -43,9 +52,12 @@
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.s3a.statistics.impl.AwsStatisticsCollector;
import org.apache.hadoop.fs.store.LogExactlyOnce;
import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move up to L44. Yes, it its a PITA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, missed it :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

losing battle here. I've got it wrong in some of the audit work and then it complicates backporting

*
* @return new AmazonS3 client.
*/
protected AmazonS3 buildAmazonS3EncryptionClient(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InconsistentS3Client doesn't do this; I can see tests skip it. Should we have that client throw some Unsupported Exception here

@mukund-thakur
Copy link
Contributor

I setup the CSE tests with S3guard off. All good except one test failing always

[ERROR] testStatistics(org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextStatistics) Time elapsed: 4.229 s <<< FAILURE!
java.lang.AssertionError: Mismatch in bytes written expected:<658> but was:<698>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextStatistics.verifyWrittenBytes(ITestS3AFileContextStatistics.java:89)
at org.apache.hadoop.fs.FCStatisticsBaseTest.testStatistics(FCStatisticsBaseTest.java:103)
[INFO]
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestS3AFileContextStatistics>FCStatisticsBaseTest.testStatistics:103->verifyWrittenBytes:89 Mismatch in bytes written expected:<658> but was:<698>

@apache apache deleted a comment from hadoop-yetus Jul 14, 2021
@apache apache deleted a comment from hadoop-yetus Jul 14, 2021
@steveloughran
Copy link
Contributor

steveloughran commented Jul 14, 2021

Bad news I'm afraid: I've got an extra bit of work for this patch.

The encryption option and key must be included in delegation tokens, so that encryption setting I have on my client is picked up and propagated to all workers in a launched job.

We already do this for all the service side encryption options, by creating and serializing a EncryptionSecrets instance in the token.

This class already supports passing S3-CSE information, so what is needed is to hook this up with CSE as well as SSE call

  • setEncryptionSecrets in FileSystem.initialize needs to build the secrets from the client side options, if active
  • bindAWSClient needs to set client side encryption options from any DT.

you may want to add code in EncryptionSecretOperations to help - EncryptionSecrets MUST NOT use any AWS SDK call to avoid classloading issues.

There's makes me think that the CSE binding/setup code needs to be merged with the SSE stuff a bit more

  1. The encryption secrets built up in S3AFS.initialize() MUST use the client side secrets if set, so they are picked up by DTs
  2. FileSystem.setEncryptionSecrets() should set the isCSEEnabled flag.
  3. CSE-only ITest to call S3AFS.getEncryptionSecrets() & verify that all is good
  4. See if ITestSessionDelegationInFileystem can pick up and propagate CSE options.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to move to S3AUtils.lookupPassword() to look up encryption option/secret. This supports JCEKS-files hosted secrets, including per-bucket options within a file.

@steveloughran
Copy link
Contributor

I'm seeing

2021-07-14 12:54:09,519 [main] WARN  s3.AmazonS3EncryptionClientV2 (AmazonS3EncryptionClientV2.java:warnOnLegacyCryptoMode(409)) - The S3 Encryption Client is configured to read encrypted data with legacy encryption modes through the CryptoMode setting. If you don't have objects encrypted with these legacy modes, you should disable support for them to enhance security. See https://docs.aws.amazon.com/general/latest/gr/aws_sdk_cryptography.html
2021-07-14 12:54:09,525 [main] WARN  s3.AmazonS3EncryptionClientV2 (AmazonS3EncryptionClientV2.java:warnOnRangeGetsEnabled(401)) - The S3 Encryption Client is configured to support range get requests. Range gets do not provide authenticated encryption properties even when used with an authenticated mode (AES-GCM). See https://docs.aws.amazon.com/general/latest/gr/aws_sdk_cryptography.html

I'm assuming this is legit because we want those range requests. We need to document this in encryption.md; in troubleshooting say "yes"

@mehakmeet
Copy link
Contributor Author

Just realized I need to slightly tweak ITestSessionDelegationInFileystem.java for it to run CSE-KMS or SSE-KMS. Will add one more commit for that.

Copy link
Contributor

@bogthe bogthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great patch 👏 thanks for making it happen! Added a few minor comments below. Will checkout and run this PR locally now.

@@ -524,10 +525,15 @@ public static S3AFileStatus createFileStatus(Path keyPath,
long blockSize,
String owner,
String eTag,
String versionId) {
String versionId,
boolean isCSEEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth creating a ticket in JIRA now so we don't forget about it?

* CSE enabled and disabled FS respectively.
*/
@Test
public void testEncryptionEnabledAndDisabledFS() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome

@steveloughran
Copy link
Contributor

catching up on this; letting you go through bogdan's comments first.

What was the conclusion on the ITestS3AFileContextStatistics failure? Side effect of the padding?

@mehakmeet
Copy link
Contributor Author

What was the conclusion on the ITestS3AFileContextStatistics failure? Side effect of the padding?

Seems like there are few different ways to set the KMS Key ID, other than just using the Key ID.

  • Key ID.
  • Key ARN.
  • Alias ARN.
  • Alias name.

These all would result in different number of bytes sent to KMS service for key generation op, resulting in different bytesWritten value.
This test was always kind of a workaround, with having some constant bytes, now I have added the value of key ID set, so that it works for different method, but other request params like "EncryptionContext" and "KeySpec", still added as constant bytes.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 27 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 29s Maven dependency ordering for branch
+1 💚 mvninstall 22m 39s trunk passed
+1 💚 compile 22m 31s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 19m 12s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 4m 1s trunk passed
+1 💚 mvnsite 2m 23s trunk passed
+1 💚 javadoc 1m 30s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 15s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 38s trunk passed
+1 💚 shadedclient 17m 1s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 17m 22s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 23s Maven dependency ordering for patch
+1 💚 mvninstall 1m 29s the patch passed
+1 💚 compile 22m 3s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
-1 ❌ javac 22m 3s /results-compile-javac-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 1908 unchanged - 0 fixed = 1909 total (was 1908)
+1 💚 compile 19m 4s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
-1 ❌ javac 19m 4s /results-compile-javac-root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 1 new + 1784 unchanged - 0 fixed = 1785 total (was 1784)
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 27 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-1 ❌ blanks 0m 0s /blanks-tabs.txt The patch 136 line(s) with tabs.
-0 ⚠️ checkstyle 3m 55s /results-checkstyle-root.txt root: The patch generated 6 new + 114 unchanged - 5 fixed = 120 total (was 119)
+1 💚 mvnsite 2m 21s the patch passed
+1 💚 javadoc 1m 30s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
-1 ❌ javadoc 0m 38s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 2 new + 63 unchanged - 0 fixed = 65 total (was 63)
+1 💚 spotbugs 4m 0s the patch passed
+1 💚 shadedclient 17m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 16m 57s hadoop-common in the patch passed.
+1 💚 unit 2m 25s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 49s The patch does not generate ASF License warnings.
205m 53s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2706/9/artifact/out/Dockerfile
GITHUB PR #2706
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint
uname Linux 96056f4ad5cd 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e3d5922
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2706/9/testReport/
Max. process+thread count 2250 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2706/9/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor text review comments only

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a review of the code too. Afraid you need to restore the original constant in Constants.java as thats @public - we don't want to break anything which has imported/used the class.

@@ -524,10 +525,15 @@ public static S3AFileStatus createFileStatus(Path keyPath,
long blockSize,
String owner,
String eTag,
String versionId) {
String versionId,
boolean isCSEEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to do it....

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 21 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 11m 27s Maven dependency ordering for branch
+1 💚 mvninstall 21m 30s trunk passed
+1 💚 compile 21m 13s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 18m 22s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 3m 48s trunk passed
+1 💚 mvnsite 2m 35s trunk passed
+1 💚 javadoc 1m 51s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 30s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 45s trunk passed
+1 💚 shadedclient 14m 33s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 14m 57s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 32s the patch passed
+1 💚 compile 20m 25s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
-1 ❌ javac 20m 25s /results-compile-javac-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 1908 unchanged - 0 fixed = 1909 total (was 1908)
+1 💚 compile 18m 28s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
-1 ❌ javac 18m 28s /results-compile-javac-root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 1 new + 1784 unchanged - 0 fixed = 1785 total (was 1784)
-1 ❌ blanks 0m 0s /blanks-tabs.txt The patch 136 line(s) with tabs.
-0 ⚠️ checkstyle 3m 53s /results-checkstyle-root.txt root: The patch generated 1 new + 57 unchanged - 0 fixed = 58 total (was 57)
+1 💚 mvnsite 2m 33s the patch passed
+1 💚 javadoc 1m 47s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 33s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 4m 9s the patch passed
+1 💚 shadedclient 14m 54s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 12s hadoop-common in the patch passed.
+1 💚 unit 2m 17s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 52s The patch does not generate ASF License warnings.
196m 57s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2706/10/artifact/out/Dockerfile
GITHUB PR #2706
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint
uname Linux 3ece512640c8 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 274a30e
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2706/10/testReport/
Max. process+thread count 2837 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2706/10/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 21 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 46s Maven dependency ordering for branch
+1 💚 mvninstall 21m 9s trunk passed
+1 💚 compile 22m 33s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 21m 5s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 3m 47s trunk passed
+1 💚 mvnsite 2m 38s trunk passed
+1 💚 javadoc 1m 47s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 30s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 4m 3s trunk passed
+1 💚 shadedclient 17m 29s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 17m 55s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 1m 41s the patch passed
+1 💚 compile 23m 4s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
-1 ❌ javac 23m 4s /results-compile-javac-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 1908 unchanged - 0 fixed = 1909 total (was 1908)
+1 💚 compile 19m 55s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
-1 ❌ javac 19m 55s /results-compile-javac-root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 1 new + 1784 unchanged - 0 fixed = 1785 total (was 1784)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 47s the patch passed
+1 💚 mvnsite 2m 37s the patch passed
+1 💚 javadoc 1m 51s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 30s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 4m 26s the patch passed
+1 💚 shadedclient 16m 56s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 45s hadoop-common in the patch passed.
+1 💚 unit 2m 25s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 59s The patch does not generate ASF License warnings.
212m 11s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2706/11/artifact/out/Dockerfile
GITHUB PR #2706
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint
uname Linux 495e149c810a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1a05322
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2706/11/testReport/
Max. process+thread count 1266 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2706/11/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.
yetus failure is "Unchanged files with check annotations". Unrelated and going to ignore. Will need to be fixed (by disabling for those files)

@steveloughran steveloughran merged commit f813554 into apache:trunk Jul 27, 2021
@steveloughran
Copy link
Contributor

Merged to trunk! please CP and test against 3.3 and I'll merge in there.

mehakmeet added a commit to mehakmeet/hadoop that referenced this pull request Aug 11, 2021
apache#2706)

This (big!) patch adds support for client side encryption in AWS S3,
with keys managed by AWS-KMS.

Read the documentation in encryption.md very, very carefully before
use and consider it unstable.

S3-CSE is enabled in the existing configuration option
"fs.s3a.server-side-encryption-algorithm":

fs.s3a.server-side-encryption-algorithm=CSE-KMS
fs.s3a.server-side-encryption.key=<KMS_KEY_ID>

You cannot enable CSE and SSE in the same client, although
you can still enable a default SSE option in the S3 console.

* Filesystem list/get status operations subtract 16 bytes from the length
  of all files >= 16 bytes long to compensate for the padding which CSE
  adds.
* The SDK always warns about the specific algorithm chosen being
  deprecated. It is critical to use this algorithm for ranged
  GET requests to work (i.e. random IO). Ignore.
* Unencrypted files CANNOT BE READ.
  The entire bucket SHOULD be encrypted with S3-CSE.
* Uploading files may be a bit slower as blocks are now
  written sequentially.
* The Multipart Upload API is disabled when S3-CSE is active.

Contributed by Mehakmeet Singh
@@ -420,7 +420,12 @@ private Constants() {
"fs.s3a.multipart.purge.age";
public static final long DEFAULT_PURGE_EXISTING_MULTIPART_AGE = 86400;

// s3 server-side encryption, see S3AEncryptionMethods for valid options
/**
* s3 server-side encryption or s3 client side encryption method, see

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, apologies if this had been discussed before but wouldn't be better to define the client-side encryption in a property called fs.s3a.client-side-encryption-algorithm? Or perhaps deprecate the current property and move to just fs.s3a.encryption-algorithm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having two was too complex. as for a single name and deprecating the other -yes that's a great idea! we haven't shipped yet, so if you can add a patch there it'd be welcome.

(FWIW, in my local test setups i will stay on the old setting so tests on older branches still work. changing a config name is tricky, even with the deprecation mechanism)

asfgit pushed a commit that referenced this pull request Oct 5, 2021
#2706)

This (big!) patch adds support for client side encryption in AWS S3,
with keys managed by AWS-KMS.

Read the documentation in encryption.md very, very carefully before
use and consider it unstable.

S3-CSE is enabled in the existing configuration option
"fs.s3a.server-side-encryption-algorithm":

fs.s3a.server-side-encryption-algorithm=CSE-KMS
fs.s3a.server-side-encryption.key=<KMS_KEY_ID>

You cannot enable CSE and SSE in the same client, although
you can still enable a default SSE option in the S3 console.

* Filesystem list/get status operations subtract 16 bytes from the length
  of all files >= 16 bytes long to compensate for the padding which CSE
  adds.
* The SDK always warns about the specific algorithm chosen being
  deprecated. It is critical to use this algorithm for ranged
  GET requests to work (i.e. random IO). Ignore.
* Unencrypted files CANNOT BE READ.
  The entire bucket SHOULD be encrypted with S3-CSE.
* Uploading files may be a bit slower as blocks are now
  written sequentially.
* The Multipart Upload API is disabled when S3-CSE is active.

Contributed by Mehakmeet Singh

Change-Id: Ie1a27a036a39db66a67e9c6d33bc78d54ea708a0
kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
apache#2706)


This (big!) patch adds support for client side encryption in AWS S3,
with keys managed by AWS-KMS.

Read the documentation in encryption.md very, very carefully before
use and consider it unstable.

S3-CSE is enabled in the existing configuration option
"fs.s3a.server-side-encryption-algorithm":

fs.s3a.server-side-encryption-algorithm=CSE-KMS
fs.s3a.server-side-encryption.key=<KMS_KEY_ID>

You cannot enable CSE and SSE in the same client, although
you can still enable a default SSE option in the S3 console. 
  
* Filesystem list/get status operations subtract 16 bytes from the length
  of all files >= 16 bytes long to compensate for the padding which CSE
  adds.
* The SDK always warns about the specific algorithm chosen being
  deprecated. It is critical to use this algorithm for ranged
  GET requests to work (i.e. random IO). Ignore.
* Unencrypted files CANNOT BE READ.
  The entire bucket SHOULD be encrypted with S3-CSE.
* Uploading files may be a bit slower as blocks are now
  written sequentially.
* The Multipart Upload API is disabled when S3-CSE is active.

Contributed by Mehakmeet Singh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants