-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-13887. Support S3 client side encryption (S3-CSE) using AWS-SDK #2706
Conversation
Added the "UNENCRYPTED_CONTENT_LENGTH" header in getFileStatus calls and replaced the content length with it if client-side encryption(CSE) is enabled. This helped pass the initial set of tests for all methods of CSE, but after adding a "Directory Listing" test to check the content lengths of files of different sizes, it broke:
|
we've discussed using that header before -but not gone with because, as you note: LIST operations don't pick up the header. All code which uses the list to determine object length is going to break. |
Also on that note, I figured something in the aws-sdk while doing a put request we have this in S3CryptoModuleBase.java:
Every put request in AmazonS3EncryptionClientV2, is wrapped like this. but it seems this isn’t the issue, in LIST ops, we don’t use these, right? so, how do we get file length in LIST calls? |
we get the filelength back in the list API calls; with the v2 LIST being the one we default to. it returns file name, length, timestamp, etag, storage class and optionally owner (we don't) This is what has blocked us before: code which goes from list dir -> fileStatus[]. foreach.open().read(). if the length of a file in listing != that of the file in read, things break. Lots of things. if we still can't do unpadded CSE, it's still unusable for our workloads |
Had merge conflicts so had to force push.
Scale:
Most errors are MultiPart upload related:
Simply adding 16(Padding length) to multipart upload block size won't work. The part sizes need to be a multiple of 16, so it has that restriction for CSE. Also, one more thing to note here is that it assumes the last part to be an exception, which makes me believe that multipart upload in CSE has to be sequential(or can we parallel upload the starting parts and then upload the last part?)? So, potentially another constraint while uploading could have performance impacts here apart from the HEAD calls being required while downloading/listing. |
Hi @mehakmeet , regarding multipart uploads. The last part is always an exception with regular multi part uploads too! You can do parallel uploads and even upload the last part first and it would still work (for regular multi-part). My assumption is that for multi part uploads with CSE enabled the same functionality holds (except for cipher block size, but the minimum part size for regular multi-part is 5MB = 5 * 1024 * 1024 which is still a multiple of 16 :D ). |
Hey @bogthe,
Ah, I see, even I was thinking that with CSE it should still be able to find which part was last since during the parts upload step we provide part numbers and complete it in ascending order, But, I ran some tests with CSE enabled and I was facing these issues: AbstractContractMultipartUploaderTest#testMultipartUpload()
This retries a couple of times and fails with the exception:
T2: partSize: 5242880bytes(5MB)
Eventually fails with the exception:
Both of these passes without CSE. So, basically, we have a restriction to use only multiple of 16 as partSizes, even though min size of parts is 5MB and anything which is a multiple of MB, would be a multiple of 16, but we can't set any custom bytes(not multiple of 16) as partSize in CSE. CC: @steveloughran |
Did some digging and found out some more context:
Also make sure your multi part code sets |
Force pushing due to merge conflicts.
Ran tests for both with and without CSE on ap-south-1:
With CSE:
Errors in CSE are:
The weird thing is it skips in IDE, but for some reason, I am not able to make it skip in mvn terminal.
Any feedback as to why this is happening would be really helpful. |
Seems like git pull, didn't work for me, just before pushing, still have some conflicts, I'll rebase once more by fetching and applying. |
CC: @mukund-thakur |
Hey, someone just broke things.Sorry.
What I usually miss out there is you need to rebuild hadoop-common after changing the tests there; running tests in hadoop-aws pick up the old version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an ambitious piece of work. We always run away from it because of file listing length problems,
But this does appear to correct it provided you are always listing directories containing only encrypted content.
What is going to happen if you list a bucket containing unencrypted data? it's going to be shorter.
it might be interesting to point encrypted and unencrypted FS instances at the same FS and
see how they behave,. e.g assert that lengths are different, and when the unencrypted data is read
the bytes are different too.
- What happens when you try and read unencrypted data through an encrypted FS? Again, a test can show this.
We are going to need a documentation page on this, especially one that which covers how file lengths are fixed up.
How to seek work in this world? Slow I guess, assuming it starts from the beginning of the file.
In a multipart file it may only decrypt from the beginning of that block so seeking with in a block should be much more efficient.
That is: the closer your initial read position is to the start of a block, the less data to decrypt, and hence the faster the read.
You could probably add an experiment there - not one you could make real sessions on, but we could at least use the nano timer to measure/report on timing differences.
Once we know a bit more about the performance hit of reading/seeking in encrypted data, we could provide more guidance in the docs.
I think I'd like some more "observability" on this with the FS and output stream toString() values reporting CSE.
We could also think about setting a gauge in their IOStatistics reports, making that a way to check, and ensuring
it gets aggregated into other reports. (or we set a maximum and rely on max aggregation to always set it to 1 if merging)
bit of a hack, but ... (we never did have a good story for gauge aggregation, did we?)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Show resolved
Hide resolved
...op-aws/src/test/java/org/apache/hadoop/fs/s3a/fileContext/ITestS3AFileContextStatistics.java
Outdated
Show resolved
Hide resolved
//ContractTestUtils.verifyFileContents(fs, src, data); | ||
Path dest = path(src.getName() + "-copy"); | ||
fs.rename(src, dest); | ||
ContractTestUtils.verifyFileContents(fs, dest, data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we should use DurationInfo to log durations of the write rename and verify operations? Might be interesting to look at
...op-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AClientSideEncryption.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
@@ -524,10 +525,15 @@ public static S3AFileStatus createFileStatus(Path keyPath, | |||
long blockSize, | |||
String owner, | |||
String eTag, | |||
String versionId) { | |||
String versionId, | |||
boolean isCSEEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it's time we moved to a builder here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would require a few changes, don't know if it's worth it in this patch or should we do kind of a refactor patch separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, for now keep separate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth creating a ticket in JIRA now so we don't forget about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to do it....
…ent length header for fileStatus calls
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
Outdated
Show resolved
Hide resolved
|
||
static final FileStatusAcceptor ACCEPT_ALL_BUT_S3N = | ||
new AcceptAllButS3nDirs(); | ||
|
||
private final ListingOperationCallbacks listingOperationCallbacks; | ||
|
||
public Listing(ListingOperationCallbacks listingOperationCallbacks, | ||
StoreContext storeContext) { | ||
StoreContext storeContext, boolean isCSEEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move isCSEEnabled to storeContext?
} | ||
} | ||
} | ||
|
||
/** | ||
* Start an asynchronous upload of the current block. | ||
* @param isLast true, if part being uploaded is last and client side |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and or 'or'??
// Creating a unique path by adding file length in file name. | ||
Path path = writeThenReadFile(getMethodName() + len, len); | ||
assertEncrypted(path); | ||
rm(getFileSystem(), path, false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to explicitly do this? Won't this be done as a part of test cleanup and teardown?
- CSE is enabled. | ||
- If a getFileStatus call is made, a check to see if any client side | ||
encryption algorithm was used. | ||
- contentLength >= 16 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When all three conditions are met right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2nd one is specific to getFileStatus call, in directory listing we won't have object metadata to verify if an encryption algorithm is set or not.
@@ -1682,3 +1682,105 @@ com.amazonaws.SdkClientException: Unable to execute HTTP request: | |||
|
|||
When this happens, try to set `fs.s3a.connection.request.timeout` to a larger value or disable it | |||
completely by setting it to `0`. | |||
|
|||
### <a name="client-side-encryption"></a> S3 Client Side Encryption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we describe solutions to these problems as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These problems are more of a "what you shouldn't do", so it kind of answers itself in the problem statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The troubleshooting documents need to assume that the person reading them or googling for them has just seen a specific error message and is trying to work out what's wrong/what to do.
This means the phrasing here needs to be tweaked slightly
- move these into the "encryption" section (ahead of those which clearly shouldn't be there :)
- Add a ### subtitle for every stack trace
- with the error text the title of it -so it's the first thing people see
- interpretation after the stack trace
- and ideally a recommendation, if there's a simple solution (including "don't")
"Instruction file not found for S3 object"
Reading a file fails when the client is configured with S3-CSE
(insert )
The file is not encrypted with S3-CSE.
Ran the raw s3 tests without setting up CSE. |
Interesting you see those errors since they should be skipped if CSE is enabled, I'll try to see what's wrong. |
your bucket. | ||
- If already created, [view the kms key ID by these steps.](https://docs.aws.amazon.com/kms/latest/developerguide/find-cmk-id-arn.html) | ||
- Set `fs.s3a.cse.method=KMS`. | ||
- Set `fs.s3a.cse.kms.keyId=<KMS_KEY_ID>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think property name is wrong. should be fs.s3a.cse.kms.key-id
. Found while running tests.
No that was without enabling CSE as said above. Now I am trying to run the tests after enabling CSE. And the first issue I see is This should have been skipped rather that failed. I will update on how the rest goes. |
I think I saw it a while ago too after making the change, what worked for me was a clean build from root of project rather than hadoop-aws module, and then running the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good; primarily reviewed docs. Tests all good with s3guard off.
I need to set up a client through the instructions next...I see mukund has commented there
@@ -1682,3 +1682,105 @@ com.amazonaws.SdkClientException: Unable to execute HTTP request: | |||
|
|||
When this happens, try to set `fs.s3a.connection.request.timeout` to a larger value or disable it | |||
completely by setting it to `0`. | |||
|
|||
### <a name="client-side-encryption"></a> S3 Client Side Encryption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The troubleshooting documents need to assume that the person reading them or googling for them has just seen a specific error message and is trying to work out what's wrong/what to do.
This means the phrasing here needs to be tweaked slightly
- move these into the "encryption" section (ahead of those which clearly shouldn't be there :)
- Add a ### subtitle for every stack trace
- with the error text the title of it -so it's the first thing people see
- interpretation after the stack trace
- and ideally a recommendation, if there's a simple solution (including "don't")
"Instruction file not found for S3 object"
Reading a file fails when the client is configured with S3-CSE
(insert )
The file is not encrypted with S3-CSE.
@@ -2029,3 +2029,51 @@ signer for all services. | |||
For a specific service, the service specific signer is looked up first. | |||
If that is not specified, the common signer is looked up. If this is | |||
not specified as well, SDK settings are used. | |||
|
|||
### <a name="cse"></a> Amazon S3 Client Side Encryption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to encryption.md
|
||
### <a name="cse"></a> Amazon S3 Client Side Encryption | ||
Amazon S3 Client Side Encryption(CSE), uses `AmazonS3EncryptionClientV2.java | ||
` AmazonS3 client. The encryption and decryption is done in AWS SDK side. Atm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"atm?"
Recommend an intro, features and limitations section:
A key reason this feature (HADOOP-13887) has been unavailable for a long time is that the AWS S3 client pads uploaded objects with a 16 byte footer. This meant that files were shorter when being read that when are listed them through any of the list API calls/getFileStatus(). Which broke many applications, including anything seeking near the end of a file to read a footer, as ORC and Parquet do.
There is now a workaround: compensate for the footer in listings when CSE is enabled.
- when listing files and directories, 16 bytes are subtracted from the length of all non-empty objects.
- directory markers MAY be longer than 0 bytes long.
This "appears" to work; secondly it does in the testing to date. However, the length of files when listed through the S3A client is now going to be shorter than the length of files listed with other clients -including S3A clients where S3-CSE has not been enabled.
features
- Supports client side encryption with keys managed in AWS KMS
- encryption settings propagated into jobs through any issued delegation tokens
- encryption information stored as headers in the uploaded object
Limitations
- performance will be reduced. All encrypt/decrypt is now being done on the client.
- writing files may be slower, as only a single block can be encrypted and uploaded at a time
- Multipart Uploader API disabled
+ client, | ||
client instanceof InconsistentAmazonS3Client); | ||
inconsistentClient = (InconsistentAmazonS3Client) client; | ||
if(client instanceof InconsistentAmazonS3Client) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add space after if
@@ -242,6 +242,7 @@ protected void logTimePerIOP(String operation, | |||
|
|||
@Test | |||
public void testTimeToOpenAndReadWholeFileBlocks() throws Throwable { | |||
skipIfClientSideEncryption(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we turning this off ? Performance? seek() perf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uses Landsat data that isn't encrypted which causes failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -43,9 +52,12 @@ | |||
import org.apache.hadoop.conf.Configured; | |||
import org.apache.hadoop.fs.s3a.statistics.impl.AwsStatisticsCollector; | |||
import org.apache.hadoop.fs.store.LogExactlyOnce; | |||
import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move up to L44. Yes, it its a PITA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, missed it :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
losing battle here. I've got it wrong in some of the audit work and then it complicates backporting
* | ||
* @return new AmazonS3 client. | ||
*/ | ||
protected AmazonS3 buildAmazonS3EncryptionClient( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InconsistentS3Client doesn't do this; I can see tests skip it. Should we have that client throw some Unsupported Exception here
I setup the CSE tests with S3guard off. All good except one test failing always
|
Bad news I'm afraid: I've got an extra bit of work for this patch. The encryption option and key must be included in delegation tokens, so that encryption setting I have on my client is picked up and propagated to all workers in a launched job. We already do this for all the service side encryption options, by creating and serializing a EncryptionSecrets instance in the token. This class already supports passing S3-CSE information, so what is needed is to hook this up with CSE as well as SSE call
you may want to add code in EncryptionSecretOperations to help - EncryptionSecrets MUST NOT use any AWS SDK call to avoid classloading issues. There's makes me think that the CSE binding/setup code needs to be merged with the SSE stuff a bit more
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to move to S3AUtils.lookupPassword()
to look up encryption option/secret. This supports JCEKS-files hosted secrets, including per-bucket options within a file.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
Outdated
Show resolved
Hide resolved
I'm seeing
I'm assuming this is legit because we want those range requests. We need to document this in encryption.md; in troubleshooting say "yes" |
Just realized I need to slightly tweak |
…t, CSE-KMS if provided)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great patch 👏 thanks for making it happen! Added a few minor comments below. Will checkout and run this PR locally now.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
@@ -524,10 +525,15 @@ public static S3AFileStatus createFileStatus(Path keyPath, | |||
long blockSize, | |||
String owner, | |||
String eTag, | |||
String versionId) { | |||
String versionId, | |||
boolean isCSEEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth creating a ticket in JIRA now so we don't forget about it?
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
Outdated
Show resolved
Hide resolved
* CSE enabled and disabled FS respectively. | ||
*/ | ||
@Test | ||
public void testEncryptionEnabledAndDisabledFS() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome
catching up on this; letting you go through bogdan's comments first. What was the conclusion on the ITestS3AFileContextStatistics failure? Side effect of the padding? |
Seems like there are few different ways to set the KMS Key ID, other than just using the Key ID.
These all would result in different number of bytes sent to KMS service for key generation op, resulting in different bytesWritten value. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor text review comments only
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a review of the code too. Afraid you need to restore the original constant in Constants.java as thats @public - we don't want to break anything which has imported/used the class.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Outdated
Show resolved
Hide resolved
@@ -524,10 +525,15 @@ public static S3AFileStatus createFileStatus(Path keyPath, | |||
long blockSize, | |||
String owner, | |||
String eTag, | |||
String versionId) { | |||
String versionId, | |||
boolean isCSEEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to do it....
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
yetus failure is "Unchanged files with check annotations". Unrelated and going to ignore. Will need to be fixed (by disabling for those files)
Merged to trunk! please CP and test against 3.3 and I'll merge in there. |
apache#2706) This (big!) patch adds support for client side encryption in AWS S3, with keys managed by AWS-KMS. Read the documentation in encryption.md very, very carefully before use and consider it unstable. S3-CSE is enabled in the existing configuration option "fs.s3a.server-side-encryption-algorithm": fs.s3a.server-side-encryption-algorithm=CSE-KMS fs.s3a.server-side-encryption.key=<KMS_KEY_ID> You cannot enable CSE and SSE in the same client, although you can still enable a default SSE option in the S3 console. * Filesystem list/get status operations subtract 16 bytes from the length of all files >= 16 bytes long to compensate for the padding which CSE adds. * The SDK always warns about the specific algorithm chosen being deprecated. It is critical to use this algorithm for ranged GET requests to work (i.e. random IO). Ignore. * Unencrypted files CANNOT BE READ. The entire bucket SHOULD be encrypted with S3-CSE. * Uploading files may be a bit slower as blocks are now written sequentially. * The Multipart Upload API is disabled when S3-CSE is active. Contributed by Mehakmeet Singh
@@ -420,7 +420,12 @@ private Constants() { | |||
"fs.s3a.multipart.purge.age"; | |||
public static final long DEFAULT_PURGE_EXISTING_MULTIPART_AGE = 86400; | |||
|
|||
// s3 server-side encryption, see S3AEncryptionMethods for valid options | |||
/** | |||
* s3 server-side encryption or s3 client side encryption method, see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, apologies if this had been discussed before but wouldn't be better to define the client-side encryption in a property called fs.s3a.client-side-encryption-algorithm
? Or perhaps deprecate the current property and move to just fs.s3a.encryption-algorithm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having two was too complex. as for a single name and deprecating the other -yes that's a great idea! we haven't shipped yet, so if you can add a patch there it'd be welcome.
(FWIW, in my local test setups i will stay on the old setting so tests on older branches still work. changing a config name is tricky, even with the deprecation mechanism)
#2706) This (big!) patch adds support for client side encryption in AWS S3, with keys managed by AWS-KMS. Read the documentation in encryption.md very, very carefully before use and consider it unstable. S3-CSE is enabled in the existing configuration option "fs.s3a.server-side-encryption-algorithm": fs.s3a.server-side-encryption-algorithm=CSE-KMS fs.s3a.server-side-encryption.key=<KMS_KEY_ID> You cannot enable CSE and SSE in the same client, although you can still enable a default SSE option in the S3 console. * Filesystem list/get status operations subtract 16 bytes from the length of all files >= 16 bytes long to compensate for the padding which CSE adds. * The SDK always warns about the specific algorithm chosen being deprecated. It is critical to use this algorithm for ranged GET requests to work (i.e. random IO). Ignore. * Unencrypted files CANNOT BE READ. The entire bucket SHOULD be encrypted with S3-CSE. * Uploading files may be a bit slower as blocks are now written sequentially. * The Multipart Upload API is disabled when S3-CSE is active. Contributed by Mehakmeet Singh Change-Id: Ie1a27a036a39db66a67e9c6d33bc78d54ea708a0
apache#2706) This (big!) patch adds support for client side encryption in AWS S3, with keys managed by AWS-KMS. Read the documentation in encryption.md very, very carefully before use and consider it unstable. S3-CSE is enabled in the existing configuration option "fs.s3a.server-side-encryption-algorithm": fs.s3a.server-side-encryption-algorithm=CSE-KMS fs.s3a.server-side-encryption.key=<KMS_KEY_ID> You cannot enable CSE and SSE in the same client, although you can still enable a default SSE option in the S3 console. * Filesystem list/get status operations subtract 16 bytes from the length of all files >= 16 bytes long to compensate for the padding which CSE adds. * The SDK always warns about the specific algorithm chosen being deprecated. It is critical to use this algorithm for ranged GET requests to work (i.e. random IO). Ignore. * Unencrypted files CANNOT BE READ. The entire bucket SHOULD be encrypted with S3-CSE. * Uploading files may be a bit slower as blocks are now written sequentially. * The Multipart Upload API is disabled when S3-CSE is active. Contributed by Mehakmeet Singh
Tests: mvn -T 32 clean verify -Ddynamo -Dauth -Dscale -Dparallel-tests
Region: ap-south-1
When we do fs.getFileStatus().getLen() to get the content length of the encrypted file, it is not the same as the original length and hence, the tests are breaking.
Failures other than S3-CSE tests are config related
When I remove the checks for file content Lengths from the tests, the tests run successfully, hence, the Key-wrap Algo and Content Encryption Algo are used as intended successfully.
One possible way we explored to tackle the padding issue was to tweak the s3GetFileStatus call to return a FileStatus with "UNENCRYPTED_CONTENT_LENGTH" header which comes in the user metadata, but this would still break where we don't get these headers to do our tasks. Hence, consistency of content Length needs to be maintained.
P.S: Need more tests to validate that even with tweaking, we are breaking some tests.
CC: @steveloughran