-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index Level Encryption #8791
Index Level Encryption #8791
Conversation
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Encryption PR Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crypto should be built as a reusable and extensible plugin. There are many use cases apart from local encryption which requires data encryption. There is a consensus here to build this as a reusable plugin and add a SPI layer to have users define their own key providers. Common plugin can be reused at repository or for local encryption. Also, in this PR you have used default AES mode which is ECB. It doesn't provide integrity protection. Also, I am not sure if it is right to store kms password locally as an index setting. Why not let the heavy lifting be done by AWS encryption SDK which uses stable encryption modes like GCM, generates IVs, manages data key generations and also allows plugging different key providers. You can reuse abstractions of the linked PRs to build index level encryption. |
@vikasvb90 The encryption algorithm used is actually AES/CTR/NoPadding with 256-bit keys. AES CTR does not guarantee data integrity (like GCM) but it does support random IO and provides the necessary level of data confidentiality. There is planned support for remote KMS vendors and potentially a cluster local OpenSearch KMS for on-prem scenarios. The kms alias and password are part of a stand-in, local key management store. Ideally the kms password would be stored as a secure setting but secure settings are AFAIK node-level settings and not index level settings. Thanks for the links to the PRs; I will have a look |
- Properly releases system resourcess in CryptoDirectory Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Gradle Check (Jenkins) Run Completed with:
|
As I suggested earlier, I think we can converge and you can use the crypto abstractions present in the linked PRs above to encrypt local indices. Let me know if something is not working out for you. @Bukhtawar @andrross thoughts? |
@vikasvb90 I agree with your points above. @asonje Have you looked at implementing this as an optional plugin using the IndexStorePlugin interface? A quick glance at the code looks like the extension points already exist to do what you need to do here. I feel that a plugin might be the better architecture for incubating and iterating on a feature like this. |
@andrross you are right the CryptoDirectory could be implemented using the IndexStorePlugin interface; I will investigate that. @vikasvb90 Regarding key management; It makes sense to me to reuse any pluggable KMS layer that exists. I do have concerns about the AES GCM algorithm being the only algorithm offered in your scheme. I started out using AES GCM but chose to use AES CTR mode because it easily allows for random access on decryption. Also, with a zero padding, the length of the cipher text is the same as the length of the plain text. This feature is necessary because the files being encrypted are owned and written by Lucene; the CryptoDirectory receives and encrypts a stream of opaque bytes before writing to the filesystem. Among those bytes is some metadata (headers, footers etc). If the file size or structure is modified, checksums (among other things) will fail. I suppose it is possible with GCM to store the authentication tag and IV separately from the encrypted frame, however with large files the performance overhead of managing all that metadata will start to add up. |
This PR ##8465 has algorithm hard coded because there is only one algorithm which seems to provide sufficient security of content. I can easily make this configurable and you can consider using the abstraction to implement your algo. Although I am not sure if CTR can be considered as a safe to use algorithm because there are security concerns in using it. Consider a case where a malicious actor replaces some encrypted content in the file with data which was encrypted with the same key. There is no way of identifying this change in CTR. There might be some performance overhead in GCM but there is no measure of how significant it is. Just to add, the metadata you are referring to is not encrypted and is constant for a given size of content. For each frame of size lets say 4KB there are additonal 32 bytes, a file header of size 300-1KB is added (I don't know the number but this is also constant), and 8 additionaly bytes to mark end of file. So, size of metadata isn't much. Your other concerns of necessity of using CTR don't seem valid. We are already working with Lucene files to ensure data integrity by reusing the stored checksum during encryption of content during transfers to remote stores. Similarly, we have block fetch features which require partial decryption and multi-part uploads which require partial encryption. We can reuse the same abstractions in index encryption paths as well. You can look at Also, there is already an OS level encryption mechanism available via |
This PR is stalled because it has been open for 30 days with no activity. |
This draft PR has been stalled for a few weeks. Please re-open if you are still working on it. |
Description
This pull request adds index level encryption features to OpenSearch based on the issue #3469. Each OpenSearch index is individually encrypted based on user provided encryption keys. A new cryptofs store type
index.store.type
is introduced which instantiates a CryptoDirectory (a hybrid directory) that encrypts and decrypts files as they are written and read respectivelyRelated Issues
#3469
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.