Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Level Encryption #8791

Closed
wants to merge 6 commits into from
Closed

Index Level Encryption #8791

wants to merge 6 commits into from

Conversation

asonje
Copy link

@asonje asonje commented Jul 20, 2023

Description

This pull request adds index level encryption features to OpenSearch based on the issue #3469. Each OpenSearch index is individually encrypted based on user provided encryption keys. A new cryptofs store type index.store.type is introduced which instantiates a CryptoDirectory (a hybrid directory) that encrypts and decrypts files as they are written and read respectively

Related Issues

#3469

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Encryption PR

Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @asonje for the changes. We have similar works in progress and would like us to align to a common crypto interface.

@vikasvb90
Copy link
Contributor

vikasvb90 commented Jul 20, 2023

Crypto should be built as a reusable and extensible plugin. There are many use cases apart from local encryption which requires data encryption. There is a consensus here to build this as a reusable plugin and add a SPI layer to have users define their own key providers. Common plugin can be reused at repository or for local encryption.

Also, in this PR you have used default AES mode which is ECB. It doesn't provide integrity protection. Also, I am not sure if it is right to store kms password locally as an index setting. Why not let the heavy lifting be done by AWS encryption SDK which uses stable encryption modes like GCM, generates IVs, manages data key generations and also allows plugging different key providers.
We already have some work done. This is the PR where crypto is built as an extensible plugin which uses the common lib for actually encrypting or decrypting content.

You can reuse abstractions of the linked PRs to build index level encryption.

@asonje
Copy link
Author

asonje commented Jul 20, 2023

@vikasvb90 The encryption algorithm used is actually AES/CTR/NoPadding with 256-bit keys. AES CTR does not guarantee data integrity (like GCM) but it does support random IO and provides the necessary level of data confidentiality.

There is planned support for remote KMS vendors and potentially a cluster local OpenSearch KMS for on-prem scenarios. The kms alias and password are part of a stand-in, local key management store. Ideally the kms password would be stored as a secure setting but secure settings are AFAIK node-level settings and not index level settings.

Thanks for the links to the PRs; I will have a look

- Properly releases system resourcess in CryptoDirectory

Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@vikasvb90
Copy link
Contributor

@asonje

  1. Both local key management and remote key management can be handled by a single abstraction layer which can be a pluggable layer. I don't think we should couple this in core.
  2. Again, I don't think there is a need of re-inventing the wheel and writing our own encryption code when AWS encryption sdk which is already widely used by community offers the same with no regression, encryption standards, key management support and performance guarantees.
  3. This should be built as a pluggable and reusable layer to allow multiple use cases to benefit transparently from the feature.

As I suggested earlier, I think we can converge and you can use the crypto abstractions present in the linked PRs above to encrypt local indices. Let me know if something is not working out for you.

@Bukhtawar @andrross thoughts?

@andrross
Copy link
Member

@vikasvb90 I agree with your points above.

@asonje Have you looked at implementing this as an optional plugin using the IndexStorePlugin interface? A quick glance at the code looks like the extension points already exist to do what you need to do here. I feel that a plugin might be the better architecture for incubating and iterating on a feature like this.

@asonje
Copy link
Author

asonje commented Jul 28, 2023

@andrross you are right the CryptoDirectory could be implemented using the IndexStorePlugin interface; I will investigate that.

@vikasvb90 Regarding key management; It makes sense to me to reuse any pluggable KMS layer that exists. I do have concerns about the AES GCM algorithm being the only algorithm offered in your scheme. I started out using AES GCM but chose to use AES CTR mode because it easily allows for random access on decryption. Also, with a zero padding, the length of the cipher text is the same as the length of the plain text. This feature is necessary because the files being encrypted are owned and written by Lucene; the CryptoDirectory receives and encrypts a stream of opaque bytes before writing to the filesystem. Among those bytes is some metadata (headers, footers etc). If the file size or structure is modified, checksums (among other things) will fail. I suppose it is possible with GCM to store the authentication tag and IV separately from the encrypted frame, however with large files the performance overhead of managing all that metadata will start to add up.

@reta @dblock @sarthakaggarwal97

@vikasvb90
Copy link
Contributor

vikasvb90 commented Aug 17, 2023

I do have concerns about the AES GCM algorithm being the only algorithm offered in your scheme.

This PR ##8465 has algorithm hard coded because there is only one algorithm which seems to provide sufficient security of content. I can easily make this configurable and you can consider using the abstraction to implement your algo. Although I am not sure if CTR can be considered as a safe to use algorithm because there are security concerns in using it. Consider a case where a malicious actor replaces some encrypted content in the file with data which was encrypted with the same key. There is no way of identifying this change in CTR.

There might be some performance overhead in GCM but there is no measure of how significant it is. Just to add, the metadata you are referring to is not encrypted and is constant for a given size of content. For each frame of size lets say 4KB there are additonal 32 bytes, a file header of size 300-1KB is added (I don't know the number but this is also constant), and 8 additionaly bytes to mark end of file. So, size of metadata isn't much.

Your other concerns of necessity of using CTR don't seem valid. We are already working with Lucene files to ensure data integrity by reusing the stored checksum during encryption of content during transfers to remote stores. Similarly, we have block fetch features which require partial decryption and multi-part uploads which require partial encryption. We can reuse the same abstractions in index encryption paths as well. You can look at CryptoProvider interface in PR ##8466

Also, there is already an OS level encryption mechanism available via dm-crypt to enforce local encryption. Can you please explain the purpose of this feature and what is different from what is already available?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Sep 30, 2023
@kotwanikunal
Copy link
Member

This draft PR has been stalled for a few weeks. Please re-open if you are still working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants