-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On The Fly Encryption Feature Proposal #3469
Comments
Thanks for putting this proposal together @aysee! I'm a big fan of expanding the encryption options within OpenSearch to make it more flexible. There are a couple of things that come to mind we might want to think more about from an experience perspective (@setiah would want your thoughts on this too).
|
Thank you @elfisher for your feedback and questions! Key management A simple answer is that we can make an abstraction layer for a key store communication - define an SPI, make it pluggable, and configurable. If we define basic operations like data encryption key generation and data key decryption, we can support multiple key stores. However, the devil is in the details and multiple key store support may be harder to achieve in reality. Anyways, it's a very valid point but I propose take to baby steps here and start with something we know having extensibility and backward compatibility in mind. It would be also nice, if you can provide more details about these local key stores. Security UI There are multiple use cases here. Marking indices as encrypted is one use case. Having a dedicated UI that displays encrypted indices with the corresponding key configuration is a different use case. Do you think this functionality should be a part of the Feature Request or should it be build later on top of it? Interesting point regarding OpenSearch audit logs. What events do you have in mind? Re-indexing encrypted index into a plain text index would be suspicious for sure. Anything else, like encrypted index creation? Key/config propagation We propose that all the master key details will be stored in Index settings. Index itself will know a master key Id, where it resides, and what is the key management service or key store type. When OpenSearch creates an index shard, it can use this configuration to derive a data key from the master key. Each shard will have its own data key, there is no need to have same data key across shard and it would be a hard thing to do when shards are on different nodes. |
How will this feature interact with snapshots? Specifically, should one be able to take a snapshot without decrypting? |
The idea is good
It affects the size of index data on the disk and in the memory as well since encrypted data is worse than non-encrypted For snapshotting encryption we already introduced a plugin here: https://github.com/aiven/encrypted-repository-opensearch and it was added here: opensearch-project/project-website#812 as a community plugin |
@dblock @willyborankin thank you for your questions and feedback. I'm replying to both of you because there is a certain overlap in Snapshot related functionality that both of you brought up. Snapshots Shard merge Change/roll a master key
Initialization Vector (IV) must not be used to encrypt more than 64Gb of data. Encrypting more data with the same IV makes the key vulnerable. We propose to have a separate IV for a segment, not a shard. It means, we might have problems with segment files that have more than 64Gb. There are multiple ways to fix it:
Besides that, our proposal should have no issues with such big shards.
We don't cover master key rotation yet, so should not be an issue.
Encryption adds almost no overhead on the persisted data. The overhead will be: data key or keys, IV per file, custom Lucene headers and footers per file. |
I agree they need to be independent, some customers could use encrypted file systems and store encrypted snapshots in clouds, using their own keys or build-in functionality provided by clouds.
Thank you for your explanation now it is clear.
Got it.
I especially asked this question due to the problem I thought existed for merging procedure.
Got it. |
This is an interesting proposal @aysee. What is the current status? I'd be happy to contribute towards the implementation if needed. |
I have been working on an implementation of this feature based on the proposed design and recommendations so far. It is almost ready and I expect to create a PR in a few weeks. |
@asonje How's this looking? Do you want to link a draft PR for others to take a look and help on? |
Yes @wbeckler , I am working on some internal validation and will be ready with a PR soon. |
PR #8791 largely follows the design outlined here. A new cryptofs store type The encryption algorithm chosen is AES/CTR/NoPadding with 256-bit keys. AES CTR supports random IO and provides the necessary level of data confidentiality. This however does not guarantee data integrity (like GCM). The crypto provider can be configured via the setting The index owner provides credentials to a key management store, which provides a master key data key pair. Each shard has a unique data key which is encrypted(by the master key) and stored on disk. A Multiple key management store vendors can and should be supported including an OpenSearch cluster-wide KMS service. |
Looks like the draft PR was closed due to inactivity. Are we still tracking this change for any future release? |
There has been a considerable time since this issue was first opened and the state of the architecture with OpenSearch has changed. There were several questions asked and responded to in a the single 'big' comment channel - @aysee could you see about updating the description of the issue to capture these updates so its clear what the intention is for this feature? I'd recommend augmenting the existing plan around the following areas;
|
Feature Proposal
This document is a proposal of On The Fly encryption feature that allows OpenSearch to encrypt search indices on the Directory level using different encryption keys per index.
Why we need it
Enterprise customers require additional controls over data they store in multi-tenanted cloud services. Data encryption with a customer provided key is one of the features these customers are asking for. This feature allows customers to manage their own master key and then give a cloud service access to encrypt or decrypt customer’s data with derived data keys. A customer can revoke master key in a case of a security incident making their data non-decryptable.
This feature enables a better data isolation in a multi-tenanted service, allows for a better audit trail, and for an added security.
OpenSearch does not provide fine-grained multi-tenanted encryption solution yet. It’s either enabled for the whole cluster or for a data node, or is fully disable. When we use a search index per tenant, there is no way to configure encryption per index. Having a separate OpenSearch cluster per tenant is too expensive.
Proposal
The proposal is to implement a new Lucene Directory that will encrypt or decrypt shard data on the fly. We can use existing
settings.store.type
configuration to enable encryption when we create an index. For example:In this case
cryptofs
becomes a new Store Type. OpenSearch will use CryptoDirectory for this specific store type.Potentially, we can implement CryptoDyrectory as a simple FilterDirectory to leverage existing Index Input and Output classes, however this approach won’t allow us to leverage buffered reads and writes. Lucene issues frequent single byte read and write calls, so it’s better to read from and write into an encrypted buffer instead of decrypting and encrypting single bytes every time.
We propose to override Lucene IndexInput and IndexOutput with a new encrypting implementations to leverage existing IO buffer optimization. CryptoDirectory will extend FSDirectory and will instantiate overridden versions of these inputs.
Also, Index Input and Output classes provide access to underlying IO streams, it allows to leverage existing optimized stream encryption libraries.
Encryption
Concrete encryption algorithm can be made configurable, but it’s critical to use no-padding algorithms to keep Lucene’s random IO access support.
Concrete crypto provider will be also configurable. Crypto providers like Amazon Corretto, SunJCE, or Bouncy Castle come with their own tradeoffs. Consumer of this On The Fly encryption feature should be able to make a decision based on their specific performance, FIPS compliance, or runtime environment requirements.
Key management
Each index shard will require one or multiple data keys to encrypt data. We can start with only one data key per shard to simplify key management. But this solution can evolve, for example OpenSearch can generate new data keys according to a time-based or usage-based criteria.
All shard data keys will be derived from one master key defined on the index level. When OpenSearch creates a new index, CryptoDirectoryFactory will reach out to a Key Management Service (KMS) to generate a data key pair. Encrypted version of the data key can be persisted in a
key
file inside the shard data folder itself. Any encryption or decryption operation will require a plain text version of the key, CryptoDirectory will need to make a call to the KMS service to decrypt encrypted data key. It will cache this plain text key version in a short lived cache for performance reasons.Here is how we can configure a KMS when we create an index:
This configuration can support multiple KMS vendors if required.
Key revocation and restoration
When customer revokes access to a master key, OpenSearch cannot decrypt encrypted data keys anymore. It will be able to decrypt encrypted data with a cached plain text version of a key until key cache expires, but after that any requests will start failing. OpenSearch will require a special error code to convey this error to consumers.
Any background operations like merge or refresh will also start failing - they will require a special handling to avoid data corruption.
Key restoration will require no specific logic. Once customer restores key access, then OpenSearch can use immediately to decrypt data keys.
Key rotation and re-encryption
This proposal does not cover managed key rotation and re-encryption. OpenSearch re-indexing satisfies both of these requirements during initial implementation phase.
Audit trail
Customers will be interested in monitoring how OpenSearch uses their encryption keys. Any KMS requests will be logged automatically on the customer’s KMS side. However when OpenSearch uses these data key to encrypt of decrypt data, no logs will be produced.
Performance
Encryption comes with a performance cost. Actual performance degradation will depend on a request type and on encryption algorithm. For example, according to our initial performance benchmarking overhead on injection and simple queries is less than on complex queries with functions and aggregates.
Concrete acceptable performance degradation numbers are still TBD.
Shipment options
We would like this feature to be available in managed AWS OpenSearch service. We can either ship this feature as a community plugin or implement it inside OpenSearch itself.
The text was updated successfully, but these errors were encountered: