Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design Proposal] Client Side Encryption in OpenSearch #6633

Closed
vikasvb90 opened this issue Mar 11, 2023 · 4 comments
Closed

[Design Proposal] Client Side Encryption in OpenSearch #6633

vikasvb90 opened this issue Mar 11, 2023 · 4 comments
Assignees
Labels
discuss Issues intended to help drive brainstorming and decision making feature New feature or request

Comments

@vikasvb90
Copy link
Contributor

vikasvb90 commented Mar 11, 2023

This doc proposes low level design details to provide client side encryption in OpenSearch. For more information about the feature goals, please refer the Feature Doc.

High Level Details

Following are some high level steps involved in encryption and decryption:

  1. Encryption requires a cipher key to encrypt raw data. This key is called data key.
  2. Data key along with it’s encrypted version is usually generated by a secret key provider store such as AWS KMS key.
  3. Generated data key is then used to encrypt raw data. Encrypted content along with encrypted data key are then stored on disk or remote store.
  4. During decryption, encrypted key is first decrypted by making an authorized call to key provider store. Decrypted key is then used to decrypt the content.

Crypto support can therefore be designed into 3 parts :

  1. Providing a set of abstractions in a new Plugin interface in OpenSearch to provide crypto support. These abstractions can be used in OpenSearch for crypto use cases. Underlying modules/plugins can implement these abstractions to provide the support.
  2. A new abstract module in a separate github repo which can provide a concrete implementation of the exposed interfaces. This module will perform actual encryption or decryption by using cipher libraries. It will have extension hooks for plugging in a key provider.
  3. Lastly, different plugins can be written to provide key provider implementation for different types of key stores.

Architecture

Screenshot 2023-05-26 at 10 23 03 PM

Low Level Details

This plugin should work on top of InputStream(s) and should wrap the provided InputStream(s) with it’s own stream(s) responsible for encryption or decryption of the read buffer. This will allow caller to further decorate the stream for any other processing work needed to be done on the read buffer.

Plugin shouldn’t depend on the type of content to be encrypted or decrypted and should function independently. No context of segements, translogs or Lucene directories should be present within the plugin.

We propose the following capabilities for the plugin :

  1. Encryption of entire file referenced by a stream.
  2. Decryption of entire file referenced by a stream.
  3. Estimation of total length of the encrypted content.
  4. Encryption of a particular part of a file referenced by a stream.
  5. Estimation of sizes of different parts of the encrypted or decrypted content of the file being streamed.
  6. To check if support for encryption of a portion of a file is provided.
  7. Decryption of a particular part of a file referenced by a stream.
  8. Additionally, encryption and decryption would require some initialization work and therefore, methods for initializing encryption context and decryption context would be required.

Encryption of entire file referenced by a stream

A mandatory feature offered by the plugin to provide an encrypting stream supplier responsible for supplying raw input stream wrapped with encrypting input stream.

Decryption of entire file referenced by a stream

A mandatory feature offered by the plugin to provide a decrypting stream supplier responsible for supplying raw input stream wrapped with decrypting input stream.

Estimation of total length of the encrypted content

In some cases like remote transfers of data it is essential to know the length of the content before actually processing the content. To support such cases, plugin should support pre-computation of length of the encrypted stream.

Encryption/Decryption of a particular part of a file referenced by a stream

There can be cases where encryption or decryption of only a portion of the content is required. Plugin can provide capability to support such cases. These can be optional features supported by the plugin and methods can be exposed to indicate if these features are supported.

Estimation of sizes of different parts of the encrypted or decrypted content of the file being streamed

To support encryption/decryption of partial content, determining size of the content to be encrypted or decrypted becomes essential in some scenarios. If plugin supports processing partial content then this becomes a mandatory feature to be supported in the plugin.

Performance Results

On carrying out some performance runs on POC code providing capability of encrypt and upload operation of multiple parts of a file in parallel, we obtained following results :

Note: Observations below are taken from prolonged runs (>15min) of repeated transfer of a file.
Instance Type : m5.2xlarge

Without encryption

Threads CPU (%) File Size (gb) Latency (sec)
1 7 4.5 110
3 8 4.5 100
5 12 4.5 60
10 20 4.5 30
20 35 4.5 17
1 3 1.1 71
3 8 1.1 25
5 12 1.1 15
10 21 1.1 8
20 32 1.1 5

With encryption

Threads CPU (%) File Size (gb) Latency (sec)
1 5 4.5 300
3 13 4.5 103
5 21 4.5 60
10 35 4.5 34
20 44 4.5 25
1 5 1.1 78
3 13 1.1 26
5 20 1.1 16
10 33 1.1 9
20 40 1.1 7

Observations
Following can be deduced from the tables :

  1. There is a clear gain in providing support for encryption of a portion of a file. In scenarios where this capability is required like remote transfer of multiple parts of a file in parallel, significant performance improvement can be observed.
  2. There isn’t any considerable overhead of encryption on latency for threads <=10.
  3. CPU on the other hand takes a hit as expected. In case of remote transfer with encryption, CPU taken was almost 80% more than the transfer itself as seen above.
@vikasvb90 vikasvb90 added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 11, 2023
@vikasvb90
Copy link
Contributor Author

Tagging @elfisher @muralikpbhat @reta @mch2 @dreamer-89 @andrross @Bukhtawar @sachinpkale @itiyamas @dblock @shwetathareja @saratvemulapalli @ashking94 for feedback. Pls do tag others who can review this.

@vikasvb90 vikasvb90 self-assigned this Mar 16, 2023
@vikasvb90 vikasvb90 added feature New feature or request discuss Issues intended to help drive brainstorming and decision making and removed enhancement Enhancement or improvement to existing feature or request labels Mar 16, 2023
@dblock
Copy link
Member

dblock commented Mar 20, 2023

This looks sound. Could you please provide a bit of information on 1) key rotation, 2) using multiple keys, 3) interactions with snapshots?

@vikasvb90
Copy link
Contributor Author

@dblock I have intentionally left this out because I am thinking of exposing key provider as an extension of the core plugin (3rd point of High Level Design under crypto support). With this approach, any type of key providers can be plugged in such as KMS key based provider with 7 days of key rotation policy. So, the responsibility of this plugin becomes solely to encrypt or decrypt content given a data key.
We can provide a sample KMS based extension plugin with some configured key rotation period as well though for reference.

@dblock
Copy link
Member

dblock commented Apr 3, 2023

@vikasvb90 Makes a lot of sense. Do we have a proposal for a KMS provider yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants