Encrypt Query Language (EQL) is a set of abstractions for transmitting, storing & interacting with encrypted data and indexes in PostgreSQL.
EQL provides a data format for transmitting and storing encrypted data & indexes, and database types & functions to interact with the encrypted material.
EQL enables encryption in use, without significant changes to your application code. A variety of searchable encryption techniques are available, including:
- Matching - Equality or partial matches
- Ordering - comparison operations using order revealing encryption
- Uniqueness - enforcing unique constraints
Encryption in use is the practice of keeping data encrypted even while it's being processed or queried in the database. Unlike traditional encryption methods that secure data only at rest (on disk) or in transit (over the network), encryption in use keeps the data encrypted while operations are being performed on the data. This provides an additional layer of security against unauthorized access — an adversary needs access to the encrypted data and encryption keys.
While encryption at rest and in transit are essential, they don't protect data when the database server itself is compromised. Encryption in use mitigates this risk by ensuring that:
- Data remains secure: Even if the database server is breached, the data remains encrypted and unreadable without the proper keys.
- Compliance controls are stronger: When you need stronger data security controls than what SOC2/SOC3 or ISO27001 mandate, encryption in use helps you meet those stringent requirements.
CipherStash Proxy is a transparent proxy that sits between your application and your PostgreSQL database. It intercepts SQL queries and handles the encryption and decryption of data on-the-fly. This enables encryption in use without significant changes to your application code.
- Intercepts queries: CipherStash Proxy captures SQL statements from the client application.
- Encrypts data: For write operations, it encrypts the plaintext data before sending it to the database.
- Decrypts data: For read operations, it decrypts the encrypted data retrieved from the database before returning it to the client.
- Maintains searchability: Ensures that the encrypted data is searchable and retrievable without sacrificing performance or application functionality.
- Manages encryption keys: Securely handles encryption keys required for encrypting and decrypting data.
EQL uses CipherStash Proxy to mediate access to your PostgreSQL database and provide low-latency encryption & decryption.
At a high level:
- encrypted data is stored as
jsonb
- references to the column in sql statements are wrapped in a helper function
- Cipherstash Proxy transparently encrypts and indexes data
- Database client sends
plaintext
data encoded asjsonb
- CipherStash Proxy encrypts the
plaintext
and encodes theciphertext
value and associated indexes into thejsonb
payload - The data is written to the encrypted column
- Wrap references to the encrypted column in the appropriate EQL function
- CipherStash Proxy encrypts the
plaintext
- PostgreSQL executes the SQL statement
- CipherStash Proxy decrypts any returned
ciphertext
data and returns to client
Before you get started, it's important to understand some of the key components of EQL.
Encrypted columns are defined using the cs_encrypted_v1
domain type, which extends the jsonb
type with additional constraints to ensure data integrity.
Example table definition:
CREATE TABLE users (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email_encrypted cs_encrypted_v1
);
EQL provides specialized functions to interact with encrypted data:
cs_ciphertext_v1(val JSONB)
: Extracts the ciphertext for decryption by CipherStash Proxy.cs_match_v1(val JSONB)
: Enables basic full-text search.cs_unique_v1(val JSONB)
: Retrieves the unique index for enforcing uniqueness.cs_ore_v1(val JSONB)
: Retrieves the Order-Revealing Encryption index for range queries.
These Functions expect a jsonb
value that conforms to the storage schema.
cs_add_index(table_name text, column_name text, index_name text, cast_as text, opts jsonb)
Parameter | Description | Notes |
---|---|---|
table_name | Name of target table | Required |
column_name | Name of target column | Required |
index_name | The index kind | Required. |
cast_as | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to text |
opts | Index options | Optional for match indexes (see below) |
Supported types:
- text
- int
- small_int
- big_int
- boolean
- date
A match index enables full text search across one or more text fields in queries.
The default Match index options are:
{
"k": 6,
"m": 2048,
"include_original": true,
"tokenizer": {
"kind": "ngram",
"token_length": 3
}
"token_filters": {
"kind": "downcase"
}
}
tokenFilters
: a list of filters to apply to normalise tokens before indexing.tokenizer
: determines how input text is split into tokens.m
: The size of the backing bloom filter in bits. Defaults to2048
.k
: The maximum number of bits set in the bloom filter per term. Defaults to6
.
Token Filters
There are currently only two token filters available downcase
and upcase
. These are used to normalise the text before indexing and are also applied to query terms. An empty array can also be passed to tokenFilters
if no normalisation of terms is required.
Tokenizer
There are two tokenizer
s provided: standard
and ngram
.
The standard
simply splits text into tokens using this regular expression: /[ ,;:!]/
.
The ngram
tokenizer splits the text into n-grams and accepts a configuration object that allows you to specify the tokenLength
.
m and k
k
and m
are optional fields for configuring bloom filters that back full text search.
m
is the size of the bloom filter in bits. filterSize
must be a power of 2 between 32
and 65536
and defaults to 2048
.
k
is the number of hash functions to use per term.
This determines the maximum number of bits that will be set in the bloom filter per term.
k
must be an integer from 3
to 16
and defaults to 6
.
Caveats around n-gram tokenization
While using n-grams as a tokenization method allows greater flexibility when doing arbitrary substring matches, it is important to bear in mind the limitations of this approach.
Specifically, searching for strings shorter than the tokenLength
parameter will not generally work.
If you're using n-gram as a token filter, then a token that is already shorter than the tokenLength
parameter will be kept as-is when indexed, and so a search for that short token will match that record.
However, if that same short string only appears as a part of a larger token, then it will not match that record.
In general, therefore, you should try to ensure that the string you search for is at least as long as the tokenLength
of the index, except in the specific case where you know that there are shorter tokens to match, and you are explicitly OK with not returning records that have that short string as part of a larger token.
_cs_modify_index_v1(table_name text, column_name text, index_name text, cast_as text, opts jsonb)
Modifies an existing index configuration.
Accepts the same parameters as cs_add_index
cs_remove_index_v1(table_name text, column_name text, index_name text)
Removes an index configuration from the column.
These Functions expect a jsonb
value that conforms to the storage schema, and are used to perform search operations.
cs_ciphertext_v1(val jsonb)
Extracts the ciphertext from the jsonb
value.
Ciphertext values are transparently decrypted in transit by Cipherstash Proxy.
cs_match_v1(val jsonb)
Extracts a match index from the jsonb
value.
Returns null
if no match index is present.
cs_unique_v1(val jsonb)
Extracts a unique index from the jsonb
value.
Returns null
if no unique index is present.
cs_ore_v1(val jsonb)
Extracts an ore index from the jsonb
value.
Returns null
if no ore index is present.
Encrypted data is stored as jsonb
with a specific schema:
-
Plaintext Payload (Client Side):
{ "v": 1, "k": "pt", "p": "plaintext value", "e": { "t": "table_name", "c": "column_name" } }
-
Encrypted Payload (Database Side):
{ "v": 1, "k": "ct", "c": "ciphertext value", "e": { "t": "table_name", "c": "column_name" } }
The format is defined as a JSON Schema.
It should never be necessary to directly interact with the stored jsonb
.
Cipherstash proxy handles the encoding, and EQL provides the functions.
Field | Name | Description |
---|---|---|
s | Schema version | JSON Schema version of this json document. |
v | Version | The configuration version that generated this stored value. |
k | Kind | The kind of the data (plaintext/pt, ciphertext/ct, encrypting/et). |
i.t | Table identifier | Name of the table containing encrypted column. |
i.c | Column identifier | Name of the encrypted column. |
p | Plaintext | Plaintext value sent by database client. Required if kind is plaintext/pt or encrypting/et. |
c | Ciphertext | Ciphertext value. Encrypted by proxy. Required if kind is plaintext/pt or encrypting/et. |
m.1 | Match index | Ciphertext index value. Encrypted by proxy. |
o.1 | ORE index | Ciphertext index value. Encrypted by proxy. |
u.1 | Uniqueindex | Ciphertext index value. Encrypted by proxy. |
We have created a few langague specific packages to help you interact with the payloads:
- @cipherstash/eql: This is a TypeScript implementation of EQL.
The following guide assumes you have the prerequisites installed and running, and are running the SQL statements through your CipherStash Proxy instance.
- PostgreSQL 14+
- Cipherstash Proxy
- Cipherstash Encrypt
- It's important to have your dataset configured for encryption before you start using EQL.
- You can use the
cipherstash/dataset.yml
file in thecipherstash
directory as a starting point.
EQL relies on Cipherstash Proxy and Cipherstash Encrypt for low-latency encryption & decryption. We plan to support direct language integration in the future.
Note: You will need to copy the
cipherstash/cipherstash-proxy.toml.example
file tocipherstash/cipherstash-proxy.toml
and update the values to match your environment before running the script.
In order to use EQL, you must first install the EQL extension in your PostgreSQL database.
You can do this by running the following command, which will execute the SQL from the src/install.sql
file:
Update the database credentials based on your environment.
psql -U postgres -d postgres -f src/install.sql
Create a table with encrypted columns.
For this example, we'll use the users
table, with a plaintext email
column and an encrypted email_encrypted
column.
CREATE TABLE IF NOT EXISTS "users" (
"id" serial PRIMARY KEY NOT NULL,
"email" varchar,
"email_encrypted" "cs_encrypted_v1"
);
When inserting data into the encrypted column, you must wrap the plaintext in the appropriate EQL payload.
INSERT INTO users (email_encrypted) VALUES ('{"v":1,"k":"pt","p":"test@test.com","i":{"t":"users","c":"email_encrypted"}}');
For reference, the EQL payload is defined as a jsonb
with a specific schema:
{
"v": 1,
"k": "pt",
"p": "test@test.com",
"i": {
"t": "users",
"c": "email_encrypted"
}
}
When querying data, you must wrap the encrypted column in the appropriate EQL payload.
SELECT email_encrypted FROM users WHERE cs_match_v1(email_encrypted) @> cs_match_v1('{"v":1,"k":"pt","p":"test@test.com","i":{"t":"users","c":"email_encrypted"}}');
For reference, the EQL payload is defined as a jsonb
with a specific schema:
{
"v": 1,
"k": "ct",
"c": "test@test.com",
"i": {
"t": "users",
"c": "email_encrypted"
}
}
In progress...
TODO: Do we need this?
-- Alter tables from the configuration
cs_create_encrypted_columns_v1()
-- Explicit alter table
ALTER TABLE users ADD column email_encrypted cs_encrypted_v1;
EQL supports three types of indexes:
- match
- ore (order revealing encryption)
- unique
Indexes are managed using EQL functions and can be baked into an existing database migration process.
-- Add an ore index to users.name
cs_add_index('users', 'name', 'ore');
-- Remove an ore index from users.name
cs_remove_index('users', 'name', 'ore');
Adding the index to your configuration does not encrypt the data.
The encryption process needs to update every row in the target table. Depending on the size of the target table, this process can be long-running.
{{LINK TO MIGRATOR DETAILS HERE}}
.... more to come