Skip to content

EQL docs edits #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 9 additions & 15 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,19 @@
# EQL Documentation
# EQL documentation

This directory contains the documentation for the Encrypt Query Language (EQL).

## Concepts
## About

The following concepts are available:
- [Postgres data security with CipherStash](concepts/WHY.md)

- [Why we built EQL](concepts/WHY.md)
## How-to guides

## Reference
- [Getting started](tutorials/GETTINGSTARTED.md)
- [Using CipherStash Proxy](tutorials/PROXY.md)

The following reference guides are available:
## Reference

- [EQL index configuration](reference/INDEX.md)
- [JSONB and JSON support](reference/JSON.md)
- [Migrating plaintext data](reference/MIGRATOR.md)
- [EQL with JSON and JSONB](reference/JSON.md)
- [CipherStash Migrator](reference/MIGRATOR.md)
- [EQL payload data format](reference/PAYLOAD.md)

## Tutorials

The following tutorials are available:

- [Getting started](tutorials/GETTINGSTARTED.md)
- [Using CipherStash Proxy](tutorials/PROXY.md)
39 changes: 16 additions & 23 deletions docs/concepts/WHY.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
# Postgres data security with CipherStash

This article gives a high-level overview of CipherStash's encryption in use solution, including the CipherStash Proxy and the Encrypt Query Language (EQL).
This page gives a high-level overview of CipherStash's encryption in use solution, including CipherStash Proxy and the Encrypt Query Language (EQL). It's designed for developers and engineers who need to implement robust data security in PostgreSQL without sacrificing performance or usability.

It is designed for developers and engineers who need to implement robust data security in PostgreSQL without sacrificing performance or usability.

## Table of Contents
## On this page

1. [Encryption in use](#encryption-in-use)
- [What is encryption in use?](#what-is-encryption-in-use)
- [Why use encryption in use?](#why-use-encryption-in-use)
2. [CipherStash Proxy](#cipherstash-proxy)
- [Proxy overview](#proxy-overview)
- [How it works](#how-it-works)
3. [Encrypt Query Language (EQL)](#encrypt-query-language-eql)
4. [Best practices](#best-practices)
Expand All @@ -20,7 +17,8 @@ It is designed for developers and engineers who need to implement robust data se

## Encryption in use

EQL enables encryption in use, without significant changes to your application code.
CipherStash's encryption in use solution, comprising CipherStash Proxy and EQL, provides a practical way to enhance data security in Postgres databases.
EQL enables encryption in use without significant changes to your application code.
A variety of searchable encryption techniques are available, including:

- **Matching** - Equality or partial matches
Expand All @@ -44,8 +42,6 @@ Encryption in use mitigates this risk by ensuring that:

## CipherStash Proxy

### Proxy overview

CipherStash Proxy is a transparent proxy that sits between your application and your PostgreSQL database.
It intercepts SQL queries and handles the encryption and decryption of data on-the-fly.
This enables encryption in use without significant changes to your application code.
Expand All @@ -63,19 +59,19 @@ This enables encryption in use without significant changes to your application c
Encrypt Query Language (EQL) is a set of PostgreSQL functions and data types provided by CipherStash to work with encrypted data and indexes.
EQL allows you to perform queries on encrypted data without decrypting it, supporting operations like equality checks, range queries, and unique constraints.

To get started, view the [Getting Started](https://github.com/cipherstash/encrypt-query-language/blob/main/GETTINGSTARTED.md) guide.
To get started, read the [Getting started](https://github.com/cipherstash/encrypt-query-language/blob/main/GETTINGSTARTED.md) guide.

## Best Practices
## Best practices

- **Leverage CipherStash Proxy**: Use CipherStash Proxy to handle encryption/decryption transparently.
- **Utilize EQL functions**: Always use EQL functions when interacting with encrypted data.
- **Define constraints**: Apply database constraints to maintain data integrity.
- **Secure key management**: Ensure encryption keys are securely managed and stored.
- **Monitor performance**: Keep an eye on query performance and optimize as needed.
- **Use CipherStash Proxy** to handle encryption/decryption transparently.
- **Use EQL functions** when interacting with encrypted data.
- **Define database constraints**to maintain data integrity.
- **Secure key management** of encryption keys.
- **Monitor query performance** and optimize as needed.

## Advanced Topics
## Advanced topics

### Integrating without CipehrStash Proxy
### Integrating without CipherStash Proxy

> The SDK approach is currently in development, but if you're interested in contributing, please start a discussion [here](https://github.com/cipherstash/encrypt-query-language/discussions).

Expand All @@ -88,11 +84,8 @@ For advanced users who prefer to handle encryption within their application:

**Note**: This approach increases complexity and is recommended only if CipherStash Proxy does not meet specific requirements.

## Conclusion

CipherStash's encryption in use solution, comprising CipherStash Proxy and EQL, provides a practical way to enhance data security in Postgres databases.
By keeping data encrypted even during processing, you minimize the risk of data breaches and comply with stringent security standards without significant changes to your application logic.
## Getting started

To get started, see the [Getting Started](https://github.com/cipherstash/encrypt-query-language/blob/main/GETTINGSTARTED.md) guide.
To get started using CipherStash's encryption is use solution, see the [Getting Started](https://github.com/cipherstash/encrypt-query-language/blob/main/GETTINGSTARTED.md) guide.

**Contact Support:** For further assistance, raise an issue [here](https://github.com/cipherstash/encrypt-query-language/issues).
For further help, raise an issue [here](https://github.com/cipherstash/encrypt-query-language/issues).
18 changes: 9 additions & 9 deletions docs/reference/INDEX.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# EQL index configuration

The following functions allow you to configure indexes for encrypted columns.
All these functions modify the `cs_configuration_v1` table in your database, and is added during the EQL installation.
All these functions modify the `cs_configuration_v1` table in your database, and are added during the EQL installation.

> **IMPORTANT:** When you modify or add an index, you must re-encrypt data that's already been stored in the database.
The CipherStash encryption solution will encrypt the data based on the current state of the configuration.
Expand All @@ -24,7 +24,7 @@ SELECT cs_add_index_v1(
| ------------- | -------------------------------------------------- | ------------------------------------------------------------------------ |
| `table_name` | Name of target table | Required |
| `column_name` | Name of target column | Required |
| `index_name` | The index kind | Required. |
| `index_name` | The index kind | Required |
| `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` |
| `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) |

Expand All @@ -44,7 +44,7 @@ Supported types:

A match index enables full text search across one or more text fields in queries.

The default Match index options are:
The default match index options are:

```json
{
Expand Down Expand Up @@ -93,21 +93,21 @@ Specifically, searching for strings _shorter_ than the `tokenLength` parameter w

If you're using n-gram as a token filter, then a token that is already shorter than the `tokenLength` parameter will be kept as-is when indexed, and so a search for that short token will match that record.
However, if that same short string only appears as a part of a larger token, then it will not match that record.
In general, therefore, you should try to ensure that the string you search for is at least as long as the `tokenLength` of the index, except in the specific case where you know that there are shorter tokens to match, _and_ you are explicitly OK with not returning records that have that short string as part of a larger token.
Try to ensure that the string you search for is at least as long as the `tokenLength` of the index, except in the specific case where you know that there are shorter tokens to match, _and_ you are explicitly OK with not returning records that have that short string as part of a larger token.

#### Options for ste_vec indexes (`opts`)

An ste_vec index on a encrypted JSONB column enables the use of PostgreSQL's `@>` and `<@` [containment operators](https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSONB-OP-TABLE).

An ste_vec index requires one piece of configuration: the `context` (a string) which is passed as an info string to a MAC (Message Authenticated Code).
This ensures that all of the encrypted values are unique to that context.
It is generally recommended to use the table and column name as a the context (e.g. `users/name`).
We recommend that you use the table and column name as a the context (e.g. `users/name`).

Within a dataset, encrypted columns indexed using an `ste_vec` that use different contexts cannot be compared.
Within a dataset, encrypted columns indexed using an `ste_vec` that use different contexts can't be compared.
Containment queries that manage to mix index terms from multiple columns will never return a positive result.
This is by design.

The index is generated from a JSONB document by first flattening the structure of the document such that a hash can be generated for each unique path prefix to a node.
The index is generated from a JSONB document by first flattening the structure of the document so that a hash can be generated for each unique path prefix to a node.

The complete set of JSON types is supported by the indexer.
Null values are ignored by the indexer.
Expand Down Expand Up @@ -182,7 +182,7 @@ The hashes would be generated for all prefixes of the full path to the leaf node

Query terms are processed in the same manner as the input document.

A query prior to encrypting & indexing looks like a structurally similar subset of the encrypted document, for example:
A query prior to encrypting and indexing looks like a structurally similar subset of the encrypted document. For example:

```json
{
Expand Down Expand Up @@ -238,4 +238,4 @@ SELECT cs_remove_index_v1(
column_name text,
index_name text
);
```
```
63 changes: 30 additions & 33 deletions docs/reference/JSON.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,19 @@

EQL supports encrypting, decrypting, and searching JSON and JSONB objects.

## Table of contents

- [Configuring the Index](#configuring-the-index)
- [Inserting JSON Data](#inserting-json-data)
- [Reading JSON Data](#reading-json-data)
- [Querying JSONB Data with EQL](#querying-jsonb-data-with-eql)
- [Containment Queries (`cs_ste_vec_v1`)](#containment-queries-cs_ste_vec_v1)
- [Field Extraction (`cs_ste_vec_value_v1`)](#field-extraction-cs_ste_vec_value_v1)
- [Field Comparison (`cs_ste_vec_term_v1`)](#field-comparison-cs_ste_vec_term_v1)
- [Grouping Data](#grouping-data)
- [Reference](#reference)
- [EQL Functions for JSONB and `ste_vec`](#eql-functions-for-jsonb-and-ste_vec)
- [EJSON Paths](#ejson-paths)
- [Native PostgreSQL JSON(B) Compared to EQL](#native-postgresql-jsonb-compared-to-eql)
## On this page

- [Configuring the index](#configuring-the-index)
- [Inserting JSON data](#inserting-json-data)
- [Reading JSON data](#reading-json-data)
- [Querying JSONB data with EQL](#querying-jsonb-data-with-eql)
- [Containment queries (`cs_ste_vec_v1`)](#containment-queries-cs_ste_vec_v1)
- [Field extraction (`cs_ste_vec_value_v1`)](#field-extraction-cs_ste_vec_value_v1)
- [Field comparison (`cs_ste_vec_term_v1`)](#field-comparison-cs_ste_vec_term_v1)
- [Grouping data](#grouping-data)
- [EQL functions for JSONB and `ste_vec`](#eql-functions-for-jsonb-and-ste_vec)
- [EJSON paths](#ejson-paths)
- [Native PostgreSQL JSON(B) compared to EQL](#native-postgresql-jsonb-compared-to-eql)
- [`json ->> text` → `text` and `json -> text` → `jsonb`/`json`](#json--text--text-and-json---text--jsonbjson)
- [Decryption Example](#decryption-example)
- [Comparison Example](#comparison-example)
Expand Down Expand Up @@ -116,15 +115,15 @@ Data is returned as:
}
```

## Querying JSONB Data with EQL
## Querying JSONB data with EQL

EQL provides specialized functions to interact with encrypted JSONB data, supporting operations like containment queries, field extraction, and comparisons.

### Containment Queries (`cs_ste_vec_v1`)
### Containment queries (`cs_ste_vec_v1`)

Retrieve the Structured Encryption Vector for JSONB containment queries.

**Example: Containment Query**
**Example: Containment query**

Suppose we have the following encrypted JSONB data:

Expand All @@ -138,7 +137,7 @@ Suppose we have the following encrypted JSONB data:

We can query records that contain a specific structure.

**SQL Query:**
**SQL query:**

```sql
SELECT * FROM examples
Expand All @@ -162,11 +161,11 @@ WHERE jsonb_column @> '{"top":{"nested":["a"]}}';

**Note:** The `@>` operator checks if the left JSONB value contains the right JSONB value.

**Negative Example:**
**Negative example:**

If we query for a value that does not exist in the data:

**SQL Query:**
**SQL query:**

```sql
SELECT * FROM examples
Expand All @@ -183,7 +182,7 @@ WHERE cs_ste_vec_v1(encrypted_json) @> cs_ste_vec_v1(

This query would return no results, as the value `"d"` is not present in the `"nested"` array.

### Field Extraction (`cs_ste_vec_value_v1`)
### Field extraction (`cs_ste_vec_value_v1`)

Extract a field from an encrypted JSONB object.

Expand All @@ -201,7 +200,7 @@ Suppose we have the following encrypted JSONB data:

We can extract the value of the `"top"` key.

**SQL Query:**
**SQL query:**

```sql
SELECT cs_ste_vec_value_v1(encrypted_json,
Expand Down Expand Up @@ -231,7 +230,7 @@ FROM examples;
}
```

### Field Comparison (`cs_ste_vec_term_v1`)
### Field comparison (`cs_ste_vec_term_v1`)

Select rows based on a field value in an encrypted JSONB object.

Expand All @@ -247,7 +246,7 @@ Suppose we have encrypted JSONB data with a numeric field:

We can query records where the `"num"` field is greater than `2`.

**SQL Query:**
**SQL query:**

```sql
SELECT * FROM examples
Expand Down Expand Up @@ -277,7 +276,7 @@ SELECT * FROM examples
WHERE (jsonb_column->>'num')::int > 2;
```

### Grouping Data
### Grouping data

Use `cs_ste_vec_term_v1` along with `cs_grouped_value_v1` to group by a field in an encrypted JSONB column.

Expand All @@ -296,7 +295,7 @@ Suppose we have records with a `"color"` field:

We can group the data by the `"color"` field and count occurrences.

**SQL Query:**
**SQL query:**

```sql
SELECT cs_grouped_value_v1(cs_ste_vec_value_v1(encrypted_json,
Expand Down Expand Up @@ -336,24 +335,22 @@ GROUP BY jsonb_column->>'color';
| green | 2 |
| red | 1 |

## Reference
## EQL Functions for JSONB and `ste_vec`

### EQL Functions for JSONB and `ste_vec`

- **Index Management**
- **Index management**

- `cs_add_index_v1(table_name text, column_name text, 'ste_vec', 'jsonb', opts jsonb)`: Adds an `ste_vec` index configuration.
- `opts` must include the `"context"` key.

- **Query Functions**
- **Query functions**

- `cs_ste_vec_v1(val jsonb)`: Retrieves the STE vector for JSONB containment queries.
- `cs_ste_vec_term_v1(val jsonb, epath jsonb)`: Retrieves the encrypted term associated with an encrypted JSON path.
- `cs_ste_vec_value_v1(val jsonb, epath jsonb)`: Retrieves the decrypted value associated with an encrypted JSON path.
- `cs_ste_vec_terms_v1(val jsonb, epath jsonb)`: Retrieves an array of encrypted terms for elements in an array at the given JSON path (used for comparisons).
- `cs_grouped_value_v1(val jsonb)`: Used with `ste_vec` indexes for grouping.

### EJSON Paths
## EJSON paths

EQL uses an extended JSONPath syntax called EJSONPath for specifying paths in JSONB data.

Expand All @@ -363,7 +360,7 @@ EQL uses an extended JSONPath syntax called EJSONPath for specifying paths in JS
- Wildcards are supported: `$.some_array_field[*]`
- Array indexing is **not** supported: `$.some_array_field[0]`

**Example Paths:**
**Example paths:**

- `$.top.nested` selects the `"nested"` key within the `"top"` object.
- `$.array[*]` selects all elements in the `"array"` array.
Expand Down
6 changes: 2 additions & 4 deletions docs/reference/MIGRATOR.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
# CipherStash Migrator

The CipherStash Migrator is a tool that can be used to migrate plaintext data in a database to its encrypted equivalent.
CipherStash Migrator is a tool that can be used to migrate plaintext data in a database to its encrypted equivalent.
It works inside the CipherStash Proxy Docker container and can handle different data types such as text, JSONB, integers, booleans, floats, and dates.
By specifying the relevant columns in your table, the migrator will seamlessly encrypt the existing data and store it in designated encrypted columns.
By specifying the relevant columns in your table, CipherStash Migrator will seamlessly encrypt the existing data and store it in designated encrypted columns.

## Prerequisites

- [CipherStash Proxy](PROXY.md)
- [Have set up EQL in your database](GETTINGSTARTED.md)
- Ensure that the columns where data will be migrated already exist.

Here’s a draft for the technical usage documentation for the CipherStash Migrator tool:

## Usage

The CipherStash Migrator allows you to specify key-value pairs where the key is the plaintext column, and the value is the corresponding encrypted column.
Expand Down