Skip to content

Work on general purpose JSONB support #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 132 additions & 0 deletions JSON.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# JSON Encrypted Indexing

> [!NOTE]
> This section is under construction


JSONB objects can be encrypted in EQL using a Structured Encryption Vec (`ste_vec`)
or a Structured Encryption Map, `ste_map`.

```json
{
<selector>: <term | ciphertext>
}
```


## Simplified JSON Path

CipherStash EQL supports a simplified JSONPath syntax as follows:

| Expression | Description |
|------------|-------------|
| `$` | The root object or array. |
| `.property` | Selects the specified property in a parent object. |
| `[n]` | Selects the n-th element from an array. Indexes are 0-based. |
| `[*]` | Matches any array element |

### Examples

Given the following JSON:

```json
{
"firstName": "John",
"lastName": "doe",
"scores": [1, 2, 3]
}
```

`$.firstName` returns `[John]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the returned value in an array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to keep the same behaviour as standard JSONPath.
Screenshot 2024-10-15 at 2 10 53 PM

I used https://jsonpath.com/

`$.scores` returns `[[1, 2, 3]]`
`$[0]` returns nothing
`$.scores[0]` returns `[1]`
`$.scores[*]` returns `[1, 2, 3]`
`$.` returns the entire object

### Path Segments

A Simplified JSON Path can be tokenized into segments where each segment is one of:

* `.`
* A property
* `[*]`
* `[n]`

Below are some paths along with their segment tokenizations:

* `$.firstName` -> `[".", "firstName"]`
* `$.scores[0]` -> `[".", "scores", "[0]"]`
* `$.` -> `["."]`
* `$` -> `["."]`


## Selectors

A selector represents an encryption of a Simplied JSON Path for a leaf node in the JSON tree (*including* the leaf node itself),
along with information about what type it selects (i.e. a `term` or a `ciphertext`).

Given:

* An `INFO` string representing storage context (e.g the table and column name)
* A `TYPE` - either `T` (term) or `C` (ciphertext)
* A sub-type, `t`, comprising *exactly* 1-byte (set to 0 for the default sub-type)
* A path `P` made up of segments `P(0)..P(N)`
* The length (in bytes) of `x` defined by `len(x)`
* A secure Message Authenticated Code function, `MAC` (such as Blake3 or SHA2-512)
* A length parameter `L` which, when passed to `TRUNCATE(x, L)` will truncate X to `L` bytes
* `+` means string concatenation

The selector is defined as:

```
TRUNCATE(MAC(<TYPE> + <INFO> + len(<INFO>) + {P(0) + len(P(0))} + ... {P(N) + len(P(N))}), L)
```

## Examples

* `INFO`: `customers/attrs`
* `TYPE`: `T`
* `t` : `0`
* `L`: `16`

A given input:

```json
{
"firstName": "John",
"lastName": "doe",
"scores" []
}
```

The selector, `S1` for the path `$.firstName` is:

```
S1 = TRUNCATE(MAC("T" + 0 + "customers/attrs" + 15 + "." + 1 + "firstName" + 9), 16)
```

The selector, `S2` for the path `$.scores[*]` is:

```
S2 = TRUNCATE(MAC("T" + 0 + "customers/attrs" + 15 + "." + 1 + "scores" + 6 + "[*]" + 3), 16)
```


## Terms






For arrays we could do:
```
$.scores[0]
```

Or (if position is not important)
```
$.scores[]
```

107 changes: 92 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,17 +222,94 @@ CREATE TABLE users (

### EQL functions

EQL provides specialized functions to interact with encrypted data:
EQL provides specialized functions to interact with encrypted data.
These Functions expect an encrypted domain type (which is effectively just JSONB).

- **`cs_ciphertext_v1(val JSONB)`**: Extracts the ciphertext for decryption by CipherStash Proxy.
- **`cs_match_v1(val JSONB)`**: Enables basic full-text search.
- **`cs_unique_v1(val JSONB)`**: Retrieves the unique index for enforcing uniqueness.
- **`cs_ore_v1(val JSONB)`**: Retrieves the Order-Revealing Encryption index for range queries.
- **`cs_ste_vec_v1(val JSONB)`**: Retrieves the Structured Encryption Vector for containment queries.


#### `cs_ste_vec_v1(val JSONB)`

Retrieves the Structured Encryption Vector for containment queries.

**Example:**

```rb
# Serialize a JSONB value bound to the users table column
term = User::ENCRYPTED_JSONB.serialize({field: "value"})
User.where("cs_ste_vec_v1(attrs) @> cs_ste_vec_v1(?)", term)
```

Which will execute on the server as:

```sql
SELECT * FROM users WHERE cs_ste_vec_v1(attrs) @> '53T8dtvW4HhofDp9BJnUkw';
```

And is the EQL equivalent of the following plaintext query.

```sql
SELECT * FROM users WHERE attrs @> '{"field": "value"}`;
```
Comment on lines +240 to +256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this framing


#### `cs_ste_term_v1(val JSONB, epath TEXT)`
Copy link
Contributor

@CDThomas CDThomas Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make epath here json/jsonb as well? The motivation is that the JSON value would include all the info needed for generating the MAC so that Proxy doesn't need to sort out the table, column, etc based by looking at anything other than the param or literal used for the arg.

I think that we could aim to target text in SM2, but using JSON here similar to how encryption already works for other EQL functions would simplify the Proxy (MLP) logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what's the return type of cs_ste_term_v1 ? ore_64_8_v1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think for now that makes the most sense. I considered adding a variant of cs_ste_term_v1 which also takes a term type so we can handle other types of index terms but for now this should be enough. Does that sounds reasonable?


Retrieves the encrypted index term associated with the encrypted JSON path, `epath`.

This is useful for sorting or filtering on integers in encrypted JSON objects.

**Example:**

```rb
# Serialize a JSONB value bound to the users table column
path = EJSON_PATH.serialize("$.login_count")
term = User::ENCRYPTED_INT.serialize(100)
User.where("cs_ste_term_v1(attrs, ?) > cs_ore_64_8_v1(?)", path, term)
```

Which will execute on the server as:

```sql
SELECT * FROM users WHERE cs_ste_term_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') > 'QAJ3HezijfTHaKrhdKxUEg';
```

And is the EQL equivalent of the following plaintext query.

```sql
SELECT * FROM users WHERE attrs->'login_count' > 10;
```

#### `cs_ste_value_v1(val JSONB, epath TEXT)`

Retrieves the encrypted *value* associated with the encrypted JSON path, `epath`.

**Example:**

```rb
# Serialize a JSONB value bound to the users table column
path = EJSON_PATH.serialize("$.login_count")
User.find_by_sql(["SELECT cs_ste_value_v1(attrs, ?) FROM users", path])
```

Which will execute on the server as:

```sql
SELECT cs_ste_value_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') FROM users;
```

And is the EQL equivalent of the following plaintext query.

```sql
SELECT attrs->'login_count' FROM users;
```


### Index functions

These Functions expect a `jsonb` value that conforms to the storage schema.
These functions expect a `jsonb` value that conforms to the storage schema.

#### cs_add_index

Expand All @@ -242,22 +319,22 @@ cs_add_index(table_name text, column_name text, index_name text, cast_as text, o

| Parameter | Description | Notes
| ------------- | -------------------------------------------------- | ------------------------------------
| table_name | Name of target table | Required
| column_name | Name of target column | Required
| index_name | The index kind | Required.
| cast_as | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text`
| opts | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below)
| `table_name` | Name of target table | Required
| `column_name` | Name of target column | Required
| `index_name` | The index kind | Required.
| `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text`
| `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below)

##### cast_as

Supported types:
- text
- int
- small_int
- big_int
- boolean
- date
- jsonb
- `text`
- `int`
- `small_int`
- `big_int`
- `boolean`
- `date`
- `jsonb`

##### match opts

Expand Down