-
Notifications
You must be signed in to change notification settings - Fork 0
Work on general purpose JSONB support #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
# JSON Encrypted Indexing | ||
|
||
> [!NOTE] | ||
> This section is under construction | ||
|
||
|
||
JSONB objects can be encrypted in EQL using a Structured Encryption Vec (`ste_vec`) | ||
or a Structured Encryption Map, `ste_map`. | ||
|
||
```json | ||
{ | ||
<selector>: <term | ciphertext> | ||
} | ||
``` | ||
|
||
|
||
## Simplified JSON Path | ||
|
||
CipherStash EQL supports a simplified JSONPath syntax as follows: | ||
|
||
| Expression | Description | | ||
|------------|-------------| | ||
| `$` | The root object or array. | | ||
| `.property` | Selects the specified property in a parent object. | | ||
| `[n]` | Selects the n-th element from an array. Indexes are 0-based. | | ||
| `[*]` | Matches any array element | | ||
|
||
### Examples | ||
|
||
Given the following JSON: | ||
|
||
```json | ||
{ | ||
"firstName": "John", | ||
"lastName": "doe", | ||
"scores": [1, 2, 3] | ||
} | ||
``` | ||
|
||
`$.firstName` returns `[John]` | ||
`$.scores` returns `[[1, 2, 3]]` | ||
`$[0]` returns nothing | ||
`$.scores[0]` returns `[1]` | ||
`$.scores[*]` returns `[1, 2, 3]` | ||
`$.` returns the entire object | ||
|
||
### Path Segments | ||
|
||
A Simplified JSON Path can be tokenized into segments where each segment is one of: | ||
|
||
* `.` | ||
* A property | ||
* `[*]` | ||
* `[n]` | ||
|
||
Below are some paths along with their segment tokenizations: | ||
|
||
* `$.firstName` -> `[".", "firstName"]` | ||
* `$.scores[0]` -> `[".", "scores", "[0]"]` | ||
* `$.` -> `["."]` | ||
* `$` -> `["."]` | ||
|
||
|
||
## Selectors | ||
|
||
A selector represents an encryption of a Simplied JSON Path for a leaf node in the JSON tree (*including* the leaf node itself), | ||
along with information about what type it selects (i.e. a `term` or a `ciphertext`). | ||
|
||
Given: | ||
|
||
* An `INFO` string representing storage context (e.g the table and column name) | ||
* A `TYPE` - either `T` (term) or `C` (ciphertext) | ||
* A sub-type, `t`, comprising *exactly* 1-byte (set to 0 for the default sub-type) | ||
* A path `P` made up of segments `P(0)..P(N)` | ||
* The length (in bytes) of `x` defined by `len(x)` | ||
* A secure Message Authenticated Code function, `MAC` (such as Blake3 or SHA2-512) | ||
* A length parameter `L` which, when passed to `TRUNCATE(x, L)` will truncate X to `L` bytes | ||
* `+` means string concatenation | ||
|
||
The selector is defined as: | ||
|
||
``` | ||
TRUNCATE(MAC(<TYPE> + <INFO> + len(<INFO>) + {P(0) + len(P(0))} + ... {P(N) + len(P(N))}), L) | ||
``` | ||
|
||
## Examples | ||
|
||
* `INFO`: `customers/attrs` | ||
* `TYPE`: `T` | ||
* `t` : `0` | ||
* `L`: `16` | ||
|
||
A given input: | ||
|
||
```json | ||
{ | ||
"firstName": "John", | ||
"lastName": "doe", | ||
"scores" [] | ||
} | ||
``` | ||
|
||
The selector, `S1` for the path `$.firstName` is: | ||
|
||
``` | ||
S1 = TRUNCATE(MAC("T" + 0 + "customers/attrs" + 15 + "." + 1 + "firstName" + 9), 16) | ||
``` | ||
|
||
The selector, `S2` for the path `$.scores[*]` is: | ||
|
||
``` | ||
S2 = TRUNCATE(MAC("T" + 0 + "customers/attrs" + 15 + "." + 1 + "scores" + 6 + "[*]" + 3), 16) | ||
``` | ||
|
||
|
||
## Terms | ||
|
||
|
||
|
||
|
||
|
||
|
||
For arrays we could do: | ||
``` | ||
$.scores[0] | ||
``` | ||
|
||
Or (if position is not important) | ||
``` | ||
$.scores[] | ||
``` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -222,17 +222,94 @@ CREATE TABLE users ( | |
|
||
### EQL functions | ||
|
||
EQL provides specialized functions to interact with encrypted data: | ||
EQL provides specialized functions to interact with encrypted data. | ||
These Functions expect an encrypted domain type (which is effectively just JSONB). | ||
|
||
- **`cs_ciphertext_v1(val JSONB)`**: Extracts the ciphertext for decryption by CipherStash Proxy. | ||
- **`cs_match_v1(val JSONB)`**: Enables basic full-text search. | ||
- **`cs_unique_v1(val JSONB)`**: Retrieves the unique index for enforcing uniqueness. | ||
- **`cs_ore_v1(val JSONB)`**: Retrieves the Order-Revealing Encryption index for range queries. | ||
- **`cs_ste_vec_v1(val JSONB)`**: Retrieves the Structured Encryption Vector for containment queries. | ||
|
||
|
||
#### `cs_ste_vec_v1(val JSONB)` | ||
|
||
Retrieves the Structured Encryption Vector for containment queries. | ||
|
||
**Example:** | ||
|
||
```rb | ||
# Serialize a JSONB value bound to the users table column | ||
term = User::ENCRYPTED_JSONB.serialize({field: "value"}) | ||
User.where("cs_ste_vec_v1(attrs) @> cs_ste_vec_v1(?)", term) | ||
``` | ||
|
||
Which will execute on the server as: | ||
|
||
```sql | ||
SELECT * FROM users WHERE cs_ste_vec_v1(attrs) @> '53T8dtvW4HhofDp9BJnUkw'; | ||
``` | ||
|
||
And is the EQL equivalent of the following plaintext query. | ||
|
||
```sql | ||
SELECT * FROM users WHERE attrs @> '{"field": "value"}`; | ||
``` | ||
Comment on lines
+240
to
+256
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really like this framing |
||
|
||
#### `cs_ste_term_v1(val JSONB, epath TEXT)` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we make I think that we could aim to target text in SM2, but using JSON here similar to how encryption already works for other EQL functions would simplify the Proxy (MLP) logic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, what's the return type of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I think for now that makes the most sense. I considered adding a variant of |
||
|
||
Retrieves the encrypted index term associated with the encrypted JSON path, `epath`. | ||
|
||
This is useful for sorting or filtering on integers in encrypted JSON objects. | ||
|
||
**Example:** | ||
|
||
```rb | ||
# Serialize a JSONB value bound to the users table column | ||
path = EJSON_PATH.serialize("$.login_count") | ||
term = User::ENCRYPTED_INT.serialize(100) | ||
User.where("cs_ste_term_v1(attrs, ?) > cs_ore_64_8_v1(?)", path, term) | ||
``` | ||
|
||
Which will execute on the server as: | ||
|
||
```sql | ||
SELECT * FROM users WHERE cs_ste_term_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') > 'QAJ3HezijfTHaKrhdKxUEg'; | ||
``` | ||
|
||
And is the EQL equivalent of the following plaintext query. | ||
|
||
```sql | ||
SELECT * FROM users WHERE attrs->'login_count' > 10; | ||
``` | ||
|
||
#### `cs_ste_value_v1(val JSONB, epath TEXT)` | ||
|
||
Retrieves the encrypted *value* associated with the encrypted JSON path, `epath`. | ||
|
||
**Example:** | ||
|
||
```rb | ||
# Serialize a JSONB value bound to the users table column | ||
path = EJSON_PATH.serialize("$.login_count") | ||
User.find_by_sql(["SELECT cs_ste_value_v1(attrs, ?) FROM users", path]) | ||
``` | ||
|
||
Which will execute on the server as: | ||
|
||
```sql | ||
SELECT cs_ste_value_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') FROM users; | ||
``` | ||
|
||
And is the EQL equivalent of the following plaintext query. | ||
|
||
```sql | ||
SELECT attrs->'login_count' FROM users; | ||
``` | ||
|
||
|
||
### Index functions | ||
|
||
These Functions expect a `jsonb` value that conforms to the storage schema. | ||
These functions expect a `jsonb` value that conforms to the storage schema. | ||
|
||
#### cs_add_index | ||
|
||
|
@@ -242,22 +319,22 @@ cs_add_index(table_name text, column_name text, index_name text, cast_as text, o | |
|
||
| Parameter | Description | Notes | ||
| ------------- | -------------------------------------------------- | ------------------------------------ | ||
| table_name | Name of target table | Required | ||
| column_name | Name of target column | Required | ||
| index_name | The index kind | Required. | ||
| cast_as | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` | ||
| opts | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) | ||
| `table_name` | Name of target table | Required | ||
| `column_name` | Name of target column | Required | ||
| `index_name` | The index kind | Required. | ||
| `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` | ||
| `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) | ||
|
||
##### cast_as | ||
|
||
Supported types: | ||
- text | ||
- int | ||
- small_int | ||
- big_int | ||
- boolean | ||
- date | ||
- jsonb | ||
- `text` | ||
- `int` | ||
- `small_int` | ||
- `big_int` | ||
- `boolean` | ||
- `date` | ||
- `jsonb` | ||
|
||
##### match opts | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the returned value in an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to keep the same behaviour as standard JSONPath.

I used https://jsonpath.com/