feat: JSON indexing for EQL V2 #263

freshtonic · 2025-06-13T06:00:29Z

Support additional EQL types for representing JSON indexing terms in the eql-mapper and presentation to the proxy as the result of a successful type check.

The new JSON indexing types required adding associated type support to the type system and Unifier.

For more detail, see the commit messages in this PR.

TODO: proxy glue code (partially implemented but the proxy package currently does not build).

Acknowledgment

By submitting this pull request, I confirm that CipherStash can use, modify, copy, and redistribute this contribution, under the terms of CipherStash's choice.

The type system requires associated types to represent EQL terms used only with particular operators and functions. The kinds of encrypted AST literals (`Value` nodes) values that can be communicated as a result of an `eql-mapper` type check will be used by Proxy to inform *how* those terms are to be encrypted. This commit also introduces `EqlTrait`. In `eql-mapper` `EqlTrait` is Rust enum but each variant respresents the name of a notional trait that EQL types can implement. These are the currently defined EQL traits: `Eq`, `Ord`, `TokenMatch`, `JsonLike` & `Contain`. Associated types belong to an `EqlTrait` but are implemented on types that implement that trait (just like in Rust). For example the `EqlTrait::JsonLike` has `Accessor` and `Path` associated types. An `Accessor` is the type of the expression the right hand side of `->` or `->>`. A `Path` is the type of the second argument to `jsonb_query_path`. An associated type is fundamentally related to its parent type in that it shares configuration - but cannot be used in a position where the parent type is mandated. Lastly, `Type::Var` now has "type bounds". Type bounds are represented as an `EqlTraits` value (not the pluralisation) which is just a set of boolean flags: one per `EqlTrait`.

The current implementation of function type checking in `eql-mapper` manually instantiates and unifies types for a handful of specific SQL functions (`MIN`, `MAX`, `COUNT`) and operators (`=`, `<`, `<=`, `>=`, `>`). The list of functions and operators is about to grow significantly and implementing type checking by hand for all of those by manually instantiating and unifying type variables is going to be tedious and error prone. This commit introduces `TypeDecl`, `TypeEnv` and `InstantiatedTypeEnv`. A `TypeDecl` is a type declaration for the `eql-mapper` type system. A `TypeDecl` is purely symbollic and cannot be used by the `Unifier`. What a `TypeDecl` does is achieve a consistent notation for declaring types and their bounds. Multiple type declarations can be put into a `TypeEnv` - a `TypeEnv` is a scope for some related type declarations. Type declarations can reference other type declarations in the same type env via type variables `TVar`s. A `TypeEnv` can be *instantiated*, returning an `InstantiatedTypeEnv`. In an `InstantiatedTypeEnv` all of the `TypeDecls` have been converted into `Arc<Type>` values ready to be used by the `Unifier`. Additionally, the `InstantiatedTypeEnv` allows retrieval of its types via a `TVar` - it maintains the same associated of `TVar` to type as the original `TypeEnv`.

Added crate `eql-mapper-macros`. The crate exposes functions for building `TypeDecl` and `Arc<Type>` values without boilerplate. These macros are great for testing - previously, defining complex types like projections required a lot of code to build the `Type` enums. Now those type definitions are backed up by a DSL.

An associated type represents "deferred" unification. This is a bit more involved than unifying other types. The reason is that the `impl_ty` (parent type) of the associated type can be a type variable bounded by the appropriate `EqlTrait` that declares the associated type. At the time the `Type::Associated(AssociateType)` is instantiated by the `TypeInferencer` we cannot guarantee that the parent type has been resolved - so we can resolve the conrete associated type. So you can think of the associated type as an "obligation" (that's the actual term used in type theory). Whenever unification tries to unify a type with an associated type obligation, it first checks if the parent type of the associated type has been resolved and then resolves the associated type. If the parent type has not been resolved, then the *unresolved* associated type is unified with the other type as a new `Type::Associated`, deferring resolution until later. `Type::Var` now also supports bounds (as an `EqlTraits` struct). Two type variables always unify: even they have different bounds. The result is a new type variable with merged bounds. Whenever a concrete type is unified with a type variable, the effective bounds of the concrete type MUST contain all of the bounds of the type variable (concrete types always have *implied* effective bounds). Thr concrete type is allowed to define additional bounds in excess of those required by the type variable.

Using the macros defined earlier in this PR, define the SQL binary operators and SQL functions that must support EQL types.

- extract "resolve_type" functions into into a ResolveType trait - delete old SQL function and operator macros - some cosmetic renaming

…types

Author: Toby Hede <toby@cipherstash.com>

This enables correct typing of `jsonb_array_elements` and `json_array_elements_text`. `SETOF` is used as the return type for functions that return multiple rows. A `SETOF` type is similar to a projection type except for a few minor differences: - In a projection, columns can be anonymous or aliased and aliases do not even have to be unique. - In a SETOF with multiple columns all columns must have a name and that name must be unique (with the exception a SETOF consisting of a single column does not have to alias the column).

The rename logic now checks if any argument type OR return type is EQL.

`Type::Value` now represents any SQL expression and an artificial dichotomy has been removed which simplifies a bunch of code.

`Projection` is now defined simply as ```rust struct Projection(Vec<ProjectionColumn>); ``` This simplifies type unification because no longer does the `Projection::Empty` case have to be handled as a special case. The projections are now auto-flattened during conversion to the exported `eql_mapper::Type` representation.

Also added a README.

This means cipherstash-proxy & cipherstash-proxy-integration are now sharing the same dependency version.

freshtonic marked this pull request as draft June 13, 2025 06:02

freshtonic requested a review from tobyhede June 13, 2025 06:07

freshtonic force-pushed the feat/json-indexing-eql-v2 branch from 790df49 to f8ab910 Compare June 17, 2025 01:26

tobyhede force-pushed the feat/json-indexing-eql-v2 branch from 0c01019 to ee91aaa Compare June 23, 2025 03:54

freshtonic and others added 26 commits June 26, 2025 15:05

feat: add ability to put type bounds on EQL columns in schema macro

abaa2e7

feat: SQL operator and function definitions that support EQL types

43e86b0

Using the macros defined earlier in this PR, define the SQL binary operators and SQL functions that must support EQL types.

feat: infer function types using delcared SQL/EQL functions

0867d69

feat: infer binary operator types using delcared SQL/EQL operators

762a96f

chore: various refactorings

1f3a451

- extract "resolve_type" functions into into a ResolveType trait - delete old SQL function and operator macros - some cosmetic renaming

chore: make schema delta functionality aware of bounds on EQL column …

0d96253

…types

fix: assorted fixups (due to out of sequence rebasing)

f7084a7

fix: add select_jsonb_path_query

c9100d2

Author: Toby Hede <toby@cipherstash.com>

WIP: get proxy to use new types

402a810

WIP: just enough to get the proxy to compile against the new EQL types

63c7682

fix: add all EQLTraits to EQL col

874f3a3

Add test for jsonb_path_query inference

af16330

feat: jsonb_path_query

6e7312b

docs: RustDoc on Type

296456a

fix: SQL function renaming fails when return type is Native

7f8c54b

The rename logic now checks if any argument type OR return type is EQL.

chore: eql-mapper rustdoc

902efae

more docs

a03a397

refactor: remove Type::Constructor variant

8079f6a

`Type::Value` now represents any SQL expression and an artificial dichotomy has been removed which simplifies a bunch of code.

update to eql-2.0.6

4528d8b

fix: update proxy integration

54cac15

tobyhede and others added 8 commits June 26, 2025 15:16

feat: jsonb_path_query

be957c5

fix(eql-mapper): broken test in eql-mapper-macros

d8dc263

Also added a README.

ref(eql-mapper): remove unused provenance mod

57b70ee

chore: put cipherstash-client in Cargo workspace

8b62144

This means cipherstash-proxy & cipherstash-proxy-integration are now sharing the same dependency version.

ref(eql-mapper): rust doc and trivial refactorings

bbdfe05

chore: clippy

3aa47ea

chore: fmt

77cde08

fix: bad conflict resolutions during rebase

ccdec51

freshtonic force-pushed the feat/json-indexing-eql-v2 branch from 3e661a1 to ccdec51 Compare June 26, 2025 05:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: JSON indexing for EQL V2 #263

feat: JSON indexing for EQL V2 #263

Uh oh!

freshtonic commented Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: JSON indexing for EQL V2 #263

Are you sure you want to change the base?

feat: JSON indexing for EQL V2 #263

Uh oh!

Conversation

freshtonic commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Acknowledgment

Uh oh!

Uh oh!

freshtonic commented Jun 13, 2025 •

edited

Loading