Skip to content

feat: JSON indexing for EQL V2 #263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 34 commits into
base: main
Choose a base branch
from
Draft

Conversation

freshtonic
Copy link
Contributor

@freshtonic freshtonic commented Jun 13, 2025

Support additional EQL types for representing JSON indexing terms in the eql-mapper and presentation to the proxy as the result of a successful type check.

The new JSON indexing types required adding associated type support to the type system and Unifier.

For more detail, see the commit messages in this PR.

TODO: proxy glue code (partially implemented but the proxy package currently does not build).

Acknowledgment

By submitting this pull request, I confirm that CipherStash can use, modify, copy, and redistribute this contribution, under the terms of CipherStash's choice.

@freshtonic freshtonic marked this pull request as draft June 13, 2025 06:02
@freshtonic freshtonic requested a review from tobyhede June 13, 2025 06:07
@freshtonic freshtonic force-pushed the feat/json-indexing-eql-v2 branch from 790df49 to f8ab910 Compare June 17, 2025 01:26
@tobyhede tobyhede force-pushed the feat/json-indexing-eql-v2 branch from 0c01019 to ee91aaa Compare June 23, 2025 03:54
freshtonic and others added 26 commits June 26, 2025 15:05
The type system requires associated types to represent EQL terms used
only with particular operators and functions. The kinds of encrypted AST
literals (`Value` nodes) values that can be communicated as a result of
an `eql-mapper` type check will be used by Proxy to inform *how* those
terms are to be encrypted.

This commit also introduces `EqlTrait`. In `eql-mapper` `EqlTrait` is
Rust enum but each variant respresents the name of a notional trait that
EQL types can implement.

These are the currently defined EQL traits: `Eq`, `Ord`, `TokenMatch`,
`JsonLike` & `Contain`.

Associated types belong to an `EqlTrait` but are implemented on types
that implement that trait (just like in Rust).

For example the `EqlTrait::JsonLike` has `Accessor` and `Path`
associated types.

An `Accessor` is the type of the expression the right hand side of `->`
or `->>`.

A `Path` is the type of the second argument to `jsonb_query_path`.

An associated type is fundamentally related to its parent type in that
it shares configuration - but cannot be used in a position where the
parent type is mandated.

Lastly, `Type::Var` now has "type bounds". Type bounds are represented
as an `EqlTraits` value (not the pluralisation) which is just a set of
boolean flags: one per `EqlTrait`.
The current implementation of function type checking in `eql-mapper`
manually instantiates and unifies types for a handful of specific SQL
functions (`MIN`, `MAX`, `COUNT`) and operators (`=`, `<`, `<=`, `>=`,
`>`).

The list of functions and operators is about to grow significantly and
implementing type checking by hand for all of those by manually
instantiating and unifying type variables is going to be tedious and
error prone.

This commit introduces `TypeDecl`, `TypeEnv` and `InstantiatedTypeEnv`.

A `TypeDecl` is a type declaration for the `eql-mapper` type system. A
`TypeDecl` is purely symbollic and cannot be used by the `Unifier`.

What a `TypeDecl` does is achieve a consistent notation for declaring
types and their bounds.

Multiple type declarations can be put into a `TypeEnv` - a `TypeEnv` is
a scope for some related type declarations. Type declarations can
reference other type declarations in the same type env via type
variables `TVar`s.

A `TypeEnv` can be *instantiated*, returning an `InstantiatedTypeEnv`.
In an `InstantiatedTypeEnv` all of the `TypeDecls` have been converted
into `Arc<Type>` values ready to be used by the `Unifier`.

Additionally, the `InstantiatedTypeEnv` allows retrieval of its types
via a `TVar` - it maintains the same associated of `TVar` to type as the
original `TypeEnv`.
Added crate `eql-mapper-macros`.

The crate exposes functions for building `TypeDecl` and `Arc<Type>`
values without boilerplate.

These macros are great for testing - previously, defining complex types
like projections required a lot of code to build the `Type` enums.

Now those type definitions are backed up by a DSL.
An associated type represents "deferred" unification.

This is a bit more involved than unifying other types. The reason is
that the `impl_ty` (parent type) of the associated type can be a type
variable bounded by the appropriate `EqlTrait` that declares the
associated type.

At the time the `Type::Associated(AssociateType)` is instantiated by the
`TypeInferencer` we cannot guarantee that the parent type has been
resolved - so we can resolve the conrete associated type.

So you can think of the associated type as an "obligation" (that's the
actual term used in type theory).

Whenever unification tries to unify a type with an associated type
obligation, it first checks if the parent type of the associated type
has been resolved and then resolves the associated type.

If the parent type has not been resolved, then the *unresolved*
associated type is unified with the other type as a new
`Type::Associated`, deferring resolution until later.

`Type::Var` now also supports bounds (as an `EqlTraits` struct). Two
type variables always unify: even they have different bounds. The
result is a new type variable with merged bounds.

Whenever a concrete type is unified with a type variable, the effective
bounds of the concrete type MUST contain all of the bounds of
the type variable (concrete types always have *implied* effective
bounds). Thr concrete type is allowed to define additional bounds in
excess of those required by the type variable.
Using the macros defined earlier in this PR, define the SQL binary
operators and SQL functions that must support EQL types.
- extract "resolve_type" functions into into a ResolveType trait
- delete old SQL function and operator macros
- some cosmetic renaming
Author: Toby Hede <toby@cipherstash.com>
This enables correct typing of `jsonb_array_elements` and
`json_array_elements_text`.

`SETOF` is used as the return type for functions that return multiple
rows. A `SETOF` type is similar to a projection type except for a few
minor differences:

- In a projection, columns can be anonymous or aliased and aliases do
  not even have to be unique.
- In a SETOF with multiple columns all columns must have a name and
  that name must be unique (with the exception a SETOF consisting of
  a single column does not have to alias the column).
The rename logic now checks if any argument type OR return type is EQL.
`Type::Value` now represents any SQL expression and an artificial
dichotomy has been removed which simplifies a bunch of code.
`Projection` is now defined simply as

```rust
struct Projection(Vec<ProjectionColumn>);
```

This simplifies type unification because no longer does the
`Projection::Empty` case have to be handled as a special case.

The projections are now auto-flattened during conversion to the exported
`eql_mapper::Type` representation.
@freshtonic freshtonic force-pushed the feat/json-indexing-eql-v2 branch from 3e661a1 to ccdec51 Compare June 26, 2025 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants