Skip to content

Map EQL columns in ORDER BY clause #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 0 commits into from
Feb 27, 2025
Merged

Map EQL columns in ORDER BY clause #122

merged 0 commits into from
Feb 27, 2025

Conversation

CDThomas
Copy link
Contributor

@CDThomas CDThomas commented Feb 24, 2025

This change is the Proxy side of supporting ORDER BY on EQL columns. The approach taken here is to wrap EQL columns in ORDER BY with the EQL function for ORE indexes.

For example, the statement:

SELECT id FROM encrypted ORDER BY encrypted_text;

Maps to:

SELECT id FROM encrypted ORDER BY cs_ore_64_8_v1(encrypted_text);

See the docs for new code and inline PR comments for more details on the implementation.

Related changes in the EQL repo:

@CDThomas
Copy link
Contributor Author

CDThomas commented Feb 25, 2025

CI is failing on the new test, but should pass once cipherstash/encrypt-query-language#90 lands. Passes for me locally with EQL built from the linked PR branch.

Edit: the EQL changes have been released and CI is passing now.

convert_control_flow(self.inferencer.borrow_mut().enter(node))?;

ControlFlow::Continue(())
}

fn exit<N: Visitable>(&mut self, node: &'ast N) -> ControlFlow<Break<Self::Error>> {
convert_control_flow(self.inferencer.borrow_mut().exit(node))?;
convert_control_flow(self.eql_function_tracker.borrow_mut().exit(node))?;
Copy link
Contributor Author

@CDThomas CDThomas Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want exit to be called for the function tracker after the inferencer. The function tracker relies on the inferencer to determine which nodes contain EQL values.

Note that all types aren't necessarily resolved at this point (specifically for literals), but that's currently OK because we're interested in types for identifiers (which are resolved by this point).

#[derive(thiserror::Error, PartialEq, Eq, Debug)]
pub enum EqlFunctionTrackerError {}

impl<'ast> Visitor<'ast> for EqlFunctionTracker<'ast> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is a new visitor that we use in the type checker for determining which nodes need to be wrapped in EQL function calls (currently only EQL columns used in ORDER BY).

//
// For example (assuming that `encrypted_col` is an identifier for an EQL column) transform
// `encrypted_col` to `cs_ore_64_8_v1(encrypted_col)`.
ast::Expr::Identifier(_) | ast::Expr::CompoundIdentifier(_) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the logic in the transformer that's responsible for actually wrapping nodes in EQL function calls (based on output of the new visitor in the type checker).

@CDThomas CDThomas changed the title Wrap EQL columns in ORDER BY in EQL ORE function Map EQL columns in ORDER BY clause Feb 27, 2025
@CDThomas CDThomas marked this pull request as ready for review February 27, 2025 01:04
Copy link
Contributor

@auxesis auxesis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the expertise to review the code, but I do have a usability question:

What documentation about this functionality should we include in Proxy's README?

Copy link
Contributor

@tobyhede tobyhede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work. Super clear PR, A+ would review again.
Bike shedding level commentary should not hold up merge.

/// Takes `eql_function_tracker`, consumes it, and returns a `HashSet` of keys for nodes
/// that the type checker has marked for wrapping with EQL function calls.
fn nodes_to_wrap(&self) -> HashSet<NodeKey<'ast>> {
self.eql_function_tracker.take().into_to_wrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor/nit into_to_wrap is not quite the same semantic as regular into
Name may not quite be right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clippy will warn here if I remove the into_ prefix on into_to_wrap since the method consumes self.

Agreed that the naming feels a bit off though (even if into_ is the correct prefix). Maybe it would be more clear if I rename the method to into_nodes_to_wrap and return a NodesToWrap struct instead of a plain HashSet? I think that would more clearly show that the method consumes self and returns a different data structure.

A couple of examples of the into_ convention for methods that consume self in the std lib are into_iter (on various types) and Vec::into_flattened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to merge and consider refactoring to add in the NodesToWrap struct in a follow-up PR.

@CDThomas
Copy link
Contributor Author

I don't have the expertise to review the code, but I do have a usability question:

What documentation about this functionality should we include in Proxy's README?

@auxesis I think that Proxy is generally missing docs on index types and the queries that they support at the moment. I think that ORDER BY should be mentioned as a feature that ORE indexes support (and we should include some example configuration for an ORE index and a query that uses ORDER BY on an column with an ORE index).

I'd like to handle that in a separate PR so that I can keep this PR focused on internal changes for ORDER BY support.

@CDThomas CDThomas merged this pull request into main Feb 27, 2025
1 check passed
@CDThomas CDThomas deleted the feat/map-order-by branch February 27, 2025 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants