Skip to content

ES|QL: add local optimizations for constant_keyword #127549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

luigidellaquila
Copy link
Contributor

Adding local planning optimizations for constant field values (eg. constant_keyword).

The rule tries to get the value at planning time and replaces it with a literal, avoiding field extraction and allowing to trigger further optimizations.

@luigidellaquila luigidellaquila added >non-issue auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.19.0 labels Apr 30, 2025
@luigidellaquila
Copy link
Contributor Author

After discussing it with @alex-spies, we decided to simplify it a bit (to the cost of some minor optimizations) and diverge a bit from what we do in ReplaceMissingFieldWithNull.
In particular, now we avoid to reuse NameIDs, that is potentially dangerous in the long term

@luigidellaquila luigidellaquila marked this pull request as ready for review May 2, 2025 06:59
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@astefan
Copy link
Contributor

astefan commented May 2, 2025

With the following test data, things do not seem to work as they do on main:

test1

{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "properties": {
      "f1": {
        "type": "keyword"
      }
    }
  }
}

test2

{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "properties": {
      "f1": {
        "type": "long"
      }
    }
  }
}

test3

{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "properties": {
      "f1": {
        "type": "constant_keyword"
      }
    }
  }
}
{"index":{"_index":"test1","_id":1}}
{"f1":"foo"}
{"index":{"_index":"test2","_id":1}}
{"f1":123}
{"index":{"_index":"test3","_id":1}}
{"bar":"baz"}
{"index":{"_index":"test3","_id":2}}
{"f1":"abc"}
{"index":{"_index":"test3","_id":3}}
{"f1":"foo"}
FROM test*,-test2 METADATA _index, _id | where f1 == "abc"

Results in

      bar      |  bar.keyword  |      f1       |    _index     |      _id      
---------------+---------------+---------------+---------------+---------------
null           |null           |foo            |test1          |1              
baz            |baz            |abc            |test3          |1              
null           |null           |abc            |test3          |2

@astefan astefan self-requested a review May 2, 2025 10:13
@elasticsearchmachine
Copy link
Collaborator

Hi @luigidellaquila, I've created a changelog YAML for you.

@luigidellaquila
Copy link
Contributor Author

My bad, a missing else.
Before moving forward I'll add more tests, especially with mixed types (eg. text and constant_keyword)

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with it. It'd be nice if there were an easier way to get the constant value, but this'll do. If everyone else is good, I'm good.

Object objVal = vals.size() == 1 ? vals.get(0) : null;
// we are considering only string values for now, since this can return "strange" things,
// see IndexModeFieldType
thisVal = objVal instanceof String ? (String) objVal : null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we tend to use BytesRefs encoded as utf-8.

@astefan
Copy link
Contributor

astefan commented May 7, 2025

My bad, a missing else. Before moving forward I'll add more tests, especially with mixed types (eg. text and constant_keyword)

No worries.
Next time, the reviewers' job would be less challenging and more swift if the PR would be ready for review with the tests you mention above. Much better for both parties if there is no need to double-check (with additional tests from reviewers) the validity of the solution. Thanks you :-).


private LogicalPlan replaceAttributes(LogicalPlan plan, Map<Attribute, Expression> attrToValue) {
// This is slightly different from ReplaceMissingFieldWithNull.
// It's on purpose: reusing NameIDs is dangerous, and we have no evidence that adding an EVAL will actually lead to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, provide a more explanatory comment. I am looking for "reusing NameIDs is dangerous" explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, having an eval doing the value "replacement" is not about performance, but about making use of other mechanisms that already exists in the optimizer to naturally "move"/"flow" the EVAL through the Nodes tree (like constant folding, moving literals on the right hand side of boolean expressions etc).

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more... this rule and ReplaceMissingFieldWithNull are, in essence, doing the same thing.

The only essential aspect that is different between them is
localLogicalOptimizerContext.searchStats().exists(f.fieldName())

and

localLogicalOptimizerContext.searchStats().constantValue(attribute.name())

Meaning, the "constant" is either null or is a value coming from a constant_keyword field. Try to see if you can abstract away this logic and either:

  • have two rules that use common code and the only thing different is the searchStats() check
  • have only one rule that does the null and constant check at the same time.

@luigidellaquila
Copy link
Contributor Author

Thanks for the feedback @astefan, I unified the two rules.
TBH I was a bit dubious, mostly for two reasons: one is that all the comments in the old code seem to discourage using that pattern, and suggest possible future refactorings; the other one is that the old rule had a negative logic (see if a field needs to be retained, otherwise replace it with null), while the new one had a positive logic (see if a field is constant, and in case replace it).
Anyway, having a single rule will reduce the cost of optimization (we traverse the tree only once), and future refactorings will involve only a single class, so I'm good with this.
Please have a look if you have a chance.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the idea behind the changes, but I believe the resulting code is not so nice.
For example, the transformExpressionsOnlyUp here is not clearly showing the two (distinct) use cases, ie what has to do shouldBeRetained with constants (the conditional links them together, but one has nothing to do with the other). I've tried being more explicit by splitting the conditional and adding comments, but I still don't feel good about it. The changes do show that the logic is split and things were just merged, is not a natural way to follow the code.

@astefan astefan self-requested a review May 19, 2025 15:34
@luigidellaquila
Copy link
Contributor Author

luigidellaquila commented May 22, 2025

Thanks @astefan, I included your suggestions.
I agree with you that the code is not very natural and that it practically does two things at once; I like the idea of having a single rule just because it's more efficient.
IMHO the real problem is this part; the rule was already doing two things at once, ie. replacing fields with constants in EVALs and WHEREs and adding a new EVAL + PROJECT masking fields using the same NameIDs.
A way to make things simpler (and potentially safer) is to just remove the latter, but it's something we could do in the future, if needed.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luigidellaquila luigidellaquila merged commit 445c3ea into elastic:main May 22, 2025
17 of 18 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127549

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants