Skip to content

ES|QL: add local optimizations for constant_keyword #127549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

luigidellaquila
Copy link
Contributor

Adding local planning optimizations for constant field values (eg. constant_keyword).

The rule tries to get the value at planning time and replaces it with a literal, avoiding field extraction and allowing to trigger further optimizations.

@luigidellaquila luigidellaquila added >non-issue auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.19.0 labels Apr 30, 2025
@luigidellaquila
Copy link
Contributor Author

After discussing it with @alex-spies, we decided to simplify it a bit (to the cost of some minor optimizations) and diverge a bit from what we do in ReplaceMissingFieldWithNull.
In particular, now we avoid to reuse NameIDs, that is potentially dangerous in the long term

@luigidellaquila luigidellaquila marked this pull request as ready for review May 2, 2025 06:59
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@astefan
Copy link
Contributor

astefan commented May 2, 2025

With the following test data, things do not seem to work as they do on main:

test1

{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "properties": {
      "f1": {
        "type": "keyword"
      }
    }
  }
}

test2

{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "properties": {
      "f1": {
        "type": "long"
      }
    }
  }
}

test3

{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "properties": {
      "f1": {
        "type": "constant_keyword"
      }
    }
  }
}
{"index":{"_index":"test1","_id":1}}
{"f1":"foo"}
{"index":{"_index":"test2","_id":1}}
{"f1":123}
{"index":{"_index":"test3","_id":1}}
{"bar":"baz"}
{"index":{"_index":"test3","_id":2}}
{"f1":"abc"}
{"index":{"_index":"test3","_id":3}}
{"f1":"foo"}
FROM test*,-test2 METADATA _index, _id | where f1 == "abc"

Results in

      bar      |  bar.keyword  |      f1       |    _index     |      _id      
---------------+---------------+---------------+---------------+---------------
null           |null           |foo            |test1          |1              
baz            |baz            |abc            |test3          |1              
null           |null           |abc            |test3          |2

@astefan astefan self-requested a review May 2, 2025 10:13
@elasticsearchmachine
Copy link
Collaborator

Hi @luigidellaquila, I've created a changelog YAML for you.

@luigidellaquila
Copy link
Contributor Author

My bad, a missing else.
Before moving forward I'll add more tests, especially with mixed types (eg. text and constant_keyword)

…ptimizations' into esql/constant_keyword_optimizations
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with it. It'd be nice if there were an easier way to get the constant value, but this'll do. If everyone else is good, I'm good.

Object objVal = vals.size() == 1 ? vals.get(0) : null;
// we are considering only string values for now, since this can return "strange" things,
// see IndexModeFieldType
thisVal = objVal instanceof String ? (String) objVal : null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we tend to use BytesRefs encoded as utf-8.

@astefan
Copy link
Contributor

astefan commented May 7, 2025

My bad, a missing else. Before moving forward I'll add more tests, especially with mixed types (eg. text and constant_keyword)

No worries.
Next time, the reviewers' job would be less challenging and more swift if the PR would be ready for review with the tests you mention above. Much better for both parties if there is no need to double-check (with additional tests from reviewers) the validity of the solution. Thanks you :-).


private LogicalPlan replaceAttributes(LogicalPlan plan, Map<Attribute, Expression> attrToValue) {
// This is slightly different from ReplaceMissingFieldWithNull.
// It's on purpose: reusing NameIDs is dangerous, and we have no evidence that adding an EVAL will actually lead to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, provide a more explanatory comment. I am looking for "reusing NameIDs is dangerous" explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, having an eval doing the value "replacement" is not about performance, but about making use of other mechanisms that already exists in the optimizer to naturally "move"/"flow" the EVAL through the Nodes tree (like constant folding, moving literals on the right hand side of boolean expressions etc).

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more... this rule and ReplaceMissingFieldWithNull are, in essence, doing the same thing.

The only essential aspect that is different between them is
localLogicalOptimizerContext.searchStats().exists(f.fieldName())

and

localLogicalOptimizerContext.searchStats().constantValue(attribute.name())

Meaning, the "constant" is either null or is a value coming from a constant_keyword field. Try to see if you can abstract away this logic and either:

  • have two rules that use common code and the only thing different is the searchStats() check
  • have only one rule that does the null and constant check at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants