Skip to content

ESQL: Allow pruning columns added by InlineJoin #131204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Jul 14, 2025

This updates PruneColumns rule to allow optimizing away the columns added by InlineJoin, dropping unnecessary agg'ing.

bpintea added 2 commits July 14, 2025 14:38
This updates PruneColumns rule to allow optimizing away the columns
added by InlineJoin, dropping unnecessary agg'ing.
@elasticsearchmachine elasticsearchmachine added v9.2.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jul 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's looking good, but the refactoring can be made a bit more digestible with more comments and smarter method names.
I didn't check the correctness of the optimizer rule unit tests, only looked at the queries they are using.

I have noticed that, after chasing another bug I discovered, that it's fixing another behavior as well.

In #124715, it was described as FROM kibana_sample_data_logs | EVAL timestamp=DATE_TRUNC(5 minute, @timestamp) | INLINESTATS results = count(*) by timestamp | keep results, timestamp. For the dataset used in csv-spec files:

FROM employees | WHERE still_hired == false | EVAL sal = salary/1000 | INLINESTATS total = SUM(sal), count=COUNT(*) BY gender | keep emp_no, still_hired, total, count | SORT emp_no

FROM employees | EVAL sal = salary/1000 | INLINESTATS total = SUM(sal), count=COUNT(*) BY gender | keep emp_no, still_hired, total, count | WHERE still_hired == false

FROM employees | EVAL sal = salary/1000 | INLINESTATS total = SUM(sal) BY gender | keep emp_no, still_hired, total

row salary = 12300, emp_no = 5, gender = "F" | eval sal = salary/1000 | inlinestats sum(sal) by gender | keep emp_no

^ here the presence of that evaluation before inlinestats is at least one of the triggers. We do have some queries with eval before inlinestats and with keep after it which succeed. Those above do not. I recommend adding some flavors of these to the csv-spec file.

// track inlinestats' own aggregation output (right-hand side of the join) so that any other plan on the left-hand side of the
// inline join won't have its columns pruned due to the lack of "visibility" into the right hand side output/Attributes
var inlineJoinRightOutput = new ArrayList<Attribute>();
return pruneColumns(plan, plan.outputSet().asBuilder(), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a harder to follow code, but with some comments it can be made easier to digest.
For one, I would two methods; one LogicalPlan pruneColumns(LogicalPlan plan, AttributeSet.Builder used) and use it here. This first method would call pruneColumns(plan, plan.outputSet().asBuilder(), false).
The second method would be pruneInlineJoinColumns(LogicalPlan plan, AttributeSet.Builder used) and call pruneColumns(plan, plan.outputSet().asBuilder(), true).

p = aggregate.with(aggregate.groupings(), remaining);
}
} else {
if (inlineJoin && aggregate.groupings().containsAll(remaining)) { // not expecting high groups cardinality
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment tbh // not expecting high groups cardinality


var remaining = pruneUnusedAndAddReferences(aggregate.aggregates(), used);

if (remaining != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you checked here if (remaining == null) return p;? to have a smaller if branch?

if (inlineJoin && aggregate.groupings().containsAll(remaining)) { // not expecting high groups cardinality
// It's an INLINEJOIN and all remaining attributes are groupings, which are already part of the IJ output (from the
// left-hand side).
if (aggregate.child() instanceof StubRelation stub) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this condition will need an update when we start supporting filters on aggs... if that guess is valid, I think a TODO is in order here. If you will add that TODO, for some (all, most I hope) I added // TODO Inlinestats:.... to find them quicker later.

return p;
}

private static LogicalPlan pruneColumns(Project project, AttributeSet.Builder used) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the meat of the PR, right?

p = remaining.isEmpty() || remaining.stream().allMatch(FieldAttribute.class::isInstance)
? emptyLocalRelation(project)
: new Project(project.source(), project.child(), remaining);
} else if (project.output().stream().allMatch(FieldAttribute.class::isInstance)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we interested in this aspect? (the fact that the projection is made of fields only)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants