Skip to content

ES|QL: Support STATS after FORK #128745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ioanatia
Copy link
Contributor

@ioanatia ioanatia commented Jun 2, 2025

related: #121950

It was reported that using STATS immediately after FORK results in an error.

Example:

FROM employees | FORK (EVAL x = 1) ( EVAL y = 1) | stats max(salary)

ends up with

"stack_trace": "org.elasticsearch.ElasticsearchException$1: Cannot invoke \"org.elasticsearch.xpack.esql.planner.Layout$ChannelAndType.channel()\" because the return value of \"org.elasticsearch.xpack.esql.planner.Layout.get(org.elasticsearch.xpack.esql.core.expression.NameId)\" is null

The problem has to do with how we set the aggregatorMode in AbstractPhysicalOperationProviders.javagroupingPhysicalOperation. We just assume that if the AggregateExec follows a ExchangeSourceExec, we should set the aggregatorMode to INTERMEDIARY because we are on the data node reduce step.
With the introduction of FORK and #126389 this is no longer always true, so we explicitly check if the ExchangeSourceExec is indeed between aggregations.

@ioanatia ioanatia added >non-issue :Analytics/ES|QL AKA ESQL :Search Relevance/Search Catch all for Search Relevance Team:Search - Relevance The Search organization Search Relevance team labels Jun 2, 2025
@ioanatia ioanatia marked this pull request as ready for review June 2, 2025 13:21
Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@astefan astefan added >bug and removed >non-issue labels Jun 2, 2025
@astefan astefan requested review from fang-xing-esql and removed request for astefan June 2, 2025 15:04
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Jun 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ioanatia, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine removed the Team:Search - Relevance The Search organization Search Relevance team label Jun 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@ioanatia
Copy link
Contributor Author

ioanatia commented Jun 2, 2025

I am putting back the non-issue label, because we don't want a release note for this particular PR.
FORK is still in snapshot and release notes are used in user facing docs.
I don't think we need to document that we fixed a bug for a feature that was not released yet.

if (aggregatorMode == AggregatorMode.INITIAL && aggregateExec.child() instanceof ExchangeSourceExec) {
if (aggregatorMode == AggregatorMode.INITIAL
&& aggregateExec.child() instanceof ExchangeSourceExec exchangeSourceExec
&& exchangeSourceExec.isIntermediateAgg()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks fine to me, but I'd like to get another pair of eyes from @alex-spies to double check if the aggregatorMode is set properly, as this was introduced by Alex originally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason why we added this check is because a query like:

FROM test
| FORK
    ( WHERE content:"fox" ) // sub plan 1
    ( WHERE content:"dog" ) // sub plan 2
| SORT _fork
| KEEP _fork, id, content

will now be split into multiple sub plans using ExchangeSourceExec and ExchangeSinkExec:

fork_planning

and so we need to explicitly check with what type of ExchangeSourceExec we are dealing with.

@ioanatia ioanatia mentioned this pull request Jun 2, 2025
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue :Search Relevance/Search Catch all for Search Relevance Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants