ES|QL: Make Fork n-ary #126074

ioanatia · 2025-04-01T19:28:09Z

One piece of feedback we got from #121948 was that Fork is not a Nary plan which further complicates the implementation with special treatment of Fork in the analyzer/optimizer/verifier.

We change the planning of Fork by making it N-ary, while also solving some of the outstanding issues:

we fix the pre-analysis step where we gather the used field names, this following handling of FORK is not needed anymore:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

Lines 602 to 604 in 0d64aab

    
           if (parsed.anyMatch(plan -> plan instanceof Fork)) { 
        
               return result.withFieldNames(IndexResolver.ALL_FIELDS); 
        
           }

for semantic search, we require rewriting the query builder for the match function on the coordinator. Initially we added special handling of Fork in QueryBuilderResolver. This is now removed since it is no longer necessary. The functionality is tested through csv tests and continues to work.
we remove most of the special handling in the analyzer for Fork - what remains is still AddImplicitForkLimit which is needed to add the default limit for each sub plan.
we remove the special handling of Fork in the verifier - this will help when we want to add support for allowing more than just where/limit/sort in the Fork branches. Without this change we would need to add more special handling of Fork to do proper validation.

ioanatia · 2025-04-01T19:29:51Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java

            var mappedChild = new FragmentExec(child);
            physicalChildren.add(mappedChild);
        }
-        return new MergeExec(merge.source(), physicalChildren, merge.output());
+        return new MergeExec(fork.source(), physicalChildren, fork.output());


I have kept the name MergeExec because from an execution standpoint we are merging results from multiple sub plans.
When we add support for multiple forks (fork after fork) - we would likely need a ForkExec too.

that make sense to me. ++

ioanatia · 2025-04-01T21:06:18Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/PreAnalyzer.java

        List<Enrich> unresolvedEnriches = new ArrayList<>();
        List<TableInfo> lookupIndices = new ArrayList<>();

        plan.forEachUp(UnresolvedRelation.class, p -> {
-            List<TableInfo> list = p.indexMode() == IndexMode.LOOKUP ? lookupIndices : indices;
-            list.add(new TableInfo(p.indexPattern()));
+            Collection<TableInfo> collection = p.indexMode() == IndexMode.LOOKUP ? lookupIndices : indices;


I needed to make this change and the ones in TableInfo because otherwise queries using FORK would fail with the following errror:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

Lines 434 to 438 in d302962

// TODO we plan to support joins in the future when possible, but for now we'll just fail early if we see one

if (indices.size() > 1) {

// Note: JOINs are not supported but we detect them when

listener.onFailure(new MappingException("Queries with multiple indices are not supported"));

} else if (indices.size() == 1) {

Note that this only ensures that we collect distinct values for indices, not lookupIndices, meaning there should be no effect on the lookup join functionality.

This makes sense to. I don't remember exactly, but I did run into this in the past but then didn't need it for a separate reason.

elasticsearchmachine · 2025-04-01T21:07:54Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

ChrisHegarty

This is a good change and addresses a number of follow-up tasks that arose during the initial FORK implementation. LGTM

ChrisHegarty · 2025-04-03T07:55:13Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

@@ -1458,23 +1418,6 @@ private static Expression castStringLiteral(Expression from, DataType target) {
        }
    }

-    private static class ImplicitForkCasting extends ParameterizedRule<LogicalPlan, LogicalPlan, AnalyzerContext> {


Thanks for removing these XXForkXX rules, I was hoping that we could be get to this point when I added them originally.

ChrisHegarty · 2025-04-03T08:09:29Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/PreAnalyzer.java

        List<Enrich> unresolvedEnriches = new ArrayList<>();
        List<TableInfo> lookupIndices = new ArrayList<>();

        plan.forEachUp(UnresolvedRelation.class, p -> {
-            List<TableInfo> list = p.indexMode() == IndexMode.LOOKUP ? lookupIndices : indices;
-            list.add(new TableInfo(p.indexPattern()));
+            Collection<TableInfo> collection = p.indexMode() == IndexMode.LOOKUP ? lookupIndices : indices;


This makes sense to. I don't remember exactly, but I did run into this in the past but then didn't need it for a separate reason.

ChrisHegarty · 2025-04-03T08:12:36Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java

            var mappedChild = new FragmentExec(child);
            physicalChildren.add(mappedChild);
        }
-        return new MergeExec(merge.source(), physicalChildren, merge.output());
+        return new MergeExec(fork.source(), physicalChildren, fork.output());


that make sense to me. ++

bpintea

LGTM

fang-xing-esql · 2025-04-03T17:00:05Z

Apologize for the late comment, as fork is still under snapshot, it is fine to address some potential issues as follow ups. We will need to be aware of the limitations of full text functions when generating the children of forks, here is an example

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from sample_data | limit 1 | fork (where message:\"Disconnected\") (where message:\"Connected\")"
}
'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "verification_exception",
        "reason" : "Found 2 problems\nline 1:36: [:] operator cannot be used after LIMIT\nline 1:67: [:] operator cannot be used after LIMIT"
      }
    ],
    "type" : "verification_exception",
    "reason" : "Found 2 problems\nline 1:36: [:] operator cannot be used after LIMIT\nline 1:67: [:] operator cannot be used after LIMIT"
  },
  "status" : 400
}

Its plan is like below:
[2025-04-03T12:53:57,109][DEBUG][o.e.x.e.a.Analyzer       ] [runTask-0] Tree transformation took 1ms
Fork[]                                        ! Limit[10000[INTEGER],false]
|_Eval[[fork1[KEYWORD] AS _fork]]             ! \_Fork[]
| \_Filter[:(?message,Disconnected[KEYWORD])] !   |_Limit[10000[INTEGER],false]
|   \_Limit[1[INTEGER],false]                 !   | \_Eval[[fork1[KEYWORD] AS _fork]]
|     \_UnresolvedRelation[sample_data]       !   |   \_Filter[:(message{f}#624,Disconnected[KEYWORD])]
\_Eval[[fork2[KEYWORD] AS _fork]]             !   |     \_Limit[1[INTEGER],false]
  \_Filter[:(?message,Connected[KEYWORD])]    !   |       \_EsRelation[sample_data][@timestamp{f}#618, client.ip{f}#620, event.duration..]
    \_Limit[1[INTEGER],false]                 !   \_Limit[10000[INTEGER],false]
      \_UnresolvedRelation[sample_data]       !     \_Eval[[fork2[KEYWORD] AS _fork]]
                                              !       \_Filter[:(message{f}#631,Connected[KEYWORD])]
                                              !         \_Limit[1[INTEGER],false]
                                              !           \_EsRelation[sample_data][@timestamp{f}#625, client.ip{f}#627, event.duration..]

Not all of the commands are allowed to be before full text function, that's how we come across this issue, do we want to push fork before the limit or just error out?

ioanatia · 2025-04-04T07:53:40Z

We should just error out in this case. I think the current behaviour is the one we should expect.
Before we take FORK out of snapshot we will check to see if we missed any extra validations.
Also as a side note, we should check if there are any other things we can improve on the full text validation side, in the long run, it's not manageable to always adjust the validation for full text functions every time we add a new command.

Make Fork n-ary

b692c60

ioanatia added >non-issue :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0 labels Apr 1, 2025

ioanatia commented Apr 1, 2025

View reviewed changes

ioanatia marked this pull request as ready for review April 1, 2025 21:07

ioanatia requested a review from ChrisHegarty April 1, 2025 21:07

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Apr 1, 2025

ioanatia requested review from costin and fang-xing-esql April 1, 2025 21:07

ioanatia mentioned this pull request Apr 1, 2025

FORK - allow EVAL/DISSECT/STATS in branches #125937

Merged

costin requested review from bpintea and removed request for costin April 2, 2025 14:49

ChrisHegarty approved these changes Apr 3, 2025

View reviewed changes

Merge branch 'main' into fork_n_ary_alt

8cc3749

bpintea approved these changes Apr 3, 2025

View reviewed changes

ioanatia merged commit bd0a75d into elastic:main Apr 3, 2025
17 checks passed

ioanatia deleted the fork_n_ary_alt branch April 3, 2025 15:09

andreidan pushed a commit to andreidan/elasticsearch that referenced this pull request Apr 9, 2025

Make Fork n-ary (elastic#126074)

cf3a488

ioanatia mentioned this pull request Apr 10, 2025

Initial FORK with restrictions #121950

Open

20 tasks

ioanatia mentioned this pull request Apr 23, 2025

ES|QL: tests for FORK's evaluation of field names used in field_caps resolve calls #127208

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ES|QL: Make Fork n-ary #126074

ES|QL: Make Fork n-ary #126074

Uh oh!

ioanatia commented Apr 1, 2025 •

edited

Loading

Uh oh!

ioanatia Apr 1, 2025

Uh oh!

ChrisHegarty Apr 3, 2025

Uh oh!

ioanatia Apr 1, 2025

Uh oh!

ChrisHegarty Apr 3, 2025

Uh oh!

elasticsearchmachine commented Apr 1, 2025

Uh oh!

ChrisHegarty left a comment

Uh oh!

ChrisHegarty Apr 3, 2025

Uh oh!

ChrisHegarty Apr 3, 2025

Uh oh!

ChrisHegarty Apr 3, 2025

Uh oh!

bpintea left a comment

Uh oh!

Uh oh!

fang-xing-esql commented Apr 3, 2025 •

edited

Loading

Uh oh!

ioanatia commented Apr 4, 2025

Uh oh!

Uh oh!

	if (parsed.anyMatch(plan -> plan instanceof Fork)) {
	return result.withFieldNames(IndexResolver.ALL_FIELDS);
	}

	// TODO we plan to support joins in the future when possible, but for now we'll just fail early if we see one
	if (indices.size() > 1) {
	// Note: JOINs are not supported but we detect them when
	listener.onFailure(new MappingException("Queries with multiple indices are not supported"));
	} else if (indices.size() == 1) {

ES|QL: Make Fork n-ary #126074

ES|QL: Make Fork n-ary #126074

Uh oh!

Conversation

ioanatia commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ioanatia Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisHegarty Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

ioanatia Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisHegarty Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Apr 1, 2025

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

ChrisHegarty Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisHegarty Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisHegarty Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

bpintea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fang-xing-esql commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ioanatia commented Apr 4, 2025

Uh oh!

Uh oh!

ioanatia commented Apr 1, 2025 •

edited

Loading

fang-xing-esql commented Apr 3, 2025 •

edited

Loading