Skip to content

ES|QL: Make Fork n-ary #126074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 3, 2025
Merged

ES|QL: Make Fork n-ary #126074

merged 2 commits into from
Apr 3, 2025

Conversation

ioanatia
Copy link
Contributor

@ioanatia ioanatia commented Apr 1, 2025

One piece of feedback we got from #121948 was that Fork is not a Nary plan which further complicates the implementation with special treatment of Fork in the analyzer/optimizer/verifier.

We change the planning of Fork by making it N-ary, while also solving some of the outstanding issues:

  • we fix the pre-analysis step where we gather the used field names, this following handling of FORK is not needed anymore:

if (parsed.anyMatch(plan -> plan instanceof Fork)) {
return result.withFieldNames(IndexResolver.ALL_FIELDS);
}

  • for semantic search, we require rewriting the query builder for the match function on the coordinator. Initially we added special handling of Fork in QueryBuilderResolver. This is now removed since it is no longer necessary. The functionality is tested through csv tests and continues to work.

  • we remove most of the special handling in the analyzer for Fork - what remains is still AddImplicitForkLimit which is needed to add the default limit for each sub plan.

  • we remove the special handling of Fork in the verifier - this will help when we want to add support for allowing more than just where/limit/sort in the Fork branches. Without this change we would need to add more special handling of Fork to do proper validation.

@ioanatia ioanatia added >non-issue :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0 labels Apr 1, 2025
var mappedChild = new FragmentExec(child);
physicalChildren.add(mappedChild);
}
return new MergeExec(merge.source(), physicalChildren, merge.output());
return new MergeExec(fork.source(), physicalChildren, fork.output());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have kept the name MergeExec because from an execution standpoint we are merging results from multiple sub plans.
When we add support for multiple forks (fork after fork) - we would likely need a ForkExec too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that make sense to me. ++

List<Enrich> unresolvedEnriches = new ArrayList<>();
List<TableInfo> lookupIndices = new ArrayList<>();

plan.forEachUp(UnresolvedRelation.class, p -> {
List<TableInfo> list = p.indexMode() == IndexMode.LOOKUP ? lookupIndices : indices;
list.add(new TableInfo(p.indexPattern()));
Collection<TableInfo> collection = p.indexMode() == IndexMode.LOOKUP ? lookupIndices : indices;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to make this change and the ones in TableInfo because otherwise queries using FORK would fail with the following errror:

// TODO we plan to support joins in the future when possible, but for now we'll just fail early if we see one
if (indices.size() > 1) {
// Note: JOINs are not supported but we detect them when
listener.onFailure(new MappingException("Queries with multiple indices are not supported"));
} else if (indices.size() == 1) {

Note that this only ensures that we collect distinct values for indices, not lookupIndices, meaning there should be no effect on the lookup join functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to. I don't remember exactly, but I did run into this in the past but then didn't need it for a separate reason.

@ioanatia ioanatia marked this pull request as ready for review April 1, 2025 21:07
@ioanatia ioanatia requested a review from ChrisHegarty April 1, 2025 21:07
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Apr 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@costin costin requested review from bpintea and removed request for costin April 2, 2025 14:49
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change and addresses a number of follow-up tasks that arose during the initial FORK implementation. LGTM

@@ -1458,23 +1418,6 @@ private static Expression castStringLiteral(Expression from, DataType target) {
}
}

private static class ImplicitForkCasting extends ParameterizedRule<LogicalPlan, LogicalPlan, AnalyzerContext> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing these XXForkXX rules, I was hoping that we could be get to this point when I added them originally.

List<Enrich> unresolvedEnriches = new ArrayList<>();
List<TableInfo> lookupIndices = new ArrayList<>();

plan.forEachUp(UnresolvedRelation.class, p -> {
List<TableInfo> list = p.indexMode() == IndexMode.LOOKUP ? lookupIndices : indices;
list.add(new TableInfo(p.indexPattern()));
Collection<TableInfo> collection = p.indexMode() == IndexMode.LOOKUP ? lookupIndices : indices;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to. I don't remember exactly, but I did run into this in the past but then didn't need it for a separate reason.

var mappedChild = new FragmentExec(child);
physicalChildren.add(mappedChild);
}
return new MergeExec(merge.source(), physicalChildren, merge.output());
return new MergeExec(fork.source(), physicalChildren, fork.output());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that make sense to me. ++

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ioanatia ioanatia merged commit bd0a75d into elastic:main Apr 3, 2025
17 checks passed
@ioanatia ioanatia deleted the fork_n_ary_alt branch April 3, 2025 15:09
@fang-xing-esql
Copy link
Member

fang-xing-esql commented Apr 3, 2025

Apologize for the late comment, as fork is still under snapshot, it is fine to address some potential issues as follow ups. We will need to be aware of the limitations of full text functions when generating the children of forks, here is an example

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from sample_data | limit 1 | fork (where message:\"Disconnected\") (where message:\"Connected\")"
}
'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "verification_exception",
        "reason" : "Found 2 problems\nline 1:36: [:] operator cannot be used after LIMIT\nline 1:67: [:] operator cannot be used after LIMIT"
      }
    ],
    "type" : "verification_exception",
    "reason" : "Found 2 problems\nline 1:36: [:] operator cannot be used after LIMIT\nline 1:67: [:] operator cannot be used after LIMIT"
  },
  "status" : 400
}

Its plan is like below:
[2025-04-03T12:53:57,109][DEBUG][o.e.x.e.a.Analyzer       ] [runTask-0] Tree transformation took 1ms
Fork[]                                        ! Limit[10000[INTEGER],false]
|_Eval[[fork1[KEYWORD] AS _fork]]             ! \_Fork[]
| \_Filter[:(?message,Disconnected[KEYWORD])] !   |_Limit[10000[INTEGER],false]
|   \_Limit[1[INTEGER],false]                 !   | \_Eval[[fork1[KEYWORD] AS _fork]]
|     \_UnresolvedRelation[sample_data]       !   |   \_Filter[:(message{f}#624,Disconnected[KEYWORD])]
\_Eval[[fork2[KEYWORD] AS _fork]]             !   |     \_Limit[1[INTEGER],false]
  \_Filter[:(?message,Connected[KEYWORD])]    !   |       \_EsRelation[sample_data][@timestamp{f}#618, client.ip{f}#620, event.duration..]
    \_Limit[1[INTEGER],false]                 !   \_Limit[10000[INTEGER],false]
      \_UnresolvedRelation[sample_data]       !     \_Eval[[fork2[KEYWORD] AS _fork]]
                                              !       \_Filter[:(message{f}#631,Connected[KEYWORD])]
                                              !         \_Limit[1[INTEGER],false]
                                              !           \_EsRelation[sample_data][@timestamp{f}#625, client.ip{f}#627, event.duration..]

Not all of the commands are allowed to be before full text function, that's how we come across this issue, do we want to push fork before the limit or just error out?

@ioanatia
Copy link
Contributor Author

ioanatia commented Apr 4, 2025

We should just error out in this case. I think the current behaviour is the one we should expect.
Before we take FORK out of snapshot we will check to see if we missed any extra validations.
Also as a side note, we should check if there are any other things we can improve on the full text validation side, in the long run, it's not manageable to always adjust the validation for full text functions every time we add a new command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants