[feature](nereids)support correlated scalar subquery without scalar agg#39471
[feature](nereids)support correlated scalar subquery without scalar agg#39471starocean999 merged 8 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-H: Total hot run time: 38175 ms |
ae4c653 to
5805433
Compare
|
run buildall |
TPC-H: Total hot run time: 37897 ms |
TPC-DS: Total hot run time: 195603 ms |
ClickBench: Total hot run time: 31.51 s |
|
run buildall |
TPC-H: Total hot run time: 38022 ms |
TPC-DS: Total hot run time: 196557 ms |
ClickBench: Total hot run time: 31.69 s |
| if (child instanceof LogicalProject) { | ||
| // keep NoneMovableFunction for later use | ||
| for (NamedExpression output : ((LogicalProject<?>) child).getOutputs()) { | ||
| if (output.containsType(NoneMovableFunction.class)) { | ||
| childRequiredSlotBuilder.add(output.toSlot()); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
i think u should change this interface org.apache.doris.nereids.trees.plans.logical.LogicalProject#pruneOutputs
| // unnest correlated scalar subquery may add count(*) and any_value() to project list | ||
| // then there may be more than one expr, so we add all project exprs here |
There was a problem hiding this comment.
should we do more check here?
|
|
||
| public ScalarSubquery(LogicalPlan subquery) { | ||
| super(Objects.requireNonNull(subquery, "subquery can not be null")); | ||
| this(Objects.requireNonNull(subquery, "subquery can not be null"), ImmutableList.of()); |
There was a problem hiding this comment.
| this(Objects.requireNonNull(subquery, "subquery can not be null"), ImmutableList.of()); | |
| this(subquery, ImmutableList.of()); |
| } else { | ||
| return false; | ||
| } | ||
| } else if (plan instanceof LogicalJoin) { |
There was a problem hiding this comment.
why only join return false directly? could u add some comment to explain it? what about other operator, such as sort, limit, window or generate?
| // check if the query has top level scalar agg | ||
| // if the correlated subquery doesn't have top level scalar agg | ||
| // we need create one in subquery unnesting step | ||
| private static boolean findTopLevelScalarAgg(Plan plan) { |
There was a problem hiding this comment.
add fe ut for this static function
| Map<SubqueryExpr, Optional<MarkJoinSlotReference>> subqueryToMarkJoinSlot, | ||
| CascadesContext ctx, Optional<Expression> conjunct, | ||
| boolean isProject, boolean singleSubquery, boolean isMarkJoinSlotNotNull) { | ||
| private Pair<LogicalPlan, Optional<Expression>> addApply(SubqueryExpr subquery, |
There was a problem hiding this comment.
add fe ut for this function
| // left child | ||
| projects.addAll(childPlan.getOutput()); | ||
| // markJoinSlotReference | ||
| projects.addAll(markJoinSlot.isPresent() ? ImmutableList.of(markJoinSlot.get()) : ImmutableList.of()); |
There was a problem hiding this comment.
| projects.addAll(markJoinSlot.isPresent() ? ImmutableList.of(markJoinSlot.get()) : ImmutableList.of()); | |
| markJoinSlot.map(projects::add); |
| .build(); | ||
|
|
||
| return new LogicalProject(projects, newApply); | ||
| ImmutableList.Builder<NamedExpression> projects = ImmutableList.builder(); |
There was a problem hiding this comment.
| ImmutableList.Builder<NamedExpression> projects = ImmutableList.builder(); | |
| ImmutableList.Builder<NamedExpression> projects = ImmutableList.builderWithExpectedSize(childPlan.getOutput().size() + 3); |
| Optional<MarkJoinSlotReference> markJoinSlot = subqueryToMarkJoinSlot.get(subquery); | ||
| boolean needAddScalarSubqueryOutputToProjects = isConjunctContainsScalarSubqueryOutput( | ||
| subquery, conjunct, isProject, singleSubquery); | ||
| boolean useNewSubquery = false; |
There was a problem hiding this comment.
| boolean useNewSubquery = false; | |
| boolean needRuntimeAssertCount = false; |
| LogicalAggregate<Plan> aggregate = new LogicalAggregate<>(ImmutableList.of(), | ||
| ImmutableList.of(countAlias, anyValueAlias), subquery.getQueryPlan()); |
There was a problem hiding this comment.
group by will be add later?
5805433 to
8aa3e0c
Compare
|
run buildall |
TPC-H: Total hot run time: 37919 ms |
TPC-DS: Total hot run time: 192664 ms |
ClickBench: Total hot run time: 32.23 s |
3ceda99 to
fdc19c1
Compare
|
run buildall |
TPC-H: Total hot run time: 38117 ms |
TPC-DS: Total hot run time: 187253 ms |
ClickBench: Total hot run time: 31.8 s |
|
run buildall |
TPC-H: Total hot run time: 37897 ms |
TPC-DS: Total hot run time: 192558 ms |
ClickBench: Total hot run time: 32.06 s |
6c77532 to
21ea260
Compare
acac3c5 to
a7dbcbc
Compare
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 41344 ms |
TPC-DS: Total hot run time: 195561 ms |
ClickBench: Total hot run time: 32.15 s |
when exists multiple subquery in a query block
```sql
select case
when t1.k1=1 then (select count(*) from t2 where t1.k2=t2.k2) -- count(*)#7
when t1.k1=2 then (select count(*) from t3 where t1.k2=t3.k2) -- count(*)#12
else 0 end as kk
from t1
order by kk
```
the aggregate function output maybe bind failed and then throw exception
```
org.apache.doris.nereids.exceptions.AnalysisException: Input slot(s) not in child's output: count(*)#7
```
because we only replace the last aggregate function slot(only replace
the `#12`, but not replace the `#7` ):
this pr fix subquery unnest can not found aggregate slot, introduced by
#39471
when exists multiple subquery in a query block
```sql
select case
when t1.k1=1 then (select count(*) from t2 where t1.k2=t2.k2) -- count(*)#7
when t1.k1=2 then (select count(*) from t3 where t1.k2=t3.k2) -- count(*)#12
else 0 end as kk
from t1
order by kk
```
the aggregate function output maybe bind failed and then throw exception
```
org.apache.doris.nereids.exceptions.AnalysisException: Input slot(s) not in child's output: count(*)#7
```
because we only replace the last aggregate function slot(only replace
the `#12`, but not replace the `#7` ):
this pr fix subquery unnest can not found aggregate slot, introduced by
#39471
…he#51086) when exists multiple subquery in a query block ```sql select case when t1.k1=1 then (select count(*) from t2 where t1.k2=t2.k2) -- count(*)apache#7 when t1.k1=2 then (select count(*) from t3 where t1.k2=t3.k2) -- count(*)apache#12 else 0 end as kk from t1 order by kk ``` the aggregate function output maybe bind failed and then throw exception ``` org.apache.doris.nereids.exceptions.AnalysisException: Input slot(s) not in child's output: count(*)apache#7 ``` because we only replace the last aggregate function slot(only replace the `apache#12`, but not replace the `apache#7` ): this pr fix subquery unnest can not found aggregate slot, introduced by apache#39471
support correlated scalar subquery without scalar agg like:
select t1.c1 from t1 where t1.c2 > (select t2.c2 from t2 where t1.c1 = t2.c1);after this pr, nereids would produce a correct plan for above sql.