Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: fix correlated aggregates which should be evaluated in outer query #21431

Merged
merged 38 commits into from
Dec 18, 2020
Merged

planner: fix correlated aggregates which should be evaluated in outer query #21431

merged 38 commits into from
Dec 18, 2020

Conversation

dyzsr
Copy link
Contributor

@dyzsr dyzsr commented Dec 2, 2020

What problem does this PR solve?

Issue Number: close #18350, close #17748

Problem Summary:

  • Some correlated aggregate (belonging to outer query) should be evaluated in outer query instead of in subquery.
# These aggregates should be evaluated in outer query
select (select count(a)) from t;
select (select count(n.a) from t limit 1) from t n;
select (select 1 from t where count(n.a) > 1 limit 1) from t n;
select (select 1 from t order by count(n.a) limit 1) from t n;
select (select 1 from t having count(n.a) > 1 limit 1) from t n;
select (select cnt from (select count(a) as cnt) n) from t;

# These aggregates should be evaluated in sub-query
select (select count(a + n.a) from t) from t n;
select (select count(a) from t) from t n;
  • Cannot handle nested aggregates.
select (select sum(count(a))) from t; # report "invalid group function use"

What is changed and how it works?

What's Changed:

  • Add a procedure resolveCorrelatedAggregates before buildAggregation to collect correlated aggregates from sub-queries.
  • Add cache for ResultSetNode to avoid rebuilding plans.
  • Add correlatedAggMap for sub-queries to build correct plans.

How it Works:

  1. For example, we have
create table t (a int, b int);
  1. Start from query:
select (select count(n.a) from t limit 1) from t n;
  1. Inside b.resolveCorrelatedAggregates, count(n.a) will be recognized as correlated aggregate since column n.a is from outer schema (from table n instead of t). So count(n.a) is collected and returned to outer buildSelect, now the query is like:
select (select count(n.a) from t limit 1), count(n.a) as `sel_subq_agg_1` from t n;
  1. Inside b.buildAggregation, a correlated column for sel_subq_agg_1 will be created and added to b.correlatedAggMap. Then b.buildProjection will build plan for sub-query (select count(n.a) from t) limit 1.

  2. The sub-query will recognize count(n.a) as correlated aggregate and skip it the process of building aggregate plan. Then inside b.buildProjection of the sub-query, count(n.a) will be rewritten as a correlated column to sel_subq_agg_1, now the query is like:

select (select CorrelatedColumn{`sel_subq_agg_1`} from t limit 1), count(n.a) as `sel_subq_agg_1` from t n;
  1. Finally we get the correct query plan.

Related changes

  • Need to cherry-pick to the release branch 4.0

Check List

Tests

  • Unit test
  • Integration test

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note

  • Fix correlated aggregates which should be evaluated in outer query instead of in subqueries.
  • Support nested aggregate in subqueries (correlated aggregates inside normal aggregate).

@dyzsr dyzsr requested a review from a team as a code owner December 2, 2020 06:56
@dyzsr dyzsr requested review from winoros and removed request for a team December 2, 2020 06:56
@sre-bot
Copy link
Contributor

sre-bot commented Dec 2, 2020

@dyzsr dyzsr marked this pull request as draft December 2, 2020 06:56
@ichn-hu ichn-hu mentioned this pull request Dec 2, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Dec 3, 2020

@github-actions github-actions bot added the sig/execution SIG execution label Dec 4, 2020
@ti-srebot ti-srebot added the sig/planner SIG: Planner label Dec 8, 2020
Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

planner/core/logical_plan_builder.go Outdated Show resolved Hide resolved
planner/core/logical_plan_builder.go Show resolved Hide resolved
@dyzsr
Copy link
Contributor Author

dyzsr commented Dec 18, 2020

/run-all-tests tidb-test=pr/1124

Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Dec 18, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Dec 18, 2020
@lzmhhh123
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 18, 2020
@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 21811
  • 21854
  • 21844

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot ti-srebot merged commit f687ebd into pingcap:master Dec 18, 2020
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Dec 18, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #21877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
6 participants