Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: fix correlated aggregates which should be evaluated in outer query (#21431) #21877

Merged
merged 13 commits into from
Jan 28, 2021

Conversation

ti-srebot
Copy link
Contributor

cherry-pick #21431 to release-4.0


What problem does this PR solve?

Issue Number: close #18350, close #17748

Problem Summary:

  • Some correlated aggregate (belonging to outer query) should be evaluated in outer query instead of in subquery.
# These aggregates should be evaluated in outer query
select (select count(a)) from t;
select (select count(n.a) from t limit 1) from t n;
select (select 1 from t where count(n.a) > 1 limit 1) from t n;
select (select 1 from t order by count(n.a) limit 1) from t n;
select (select 1 from t having count(n.a) > 1 limit 1) from t n;
select (select cnt from (select count(a) as cnt) n) from t;

# These aggregates should be evaluated in sub-query
select (select count(a + n.a) from t) from t n;
select (select count(a) from t) from t n;
  • Cannot handle nested aggregates.
select (select sum(count(a))) from t; # report "invalid group function use"

What is changed and how it works?

What's Changed:

  • Add a procedure resolveCorrelatedAggregates before buildAggregation to collect correlated aggregates from sub-queries.
  • Add cache for ResultSetNode to avoid rebuilding plans.
  • Add correlatedAggMap for sub-queries to build correct plans.

How it Works:

  1. For example, we have
create table t (a int, b int);
  1. Start from query:
select (select count(n.a) from t limit 1) from t n;
  1. Inside b.resolveCorrelatedAggregates, count(n.a) will be recognized as correlated aggregate since column n.a is from outer schema (from table n instead of t). So count(n.a) is collected and returned to outer buildSelect, now the query is like:
select (select count(n.a) from t limit 1), count(n.a) as `sel_subq_agg_1` from t n;
  1. Inside b.buildAggregation, a correlated column for sel_subq_agg_1 will be created and added to b.correlatedAggMap. Then b.buildProjection will build plan for sub-query (select count(n.a) from t) limit 1.

  2. The sub-query will recognize count(n.a) as correlated aggregate and skip it the process of building aggregate plan. Then inside b.buildProjection of the sub-query, count(n.a) will be rewritten as a correlated column to sel_subq_agg_1, now the query is like:

select (select CorrelatedColumn{`sel_subq_agg_1`} from t limit 1), count(n.a) as `sel_subq_agg_1` from t n;
  1. Finally we get the correct query plan.

Related changes

  • Need to cherry-pick to the release branch 4.0

Check List

Tests

  • Unit test
  • Integration test

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note

  • Fix correlated aggregates which should be evaluated in outer query instead of in subqueries.
  • Support nested aggregate in subqueries (correlated aggregates inside normal aggregate).

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot
Copy link
Contributor Author

@dyzsr you're already a collaborator in bot's repo.

@dyzsr
Copy link
Contributor

dyzsr commented Jan 2, 2021

/run-all-tests tidb-test=pr/1147

@jebter jebter modified the milestones: 4.0.0, v4.0.10 Jan 7, 2021
@jebter jebter modified the milestones: v4.0.10, v4.0.11 Jan 18, 2021
@dyzsr
Copy link
Contributor

dyzsr commented Jan 20, 2021

/run-all-tests tidb-test=pr/1147

@dyzsr
Copy link
Contributor

dyzsr commented Jan 20, 2021

/run-integration-copr-test tidb-test=pr/1147

@qw4990
Copy link
Contributor

qw4990 commented Jan 27, 2021

/run-all-tests

@qw4990
Copy link
Contributor

qw4990 commented Jan 28, 2021

/run-all-tests

@dyzsr
Copy link
Contributor

dyzsr commented Jan 28, 2021

/run-common-test tidb-test=pr/1147
/run-integration-common-test tidb-test=pr/1147

@dyzsr
Copy link
Contributor

dyzsr commented Jan 28, 2021

/run-integration-common-test tidb-test=pr/1147

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@dyzsr
Copy link
Contributor

dyzsr commented Jan 28, 2021

/run-all-tests

@qw4990
Copy link
Contributor

qw4990 commented Jan 28, 2021

/run-all-tests

@qw4990
Copy link
Contributor

qw4990 commented Jan 28, 2021

/run-mybatis-test

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jan 28, 2021
@ti-srebot ti-srebot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jan 28, 2021
@qw4990 qw4990 merged commit 80d420d into pingcap:release-4.0 Jan 28, 2021
@winoros winoros deleted the release-4.0-f687ebd91ce0 branch January 28, 2021 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug. type/4.0-cherry-pick
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants