Skip to content

planner should support skew distinct aggregate rewrite #36169

Closed
@fixdb

Description

Enhancement

For the following kind of query:

select S_NATIONKEY as s, 
  count(S_SUPPKEY), 
  count(distinct S_NAME) 
from supplier
group by s;

If the group key is highly skewed and the distinct key has large number of distinct values (a.k.a. high cardinality), the query execution will be slow.

We should be able to rewrite the above query to the following query to avoid skew:

select S_NATIONKEY as s, 
  sum(cnt_suppkey), 
  count(S_NAME) 
from (
    select S_NATIONKEY, S_NAME, count(S_SUPPKEY) as cnt_suppkey
    from supplier
    group by S_NATIONKEY, S_NAME
) as T
group by s;

Metadata

Assignees

Labels

type/enhancementThe issue or PR belongs to an enhancement.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions