Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] window function optimization, support rank (5) #6120

Merged
merged 7 commits into from
May 28, 2022

Conversation

liuyehcf
Copy link
Contributor

@liuyehcf liuyehcf commented May 15, 2022

What type of PR is this:

  • bug
  • feature
  • enhancement
  • others

Which issues of this PR fixes :

Fixes #5885

Enhancement

select * from (
    select *, rank() over (partition by v2 order by v3) as rk from t0
) sub_t0
where rk < 5;

For rank window function, including rankdense_rankrow_number, if it has a related predicate (rk < 5), then it can be optimized by inserting a PartitionTopN operator before the Sort operator of window function.

The main purpose of PartitionTopN is to filter data, and it's output still remain unordered. It consists of three components:

  • partitioner:Divide the input chunk based on the partition exprs
  • sorter(topn):Each partition has an instance of sorter and is sorted independently
  • gather:fetch chunks from all sorters into one data stream, so the data is still unordered after gahtering. Moreover, gather is a only logical concetp, not an actual component


                                   ┌────► topn─────┐
                                   │               │
 (unordered)                       │               │                  (unordered)
 inputChunks ───────► partitioner ─┼────► topn ────┼─► gather ─────► outputChunks
                                   │               │
                                   │               │
                                   └────► topn ────┘

The implementation on the optimizer and the executor is a little different:

  • In optimizer, for simplicity, we do not define a new pair of {Logical/Physical}PartitionTopNOperator but reuse the existing {Logical/Physical}TopNOperator by adding a new field partitionByExprs to record the partition by information. Besides, we need to pay attentation to the following things:
    • Make sure that we cannot derive sort property from PartitionTopN
    • Make sure that ExchangeNode not set limit if PartitionTopN
  • In executor, we define a new pair of LocalPartitionTopN{Sink/Source}Operator, and we may use different implementation based on the field partitionByExprs
    • if partitionByExprs is unset or empty, then the original pair of PartitionSortSinkOperator/LocalMergeSortSourceOperator is used
    • if partitionByExprs is not empty, then pair of LocalPartitionTopN{Sink/Source}Operator is used

Tasks

  • Support component partitioner
    • only support one partition expr right now
  • Support PartitionTopN
  • Support row_number
  • Support rank(this pr)
    • by supporting TopN limit by rank
  • Support dense_rank
    • by supporting TopN limit by dense_rank
  • Support multi partition exprs

Performance Improvement of this pr

test Info

  • TPCDS-100g
  • 3 be of which is 64c/128g

test sql

-- query 67
select  *
from (select i_category
            ,i_class
            ,i_brand
            ,i_product_name
            ,d_year
            ,d_qoy
            ,d_moy
            ,s_store_id
            ,sumsales
            ,rank() over (partition by i_category order by sumsales desc) rk
      from (select i_category
                  ,i_class
                  ,i_brand
                  ,i_product_name
                  ,d_year
                  ,d_qoy
                  ,d_moy
                  ,s_store_id
                  ,sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales
            from store_sales
                ,date_dim
                ,store
                ,item
       where  ss_sold_date_sk=d_date_sk
          and ss_item_sk=i_item_sk
          and ss_store_sk = s_store_sk
          and d_month_seq between 1200 and 1200+11
       group by  rollup(i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy,s_store_id))dw1) dw2
where rk <= 100
order by i_category
        ,i_class
        ,i_brand
        ,i_product_name
        ,d_year
        ,d_qoy
        ,d_moy
        ,s_store_id
        ,sumsales
        ,rk
limit 100;

test result

before after
20s 9.5s

@liuyehcf liuyehcf force-pushed the analytic_rank_optimization_rankn branch 4 times, most recently from 4d8a682 to 61c6691 Compare May 20, 2022 07:42
@liuyehcf liuyehcf force-pushed the analytic_rank_optimization_rankn branch from 61c6691 to 1ab5b49 Compare May 23, 2022 09:10
@liuyehcf liuyehcf changed the title [Enhancement][WIP] window function optimization, support rank (4) [Enhancement][WIP] window function optimization, support rank (5) May 23, 2022
@liuyehcf liuyehcf changed the title [Enhancement][WIP] window function optimization, support rank (5) [Enhancement] window function optimization, support rank (5) May 23, 2022
@liuyehcf liuyehcf force-pushed the analytic_rank_optimization_rankn branch 5 times, most recently from ce724dc to c83c56e Compare May 25, 2022 02:13
@@ -23,45 +24,55 @@ public class LogicalTopNOperator extends LogicalOperator {
private final List<Ordering> orderByElements;
private final long offset;
private final SortPhase sortPhase;
private final TopNType topNType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is confusing, because the ROW_NUMBER, RANK and DENSE RANK are actually wundow functions, not TopNType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it will truely effect the behavior of topn, I've considered LimitType and TopNType, and TopNType is choosed finally. Do you have any naming suggestion?

@liuyehcf liuyehcf force-pushed the analytic_rank_optimization_rankn branch from bfa3a7b to ec3af72 Compare May 26, 2022 03:09
@liuyehcf liuyehcf force-pushed the analytic_rank_optimization_rankn branch from 55d4418 to f139873 Compare May 27, 2022 02:49
@liuyehcf liuyehcf requested a review from murphyatwork May 27, 2022 03:00
@liuyehcf liuyehcf force-pushed the analytic_rank_optimization_rankn branch from f139873 to bab17bb Compare May 27, 2022 03:15
@liuyehcf liuyehcf requested a review from Youngwb May 27, 2022 03:16
@wanpengfei-git
Copy link
Collaborator

[FE PR Coverage check]

😍 pass : 50 / 54 (92.59%)

file detail

path covered line new line coverage
🔵 com/starrocks/sql/optimizer/operator/TopNType.java 9 13 69.23%
🔵 com/starrocks/sql/optimizer/operator/logical/LogicalTopNOperator.java 11 11 100.00%
🔵 com/starrocks/sql/optimizer/rule/transformation/PushDownPredicateWindowRankRule.java 5 5 100.00%
🔵 com/starrocks/sql/optimizer/rule/implementation/TopNImplementationRule.java 1 1 100.00%
🔵 com/starrocks/sql/optimizer/base/SortProperty.java 1 1 100.00%
🔵 com/starrocks/sql/plan/PlanFragmentBuilder.java 8 8 100.00%
🔵 com/starrocks/planner/SortNode.java 7 7 100.00%
🔵 com/starrocks/sql/optimizer/rewrite/ExchangeSortToMergeRule.java 3 3 100.00%
🔵 com/starrocks/planner/ExchangeNode.java 2 2 100.00%
🔵 com/starrocks/sql/optimizer/operator/physical/PhysicalTopNOperator.java 2 2 100.00%
🔵 com/starrocks/sql/optimizer/rewrite/AddDecodeNodeForDictStringRule.java 1 1 100.00%

@liuyehcf liuyehcf requested a review from kangkaisen May 27, 2022 11:05
@Seaven Seaven merged commit b70ad45 into StarRocks:main May 28, 2022
abc982627271 pushed a commit to abc982627271/starrocks that referenced this pull request Jun 22, 2022
@liuyehcf liuyehcf deleted the analytic_rank_optimization_rankn branch July 21, 2022 13:05
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: amber-create <yangyanping@starrocks.com>
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: amber-create <yangyanping@starrocks.com>
(cherry picked from commit 4a5e6e3)

Co-authored-by: amber-create <48005258@qq.com>
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: amber-create <yangyanping@starrocks.com>
(cherry picked from commit 4a5e6e3)

Co-authored-by: amber-create <48005258@qq.com>
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: amber-create <yangyanping@starrocks.com>
(cherry picked from commit 4a5e6e3)

Co-authored-by: amber-create <48005258@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Rank window function optimization
6 participants