Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] rank window function optimization, add partition topn (2) #6118

Conversation

liuyehcf
Copy link
Contributor

@liuyehcf liuyehcf commented May 15, 2022

What type of PR is this:

  • bug
  • feature
  • enhancement
  • others

Which issues of this PR fixes :

Fixes #5885

Enhancement

select * from (
    select *, rank() over (partition by v2 order by v3) as rk from t0
) sub_t0
where rk < 5;

For rank window function, including rankdense_rankrow_number, if it has a related predicate (rk < 5), then it can be optimized by inserting a PartitionTopN operator before the Sort operator of window function.

The main purpose of PartitionTopN is to filter data, and it's output still remain unordered. It consists of three components:

  • partitioner:Divide the input chunk based on the partition exprs
  • sorter(topn):Each partition has an instance of sorter and is sorted independently
  • gather:fetch chunks from all sorters into one data stream, so the data is still unordered after gahtering. Moreover, gather is a only logical concetp, not an actual component


                                   ┌────► topn─────┐
                                   │               │
 (unordered)                       │               │                  (unordered)
 inputChunks ───────► partitioner ─┼────► topn ────┼─► gather ─────► outputChunks
                                   │               │
                                   │               │
                                   └────► topn ────┘

The implementation on the optimizer and the executor is a little different:

  • In optimizer, for simplicity, we do not define a new pair of {Logical/Physical}PartitionTopNOperator but reuse the existing {Logical/Physical}TopNOperator by adding a new field partitionByExprs to record the partition by information. Besides, we need to pay attentation to the following things:
    • Make sure that we cannot derive sort property from PartitionTopN
    • Make sure that ExchangeNode not set limit if PartitionTopN
  • In executor, we define a new pair of LocalPartitionTopN{Sink/Source}Operator, and we may use different implementation based on the field partitionByExprs
    • if partitionByExprs is unset or empty, then the original pair of PartitionSortSinkOperator/LocalMergeSortSourceOperator is used
    • if partitionByExprs is not empty, then pair of LocalPartitionTopN{Sink/Source}Operator is used

Tasks

  • Support component partitioner
    • only support one partition expr right now
  • Support PartitionTopN(this pr)
  • Support row_number
  • Support rank
    • by supporting TopN limit by rank
  • Support dense_rank
    • by supporting TopN limit by dense_rank
  • Support multi partition exprs

@liuyehcf liuyehcf force-pushed the analytic_rank_optimization_local_partition_topn branch 2 times, most recently from a76b8f4 to 6d3ff7c Compare May 16, 2022 09:26
@wanpengfei-git
Copy link
Collaborator

[FE PR Coverage check]

😍 pass : 0 / 0 (0%)

@liuyehcf liuyehcf force-pushed the analytic_rank_optimization_local_partition_topn branch from 6d3ff7c to cc67b12 Compare May 16, 2022 09:43
@liuyehcf liuyehcf changed the title [Enhancement][WIP] window function optimization, add partition topn (2) [Enhancement] window function optimization, add partition topn (2) May 17, 2022
}
#define HASH_MAP_METHOD(NAME) \
else if (_hash_map_variant.type == PartitionHashMapVariant::Type::NAME) { \
TRY_CATCH_BAD_ALLOC(fetch_chunks_from_null_key_value<typename decltype(_hash_map_variant.NAME)::element_type>( \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a cost to move and compare items in chunk. If it is NullFirst, when there are many Null values, placing it in front may avoid the movement of some items of the Column int the TopN Stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be optimized later

@trueeyu trueeyu merged commit 1fe7f1b into StarRocks:main May 18, 2022
@liuyehcf liuyehcf changed the title [Enhancement] window function optimization, add partition topn (2) [Enhancement] rank window function optimization, add partition topn (2) May 19, 2022
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
* Add doc for covar/corr etc funtion
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
---------

Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Co-authored-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com>
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
…tarRocks#6315)

* Add doc for covar/corr etc funtion
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
---------

Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Co-authored-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com>
(cherry picked from commit c3f1994)

Co-authored-by: before-Sunrise <71162020+before-Sunrise@users.noreply.github.com>
(cherry picked from commit e0d425c)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
* Add doc for covar/corr etc funtion
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
---------

Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Co-authored-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com>
(cherry picked from commit c3f1994)

Co-authored-by: before-Sunrise <71162020+before-Sunrise@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Rank window function optimization
4 participants