-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] rank window function optimization, add partition topn (2) #6118
[Enhancement] rank window function optimization, add partition topn (2) #6118
Conversation
a76b8f4
to
6d3ff7c
Compare
[FE PR Coverage check]😍 pass : 0 / 0 (0%) |
6d3ff7c
to
cc67b12
Compare
} | ||
#define HASH_MAP_METHOD(NAME) \ | ||
else if (_hash_map_variant.type == PartitionHashMapVariant::Type::NAME) { \ | ||
TRY_CATCH_BAD_ALLOC(fetch_chunks_from_null_key_value<typename decltype(_hash_map_variant.NAME)::element_type>( \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a cost to move and compare items in chunk. If it is NullFirst, when there are many Null values, placing it in front may avoid the movement of some items of the Column int the TopN Stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be optimized later
* Add doc for covar/corr etc funtion Signed-off-by: before-Sunrise <unclejyj@gmail.com> --------- Signed-off-by: before-Sunrise <unclejyj@gmail.com> Co-authored-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com>
…tarRocks#6315) * Add doc for covar/corr etc funtion Signed-off-by: before-Sunrise <unclejyj@gmail.com> --------- Signed-off-by: before-Sunrise <unclejyj@gmail.com> Co-authored-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com> (cherry picked from commit c3f1994) Co-authored-by: before-Sunrise <71162020+before-Sunrise@users.noreply.github.com> (cherry picked from commit e0d425c) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…s#6268) (StarRocks#6315)" (StarRocks#6316) This reverts commit 2d87a5e.
* Add doc for covar/corr etc funtion Signed-off-by: before-Sunrise <unclejyj@gmail.com> --------- Signed-off-by: before-Sunrise <unclejyj@gmail.com> Co-authored-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com> (cherry picked from commit c3f1994) Co-authored-by: before-Sunrise <71162020+before-Sunrise@users.noreply.github.com>
What type of PR is this:
Which issues of this PR fixes :
Fixes #5885
Enhancement
For rank window function, including
rank
、dense_rank
、row_number
, if it has a related predicate (rk < 5
), then it can be optimized by inserting aPartitionTopN
operator before theSort
operator of window function.The main purpose of
PartitionTopN
is to filter data, and it's output still remain unordered. It consists of three components:partitioner
:Divide the input chunk based on the partition exprssorter(topn)
:Each partition has an instance of sorter and is sorted independentlygather
:fetch chunks from all sorters into one data stream, so the data is still unordered after gahtering. Moreover, gather is a only logical concetp, not an actual componentThe implementation on the optimizer and the executor is a little different:
{Logical/Physical}PartitionTopNOperator
but reuse the existing{Logical/Physical}TopNOperator
by adding a new fieldpartitionByExprs
to record the partition by information. Besides, we need to pay attentation to the following things:LocalPartitionTopN{Sink/Source}Operator
, and we may use different implementation based on the fieldpartitionByExprs
partitionByExprs
is unset or empty, then the original pair ofPartitionSortSinkOperator/LocalMergeSortSourceOperator
is usedpartitionByExprs
is not empty, then pair ofLocalPartitionTopN{Sink/Source}Operator
is usedTasks
row_number
rank
dense_rank