Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply expr predicates on zone map filtering #912

Merged
merged 1 commit into from
Oct 29, 2021
Merged

Apply expr predicates on zone map filtering #912

merged 1 commit into from
Oct 29, 2021

Conversation

dirtysalt
Copy link
Contributor

ref: #803

@dirtysalt
Copy link
Contributor Author

dirtysalt commented Oct 28, 2021

Running performance benchmark on following SQL

-- output string column, predicate on sort key
-- Q01
select max(length(s_name)) from lineorder_flat where (lo_orderkey * 2) > 1000000000;
-- Q02
select max(length(s_name)) from lineorder_flat where (lo_orderkey * 2) > 10000000;
-- Q03
select max(length(s_name)) from lineorder_flat where (lo_orderkey * 2) > 100000;

-- output int column, predicate on sort key
-- Q04
select max(lo_ordtotalprice) from lineorder_flat where (lo_orderkey * 2) > 1000000000;
-- Q05
select max(lo_ordtotalprice) from lineorder_flat where (lo_orderkey * 2) > 10000000;
-- Q06
select max(lo_ordtotalprice) from lineorder_flat where (lo_orderkey * 2) > 100000;

-- output string column, predicate on non-sort key
-- Q07
select max(length(s_name)) from lineorder_flat where (lo_partkey * 2) > 1000000;
-- Q08
select max(length(s_name)) from lineorder_flat where (lo_partkey * 2) > 10000;
-- Q09
select max(length(s_name)) from lineorder_flat where (lo_partkey * 2) > 100;

-- output int column, predicate on sort key
-- Q10
select max(lo_ordtotalprice) from lineorder_flat where (lo_partkey * 2) > 1000000;
-- Q11
select max(lo_ordtotalprice) from lineorder_flat where (lo_partkey * 2) > 10000;
-- Q12
select max(lo_ordtotalprice) from lineorder_flat where (lo_partkey * 2) > 100;

This PR almost has no negative impact on performance.

Query Master PR PR/Master
Q01 1078 409 0.379
Q02 2915 2698 0.926
Q03 2882 2625 0.911
Q04 315 149 0.473
Q05 1092 824 0.755
Q06 1121 1030 0.919
Q07 1762 1860 1.056
Q08 2778 2650 0.954
Q09 2951 2654 0.899
Q10 919 732 0.797
Q11 1100 1029 0.935
Q12 947 955 1.008

And by looking into the profile, we can see there are a lot of rows (or files) that has been filtered by using zone map. This table has 600M rows in total.

  • ZoneMapIndexFilterRows: 59.850752M
  • PredFilterRows: 1.552054M (1552054)
  • RawRowsRead: 101.562894M (101562894)
  • Some segmment files has been skipped without reding actual data
SCAN:(Active: 3s207ms[3207096049ns], % non-child: 100.00%)
                 - CachedPagesNum: 16.482K (16482)
                 - CompressedBytesRead: 636.30 MB
                 - CreateSegmentIter: 10.747ms
                 - IOTime: 218.583ms
                 - LateMaterialize: 50.360ms
                 - PushdownPredicates: 1
                 - RawRowsRead: 101.562894M (101562894)
                 - SegmentInit: 132.827ms
                   - BitmapIndexFilter: 0ns
                   - BitmapIndexFilterRows: 0
                   - BloomFilterFilterRows: 0
                   - ShortKeyFilterRows: 0
                   - ZoneMapIndexFilterRows: 59.850752M (59850752)
                 - SegmentRead: 2s999ms
                   - BlockFetch: 2s749ms
                   - BlockFetchCount: 25.001K (25001)
                   - BlockSeek: 13.775ms
                   - BlockSeekCount: 192
                   - ChunkCopy: 1.145ms
                   - DecompressT: 857.7ms
                   - DelVecFilterRows: 0
                   - IndexLoad: 0ns
                   - PredFilter: 212.947ms
                   - PredFilterRows: 1.552054M (1552054)
                 - TotalPagesNum: 40.089K (40089)
                 - UncompressedBytesRead: 1.20 GB

@dirtysalt dirtysalt removed the request for review from DeepThinker666 October 29, 2021 02:34
@dirtysalt
Copy link
Contributor Author

related to: #870

@Seaven Seaven merged commit 92661d5 into StarRocks:main Oct 29, 2021
@dirtysalt dirtysalt deleted the expr-pred-zonemap2 branch October 29, 2021 02:44
caneGuy pushed a commit to caneGuy/starrocks that referenced this pull request Mar 28, 2023
* Update DELETE and loading into Primary Key tables

* Update PrimaryKeyLoad.md
caneGuy pushed a commit to caneGuy/starrocks that referenced this pull request Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants