Introduce resultSize in IndexedTable #7420

Jackie-Jiang · 2021-09-09T22:35:55Z

Introduce resultSize (number of records kept after calling finish()) in IndexedTable which can be different from the trimSize.
On the broker side, we only need to keep limit records in the final result because the groups are already fully merged. This can significantly reduce the cost of final record sorting.
Also separate the logic for queries with/without order-by and move the common methods into the IndexedTable to reduce the duplicate code.

codecov-commenter · 2021-09-09T23:28:10Z

Codecov Report

Merging #7420 (ced0dad) into master (5976922) will decrease coverage by 1.26%.
The diff coverage is 85.00%.

@@             Coverage Diff              @@
##             master    #7420      +/-   ##
============================================
- Coverage     71.87%   70.61%   -1.27%     
  Complexity     3347     3347              
============================================
  Files          1517     1517              
  Lines         75039    75040       +1     
  Branches      10921    10930       +9     
============================================
- Hits          53932    52987     -945     
- Misses        17482    18430     +948     
+ Partials       3625     3623       -2

Flag	Coverage Δ
integration1	`?`
integration2	`29.09% <76.66%> (+<0.01%)`	⬆️
unittests1	`69.69% <85.00%> (-0.02%)`	⬇️
unittests2	`14.53% <0.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
.../java/org/apache/pinot/core/util/GroupByUtils.java	`100.00% <ø> (ø)`
.../pinot/core/data/table/ConcurrentIndexedTable.java	`78.26% <73.68%> (-3.56%)`	⬇️
...not/core/query/reduce/GroupByDataTableReducer.java	`89.36% <83.33%> (+0.07%)`	⬆️
...ache/pinot/core/data/table/SimpleIndexedTable.java	`80.00% <85.71%> (-12.31%)`	⬇️
...org/apache/pinot/core/data/table/IndexedTable.java	`84.74% <91.66%> (+5.25%)`	⬆️
...re/data/table/UnboundedConcurrentIndexedTable.java	`100.00% <100.00%> (+73.91%)`	⬆️
...perator/combine/GroupByOrderByCombineOperator.java	`82.71% <100.00%> (ø)`
...pinot/minion/exception/TaskCancelledException.java	`0.00% <0.00%> (-100.00%)`	⬇️
...nverttorawindex/ConvertToRawIndexTaskExecutor.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ore/startree/executor/StarTreeGroupByExecutor.java	`0.00% <0.00%> (-86.67%)`	⬇️
... and 101 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5976922...ced0dad. Read the comment docs.

richardstartin · 2021-09-10T08:54:36Z

This can significantly reduce the cost of final record sorting.

How is this evaluated? Is it repeatable?

Jackie-Jiang · 2021-09-10T17:45:57Z

@richardstartin This is repeatable for all group-by order-by queries without having clause (having clause has other issues, and I just put a TODO and keep the current behavior). For example, currently when we do SELECT ... GROUP BY ... ORDER BY ... LIMIT 10, on the broker side we will get and sort the top 5000 (trim size) records using heap sort. With the change, we only sort on the top 10 records with the same algorithm, and smaller heap is guaranteed to give better performance.

jackjlli · 2021-09-10T20:56:48Z

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java

      _hasOrderBy = false;
      _tableResizer = null;
-      _trimSize = trimSize;
-      _trimThreshold = trimSize;
+      _trimSize = Integer.MAX_VALUE;


For group by without order by cases, why not use the value from limit clause as the trimSize?

There is no trim required for cases without order-by, so trim size and trim threshold doesn't really matter

Jackie-Jiang requested a review from xiangfu0 September 9, 2021 22:35

Introduce resultSize in IndexedTable

ced0dad

Jackie-Jiang force-pushed the optimize_indexed_table branch from b504c02 to ced0dad Compare September 9, 2021 23:47

jackjlli reviewed Sep 10, 2021

View reviewed changes

xiangfu0 approved these changes Sep 13, 2021

View reviewed changes

Jackie-Jiang merged commit f3068bc into apache:master Sep 13, 2021

Jackie-Jiang deleted the optimize_indexed_table branch September 13, 2021 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce resultSize in IndexedTable #7420

Introduce resultSize in IndexedTable #7420

Jackie-Jiang commented Sep 9, 2021

codecov-commenter commented Sep 9, 2021 •

edited

Loading

richardstartin commented Sep 10, 2021

Jackie-Jiang commented Sep 10, 2021

jackjlli Sep 10, 2021

Jackie-Jiang Sep 10, 2021

Introduce resultSize in IndexedTable #7420

Introduce resultSize in IndexedTable #7420

Conversation

Jackie-Jiang commented Sep 9, 2021

codecov-commenter commented Sep 9, 2021 • edited Loading

Codecov Report

richardstartin commented Sep 10, 2021

Jackie-Jiang commented Sep 10, 2021

jackjlli Sep 10, 2021

Choose a reason for hiding this comment

Jackie-Jiang Sep 10, 2021

Choose a reason for hiding this comment

codecov-commenter commented Sep 9, 2021 •

edited

Loading