Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY#8927
Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY#8927Jackie-Jiang merged 25 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8927 +/- ##
============================================
- Coverage 70.13% 63.68% -6.45%
+ Complexity 4741 4719 -22
============================================
Files 1831 1784 -47
Lines 96382 94815 -1567
Branches 14408 14371 -37
============================================
- Hits 67594 60381 -7213
- Misses 24125 30101 +5976
+ Partials 4663 4333 -330
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Jackie-Jiang
left a comment
There was a problem hiding this comment.
In high level, we should not decide whether to enable query null handling based on the whether there is null value vector. This flag should be passed from the broker as a query option, and we can introduce another table config to enable it. This can ensure all the segments are executed using the same mode.
pinot-core/src/main/java/org/apache/pinot/core/common/RowBasedBlockValueFetcher.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/combine/GroupByOrderByCombineOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/SelectionOrderByOperator.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/core/operator/streaming/StreamingSelectionOnlyCombineOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/TransformOperator.java
Outdated
Show resolved
Hide resolved
.../org/apache/pinot/core/operator/combine/MinMaxValueBasedSelectionOrderByCombineOperator.java
Outdated
Show resolved
Hide resolved
| _dataBlockCache = dataBlockCache; | ||
| _column = column; | ||
| _dataSource = dataSource; | ||
| NullValueVectorReader nullValueReader = _dataSource == null ? null : _dataSource.getNullValueVector(); |
There was a problem hiding this comment.
Data source should never be null. The computation should be lazy, and the operator should decide whether to read it based on if null handling is enabled
There was a problem hiding this comment.
Updated to do computation lazily, and removed null check.
There was a problem hiding this comment.
Moved the logic to getNullBitmap() method. Please note that this method is called only when nullHandlingEnabled is set to true.
Jackie-Jiang
left a comment
There was a problem hiding this comment.
There are plenty of comments, but the PR is in good shape overall. Good job!
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/util/QueryOptionsUtils.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
Show resolved
Hide resolved
...a/org/apache/pinot/core/query/distinct/raw/RawStringSingleColumnDistinctOrderByExecutor.java
Outdated
Show resolved
Hide resolved
...java/org/apache/pinot/core/query/distinct/raw/RawStringSingleColumnDistinctOnlyExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GroupByDataTableReducer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorUtils.java
Outdated
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Mostly good.
cc @siddharthteotia to also take a look since this change might have performance impact
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpecUtils.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java
Outdated
Show resolved
Hide resolved
...g/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
...g/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
...g/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorService.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/util/QueryOptionsUtils.java
Outdated
Show resolved
Hide resolved
|
Related to #8697 |
| for (Object[] row : rows) { | ||
| for (int i = 0; i < numColumns; i++) { | ||
| Object columnValue = row[i]; | ||
| if (columnValue == null) { | ||
| row[i] = colDefaultNullValues[i]; | ||
| nullBitmaps[i].add(rowId); | ||
| } | ||
| } | ||
| rowId++; |
There was a problem hiding this comment.
can we merge this part with the main loop below? any specific reason we had to loop through the rows one more time?
Proper NULL handling in:
For Single-valued columns of all data types with:
Unit tests:
Updated
BigDecimalQueriesTestto work with null values.Added
BooleanNullEnabledNoDictQueriesTest.Added AllNullQueriesTest where all values are null.
Added NullEnabledQueriesTest where some values are nulls.