Keeps nullness attributes of merged in comparison column values#10704
Keeps nullness attributes of merged in comparison column values#10704Jackie-Jiang merged 8 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10704 +/- ##
============================================
+ Coverage 70.25% 70.28% +0.03%
+ Complexity 6429 5681 -748
============================================
Files 2112 2164 +52
Lines 113996 116347 +2351
Branches 17219 17597 +378
============================================
+ Hits 80084 81776 +1692
- Misses 28312 28857 +545
- Partials 5600 5714 +114
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 286 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
I feel this is incorrect.
When loading sealed segment, ComparisonColumns might have multiple values set, in which case we need to compare multiple values to determine which value is newer.
So the fix should be: when comparisonIndex is -1, loop over all values and check if it has any value larger than the value in another ComparisonColumns. There is no need to modify the value within the current ComparisonColumns because it should already have values set.
|
|
||
| public static class MultiComparisonColumnReader implements UpsertUtils.ComparisonColumnReader { | ||
| private final PinotSegmentColumnReader[] _comparisonColumnReaders; | ||
| private final NullValueVectorReader[] _comparisonColumnNullReaders; |
There was a problem hiding this comment.
This is not needed. You may use PinotSegmentColumnReader.isNull() instead
One case I'm struggling to figure out how to support is when altering from one set of comparison columns to a new set. In such a case, I believe there would be a need to modify the value within the current ComparisonColumns (ex. adding 1 new column in addition to a set of existing comparison columns, or migrating from a single comparison column to a set of columns). |
Are you referring to modifying the comparison columns after table is created? I don't think that is allowed even with single comparison column with partial upsert because we should not change the history of the records. |
|
Ya that makes sense, otherwise it would literally be changing the semantics of which docs should have been persisted in the first place. I do want to allow for support of adding a brand new column to the schema + comparisonColumns list though, which I agree should work. When reloading sealed segments, is processing done in global order of |
… comparison columns enabled
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ComparisonColumns.java
Show resolved
Hide resolved
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/UpsertUtils.java
Show resolved
Hide resolved
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ComparisonColumns.java
Outdated
Show resolved
Hide resolved
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ComparisonColumns.java
Outdated
Show resolved
Hide resolved
|
Sealed segments are loaded in parallel without any global order (we don't really have global ordered doc ids) |
Keeps nullness attributes of merged comparison column values to avoid
IndexOutOfBoundsexception on segment read.