Preserve non-null comparison column values during segment commit#11027
Preserve non-null comparison column values during segment commit#11027KKcorps wants to merge 1 commit intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11027 +/- ##
==========================================
- Coverage 0.11% 0.11% -0.01%
==========================================
Files 2197 2199 +2
Lines 118596 118744 +148
Branches 17980 18017 +37
==========================================
Hits 137 137
- Misses 118439 118587 +148
Partials 20 20
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 5 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
The motivation behind wanting to keep nullness encoded is being able to easily discern between a defaultValue representation of null Vs. an actual null so that comparing 2 defaultValues would not result in a comparison result of I don't feel that this is a required "guard" anymore though; it may have been needed at one point but the various algorithms for multiple comparison column upsert have changed a lot throughout implementation. We guard against any newly ingested record having all null comparison columns[1], so by the time we reach All that said though, if we don't encode nullness then we lose the ability to perform valuable queries like "show me all records that have null [1] https://github.com/egalpin/pinot/blob/68fdfa4a12926c7ce45dadc6a86d973ce5ff3669/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L597 ( -> this method has move, I think as of #10703) |
|
Picked the solution in #11044 |
This PR completes the bug fix introduced with #10704
With the fix introduced previously, although we merge the comparison column values, we still mark the original columns as null in the bitmap so that it can help during restart.
However, during the segment build, we simply filter out all the columns which have null set in the bitmap and thus we lose all these values.
Reproducing the bug
create a partital upsert table with two comparison columns
mtimeandmtime_2publish the following two records to the table
{ "rsvp_count": 23, "venue_name": "Venue A", "event_id": "E12345", "event_time": 1688372645422, "group_city": "San Francisco", "group_country": "USA", "group_id": 1234567890, "group_name": "OpenAI enthusiasts", "group_lat": 37.7749, "group_lon": -122.4194, "mtime_2": 1687341314322 } { "rsvp_count": 23, "venue_name": "Venue A", "event_id": "E12345", "event_time": 1688372645422, "group_city": "San Francisco", "group_country": "USA", "group_id": 1234567890, "group_name": "OpenAI enthusiasts", "group_lat": 37.7749, "group_lon": -122.4194, "mtime": 1687341314100 }verify the records are showing up correctly in the table. Then do either reload or force commit.
You will see that no records show up in the table now. If you use
skipUpsert(true)though, you will be able to see everything