Handle segments outside retention time gracefully during creation#9403
Handle segments outside retention time gracefully during creation#9403KKcorps wants to merge 2 commits intoapache:masterfrom
Conversation
|
For now, only moved the code from previous PR (#9355) .
The new flag |
...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Outdated
Show resolved
Hide resolved
d34254f to
2c2bcce
Compare
Codecov Report
@@ Coverage Diff @@
## master #9403 +/- ##
=============================================
- Coverage 69.77% 36.97% -32.80%
+ Complexity 5092 194 -4898
=============================================
Files 1890 1889 -1
Lines 100654 100870 +216
Branches 15327 15330 +3
=============================================
- Hits 70231 37301 -32930
- Misses 25445 60442 +34997
+ Partials 4978 3127 -1851
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
I feel this won't work properly for the following reasons:
- The start/end time doesn't match the actual data, so the pruning logic will prune the segment unexpected
- User might mistakenly push an old segment, and we don't want to keep the segment around
I feel the purpose for this PR is to prevent user putting wrong time format/unit in the schema. If we detect that the time value does not match the time format/unit, or the converted timestamp is not in valid range, we can put start/end time as null, and also store the raw min/max time in the ZK metadata. Essentially the idea in #9413
|
Yeah, I will go ahead with the idea of putting nulls in the metadata. This logic was created before we decided on |
When user chooses to skip the time value check in table config, we need to gracefully handle the
segment startTime and endTime.
If the startTime and endTime for the segment are simply not within the valid time range, we set them to
current timestamps.
If they are within the valid time range, we check if the endTime falls outside retention time range.
In case it falls outside the time range, we replace endTime with currentTime so that the segments don't
get instantly deleted by retention manager after getting uploaded.