Add minMaxInvalid flag to avoid unnecessary needPreprocess#9238
Merged
npawar merged 2 commits intoapache:masterfrom Aug 31, 2022
Merged
Add minMaxInvalid flag to avoid unnecessary needPreprocess#9238npawar merged 2 commits intoapache:masterfrom
npawar merged 2 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9238 +/- ##
=============================================
+ Coverage 28.60% 67.10% +38.49%
- Complexity 53 4825 +4772
=============================================
Files 1844 1385 -459
Lines 98570 72251 -26319
Branches 15004 11578 -3426
=============================================
+ Hits 28200 48483 +20283
+ Misses 67667 20239 -47428
- Partials 2703 3529 +826
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
klsince
approved these changes
Aug 28, 2022
jackjlli
approved these changes
Aug 31, 2022
Member
jackjlli
left a comment
There was a problem hiding this comment.
LGTM. Thanks for fixing this!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Segment build: If min/max value is invalid (based on some definitions we have set like presence of
,or empty), we skip putting the min/max value in segmentmetadata.properties.Segment load: If either min or max is not available in segment metadata.properties, the
ColumnMetadatafor it returns null for bothgetMinandgetMax. If both these values are null, butColumnMinMaxGeneratorMode != NONE, we will mark this as "needPreprocess" and process the segment during segment load (either when it is ingested, or during server restart). As part of the preprocess however, we will again skip persisting the min/max value, since it is still invalid, and will always be for a segment with the same crc.As a result, in the above combination (invalid min or max value), we always incur preprocessing cost during restart/load, even though the actual preprocessing is a noop. This may not seem much in Pinot using
LocalSegmentFSDirectory, but can become very expensive in other implementations, where initing theSegmentDirectorymight involve more expensive means. So we want to avoid getting into the else of this code when it is unnecessary. FromBaseTableDataManager:Adding a flag in
metadata.propertiesto detect this invalid min/max value case, so we can use it to avoid returningneedPreprocess=truewhen preprocess would be a noop.