Skip to content

Handle segments outside retention time gracefully during creation#9403

Closed
KKcorps wants to merge 2 commits intoapache:masterfrom
KKcorps:segment_time_validation
Closed

Handle segments outside retention time gracefully during creation#9403
KKcorps wants to merge 2 commits intoapache:masterfrom
KKcorps:segment_time_validation

Conversation

@KKcorps
Copy link
Contributor

@KKcorps KKcorps commented Sep 15, 2022

When user chooses to skip the time value check in table config, we need to gracefully handle the
segment startTime and endTime.
If the startTime and endTime for the segment are simply not within the valid time range, we set them to
current timestamps.
If they are within the valid time range, we check if the endTime falls outside retention time range.
In case it falls outside the time range, we replace endTime with currentTime so that the segments don't
get instantly deleted by retention manager after getting uploaded.

@KKcorps
Copy link
Contributor Author

KKcorps commented Sep 15, 2022

For now, only moved the code from previous PR (#9355) .
Last comments by @Jackie-Jiang

We don't want to change the start/end time within the segment metadata, instead we want to enhance the time management code to gracefully handle the segments with invalid time values.
Suggest using a separate PR for the time management change, and only keep the sample value check in this PR.

The new flag skipSegmentTimeCheck: true allows ingestion to not fail because of timestamp issues. As far as updating the timestamps in metadata is concerned, we need to find a way for users to specify the correct time-formats. The issue is we don't allow backward-incompatible change in the schema so if an user fixes the segment metadata timestamp via some new API, we won't be able to reflect those changes in the schema. This inconsistency may break a lot of things.

@KKcorps KKcorps force-pushed the segment_time_validation branch from d34254f to 2c2bcce Compare September 20, 2022 07:37
@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2022

Codecov Report

Merging #9403 (2c2bcce) into master (61fc919) will decrease coverage by 32.79%.
The diff coverage is 0.00%.

@@              Coverage Diff              @@
##             master    #9403       +/-   ##
=============================================
- Coverage     69.77%   36.97%   -32.80%     
+ Complexity     5092      194     -4898     
=============================================
  Files          1890     1889        -1     
  Lines        100654   100870      +216     
  Branches      15327    15330        +3     
=============================================
- Hits          70231    37301    -32930     
- Misses        25445    60442    +34997     
+ Partials       4978     3127     -1851     
Flag Coverage Δ
integration1 25.99% <0.00%> (-0.09%) ⬇️
integration2 24.74% <0.00%> (-0.05%) ⬇️
unittests1 ?
unittests2 15.35% <0.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ment/creator/impl/SegmentColumnarIndexCreator.java 0.00% <0.00%> (-78.94%) ⬇️
...in/java/org/apache/pinot/spi/utils/BytesUtils.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/spi/utils/BooleanUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/BaseRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/NoOpRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/table/FSTType.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/user/RoleType.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/data/MetricFieldSpec.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/common/tier/TierFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/spi/config/table/TableType.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1024 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this won't work properly for the following reasons:

  1. The start/end time doesn't match the actual data, so the pruning logic will prune the segment unexpected
  2. User might mistakenly push an old segment, and we don't want to keep the segment around

I feel the purpose for this PR is to prevent user putting wrong time format/unit in the schema. If we detect that the time value does not match the time format/unit, or the converted timestamp is not in valid range, we can put start/end time as null, and also store the raw min/max time in the ZK metadata. Essentially the idea in #9413

@KKcorps
Copy link
Contributor Author

KKcorps commented Sep 22, 2022

Yeah, I will go ahead with the idea of putting nulls in the metadata. This logic was created before we decided on fixAPI. Closing this PR.

@KKcorps KKcorps closed this Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants