Retry on segment cannot build#15234
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #15234 +/- ##
============================================
+ Coverage 61.75% 63.65% +1.90%
- Complexity 207 1459 +1252
============================================
Files 2436 2782 +346
Lines 133233 156961 +23728
Branches 20636 24078 +3442
============================================
+ Hits 82274 99910 +17636
- Misses 44911 49534 +4623
- Partials 6048 7517 +1469
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java
Outdated
Show resolved
Hide resolved
| prevSegmentZKMetadata.getStatus()); | ||
|
|
||
| int prevTargetNumRows = prevSegmentZKMetadata.getSizeThresholdToFlushSegment(); | ||
| int newNumRows = Math.min(prevNumRows / 2, prevTargetNumRows / 2); |
There was a problem hiding this comment.
Is prevNumRows strictly smaller than prevTargetNumRows?
There was a problem hiding this comment.
Not always. The endCriteria check is between consuming each "batch" of messages. So it is possible that after consuming a batch, the numRowsIndexed goes slightly over threshold.
...rc/test/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManagerTest.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/pinot/controller/helix/core/realtime/BlockingSegmentCompletionFSM.java
Outdated
Show resolved
Hide resolved
...java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManagerTest.java
Outdated
Show resolved
Hide resolved
...er/src/main/java/org/apache/pinot/controller/api/resources/LLCSegmentCompletionHandlers.java
Show resolved
Hide resolved
I have a follow up PR for early commit in #15120. However, the threshold is not easy to set. Concerns are listed in that PR. Regarding to pauseless consumption, this issue could also arise and be even more severe(no way to recover). In my opinion, we should introduce similar thing in the |
...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/protocols/SegmentCompletionProtocol.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| /** | ||
| * An instance is reporting that it cannot build segment due to non-recoverable error, usually due to size too large. |
There was a problem hiding this comment.
When one server fails to build the segment and the controller updates the segment size target to half of the previous value, how will the other replica servers handle such change? Will they continue to build? There are two cases: (1)if they also have the same build failure, they will reset their target? (2) What happen if they build the segments successfully?
There was a problem hiding this comment.
When one segment build failed and the other has not committed (either succeed or fail), controller would reduce the size and reset the segment. This would force to destroy the mutable segment and reingest.
When one segment build failed but the other has already succeed and commit, controller will drop the request and allow the succeed one completes its build. The failed one will eventually be replaced by the one built by the succeed one.
There was a problem hiding this comment.
Also updated the PR description with same.
For pauseless consumption, it will be hard to reingest the segment with a different size threshold because start/end offset are already determined. We need to detect the issue earlier, i.e. before segment build |
True.. The current limitation is on the segment sequence number. Once the segment offset is decided, when we reingest it, it is hard to split it into 2 as the next continuous sequence number is already taken. Probably some tricks need to be introduced there. |
ingestionfeatureWhen mutable segments get transferred to immutable segments. There are a few preConditions check like
Those check failure would raise
IllegalStateException, set_stateto error and then stop there forever. This is due to the reason that Pinot is assuming such build failures are transient and would eventually be fixed by other replicas. However, in most of cases, such failures are deterministic and would fail on all replicas.This PR will introduce a new
SegmentCannotBuildRequestbetween server and controller. OncePreconditionscheck failed and there's no other replicas able to build so far. The controller would reduce the segment size by half and reset the segments.Call out a few corner scenarios, e.g. segment has 2 replicas A and B
SegmentCannotBuildRequestfirst, then reset segment will be sent immediately to both A and B. B would not try to build even if it may be successful in the future.SegmentCannotBuildRequest. B's request would be ignored. And eventually B would get the copy from A or deepstore.Ingestion stop issue would no longer happen. The only side effect is that user might see some latest data now but then not be able to see it later temporarily. (AS we've reset the latest immutable segments)