Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment 'Online' but not properly sealed (no segment file created) #13149

Open
cbalci opened this issue May 14, 2024 · 1 comment
Open

Segment 'Online' but not properly sealed (no segment file created) #13149

cbalci opened this issue May 14, 2024 · 1 comment

Comments

@cbalci
Copy link
Contributor

cbalci commented May 14, 2024

We observed this interesting behavior from a Realtime table. For a single segment, something apparently went wrong at commit time that the server wasn't able to commit the segment to disk and remove files under 'consumers'. However the ingestion continued without an issue and the server has been consuming and sealing segments since then just fine.

At this point this segment:

  • Shows 'Online' in ZK metadata (not 'Consuming')
  • Server has three preallocated files under consumers/, which are still memory mapped
  • No compressed (.tar.gz) segment file on disk
  • Some error logs from RealtimeSegmentValidationManager (controller) which attempted to upload segment files to deep store but failed. (Unfortunately segment build time logs from the server are lost due to retention)

My two questions are:

  • How is it possible for a segment to get stuck in this limbo state where it finished consumption, but wasn't properly sealed?
  • Having the preallocated 'consumer' files, is there a practical way to rerun the commit routine and create the segment files after the fact?

cc @chenboat @eaugene

@cbalci cbalci changed the title Segment 'Online' but not fully committed (no segment file created) Segment 'Online' but not properly sealed (no segment file created) May 14, 2024
@eaugene
Copy link
Contributor

eaugene commented May 14, 2024

Thanks, @cbalci for drafting this message.

We have faced this issue 2 times since last month
For the first instance of the issue I made the debugging a couple of weeks back, these are the series of events which triggered the undesirable state of segments ( I got this from the log dump )

Setup

  • Contoller ( C )
  • Server - S1 , S2 - Both consuming same segment ( X ) from Kafka
  • Pinot Version : 0.12

Series of Events in Increasing Order of Time

Time C S1 S2
T1 Segment X has started consuming - -
T2 Committing Segment X with Winner S1 - -
T3 - Segment Tar built -
T4 - Failure to Upload To DeepStore -
T5 Updated Segment Metadata And Idealstates are set for this segment - -
T6 - - SegmentOnlineOfflineStateModel.onBecomeOnlineFromConsuming() got called . This internally tried to catchup till the offset of the segment X , but failed due to timeout . So it tried to download from peer
T7 Failed to download segment X after retries.
Failure in getting online servers for segment table_X
This peer download has exponential retry policy , but it could not succeed as in external view it could not find an Online copy
As earlier controller has only set in Ideal state ( at T2 ) and it has not transitioned into External state
The segment goes into error state in this node( added to Error Cache ) , but still has mmap files in consumer dir
Log:Caught exception in state transition CONSUMING -> ONLINE for table: table_X, segment: X
There's a final clause which is triggered to release segments but that only gets executed when there’s no reference on the segment. There could be a reference to the segment if that had a query processing
T8 Reload segment call from Controller to both servers - -
T9 - Reload is success as it has local copy Reloading (force committing) consuming segment: X in table: table_X
This force only sets the RealtImeSegmentDataManager._forceCommitMessageReceived as true .
The segment is still in Error in this instance.Till Here we have mmap files in consumer dir
T10 Reset segment X on S2 - -
T11 - - Skipping adding existing segment: X for table: table_X with data manager class: RealtimeSegmentDataManager
But this changes the state to ONLINE
SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline() is called
Since the segment is Online in External view , this gave a false view that the segment was successfully reloaded from peer and being served from there.
But in reality , the segment was still being served the mmap files . As the _segmentDataManagerMap would have still had a mutable segment for this .
From the code , only chances when the _segmentDataManagerMap removes a segment are CONSUMING to OFFLINE,CONSUMING to DROPPED ,ONLINE to OFFLINE, ONLINE to DROPPED . But in our case the state transitions were only on SegmentOnlineOfflineStateModel.onBecomeOfflineFromError() and SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline()
Offline to Error is just logging (
_logger.info("Resetting the state for segment:{} from ERROR to OFFLINE", message.getPartitionName());
)
If we would have dropped segment X from _segmentDataManagerMap , the future reset call would have peer downloaded the segment

This is how the segment ended in such undesirable state in S2 .

we ended up losing the segment entirely as S1 container was replaced before it even uploaded to Deepstore and S2 container was restarted (_segmentDataManagerMap in memory map getting cleared ) .

We are aware that Deepstore upload retry could have solved data loss .

Some improvements found during this debugging

  • To prevent the segment in such state , we can flush the segment from _segmentDataManagerMap in SegmentOnlineOfflineStateModel.onBecomeOfflineFromError() tranition
  • To make pinot serve the queries from mmap of the sealed segment , can we persist the _segmentDataManagerMap ( to ZK possibly ) , so even if the node is restarted we can still server
  • Not sure if converting mmap consumer files to a immutable segment is possible with pinot currently - Looking to see if we can have this ?

Looking forward to see if any-others have faced a similar kind of issue and open to hear about improvements and additional guardrails we can setup to prevent such occurrence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants