[ingesters] Send heartbeat during wall replay #4847
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
We can have ingesters becoming unhealthy during startup if the wall replay takes longer than the
heartbeat_timeout
.The reason for that is that the lifecycle service is started only after ingesters open all TSDBs:
cortex/pkg/ingester/ingester.go
Lines 708 to 719 in b855a25
This adds an option on the
lifecycle
service to configure if the ingester should AutoJoin the ring (flip his state to active).If the option is set to false, the ingester will only join the ring after the "Join" method is called.
This change allow us to start the
lifecycle
before executing the wall replay (so the heart will start to beat) but only flip its state to active after all TSDBs are replayed.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]