Skip to content

[ingesters] Send heartbeat during wall replay #4847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 30, 2022

Conversation

alanprot
Copy link
Member

@alanprot alanprot commented Aug 30, 2022

What this PR does:
We can have ingesters becoming unhealthy during startup if the wall replay takes longer than the heartbeat_timeout.

The reason for that is that the lifecycle service is started only after ingesters open all TSDBs:

func (i *Ingester) starting(ctx context.Context) error {
if err := i.openExistingTSDB(ctx); err != nil {
// Try to rollback and close opened TSDBs before halting the ingester.
i.closeAllTSDB()
return errors.Wrap(err, "opening existing TSDBs")
}
// Important: we want to keep lifecycler running until we ask it to stop, so we need to give it independent context
if err := i.lifecycler.StartAsync(context.Background()); err != nil {
return errors.Wrap(err, "failed to start lifecycler")
}

This adds an option on the lifecycle service to configure if the ingester should AutoJoin the ring (flip his state to active).

If the option is set to false, the ingester will only join the ring after the "Join" method is called.

This change allow us to start the lifecycle before executing the wall replay (so the heart will start to beat) but only flip its state to active after all TSDBs are replayed.

image

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Alan Protasio <approtas@amazon.com>
Signed-off-by: Alan Protasio <approtas@amazon.com>
@alanprot alanprot marked this pull request as ready for review August 30, 2022 18:20
Signed-off-by: Alan Protasio <approtas@amazon.com>
Signed-off-by: Alan Protasio <approtas@amazon.com>
@alanprot alanprot merged commit 9941124 into cortexproject:master Aug 30, 2022
@danielblando danielblando mentioned this pull request May 15, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants