Skip to content

[ingester] Ingester service state and lifecycler ring state not synchronized #8097

Open

Description

Background

The ingester runs as a BasicService and moves to the services.Running state after the starting() function completes.

As part of its starting() function, the ingester starts a ring.Lifecycler. Once started, the lifecycler auto-joins the ring, and moves the ingester's ring state to ring.ACTIVE as soon as it can.

The Problem

  • Once an ingester's ring state is ring.ACTIVE it becomes available for read requests.
  • When the ingester services is not in the services.Running state, the ingester will reject read requests.

Because of the above, starting the Lifecycler essentially starts a timer on the ingester service getting to the services.Running state. If the ingester's starting() function is still being executed when the ring state becomes ring.ACTIVE, the ingester will start receiving read requests, but reject them all with error ingester is unavailable (current state: Starting).

This isn't much of an issue if a single ingester enters this state, since reads are able to complete using other zones to achieve quorum. However, when ingesters are scaled up horizontally, instances are added to all zones at the same time. If instances in multiple zones are rejecting reads while in the services.Stating state, quorum can't be achieved, and we suffer a read outage.

Solution

Ideally, moving the ring state to ring.ACTIVE should be the last thing done in the ingester's starting() function (or the first thing done in its running() function) -- no other code should run in between those two events.

Unfortunately the existing ring.Lifecycler used by the ingester doesn't offer much control over when the switch to ring.ACTIVE occurs, since it auto-joins the ring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions