Skip to content

Rulers should be able to retry when there is a failure to write to ingesters #5938

Open
@rapphil

Description

@rapphil

Is your feature request related to a problem? Please describe.

When rules are evaluated, ruler writes to Ingester the Vector resulting from the evaluation of the rule using a gprc client. In case ingesters are undergoing a transient error, data is lost resulting in the following message in the logs which can associated with different types of grpc errors:

Rule sample appending failed

As far I can tell, cortex is not configuring the grpc clients with a retry policy and hence failed gprc requests are never retried..

Describe the solution you'd like
I'd like to be possible to configure rulers to retry on failed grpc requests to ingesters so that it is possible to recover from transient errors gracefully without loosing any data. This could be potentially implemented for other components.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    component/rulesBits & bobs todo with rules and alerts: the ruler, config service etc.stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions