Skip to content

Compactor exposes error metric when failing over to another instance #16388

Open
@jan-kantert

Description

Describe the bug
When the loki-backend instance which runs compactor shuts down and comes back up again we see the failure count increase in the oki_boltdb_shipper_compact_tables_operation_total metric.

To Reproduce
Steps to reproduce the behavior:

  1. Started Loki (3.3.2) in scalable mode
  2. Restart the pod running compactor
  3. Observe delta(loki_boltdb_shipper_compact_tables_operation_total{status="failure"}[5m]) via prometheus (or query the metric endpoint)

Expected behavior
When the loki-backend instance which runs compactor restarts we expect a graceful failover of the compactor. We expect loki_boltdb_shipper_compact_tables_operation_total to not count anyfailure in that case.

Environment:

  • Infrastructure: Kubernetes 1.30
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output
Log lines when this happens:

loki-backend-2 - info: finished compacting table 
loki-backend-2 - info: compacting table 
loki-backend-2 - info: finished compacting table 
loki-backend-1 - info: this instance has been chosen to run the compactor, starting compactor 
loki-backend-1 - info: waiting 10m0s for ring to stay stable and previous compactions to finish before starting compactor 
loki-backend-2 - info: compactor exiting 
loki-backend-2 - info: waiting until compactor is JOINING in the ring 
loki-backend-2 - info: compactor is JOINING in the ring 
loki-backend-2 - info: waiting until compactor is ACTIVE in the ring 
loki-backend-2 - info: compactor is ACTIVE in the ring 
loki-backend-1 - info: this instance should no longer run the compactor, stopping compactor 
loki-backend-1 - info: compactor stopped 
loki-backend-1 - error: failed to run compaction - failed to list tables: RequestCanceled: request context canceled
caused by: context canceled
loki-backend-2 - info: this instance has been chosen to run the compactor, starting compactor 
loki-backend-1 - info: compactor started 
loki-backend-2 - info: waiting 10m0s for ring to stay stable and previous compactions to finish before starting compactor 

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions