Skip to content

Metrics show append failures but logs show none #5803

Open
@jakubgs

Description

@jakubgs

Describe the bug
I'm seeing lots of errors from cortex_distributor_ingester_append_failures_total metric on one of our distributors:

cortex_distributor_ingester_append_failures_total{ingester="10.10.0.211:9095",status="4xx",type="samples"} 318
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.211:9095",status="5xx",type="metadata"} 14
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.211:9095",status="5xx",type="samples"} 1670
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.212:9095",status="4xx",type="samples"} 248
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.212:9095",status="5xx",type="metadata"} 13
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.212:9095",status="5xx",type="samples"} 1991
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.213:9095",status="4xx",type="samples"} 68
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.213:9095",status="5xx",type="samples"} 23041
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.214:9095",status="4xx",type="samples"} 128
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.214:9095",status="5xx",type="metadata"} 44
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.214:9095",status="5xx",type="samples"} 5642
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.218:9095",status="4xx",type="samples"} 97
cortex_distributor_ingester_append_failures_total{ingester="10.10.0.218:9095",status="5xx",type="samples"} 36903

These errors can be seen in the graph:

image

But when I log onto affected distributor and ingester I cannot find any errors logged, even on debug level:

ts=2024-03-06T09:53:16.116085832Z caller=grpc_logging.go:46 level=debug method=/cortex.Ingester/Push duration=4.272868ms msg="gRPC (success)"
ts=2024-03-06T09:53:16.133388069Z caller=grpc_logging.go:46 level=debug method=/cortex.Ingester/Push duration=5.270535ms msg="gRPC (success)"
ts=2024-03-06T09:53:16.146464557Z caller=grpc_logging.go:46 level=debug method=/cortex.Ingester/Push duration=4.954335ms msg="gRPC (success)"
ts=2024-03-06T09:53:16.157419974Z caller=grpc_logging.go:46 level=debug method=/cortex.Ingester/Push duration=4.418061ms msg="gRPC (success)"

To Reproduce
I have no idea.

Expected behavior
Either Distributor or Ingester shows errors in the logs so the issue can be actually debugged.

Environment:
Prometheus 2.50.1 sending to Cortex 1.16.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions