Skip to content

Periodic consistency check errors during queries with tenant federation enabled #5365

Open
@blovett

Description

@blovett

Describe the bug

We periodically see errors claiming consistency checks failed when making queries with tenant federation enabled. The blocks that it reports issues with are not a part of the tenant that has the data.

When this happens, we get messages like this in the query frontend logs:

[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=debug ts=2023-05-26T17:46:54.654382213Z caller=results_cache.go:374 traceID=5a06f1b765aa8748 msg="handle miss" start=1683676800000 spanID=692181cda7283b7f
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:54.759380243Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=0 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:54.901137753Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=1 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:54.954194856Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=2 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:55.018502766Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=3 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=error ts=2023-05-26T17:46:55.401956233Z caller=retry.go:79 traceID=5a06f1b765aa8748 msg="error processing request" try=4 err="rpc error: code = Code(500) desc = {\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\"}"
[pod/v1-cortex-query-frontend-cfc6c9d69-t8w9j/query-frontend] level=warn ts=2023-05-26T17:46:55.402152993Z caller=logging.go:86 traceID=5a06f1b765aa8748 msg="GET /prometheus/api/v1/query_range?query=customer:rts_BWbits:sum&start=1683676800&end=1683763200&step=300 (500) 747.968778ms Response: \"{\\\"status\\\":\\\"error\\\",\\\"errorType\\\":\\\"internal\\\",\\\"error\\\":\\\"expanding series: error querying tenant_id fake: consistency check failed because some blocks were not queried: 01H02N308MZ51VVG77WAZXWHRR\\\"}\" ws: false; Accept: */*; Connection: close; User-Agent: curl/7.88.1; X-Scope-Orgid: rts|fake; "

Whereas the successful query shows up like:

[pod/v1-cortex-query-frontend-cfc6c9d69-wh2ph/query-frontend] level=debug ts=2023-05-26T17:48:52.834813942Z caller=results_cache.go:374 org_id=rts traceID=20a4ac98a88c3d61 msg="handle miss" start=1683676800000 spanID=53ca2a15ef6f8322
[pod/v1-cortex-query-frontend-cfc6c9d69-wh2ph/query-frontend] level=debug ts=2023-05-26T17:48:52.939481535Z caller=logging.go:76 traceID=20a4ac98a88c3d61 msg="GET /prometheus/api/v1/query_range?query=customer:rts_BWbits:sum&start=1683676800&end=1683763200&step=300 (200) 105.020042ms"

To Reproduce

Steps to reproduce the behavior:

  1. Start Cortex 1.14.1
  2. Perform federated query

Expected behavior

I'd expect it to not error like this. I'm not sure what else to say.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: Helm

Additional Context

Storage gateway logs: https://gist.github.com/blovett/84b08f2608f3cccf2cf4865c485720db
I also included logs above. But, if there are more that I can provide that could help troubleshoot this, please let me know.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions