Skip to content

Querier error "expanding series: consistency check failed because some blocks were not queried" #4431

Closed
@saidben0

Description

@saidben0

We keep seeing this error in our querier micro-service each time we attempt to visualize our cortex metrics in Grafana; Grafana shows the same error as well.

"expanding series: consistency check failed because some blocks were not queried"

Grafana seems to be unable to query the metrics data that was pushed by cortex into Azure block storage. I am able to find the storage blocks, that querier/grafana complains about, in our Azure storage account.

We are deploying cortex 0.6.0 using the helm chart; find below our answers.yaml

store_gateway:
  replicas: 1
  extraArgs:
    log.level: debug
alertmanager:
  replicas: 1
  extraArgs:
    log.level: debug
distributor:
  extraArgs:
    log.level: debug
  resources:
    limits:
      cpu: 1
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 512Mi
tags:
  blocks-storage-memcached: true
ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: nginx
  hosts:
    - host: mycortex.com
      paths:
        - /

nginx:
  config:
    auth_orgs:
      - my-org
    client_max_body_size: 10M

query_frontend:
  config:
    max_send_msg_size: 36777216

config:
  alertmanager:
    external_url: /api/prom/alertmanager
  api:
    prometheus_http_prefix: /prometheus
  auth_enabled: true  
  ruler_storage:
    backend: azure
    azure:
      account_key: REDACTED
      account_name: mystorageacc
      container_name: mycontainer
  alertmanager_storage:
    backend: azure
    azure:
      account_key: REDACTED
      account_name: mystorageacc
      container_name: mycontainer
  blocks_storage:
    backend: azure
    tsdb:
      dir: /data/tsdb
    bucket_store:
      sync_dir: /data/tsdb-sync
    azure:
      account_key: REDACTED
      account_name: mystorageacc
      container_name: mycontainer
  chunk_store:
    chunk_cache_config:
      memcached:
        expiration: 1h
      memcached_client:
        timeout: 1s
  distributor:
    pool:
      health_check_ingesters: true
    shard_by_all_labels: true
  frontend:
    log_queries_longer_than: 10s
  ingester:
    lifecycler:
      final_sleep: 0s
      join_after: 0s
      num_tokens: 512
      ring:
        kvstore:
          consul:
            consistent_reads: true
            host: consul-cortex-headless:8500
            http_client_timeout: 20s
          prefix: collectors/
          store: consul
        replication_factor: 3
    max_transfer_retries: 0
  ingester_client:
    grpc_client_config:
      max_recv_msg_size: 104857600
      max_send_msg_size: 104857600
  limits:
    max_series_per_metric: 200000
    enforce_metric_name: false
    reject_old_samples: true
    reject_old_samples_max_age: 168h
  memberlist:
    join_members: []
  querier:
    active_query_tracker_dir: /data/cortex/querier
    query_ingesters_within: 12h
    store_gateway_addresses: cortex-store-gateway-headless:9095
  query_range:
    align_queries_with_step: true
    cache_results: true
    results_cache:
      cache:
        memcached:
          expiration: 1h
        memcached_client:
          timeout: 1s
    split_queries_by_interval: 24h
  ruler:
    enable_alertmanager_discovery: false
  schema:
    configs: []
  server:
    grpc_listen_port: 9095
    grpc_server_max_concurrent_streams: 1000 
    grpc_server_max_recv_msg_size: 104857600 
    grpc_server_max_send_msg_size: 104857600
    http_listen_port: 8080
  storage:
    engine: blocks
  table_manager:
    retention_deletes_enabled: false
    retention_period: 0s
memcached:
  enabled: true
memcached-index-read:
  enabled: true
memcached-index-write:
  enabled: true
memcached-frontend:
  enabled: true

querier logs

level=warn ts=2021-08-18T17:11:39.988658423Z caller=logging.go:71 traceID=259d69c469ec3ec2 msg="GET /api/prom/api/v1/query_range?end=1629306680&query=kube_pod_container_resource_requests_cpu_cores+%7Bprometheus_from%3D%22v2-ch4-non-prod%22%7D&start=1629285080&step=20 (500) 63.563488ms Response: \"{\\\"status\\\":\\\"error\\\",\\\"errorType\\\":\\\"internal\\\",\\\"error\\\":\\\"expanding series: consistency check failed because some blocks were not queried: 01FDCWEWBWQ1YQPZWMPR03BTM3 01FDCWEFND98TSC29JVJTJ8V4H 01FDCNKEQXS77AD4KKQYCKGEW2 01FDCNJRDCEG0JF11R43XHQ339 01FDCNK53WA5VW164XFZ85BH5K\\\"}\" ws: false; X-Scope-Orgid: my-org; uber-trace-id: 259d69c469ec3ec2:1b45f2f3216f3712:1c59b792da14ca61:0; " 
ts=2021-08-18T17:11:39.9941494Z caller=spanlogger.go:79 org_id=my-org traceID=259d69c469ec3ec2 method=blocksStoreQuerier.selectSorted level=warn msg="unable to get store-gateway clients while retrying to fetch missing blocks" err="no store-gateway instance left after filtering out excluded instances for block 01FDCWEWBWQ1YQPZWMPR03BTM3"

Looks like it might be related to store-gateway; no space left on device when it tries to create the index header

level=warn ts=2021-08-18T17:17:17.61930065Z caller=bucket.go:553 org_id=my-org msg="loading block failed" elapsed=1.937918531s id=01FDBK8GVWQV6B39G1TGXGQDRY err="create index header reader: write index header: 2 errors: copy symbols: write /data/tsdb-sync/my-org/01FDBK8GVWQV6B39G1TGXGQDRY/index-header.tmp: no space left on device; close binary writer for /data/tsdb-sync/my-org/01FDBK8GVWQV6B39G1TGXGQDRY/index-header.tmp: write /data/tsdb-sync/my-org/01FDBK8GVWQV6B39G1TGXGQDRY/index-header.tmp: no space left on device"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions