Closed
Description
We keep seeing this error in our querier micro-service each time we attempt to visualize our cortex metrics in Grafana; Grafana shows the same error as well.
"expanding series: consistency check failed because some blocks were not queried"
Grafana seems to be unable to query the metrics data that was pushed by cortex into Azure block storage. I am able to find the storage blocks, that querier/grafana complains about, in our Azure storage account.
We are deploying cortex 0.6.0 using the helm chart; find below our answers.yaml
store_gateway:
replicas: 1
extraArgs:
log.level: debug
alertmanager:
replicas: 1
extraArgs:
log.level: debug
distributor:
extraArgs:
log.level: debug
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 100m
memory: 512Mi
tags:
blocks-storage-memcached: true
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
hosts:
- host: mycortex.com
paths:
- /
nginx:
config:
auth_orgs:
- my-org
client_max_body_size: 10M
query_frontend:
config:
max_send_msg_size: 36777216
config:
alertmanager:
external_url: /api/prom/alertmanager
api:
prometheus_http_prefix: /prometheus
auth_enabled: true
ruler_storage:
backend: azure
azure:
account_key: REDACTED
account_name: mystorageacc
container_name: mycontainer
alertmanager_storage:
backend: azure
azure:
account_key: REDACTED
account_name: mystorageacc
container_name: mycontainer
blocks_storage:
backend: azure
tsdb:
dir: /data/tsdb
bucket_store:
sync_dir: /data/tsdb-sync
azure:
account_key: REDACTED
account_name: mystorageacc
container_name: mycontainer
chunk_store:
chunk_cache_config:
memcached:
expiration: 1h
memcached_client:
timeout: 1s
distributor:
pool:
health_check_ingesters: true
shard_by_all_labels: true
frontend:
log_queries_longer_than: 10s
ingester:
lifecycler:
final_sleep: 0s
join_after: 0s
num_tokens: 512
ring:
kvstore:
consul:
consistent_reads: true
host: consul-cortex-headless:8500
http_client_timeout: 20s
prefix: collectors/
store: consul
replication_factor: 3
max_transfer_retries: 0
ingester_client:
grpc_client_config:
max_recv_msg_size: 104857600
max_send_msg_size: 104857600
limits:
max_series_per_metric: 200000
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
memberlist:
join_members: []
querier:
active_query_tracker_dir: /data/cortex/querier
query_ingesters_within: 12h
store_gateway_addresses: cortex-store-gateway-headless:9095
query_range:
align_queries_with_step: true
cache_results: true
results_cache:
cache:
memcached:
expiration: 1h
memcached_client:
timeout: 1s
split_queries_by_interval: 24h
ruler:
enable_alertmanager_discovery: false
schema:
configs: []
server:
grpc_listen_port: 9095
grpc_server_max_concurrent_streams: 1000
grpc_server_max_recv_msg_size: 104857600
grpc_server_max_send_msg_size: 104857600
http_listen_port: 8080
storage:
engine: blocks
table_manager:
retention_deletes_enabled: false
retention_period: 0s
memcached:
enabled: true
memcached-index-read:
enabled: true
memcached-index-write:
enabled: true
memcached-frontend:
enabled: true
querier logs
level=warn ts=2021-08-18T17:11:39.988658423Z caller=logging.go:71 traceID=259d69c469ec3ec2 msg="GET /api/prom/api/v1/query_range?end=1629306680&query=kube_pod_container_resource_requests_cpu_cores+%7Bprometheus_from%3D%22v2-ch4-non-prod%22%7D&start=1629285080&step=20 (500) 63.563488ms Response: \"{\\\"status\\\":\\\"error\\\",\\\"errorType\\\":\\\"internal\\\",\\\"error\\\":\\\"expanding series: consistency check failed because some blocks were not queried: 01FDCWEWBWQ1YQPZWMPR03BTM3 01FDCWEFND98TSC29JVJTJ8V4H 01FDCNKEQXS77AD4KKQYCKGEW2 01FDCNJRDCEG0JF11R43XHQ339 01FDCNK53WA5VW164XFZ85BH5K\\\"}\" ws: false; X-Scope-Orgid: my-org; uber-trace-id: 259d69c469ec3ec2:1b45f2f3216f3712:1c59b792da14ca61:0; "
ts=2021-08-18T17:11:39.9941494Z caller=spanlogger.go:79 org_id=my-org traceID=259d69c469ec3ec2 method=blocksStoreQuerier.selectSorted level=warn msg="unable to get store-gateway clients while retrying to fetch missing blocks" err="no store-gateway instance left after filtering out excluded instances for block 01FDCWEWBWQ1YQPZWMPR03BTM3"
Looks like it might be related to store-gateway; no space left on device
when it tries to create the index header
level=warn ts=2021-08-18T17:17:17.61930065Z caller=bucket.go:553 org_id=my-org msg="loading block failed" elapsed=1.937918531s id=01FDBK8GVWQV6B39G1TGXGQDRY err="create index header reader: write index header: 2 errors: copy symbols: write /data/tsdb-sync/my-org/01FDBK8GVWQV6B39G1TGXGQDRY/index-header.tmp: no space left on device; close binary writer for /data/tsdb-sync/my-org/01FDBK8GVWQV6B39G1TGXGQDRY/index-header.tmp: write /data/tsdb-sync/my-org/01FDBK8GVWQV6B39G1TGXGQDRY/index-header.tmp: no space left on device"