-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Thanos, Prometheus and Golang version used:
thanos, version 0.15.0 (branch: HEAD, revision: fbd14b4)
build user: circleci@b18149728583
build date: 20200907-09:47:14
go version: go1.14.2
Object Storage Provider:
Azure Blob
What happened:
The compactor is halted during compaction due to an "out-of-order chunks" error.
What you expected to happen:
Find a way to fix the block in some way.
I run the tools bucket verify --repair
command, hoping to get something repaired but nothing happened.
Looking at the doc, there is no indication of how fix this error; even if it has to be done manually, I see no link to any procedure or tool to help with that.
Only overlaps are mentioned in the doc
Looking at previous issues, the only possibly relevant one I've found is #267 but does not contain any guidance on how to fix it.
How to reproduce it (as minimally and precisely as possible):
Not sure what went wrong there; I can provide a link to the faulty block if needed.
Full logs to relevant components:
level=info ts=2020-11-12T14:58:41.326584314Z caller=http.go:57 service=http/server component=compact msg="listening for requests and metrics" address=0.0.0.0:10902
level=info ts=2020-11-12T14:58:41.429273647Z caller=compact.go:944 msg="start sync of metas"
level=info ts=2020-11-12T14:58:41.869377033Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=542.828221ms cached=96 returned=96 partial=0
level=info ts=2020-11-12T14:58:41.910604952Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=37.305665ms cached=96 returned=96 partial=0
level=info ts=2020-11-12T14:58:42.547706875Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=1.118392926s cached=96 returned=66 partial=0
level=info ts=2020-11-12T14:58:42.548690014Z caller=compact.go:949 msg="start of GC"
level=info ts=2020-11-12T14:58:42.548906222Z caller=compact.go:961 msg="start of compactions"
level=info ts=2020-11-12T14:58:42.862073622Z caller=compact.go:694 group="0@{cluster=\"prod1a\", prometheus=\"monitoring/prometheus-operator-prometheus\", prometheus_replica=\"prometheus-prometheus-operator-prometheus-0\"}" groupKey=0@7015077072790822706 msg="compaction available and planned; downloading blocks" plan="[/data/compact/0@7015077072790822706/01EPPAK8WB40STY1HXQMHAMM86 /data/compact/0@7015077072790822706/01EPPHF048Z2C1S63N0585APJC /data/compact/0@7015077072790822706/01EPPRAQC9XZNZBJRR4AQDKHC1 /data/compact/0@7015077072790822706/01EPQ00MQW0Z7FNN4N6PPDV8EH]"
level=info ts=2020-11-12T14:59:41.92372104Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=49.874159ms cached=96 returned=96 partial=0
level=info ts=2020-11-12T15:00:42.156560757Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=278.060821ms cached=96 returned=96 partial=0
level=error ts=2020-11-12T15:00:46.381674304Z caller=compact.go:377 msg="critical error detected; halting" err="compaction: group 0@7015077072790822706: block with not healthy index found /data/compact/0@7015077072790822706/01EPQ00MQW0Z7FNN4N6PPDV8EH; Compaction level 1; Labels: map[cluster:prod1a prometheus:monitoring/prometheus-operator-prometheus prometheus_replica:prometheus-prometheus-operator-prometheus-0]: 3/2844960 series have an average of 1.000 out-of-order chunks: 0.333 of these are exact duplicates (in terms of data and time range)"
Anything else we need to know: