Skip to content

How to fix Compactor out-of-order chunks? #3442

@lorenzo-biava

Description

@lorenzo-biava

Thanos, Prometheus and Golang version used:

thanos, version 0.15.0 (branch: HEAD, revision: fbd14b4)
build user: circleci@b18149728583
build date: 20200907-09:47:14
go version: go1.14.2

Object Storage Provider:

Azure Blob

What happened:

The compactor is halted during compaction due to an "out-of-order chunks" error.

What you expected to happen:

Find a way to fix the block in some way.
I run the tools bucket verify --repair command, hoping to get something repaired but nothing happened.
Looking at the doc, there is no indication of how fix this error; even if it has to be done manually, I see no link to any procedure or tool to help with that.
Only overlaps are mentioned in the doc
Looking at previous issues, the only possibly relevant one I've found is #267 but does not contain any guidance on how to fix it.

How to reproduce it (as minimally and precisely as possible):

Not sure what went wrong there; I can provide a link to the faulty block if needed.

Full logs to relevant components:

Logs

level=info ts=2020-11-12T14:58:41.326584314Z caller=http.go:57 service=http/server component=compact msg="listening for requests and metrics" address=0.0.0.0:10902
level=info ts=2020-11-12T14:58:41.429273647Z caller=compact.go:944 msg="start sync of metas"
level=info ts=2020-11-12T14:58:41.869377033Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=542.828221ms cached=96 returned=96 partial=0
level=info ts=2020-11-12T14:58:41.910604952Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=37.305665ms cached=96 returned=96 partial=0
level=info ts=2020-11-12T14:58:42.547706875Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=1.118392926s cached=96 returned=66 partial=0
level=info ts=2020-11-12T14:58:42.548690014Z caller=compact.go:949 msg="start of GC"
level=info ts=2020-11-12T14:58:42.548906222Z caller=compact.go:961 msg="start of compactions"
level=info ts=2020-11-12T14:58:42.862073622Z caller=compact.go:694 group="0@{cluster=\"prod1a\", prometheus=\"monitoring/prometheus-operator-prometheus\", prometheus_replica=\"prometheus-prometheus-operator-prometheus-0\"}" groupKey=0@7015077072790822706 msg="compaction available and planned; downloading blocks" plan="[/data/compact/0@7015077072790822706/01EPPAK8WB40STY1HXQMHAMM86 /data/compact/0@7015077072790822706/01EPPHF048Z2C1S63N0585APJC /data/compact/0@7015077072790822706/01EPPRAQC9XZNZBJRR4AQDKHC1 /data/compact/0@7015077072790822706/01EPQ00MQW0Z7FNN4N6PPDV8EH]"
level=info ts=2020-11-12T14:59:41.92372104Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=49.874159ms cached=96 returned=96 partial=0
level=info ts=2020-11-12T15:00:42.156560757Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=278.060821ms cached=96 returned=96 partial=0
level=error ts=2020-11-12T15:00:46.381674304Z caller=compact.go:377 msg="critical error detected; halting" err="compaction: group 0@7015077072790822706: block with not healthy index found /data/compact/0@7015077072790822706/01EPQ00MQW0Z7FNN4N6PPDV8EH; Compaction level 1; Labels: map[cluster:prod1a prometheus:monitoring/prometheus-operator-prometheus prometheus_replica:prometheus-prometheus-operator-prometheus-0]: 3/2844960 series have an average of 1.000 out-of-order chunks: 0.333 of these are exact duplicates (in terms of data and time range)"

Anything else we need to know:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions