Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanos compactor: endless restart loop when 'block missing index file' #5363

Open
hawran opened this issue May 13, 2022 · 4 comments
Open

Thanos compactor: endless restart loop when 'block missing index file' #5363

hawran opened this issue May 13, 2022 · 4 comments

Comments

@hawran
Copy link

hawran commented May 13, 2022

Thanos, Prometheus and Golang version used:
thanos: 0.25.2
prometheus: 2.35.0
golang: 1.17.5

Object Storage Provider:
Thanos/s3

What happened:
I am not going to elaborate on the matter, this issue is reopening #1199.

What you expected to happen:
No restart loop.

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

Logs

level=info ts=2022-05-10T21:30:12.151102784Z caller=http.go:84 service=http/server component=compact msg="internal server is shutting down" err="error executing compaction: compaction: group 0@16580404985295954394: gather index issues for block data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C: open index file: try lock file: open data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C/index: no such file or directory"

level=warn ts=2022-05-10T21:30:12.150996864Z caller=intrumentation.go:67 msg="changing probe status" status=not-ready reason="error executing compaction: compaction: group 0@16580404985295954394: gather index issues for block data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C: open index file: try lock file: open data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C/index: no such file or directory"

level=error ts=2022-05-10T21:30:12.152881416Z caller=main.go:158 err="group 0@16580404985295954394: gather index issues for block data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C: open index file: try lock file: open data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C/index: no such file or directory\ncompaction\nmain.runCompact.func7\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:431\nmain.runCompact.func8.1\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:485\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\tgithub.com/thanos-io/thanos/pkg/runutil/runutil.go:75\nmain.runCompact.func8\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:484\ngithub.com/oklog/run.(*Group).Run.func1\n\tgithub.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\truntime/asm_amd64.s:1581\nerror executing compaction\nmain.runCompact.func8.1\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:512\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\tgithub.com/thanos-io/thanos/pkg/runutil/runutil.go:75\nmain.runCompact.func8\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:484\ngithub.com/oklog/run.(*Group).Run.func1\n\tgithub.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\truntime/asm_amd64.s:1581\ncompact command failed\nmain.main\n\tgithub.com/thanos-io/thanos/cmd/thanos/main.go:158\nruntime.main\n\truntime/proc.go:255\nruntime.goexit\n\truntime/asm_amd64.s:1581"

level=info ts=2022-05-10T21:30:12.152562807Z caller=intrumentation.go:81 msg="changing probe status" status=not-healthy reason="error executing compaction: compaction: group 0@16580404985295954394: gather index issues for block data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C: open index file: try lock file: open data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C/index: no such file or directory"

level=info ts=2022-05-10T21:30:12.152458354Z caller=http.go:103 service=http/server component=compact msg="internal server is shutdown gracefully" err="error executing compaction: compaction: group 0@16580404985295954394: gather index issues for block data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C: open index file: try lock file: open data/compact/0@16580404985295954394/01G2Q0W559A6HC3F2FBMX2H70C/index: no such file or directory"

Anything else we need to know:

@zaeemarshad
Copy link

seeing the same issue with Thanos 0.26 but with GCS. Unsure how it got into this state.

@stale
Copy link

stale bot commented Sep 21, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Sep 21, 2022
@hawran
Copy link
Author

hawran commented May 15, 2023

Hi, we're still experiencing this problem:

level=info ts=2023-05-15T08:05:11.744989977Z caller=compact.go:1005 group="0@{cluster_name=\"SERVER\", locality=\"LOC\", replica=\"prometheus-SERVER-0\"}" groupKey=0@4226472687758518923 msg="compaction available and planned; downloading blocks" plan="[01H0DT4J1AR2AS4YBS8JZC6ZPD (min time: 1684080000116, max time: 1684087200000) 01H0E1099D2JMJ3MW4XKXXRG4F (min time: 1684087200116, max time: 1684094400000) 01H0E7W0HJ9PAZVYG97KP6WARX (min time: 1684094400116, max time: 1684101600000) 01H0EEQQSJ7DNFCCM35MV6PH31 (min time: 1684101600116, max time: 1684108800000)]"
level=warn ts=2023-05-15T08:05:14.632819592Z caller=intrumentation.go:67 msg="changing probe status" status=not-ready reason="error executing compaction: compaction: group 0@4226472687758518923: gather index issues for block data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31: open index file: try lock file: open data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31/index: no such file or directory"
level=info ts=2023-05-15T08:05:14.63288286Z caller=http.go:91 service=http/server component=compact msg="internal server is shutting down" err="error executing compaction: compaction: group 0@4226472687758518923: gather index issues for block data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31: open index file: try lock file: open data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31/index: no such file or directory"
level=info ts=2023-05-15T08:05:14.633208803Z caller=http.go:110 service=http/server component=compact msg="internal server is shutdown gracefully" err="error executing compaction: compaction: group 0@4226472687758518923: gather index issues for block data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31: open index file: try lock file: open data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31/index: no such file or directory"
level=info ts=2023-05-15T08:05:14.633299157Z caller=intrumentation.go:81 msg="changing probe status" status=not-healthy reason="error executing compaction: compaction: group 0@4226472687758518923: gather index issues for block data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31: open index file: try lock file: open data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31/index: no such file or directory"
level=error ts=2023-05-15T08:05:14.633586832Z caller=main.go:161 err="group 0@4226472687758518923: gather index issues for block data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31: open index file: try lock file: open data/compact/0@4226472687758518923/01H0EEQQSJ7DNFCCM35MV6PH31/index: no such file or directory\ncompaction\nmain.runCompact.func7\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:423\nmain.runCompact.func8.1\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:477\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\tgithub.com/thanos-io/thanos/pkg/runutil/runutil.go:74\nmain.runCompact.func8\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:476\ngithub.com/oklog/run.(*Group).Run.func1\n\tgithub.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\truntime/asm_amd64.s:1594\nerror executing compaction\nmain.runCompact.func8.1\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:504\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\tgithub.com/thanos-io/thanos/pkg/runutil/runutil.go:74\nmain.runCompact.func8\n\tgithub.com/thanos-io/thanos/cmd/thanos/compact.go:476\ngithub.com/oklog/run.(*Group).Run.func1\n\tgithub.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\truntime/asm_amd64.s:1594\ncompact command failed\nmain.main\n\tgithub.com/thanos-io/thanos/cmd/thanos/main.go:161\nruntime.main\n\truntime/proc.go:250\nruntime.goexit\n\truntime/asm_amd64.s:1594"

thanos: 0.31.0
prometheus: 2.43.1
golang: 1.19.8

Why is compactor not just halted, hence thanos_compact_halted = 1, hence we can rise an alert without endless restarting?

@stale stale bot removed the stale label May 15, 2023
@muhammadn
Copy link

Apparently the block is not downloaded to the local container and after downloading meta.json, compactor just "cancel" the context and didn't do anything else after that.

I did run some debug to see what is the error and this came out (see the "context cancelled")

ts=2023-07-21T05:47:09.356685387Z caller=block.go:56 group="0@{cluster=\"test\", replica=\"rhb"}" groupKey=0@10090106770770758544 ZAIHAN="ERROR DOWNLOADING FILES" error="get file 01H5SK5PZ8BZ7CN11QV6EX9SF0/meta.json: context canceled"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants