You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanos, Prometheus and Golang version used: 0.32.2
Object Storage Provider: Azure
What happened: Compactor fails to downsample a block with the following error:
ts=2023-09-04T08:56:35.688668329Z caller=intrumentation.go:67 level=warn msg="changing probe status" status=not-ready reason="error executing compaction: first pass of downsampling failed: downsampling to 60 min: downsample block 01H9FB4YMTT0BNKVRVQKHNRXDF to window 3600000: downsample aggregate block, series: 1899: invalid size"
What you expected to happen: The block can be downsampled from 5m to 60m resolution
How to reproduce it (as minimally and precisely as possible): This is quite complicated. I will try and describe what happened as precisely as possible:
run Thanos for > 1 year without deduping in the compactors -> many individual streams in bucket with receive_replica external labels and same tenant_id external label
enable compactor deduplication by setting --deduplication.replica-label=receive_replica
enable penalty algorithm (----deduplication.func=penalty), as compaction of already downsampled blocks was failing, presumably due to failing assumptions of one-to-one dedupe algorithm
leave to run for 2 days -> all seemed to be working
downsampling of the first 5m block that was formed from already downsampled blocks fails with the given error
Full logs to relevant components:
Logs
ts=2023-09-04T08:56:33.626230473Z caller=compact.go:1419 level=info msg="start of GC"
ts=2023-09-04T08:56:33.626417909Z caller=compact.go:1442 level=info msg="start of compactions"
ts=2023-09-04T08:56:33.626660602Z caller=compact.go:1478 level=info msg="compaction iterations done"
ts=2023-09-04T08:56:33.626689005Z caller=compact.go:434 level=info msg="start first pass of downsampling"
ts=2023-09-04T08:56:33.729098296Z caller=fetcher.go:487 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=102.795866ms duration_ms=102 cached=146 returned=35 partial=0
ts=2023-09-04T08:56:33.829870428Z caller=fetcher.go:487 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=100.653163ms duration_ms=100 cached=146 returned=35 partial=0
ts=2023-09-04T08:56:35.016484486Z caller=downsample.go:362 level=info msg="downloaded block" id=01H9FB4YMTT0BNKVRVQKHNRXDF duration=1.185885634s duration_ms=1185
ts=2023-09-04T08:56:35.688252109Z caller=streamed_block_writer.go:178 level=info msg="finalized downsampled block" mint=1692230401392 maxt=1693440000000 ulid=01H9FPGWFVHQE2CXF8G5DHCFNP resolution=3600000
ts=2023-09-04T08:56:35.688668329Z caller=intrumentation.go:67 level=warn msg="changing probe status" status=not-ready reason="error executing compaction: first pass of downsampling failed: downsampling to 60 min: downsample block 01H9FB4YMTT0BNKVRVQKHNRXDF to window 3600000: downsample aggregate block, series: 1899: invalid size"
ts=2023-09-04T08:56:35.688691523Z caller=http.go:91 level=info service=http/server component=compact msg="internal server is shutting down" err="error executing compaction: first pass of downsampling failed: downsampling to 60 min: downsample block 01H9FB4YMTT0BNKVRVQKHNRXDF to window 3600000: downsample aggregate block, series: 1899: invalid size"
ts=2023-09-04T08:56:35.688735665Z caller=http.go:110 level=info service=http/server component=compact msg="internal server is shutdown gracefully" err="error executing compaction: first pass of downsampling failed: downsampling to 60 min: downsample block 01H9FB4YMTT0BNKVRVQKHNRXDF to window 3600000: downsample aggregate block, series: 1899: invalid size"
ts=2023-09-04T08:56:35.688751953Z caller=intrumentation.go:81 level=info msg="changing probe status" status=not-healthy reason="error executing compaction: first pass of downsampling failed: downsampling to 60 min: downsample block 01H9FB4YMTT0BNKVRVQKHNRXDF to window 3600000: downsample aggregate block, series: 1899: invalid size"
ts=2023-09-04T08:56:35.688861491Z caller=main.go:161 level=error err="downsampling to 60 min: downsample block 01H9FB4YMTT0BNKVRVQKHNRXDF to window 3600000: downsample aggregate block, series: 1899: invalid size\nfirst pass of downsampling failed\nmain.runCompact.func7\n\t/app/cmd/thanos/compact.go:445\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:481\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:74\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:480\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nerror executing compaction\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:508\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:74\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:480\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\ncompact command failed\nmain.main\n\t/app/cmd/thanos/main.go:161\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"
Anything else we need to know:
The text was updated successfully, but these errors were encountered:
Thanos, Prometheus and Golang version used: 0.32.2
Object Storage Provider: Azure
What happened: Compactor fails to downsample a block with the following error:
What you expected to happen: The block can be downsampled from 5m to 60m resolution
How to reproduce it (as minimally and precisely as possible): This is quite complicated. I will try and describe what happened as precisely as possible:
receive_replica
external labels and sametenant_id
external label--deduplication.replica-label=receive_replica
----deduplication.func=penalty
), as compaction of already downsampled blocks was failing, presumably due to failing assumptions ofone-to-one
dedupe algorithmFull logs to relevant components:
Anything else we need to know:
The text was updated successfully, but these errors were encountered: