Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compactor: error executing compaction: invalid size #6380

Open
oleksiytsyban opened this issue May 18, 2023 · 2 comments
Open

Compactor: error executing compaction: invalid size #6380

oleksiytsyban opened this issue May 18, 2023 · 2 comments

Comments

@oleksiytsyban
Copy link

oleksiytsyban commented May 18, 2023

Thanos, Prometheus and Golang version used:
Deployed via Helm.
docker.io/bitnami/thanos:0.31.0-scratch-r1
quay.io/prometheus/prometheus:v2.43.0-stringlabels

Object Storage Provider: S3

What happened: Compactor crashes attempting to compacts recently created block. It began soon after enabling vertical compaction.
The block details:

01H0R4SZD0FRV0WYF8DJT2FWHC block:

Start Time: April 28, 2023 6:00 PM
End Time: May 10, 2023 6:00 PM
Duration: 12 days
Series: 10637711
Samples: 3787156705
Chunks: 50901130
Resolution: 300000
Level: 6
Source: compactor

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

level=info ts=2023-05-18T19:25:13.725920093Z caller=compact.go:460 msg="compact blocks" count=6 mint=1682726400001 maxt=1683763200000 ulid=01H0R4SZD0FRV0WYF8DJT2FWHC sources="[01H0PSNM9MRHAV20CKQFK6MSPZ 01H0PWZW4K5X4YWQJJ8MJRSP41 01H0PXN7H0J09CVQC9J793CSAW 01H0PYC0QQ4A7M4PWCXW3CW7X3 01H0PZ1790GD3AGQ4HGXRB639B 01H0PZPS6N1FPDZW086CPF7CEY]" duration=6m22.770877172s
level=info ts=2023-05-18T19:25:13.874672888Z caller=compact.go:1097 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="compacted blocks" new=01H0R4SZD0FRV0WYF8DJT2FWHC blocks="[/data/compact/300000@10792396202523153392/01H0PSNM9MRHAV20CKQFK6MSPZ /data/compact/300000@10792396202523153392/01H0PWZW4K5X4YWQJJ8MJRSP41 /data/compact/300000@10792396202523153392/01H0PXN7H0J09CVQC9J793CSAW /data/compact/300000@10792396202523153392/01H0PYC0QQ4A7M4PWCXW3CW7X3 /data/compact/300000@10792396202523153392/01H0PZ1790GD3AGQ4HGXRB639B /data/compact/300000@10792396202523153392/01H0PZPS6N1FPDZW086CPF7CEY]" duration=6m22.919629574s duration_ms=382919 overlapping_blocks=false

level=info ts=2023-05-18T19:28:16.8484575Z caller=compact.go:1169 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="marking compacted block for deletion" old_block=01H0PWZW4K5X4YWQJJ8MJRSP41
level=warn ts=2023-05-18T19:28:16.865935134Z caller=block.go:185 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01H0PWZW4K5X4YWQJJ8MJRSP41/deletion-mark.json already exists in bucket"
level=info ts=2023-05-18T19:28:16.904473356Z caller=compact.go:1169 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="marking compacted block for deletion" old_block=01H0PXN7H0J09CVQC9J793CSAW
level=warn ts=2023-05-18T19:28:16.91392193Z caller=block.go:185 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01H0PXN7H0J09CVQC9J793CSAW/deletion-mark.json already exists in bucket"
level=info ts=2023-05-18T19:28:16.951007978Z caller=compact.go:1169 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="marking compacted block for deletion" old_block=01H0PYC0QQ4A7M4PWCXW3CW7X3
level=warn ts=2023-05-18T19:28:16.961776405Z caller=block.go:185 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01H0PYC0QQ4A7M4PWCXW3CW7X3/deletion-mark.json already exists in bucket"
level=info ts=2023-05-18T19:28:16.998395388Z caller=compact.go:1169 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="marking compacted block for deletion" old_block=01H0PZ1790GD3AGQ4HGXRB639B
level=warn ts=2023-05-18T19:28:17.009230178Z caller=block.go:185 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01H0PZ1790GD3AGQ4HGXRB639B/deletion-mark.json already exists in bucket"
level=info ts=2023-05-18T19:28:17.060040144Z caller=compact.go:1169 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="marking compacted block for deletion" old_block=01H0PZPS6N1FPDZW086CPF7CEY
level=warn ts=2023-05-18T19:28:17.07085462Z caller=block.go:185 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01H0PZPS6N1FPDZW086CPF7CEY/deletion-mark.json already exists in bucket"
level=info ts=2023-05-18T19:28:17.070882193Z caller=compact.go:1156 group="300000@{cluster_location="dca06", cluster_name="cops-eks01-dca06", environment="ai-prod", group_name="dca06-c01-acc00", object_name="dca06-c01-acc00", prometheus="monitoring/cprc-kube-prometheus-stack-prometheus", rc_layer="c01", source_id="ai-dca06"}" groupKey=300000@10792396202523153392 msg="finished compacting blocks" result_block=01H0R4SZD0FRV0WYF8DJT2FWHC source_blocks="[/data/compact/300000@10792396202523153392/01H0PSNM9MRHAV20CKQFK6MSPZ /data/compact/300000@10792396202523153392/01H0PWZW4K5X4YWQJJ8MJRSP41 /data/compact/300000@10792396202523153392/01H0PXN7H0J09CVQC9J793CSAW /data/compact/300000@10792396202523153392/01H0PYC0QQ4A7M4PWCXW3CW7X3 /data/compact/300000@10792396202523153392/01H0PZ1790GD3AGQ4HGXRB639B /data/compact/300000@10792396202523153392/01H0PZPS6N1FPDZW086CPF7CEY]" duration=16m2.001473681s duration_ms=962001

level=info ts=2023-05-18T19:33:12.148405863Z caller=downsample.go:356 msg="downloaded block" id=01H0R4SZD0FRV0WYF8DJT2FWHC duration=4m50.253160486s duration_ms=290253
level=info ts=2023-05-18T19:33:16.084442318Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=1.888808102s duration_ms=1888 cached=1871 returned=1871 partial=0
level=info ts=2023-05-18T19:33:58.249950384Z caller=streamed_block_writer.go:178 msg="finalized downsampled block" mint=1682726400001 maxt=1683763200000 ulid=01H0R5NM2NVS95B1J5YCT0Q0ER resolution=3600000
level=warn ts=2023-05-18T19:33:58.257884095Z caller=intrumentation.go:67 msg="changing probe status" status=not-ready reason="error executing compaction: first pass of downsampling failed: downsampling to 60 min: downsample block 01H0R4SZD0FRV0WYF8DJT2FWHC to window 3600000: downsample aggregate block, series: 1536060: invalid size"
level=info ts=2023-05-18T19:33:58.257924645Z caller=http.go:91 service=http/server component=compact msg="internal server is shutting down" err="error executing compaction: first pass of downsampling failed: downsampling to 60 min: downsample block 01H0R4SZD0FRV0WYF8DJT2FWHC to window 3600000: downsample aggregate block, series: 1536060: invalid size"
level=info ts=2023-05-18T19:33:58.258041902Z caller=http.go:110 service=http/server component=compact msg="internal server is shutdown gracefully" err="error executing compaction: first pass of downsampling failed: downsampling to 60 min: downsample block 01H0R4SZD0FRV0WYF8DJT2FWHC to window 3600000: downsample aggregate block, series: 1536060: invalid size"
level=info ts=2023-05-18T19:33:58.258060949Z caller=intrumentation.go:81 msg="changing probe status" status=not-healthy reason="error executing compaction: first pass of downsampling failed: downsampling to 60 min: downsample block 01H0R4SZD0FRV0WYF8DJT2FWHC to window 3600000: downsample aggregate block, series: 1536060: invalid size"
level=error ts=2023-05-18T19:33:58.258171883Z caller=main.go:161 err="downsampling to 60 min: downsample block 01H0R4SZD0FRV0WYF8DJT2FWHC to window 3600000: downsample aggregate block, series: 1536060: invalid size\nfirst pass of downsampling failed\nmain.runCompact.func7\n\t/app/cmd/thanos/compact.go:441\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:477\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:74\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:476\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\nerror executing compaction\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:504\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:74\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:476\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\ncompact command failed\nmain.main\n\t/app/cmd/thanos/main.go:161\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:159

Anything else we need to know:

@yeya24
Copy link
Contributor

yeya24 commented May 26, 2023

Hi @oleksiytsyban, thanks for the report. Did you enable native histograms with your Prometheus? Just want to double check if it was caused by something new

@oleksiytsyban
Copy link
Author

Hi @oleksiytsyban, thanks for the report. Did you enable native histograms with your Prometheus?

Hi. No, we did not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants