Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidecar: Do not crash when Object Storage is not accessible #7585

Open
ahurtaud opened this issue Aug 2, 2024 · 2 comments
Open

sidecar: Do not crash when Object Storage is not accessible #7585

ahurtaud opened this issue Aug 2, 2024 · 2 comments

Comments

@ahurtaud
Copy link
Contributor

ahurtaud commented Aug 2, 2024

Is your proposal related to a problem?

Also related to objstore project.

We had a network outage accessing our storage endpoint. (DNS failure)
when sidecar restarted it then go into crashloop with :

ts=2024-08-02T08:22:02.642324362Z caller=main.go:145 level=error err="
Get \"https://<redacted>.privatelink.blob.core.windows.net/<container>?restype=container\": dial tcp: lookup <redacted>.privatelink.blob.core.windows.net on xx.xx.xx.xx:53: no such host\ncreate AZURE client\ngithub.com/thanos-io/objstore/client.NewBucket
	/go/pkg/mod/github.com/thanos-io/objstore@v0.0.0-20240309075357-e8336a5fd5f3/client/factory.go:90\nmain.runSidecar
	/app/cmd/thanos/sidecar.go:327\nmain.registerSidecar.func1
	/app/cmd/thanos/sidecar.go:104\nmain.main
	/app/cmd/thanos/main.go:143\nruntime.main
	/usr/local/go/src/runtime/proc.go:267\nruntime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650\npreparing sidecar command failed\nmain.main
	/app/cmd/thanos/main.go:145\nruntime.main
	/usr/local/go/src/runtime/proc.go:267\nruntime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650"

While we consider objectstorage for long term metrics only, we would like sidecar to continue to serve prometheus read path and not crash.

Describe the solution you'd like

Could this error become a warning. And we would alert on a failing metrics or so instead of crashing.

Additional context

Thanos v0.35.0
ObjStore Azure

@yeya24
Copy link
Contributor

yeya24 commented Aug 3, 2024

I think it is a valid issue. Help wanted.

amaury-d added a commit to amaury-d/thanos that referenced this issue Sep 6, 2024
…nos-io#7585

The goal is to allow sidecar to start to continue to serve prometheus read path if objstore is not available at startup.
Bucket creation will be attempted again on next upload.

This commit brings a new counter to alert in case of buck initialization crash:
`thanos_sidecar_upload_failures_total{reason="bucket_initialization"}`

Signed-off-by: Amaury Decrême <amaury.decreme@gmail.com>
amaury-d added a commit to amaury-d/thanos that referenced this issue Sep 6, 2024
…nos-io#7585

The goal is to allow sidecar to start to continue to serve prometheus read path if objstore is not available at startup.
Bucket creation will be attempted again on next upload.

This commit brings a new counter to alert in case of buck initialization crash:
`thanos_sidecar_upload_failures_total{reason="bucket_initialization"}`

Signed-off-by: Amaury Decrême <amaury.decreme@gmail.com>
amaury-d added a commit to amaury-d/thanos that referenced this issue Sep 9, 2024
…nos-io#7585

The goal is to allow sidecar to start to continue to serve prometheus read path if objstore is not available at startup.
Bucket creation will be attempted again on next upload.

This commit brings a new counter to alert in case of buck initialization crash:
`thanos_sidecar_upload_failures_total{reason="bucket_initialization"}`

Signed-off-by: Amaury Decrême <amaury.decreme@gmail.com>
amaury-d added a commit to amaury-d/thanos that referenced this issue Sep 9, 2024
…nos-io#7585

The goal is to allow sidecar to start to continue to serve prometheus read path if objstore is not available at startup.
Bucket creation will be attempted again on next upload.

This commit brings a new counter to alert in case of buck initialization crash:
`thanos_sidecar_upload_failures_total{reason="bucket_initialization"}`

Signed-off-by: Amaury Decrême <amaury.decreme@gmail.com>
amaury-d added a commit to amaury-d/thanos that referenced this issue Sep 10, 2024
…nos-io#7585

This commit allows sidecar to continue to serve prometheus read path if objstore is not available at startup.
Bucket creation will be attempted again on next upload.

This commit brings a new metric to alert in case of bucket initialization crash: thanos_sidecar_shipper_up

Signed-off-by: Amaury Decrême <amaury.decreme@gmail.com>
amaury-d added a commit to amaury-d/thanos that referenced this issue Sep 10, 2024
…nos-io#7585

This commit allows sidecar to continue to serve prometheus read path if objstore is not available at startup.
Bucket creation will be attempted again on next upload.

This commit brings a new metric to alert in case of bucket initialization crash: thanos_sidecar_shipper_up

Signed-off-by: Amaury Decrême <amaury.decreme@gmail.com>
@amaury-d
Copy link

amaury-d commented Sep 19, 2024

After a discussion with @MichaHoffmann, we came to realise that sidecar crashing can be useful for some users that rely on it to "detect" when something is wrong (like an uninitialised S3 bucket).

While it was suggested to add a metric to alert on the situation, such situations go could unnoticed.

I suggest to let sidecar crash by default and add an option to allow sidecar to continue to serve prometheus read path even if the objstore is not working.

amaury-d added a commit to amaury-d/thanos that referenced this issue Sep 24, 2024
…7585

Signed-off-by: Amaury Decrême <amaury.decreme@gmail.com>
amaury-d added a commit to amaury-d/thanos that referenced this issue Sep 24, 2024
This commit allows sidecar to continue to serve prometheus read path if objstore is not available at startup.
Bucket creation will be attempted again on next upload.

This option is disabled by default and can be enabled by passing
argument "--shipper.retry-init".

This commit also brings a new metric to alert in case of bucket initialization crash: thanos_sidecar_shipper_up

Signed-off-by: Amaury Decrême <amaury.decreme@gmail.com>
amaury-d added a commit to amaury-d/thanos that referenced this issue Sep 26, 2024
This commit allows sidecar to continue to serve prometheus read path if objstore is not available at startup.
Bucket creation will be attempted again on next upload.

This option is disabled by default and can be enabled by passing
argument "--shipper.retry-init".

This commit also brings a new metric to alert in case of bucket initialization crash: thanos_sidecar_shipper_up

Signed-off-by: Amaury Decrême <amaury.decreme@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants