Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: Thanos receive hangs on SIGINT #959

Closed
FUSAKLA opened this issue Mar 22, 2019 · 3 comments
Closed

receive: Thanos receive hangs on SIGINT #959

FUSAKLA opened this issue Mar 22, 2019 · 3 comments

Comments

@FUSAKLA
Copy link
Member

FUSAKLA commented Mar 22, 2019

Thanos, Prometheus and Golang version used
Thanos 3.2.0 (8e0e4dc master)

What happened
Started the thanos receive and it came up ok but when I hit ctrl + c sending SIGINT to the process it catches the signal logs that it's exiting but hangs forever. I managed to kill it only with sending SIGKILL to it.

What you expected to happen
Thanos receive to exit

How to reproduce it (as minimally and precisely as possible):

./thanos receive
...
# hit ctrl + c

Full logs to relevant components

$ ./thanos receive --log.level=debug
level=warn ts=2019-03-22T05:38:21.199649677Z caller=receive.go:70 component=receive msg="setting up receive; the Thanos receive component is EXPERIMENTAL, it may break significantly without notice"
level=debug ts=2019-03-22T05:38:21.199763843Z caller=receive.go:108 component=receive msg="setting up endpoint readiness"
level=debug ts=2019-03-22T05:38:21.199797018Z caller=receive.go:135 component=receive msg="setting up tsdb"
level=debug ts=2019-03-22T05:38:21.199831265Z caller=receive.go:168 component=receive msg="setting up metric http listen-group"
level=debug ts=2019-03-22T05:38:21.199987382Z caller=receive.go:173 component=receive msg="setting up grpc server"
level=debug ts=2019-03-22T05:38:21.200026412Z caller=receive.go:213 component=receive msg="setting up receive http handler"
level=info ts=2019-03-22T05:38:21.200056478Z caller=receive.go:228 component=receive msg="starting receiver"
level=info ts=2019-03-22T05:38:21.200219897Z caller=receive.go:141 component=receive msg="starting TSDB ..."
level=info ts=2019-03-22T05:38:21.200804198Z caller=main.go:309 component=receive msg="Listening for metrics" address=0.0.0.0:10902
level=info ts=2019-03-22T05:38:21.20094471Z caller=handler.go:137 component=receive component=receive-handler msg="Start listening for connections" address=0.0.0.0:19291
level=info ts=2019-03-22T05:38:21.212942345Z caller=receive.go:151 component=receive msg="tsdb started"
level=info ts=2019-03-22T05:38:21.215812628Z caller=receive.go:125 component=receive msg="server is ready to receive web requests."
level=info ts=2019-03-22T05:38:21.21597191Z caller=main.go:257 component=receive msg="disabled TLS, key and cert must be set to enable"
level=info ts=2019-03-22T05:38:21.216062437Z caller=receive.go:201 component=receive msg="listening for StoreAPI gRPC" address=0.0.0.0:10901


^Clevel=info ts=2019-03-22T05:46:59.560571416Z caller=main.go:193 msg="caught signal. Exiting." signal=interrupt
level=warn ts=2019-03-22T05:46:59.561985668Z caller=runutil.go:107 component=receive msg="detected close error" err="store gRPC listener: close tcp [::]:10901: use of closed network connection"
Killed

Environment:

  • OS
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel:
Linux fusakla 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

But can reproduce in upstream debian:stretch docker image

@alexdepalex
Copy link

I'm having the same issue with stopping compact from systemd. SIGTERM is caught, but the process doesn't exit and just hangs. Systemd times out after 120 and I need SIGKILL to get rid of the process. Something I'd rather avoid. Running thanos-0.3.2.linux-amd64 on SLES12SP2.

Mar 28 15:08:39 node1 thanos[30792]: level=info ts=2019-03-28T14:08:39.722946127Z caller=main.go:192 msg="caught signal. Exiting." signal=terminated
Mar 28 15:10:39 node1 systemd[1]: thanos-compact.service: State 'stop-sigterm' timed out. Skipping SIGKILL.

@FUSAKLA
Copy link
Member Author

FUSAKLA commented Mar 28, 2019

I'm fixing it as a side-effect in this PR #656

@squat
Copy link
Member

squat commented Sep 30, 2019

This was fixed by: #1231

@brancz brancz closed this as completed Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants