-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage node doesn't stop at the shutdown stage #1270
Comments
Okay, it was easier to reproduce than I thought: https://gist.github.com/alexvanin/f856d22de55e9e0f1b683bfbfb33e46b |
Have an issue with this approach: it is not obvious how we can stop So if we want to keep graceful shutdown, I suggest to have timeout here. done := make(chan struct{})
go func() {
srv.GracefulStop()
close(done)
}()
select {
case <-done:
case <-time.After(30 * time.Second):
srv.Stop()
} |
The problem is that we can't abort the steam due to timeout. This approach also introduces an overhead: go-routine, timer and select per message. So |
|
Haven't heard about app termination issues since then. We can close it until new issues appears. /cc @fyrchik |
GracefulStop() may be blocked until all server-side streams are finished. There is no control over such streams yet, so application may be frozen in shutdown stage. Naive solution is to add timeout for GracefulStop(). At this point healthy connection will be finished and unhealthy connections will be terminated by Stop(). Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Expected Behavior
When neo-go node closes connection with storage node, application changes the state to
SHUTTING_DOWN
and then process is finished.Current Behavior
Application is frozen at the
SHUTTING_DOWN
stage and process never finished.It seems that gRPC server for NeoFS API endpoint cannot be closed. We see
stopping gRPC server...
log but there is nogRPC server stopped successfully
message.In the routine dump we see that
grpc.GracefulStop()
has been locked by the mutex, because it waits for all remaning RPC methods to finish.This never happens, though, because there are threads that frozen in
Send()
We did introduce timeouts for requests, but there is no timeout for sending responses. This timeout cannot be implemented on neofs-api-go side straight away, because we use directly gRPC generated code to send server responses.
Possible Solution
grpc.GracefulStop()
with hardcoded timeout to close app anyway after a while.Steps to Reproduce (for bugs)
Sometimes it is reproduced in the main chain when remote neo-go node closes the connection.
Your Environment
goroutine-dump.txt
The text was updated successfully, but these errors were encountered: