-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added logs for reasons causing connection and transport close #5840
Conversation
9382de1
to
cfba279
Compare
internal/transport/controlbuf.go
Outdated
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) { | |||
select { | |||
case <-c.ch: | |||
case <-c.done: | |||
if logger.V(logLevel) { | |||
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transports context expired.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ErrConnClosing
significant here? If not, can we instead return a meaningful error and capture all the exit reasons via a single log after loopy returns (which IIUC is already there)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like ErrConnClosing is used as a conditional/invariant anywhere in the system. Shoving this message/information into Error and logging at loopyWriter level sgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/transports/transport's/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done @easwars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm blind, and ErrConnClosing is used as a conditional in many many places at the control buf layer. Leaving this as is for now, unless you have a better suggestion.
internal/transport/controlbuf.go
Outdated
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) { | |||
select { | |||
case <-c.ch: | |||
case <-c.done: | |||
if logger.V(logLevel) { | |||
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transports context expired.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/transports/transport's/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments y'all!
internal/transport/controlbuf.go
Outdated
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) { | |||
select { | |||
case <-c.ch: | |||
case <-c.done: | |||
if logger.V(logLevel) { | |||
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transports context expired.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done @easwars.
internal/transport/controlbuf.go
Outdated
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) { | |||
select { | |||
case <-c.ch: | |||
case <-c.done: | |||
if logger.V(logLevel) { | |||
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transports context expired.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm blind, and ErrConnClosing is used as a conditional in many many places at the control buf layer. Leaving this as is for now, unless you have a better suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These might be good reads for this PR:
https://google.github.io/styleguide/go/best-practices#error-handling
https://google.github.io/styleguide/go/best-practices#logging-errors
clientconn.go
Outdated
@@ -1274,6 +1274,9 @@ func (ac *addrConn) createTransport(addr resolver.Address, copts transport.Conne | |||
|
|||
newTr, err := transport.NewClientTransport(connectCtx, ac.cc.ctx, addr, copts, onGoAway, onClose) | |||
if err != nil { | |||
if logger.V(2) { | |||
logger.Errorf("NewClientTransport failed to connect to %s. Err: %v", addr, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that it is better to avoid filler words like "failed" etc in error strings and it leads to better/concise error strings. This could be written as:
logger.Errorf("NewClientTransport(%q): %v", addr, err)
or if you prefer ,ore human readable names over function names
logger.Errorf("Creating new client transport to %q: %v", addr, err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like human readable for the high level concept rather than the specific implementation detail (and that exported func just forwards call to internal method). Switched to second option. Thanks for suggestion.
internal/transport/controlbuf.go
Outdated
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) { | |||
select { | |||
case <-c.ch: | |||
case <-c.done: | |||
if logger.V(logLevel) { | |||
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transport's context expired.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this is appropriate at INFO
level or is WARNING
more appropriate?
See: https://github.com/grpc/grpc-go/blob/master/Documentation/log_levels.md
Same for other logs in this file/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, ErrConnClosing was being returned to the run() goroutine of loopy, which would trigger the exit with an info log. However, I am changing this to plumb data into ErrConnClosing (or a certain type), and will log the error message plumbed in (i.e. I will put the message from this into the error) because ErrConnClosing has nothing in it currently. See https://github.com/grpc/grpc-go/blob/master/internal/transport/controlbuf.go#L536 for how it currently works on master.
internal/transport/handler_server.go
Outdated
func (ht *serverHandlerTransport) Close() { | ||
func (ht *serverHandlerTransport) Close(err error) { | ||
if logger.V(logLevel) { | ||
logger.Infof("closing serverHandlerTransport due to err: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. A concise error message could look something like
logger.Infof("Closing server handler transport: %v", err)
Also, note that log messages should start with a capitalized string unlike error messages. Applies here and everywhere else in this PR (unless ofcourse where it starts with the package name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to concise and capitalized beginning. NEED TO GET TO OTHER ONES WHICH REQUIRE CAPITALIZED STRING.
internal/transport/handler_server.go
Outdated
@@ -236,7 +239,7 @@ func (ht *serverHandlerTransport) WriteStatus(s *Stream, st *status.Status) erro | |||
}) | |||
} | |||
} | |||
ht.Close() | |||
ht.Close(fmt.Errorf("finished writing status")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finished writing status
.
Why is this an error?
Is this error propagated to the user? Or is this used by another component, and if so how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this felt a bit weird to me too. As part of this PR, Doug wanted a general point we log errors that close transport to be in the Close() method on the Client Transport. To avoid my multiple code additions on the server transport at each Close() call (which didn't take an error server side) we decided to log the error/transport is closing in the Close() call server side, which required adding err to API. This was passed simply so that the transport could have a reason to explain why it closed. I'll switch this to errors.New...and if you or Doug prefers this or another way (even sending a nil error) I can switch it.
internal/transport/handler_server.go
Outdated
@@ -346,7 +349,7 @@ func (ht *serverHandlerTransport) HandleStreams(startStream func(*Stream), trace | |||
case <-ht.req.Context().Done(): | |||
} | |||
cancel() | |||
ht.Close() | |||
ht.Close(fmt.Errorf("request is done processing")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Why is the fact that you are done processing with the request an error message. What value does it provide and to whom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same response as previous comment.
internal/transport/http2_client.go
Outdated
t.Close(connectionErrorf(true, nil, "keepalive ping failed to receive ACK within timeout")) | ||
err := connectionErrorf(true, nil, "keepalive ping failed to receive ACK within timeout") | ||
if logger.V(logLevel) { | ||
logger.Errorf("transport: closing transport due to error in keepalive handling: %v.", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you actually look at how this log statement shows up in practice? Should you be avoiding some duplicate information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I was supposed to delete this, I switched all these logs to log in Close with the data in the error passed in at beginning of Close for consolidation/consistency/debugability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments/pass :D!
internal/transport/http2_client.go
Outdated
@@ -447,7 +453,7 @@ func newHTTP2Client(connectCtx, ctx context.Context, addr resolver.Address, opts | |||
err := t.loopy.run() | |||
if err != nil { | |||
if logger.V(logLevel) { | |||
logger.Errorf("transport: loopyWriter.run returning. Err: %v", err) | |||
logger.Errorf("transport: loopyWriter.run returned. Err: %v. Closing connection.", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loopyWriter returned? loopyWriter exited? loopyWriter finished? I went with third option. Line switched to: logger.Errorf("transport: loopyWriter exited. Closing connection. Err: %v", err).
internal/transport/http2_client.go
Outdated
t.Close(connectionErrorf(true, nil, "keepalive ping failed to receive ACK within timeout")) | ||
err := connectionErrorf(true, nil, "keepalive ping failed to receive ACK within timeout") | ||
if logger.V(logLevel) { | ||
logger.Errorf("transport: closing transport due to error in keepalive handling: %v.", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I was supposed to delete this, I switched all these logs to log in Close with the data in the error passed in at beginning of Close for consolidation/consistency/debugability.
clientconn.go
Outdated
@@ -1274,6 +1274,9 @@ func (ac *addrConn) createTransport(addr resolver.Address, copts transport.Conne | |||
|
|||
newTr, err := transport.NewClientTransport(connectCtx, ac.cc.ctx, addr, copts, onGoAway, onClose) | |||
if err != nil { | |||
if logger.V(2) { | |||
logger.Errorf("NewClientTransport failed to connect to %s. Err: %v", addr, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like human readable for the high level concept rather than the specific implementation detail (and that exported func just forwards call to internal method). Switched to second option. Thanks for suggestion.
internal/transport/http2_client.go
Outdated
@@ -951,6 +957,9 @@ func (t *http2Client) Close(err error) { | |||
t.mu.Unlock() | |||
return | |||
} | |||
if logger.V(logLevel) { | |||
logger.Infof("closing transport due to err: %v, will close the connection.", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to: logger.Infof("Closing transport, will close the connection. Err: %v", err). See above comment for rationale of "Err:".
internal/transport/controlbuf.go
Outdated
@@ -416,6 +416,9 @@ func (c *controlBuffer) get(block bool) (interface{}, error) { | |||
select { | |||
case <-c.ch: | |||
case <-c.done: | |||
if logger.V(logLevel) { | |||
logger.Infof("transport: trigger loopy to exit (with a subsequent conn closure) because transport's context expired.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, ErrConnClosing was being returned to the run() goroutine of loopy, which would trigger the exit with an info log. However, I am changing this to plumb data into ErrConnClosing (or a certain type), and will log the error message plumbed in (i.e. I will put the message from this into the error) because ErrConnClosing has nothing in it currently. See https://github.com/grpc/grpc-go/blob/master/internal/transport/controlbuf.go#L536 for how it currently works on master.
… switched some errors from ErrConnClosing to errors with information inside
Do we need a release note for logs? |
internal/transport/http2_client.go
Outdated
@@ -951,6 +954,9 @@ func (t *http2Client) Close(err error) { | |||
t.mu.Unlock() | |||
return | |||
} | |||
if logger.V(logLevel) { | |||
logger.Infof("Closing transport, will close the connection. Err: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would end up as something like:
Closing transport, will close the connection. Err: finished serving streams for the server transport.
Isn't this enough:
Closing transport: finished serving streams for the server transport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point of this PR was to see when conn closes. Are we trying to say that since transport Close closes the connection the reader of the log should know that? I feel like it's fine to mention that it closes Conn, just like we log conn Closure when loopy returns (we don't just lump this conn.Close() operation in the log that loopyWriter returns).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing transport, will close the connection: %v
internal/transport/http2_server.go
Outdated
@@ -333,7 +334,7 @@ func NewServerTransport(conn net.Conn, config *ServerConfig) (_ ServerTransport, | |||
t.loopy.ssGoAwayHandler = t.outgoingGoAwayHandler | |||
if err := t.loopy.run(); err != nil { | |||
if logger.V(logLevel) { | |||
logger.Errorf("transport: loopyWriter.run returning. Err: %v", err) | |||
logger.Infof("transport: loopyWriter exited. Closing connection. Err: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we use the term transport
to mean a connection. If you want to distinguish between the transport
type and the underlying net.Conn
, maybe you can be a more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked about these in chat. These are logically separate things (the transport and conn) and it's important to have a distinction, because this closes the connection, not the transport. I think Doug had suggestion to switch this to net.Conn or conn so maybe I'll chose one of those.
internal/transport/http2_server.go
Outdated
t.mu.Lock() | ||
if t.state == closing { | ||
t.mu.Unlock() | ||
return | ||
} | ||
if logger.V(logLevel) { | ||
logger.Infof("Closing transport, will close the connection. Err: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about the log message being too verbose. Concise messages (without compromising on the information to be conveyed) will help greatly during debugging. So, please do consider how these logs will show up. Run a test or two and see how these show up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Infof("Closing transport, will close the connection: %v", err). See http2_client comment for my reasoning.
Please also fix the PR title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments everyone :D!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments!
@@ -236,7 +239,7 @@ func (ht *serverHandlerTransport) WriteStatus(s *Stream, st *status.Status) erro | |||
}) | |||
} | |||
} | |||
ht.Close() | |||
ht.Close(errors.New("finished writing status")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now spamming our logs, is it actually useful?
A flaky keepalive test is trying to read and write to a closed network connection. In some of the cases, the server was receiving too many pings, which led to one of the root causes of flakiness and the subsequent fix, but I couldn't figure out why the network connection was closing for any other case. Discussing this, the go team realized that the transport layer logs with respect to transport and connection closure were very sparse, and populating logs with information as to why transports and connections were triggered to close could help a user debug.
RELEASE NOTES: