New and improved kestrel.connection.duration
tags: error.type
and protocol error #56164
Description
Background and Motivation
See #53358. We want error.type
to provide a low-cardinality reason for why a connection was closed in error scenarios.
Proposed API
kestrel.connection.duration
is a histogram counter. A timer starts when a connection starts, and the timer ends and is recorded when the connection ends. Because of this, there is the opportunity to provide information about why the connection ended.
I propose two changes:
- Record the protocol error code for HTTP/2 and HTTP/3 on the connection duration metric.
- Tag is named
http.connection.protocol_error_code
. An issue to discuss on OTEL semantic conventions: Addhttp.connection.protocol_error_code
to connection duration metric open-telemetry/semantic-conventions#1135. If the tag likely won't be standardized, then it could be prefixed withkestrel.
. - The protocol error code is an unsigned integer.
- Is omitted if the code is NO_ERROR (http/2) or H3_NO_ERROR (http/3).
- Tag is present even if the server didn't physically send the error code to the client. For example, the transport for a HTTP connection closes unexpectedly and the server ends the connection with an error. The server internally says the connection ended with a specific error code, even if it doesn't have the opportunity to send to the client.
- Tag is named
- Kestrel keeps track of why a connection closes and sets the
error.type
tag to the close reason.error.type
isn't set for non-error reasons, e.g. the transport closing.- The
error.type
reason will mostly come from the HTTP layer, but HTTPS middleware can set a status if the connection failed the HTTPS handshake - If there is already an
error.type
value then it takes priority over the reason (basically, the first value set toerror.type
is what's used) - The connection end reasons that are set to
error.type
follow the standard OTEL naming standard for enums: snake case. e.g.app_shutdown
. - Errors that don't fall into one of the known connection end reasons have an
error.type
value of_OTHER
(similar to other OTEL enums that have a set range of values).
Usage Examples
Someone wants to monitor incoming HTTP connections to the server.
- Enable OTEL metrics collection for the Kestrel meter name.
- Export telemetry to telemetry store.
- Queries
kestrel.connection.duration
to seeerror.type
values.
Alternative Designs
The connection end reason could also always be set to its own tag, e.g. kestrel.connection.end_reason
. In that case, it could include non-error values. However, there are few non-error connection end reasons. And error reasons would also be set to error.type
. It doesn't seem valuable to me to have both tags when people are likely focused on connection errors.
kestrel.connection.end_reason
could be added in the future if there is demand for it.
Risks
Ensure that Kestrel tags match future OTEL semantic conventions around connections.