Skip to content

Record span-ending exceptions as span attributes instead of span event or log #4429

@alexmojaki

Description

@alexmojaki

Somewhat continuing from #4333 (comment)

My understanding is that:

  • The plan is to get rid of (or at least deprecate, which is close) span events and this is unlikely to change.
  • Since exceptions on spans are recorded as span events, the current plan for that is to instead record them as logs that are children of the span.

My proposal is that when a span ends with an exception, the usual attributes exception.type/message/stacktrace should be recorded directly on the span instead of creating a child log. Ways this could look:

  • Allow passing an exception object to Span.End
  • Change Span.RecordException to record attributes on the span instead of creating an event. This assumes that Span.RecordException is only called when the span is ending, which IIUC is now the recommendation.

Reasons to use exception.* span attributes instead of a child log:

  1. It's not clear to me how Span.RecordException will work in the SDK if it has to create a log. Will creating a tracer provider require passing a logging provider?
  2. Backends won't need to support logs to support working with exceptions on spans.
  3. Users won't have to join multiple records in a query to e.g. find spans which match some predicate and had an exception.
  4. Very simple to implement, document, and understand.
  5. When a single exception bubbles through multiple nested spans, by default each one would create a child log and would have no way of knowing that this would be redundant. This results in the tree of spans and logs looking very ugly:
span1:
    span2:
        span3:
            exception-log3
        exception-log2
    exception-log1
  1. https://opentelemetry.io/docs/specs/semconv/general/recording-errors/#recording-exceptions recommends recording some information directly on the span, but:
    1. It doesn't recommend recording exceptions that are handled, especially because the exception message is meant to be recorded as the status description, which requires setting the status code to error. However it's useful to record exceptions that ended the span even if they're expected to be handled and thus the span should not be marked as an error. This is even acknowledged in the proposal to record exceptions as logs:
      Note: some frameworks use exceptions as a communication mechanism when a request fails. For example,
      Spring users can throw a [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)
      exception to return an unsuccessful status code. Such exceptions represent errors already handled by the application code.
      Application code, in this case, is expected to log this at the appropriate severity.
      General-purpose instrumentation MAY record such errors, but at a severity not higher than `Warn`.
    2. error.type and the status description are very generic places to put information, so in some situations they would be better used by something other than the exception. For example, an HTTP client for a specific service might raise a generic APIError which is recorded as exception.type on the span, while the more specific error code contained within could be used for error.type. In general, error.type and the status description shouldn't be overridden when the already present, but that shouldn't mean that the precise technical exception data gets discarded.
    3. There's currently no way to know if error.type represents the type of an exception that ended a span, or something else. Same with the span status description.
    4. This isn't clear to users. For example, a user may see the error.type attribute on a span and expect to find an attribute like error.message next to it. They have no way of knowing if the exception message is anywhere to be found in the span data if they don't know where to look.
  2. There's currently no apparent way to know whether or not a span ended with an exception. As mentioned above, error.type and the span status are ambiguous. The presence of a child log with an exception (which is nontrivial to check) isn't conclusive either, because the exception could have been handled and the span ended normally. exception.* attributes can solve this.

Responses to possible objections:

  • What about exceptions that are handled within the span and don't end it?
    • I think the plan to use logs in that situation is fine. I see it as different because:
      • If users want to log such exceptions then the implication is that they're interesting events in their own right and would likely be worth logging even without the parent span, and OTel doesn't seem to have a good alternative recommendation for that. A span-ending exception is much more like a property of the span itself.
      • If there are 1000 such exceptions, putting them all inside the span somehow is likely to be a problem.
      • The timestamp of a handled exception is potentially much more interesting, since it's not necessarily at the end of the span.
  • What about exceptions that aren't errors, i.e. are expected to be handled outside the span? Logs can have a severity to indicate this.
    • An attribute like exception.severity can deal with this.
  • Stacktraces can be very big.
    • If the size of a stacktrace is problematic in a span attribute, it will probably be problematic in a log too.
  • Logs can be configured to e.g. not record certain exceptions.
    • Generically reusable wrapper tracer providers can be created to make this kind of configuration easy.

Metadata

Metadata

Assignees

Labels

sig-issueA specific SIG should look into this before discussing at the spec

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions