Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/googlecloud] Correctly translate exception-events from OpenTelemetry to Google Trace format #12885

Closed
Jaskaranbir opened this issue Aug 1, 2022 · 17 comments

Comments

@Jaskaranbir
Copy link

The OpenTelemtry spec provides info on exception-handling that is based adding exception-info as Event: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/exceptions.md

Currently, the exception-events passthrough as-is without any extra processing, which leads to following representation in Trace UI:
image
Notice that the stacktrace above is incomplete and has been character-limited by GCP, which significantly reduces its usefulness.

Meanwhile, referencing the following screenshot from: https://cloud.google.com/trace/docs/trace-labels
image
The above screenshot includes a specially formatted section for displaying call-stack, which is extremely helpful.

So it would be great if the exceptions exported by OpenTelemetry could be formatted to Trace's conventions to allow displaying them in correct format as achieved by using native Google.Diagnostic libraries. This seems to require a rather tricky stack-trace parsing (especially considering different languages will have their own stack-trace representation), but is there a possibility to develop some conventions to improve the feasibility of this (such as expecting special tags within spans/traces with segments of pre-parsed stack-trace)?

Just in case, .NET Core is my primary development language here and it seems to have fairly under-developed libraries for exporting Otel data to GCP, which is one of the major reasons for using OpenTelemetry Collector.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2022

Pinging code owners: @aabmass @dashpole @jsuereth @punya @tbarker25 @damemi. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth TylerHelmuth added the priority:p3 Lowest label Aug 5, 2022
@dashpole
Copy link
Contributor

dashpole commented Aug 5, 2022

The description of /stacktrace in the docs you reference looks like it only applies to the v1 API, while we use the v2 API:
JSON-formatted stack trace. Stack traces aren't indexed for search.This label is displayed in the Call Stack table.The /stacktrace label is only used by the V1 API.For the V2 API, the span contains a stackTrace field.

So it looks like we would actually want to support setting Span.StackTrace based on the attribute. Otherwise, this seems like it would be nice to have, but it looks like a hard problem to solve. We will probably keep it on the backlog for now, but we appreciate the feedback.

@Jaskaranbir
Copy link
Author

Thanks for the response @dashpole. I see that StackTrace constitutes the StackFrame here: https://github.com/googleapis/go-genproto/blob/01dd62135a58/googleapis/devtools/cloudtrace/v2/trace.pb.go#L1226

With my limited context, is it still as complex if the client can be expected to handle transforming raw stacktrace into stackframes object above, and pass the structured response via attribute? Even a rudimentary implementation of this would be quite useful.

@dashpole
Copy link
Contributor

dashpole commented Aug 8, 2022

I don't know if that would be a great contract for the exporter to have. It would be fairly difficult for clients to implement correctly, so I would prefer having something in the exporter. I'm wondering if starting with something as simple as "split the stacktrace into one attribute for each line" (e.g. exception.stacktrace.0, exception.stacktrace.1) would make this useable while we figure out how we want to handle this...

@Jaskaranbir
Copy link
Author

Jaskaranbir commented Aug 9, 2022

So it'll display in the TraceUI as multiple tags for now? That sounds good enough as starting point to me.

@dashpole
Copy link
Contributor

dashpole commented Aug 9, 2022

A teammate pointed me to https://cloud.google.com/trace/docs/quotas, which says we are limited to 32 attributes per-span. I'm worried that the multiple-attribute approach would hit that limit ~instantly, and just end up causing other attributes to be dropped...

@Jaskaranbir
Copy link
Author

Jaskaranbir commented Aug 9, 2022

The limit does make sense.

Referring back to a previous comment about having a defined contract for stacktrace, would it be an option to have provisions, based on some configurable options of the exporter, for allowing users to either format their own stacktrace or allowing them to pass a raw stacktrace and attempting to format it?

Since stacktrace format isnt standardized across languages and tools, I think the latter of the above options would be quite difficult to achieve. Although we can still have a best-attempt basis implementation that tries to format a raw stacktrace. But allowing users to define their own call-stack/stacktrace formats will be easier to implement and more flexible. Then in the future, if the latter option works out, the former option (of expecting a pre-structure stacktrace) can be deprecated.

Note: The auto-formatting raw stacktrace doesnt have to be implemented right now. I am really just referring to structuring the code in a way that allows moving towards that goal in future, but start with easier implementation instead that still allows using Trace to its ideal extent.

@dashpole
Copy link
Contributor

dashpole commented Aug 9, 2022

I could totally see that working for an SDK exporter (e.g. provide a string -> StackTrace function), but that sounds hard to do in the collector via yaml config.

@Jaskaranbir
Copy link
Author

I was thinking about an implementation where the formatted stacktrace is structured and added as a tag containing JSON data. The JSON structure could even be what V1 API uses for stacktraces:

    "stackTrace": {
      "stackFrames": {
        "frame": [
          {
            "functionName": {
              "value": "serverMethodTrace [as func]"
            },
            "fileName": {
              "value":
              "/usr/src/app/node_modules/@google-cloud/trace-agent/build/src/plugins/plugin-grpc.js"
            },
            "lineNumber": "249",
            "columnNumber": "28"
          },
          {
            "functionName": {
              "value": "anonymous function"
            },
            "fileName": {
              "value": "/usr/src/app/node_modules/grpc/src/server.js"
            },
            "lineNumber": "592",
            "columnNumber": "13"
          }
        ]
      }
    }

So the exporter receives a defined tag with this JSON structure and uses it to build the V2-API call-stack. On the configuration side for the exporter, there's option to either parse the data from this tag as pre-formatted JSON-stacktrace or the raw-stacktrace (for now, it can only support one option, that of pre-formatted stacktrace).

What are your thoughts on something like this?

@dashpole
Copy link
Contributor

It seems really hard for users to implement... I'll have to chew on it a bit.

Just for fun, can you try adding this in your exporter config:

trace:
    attribute_mappings:
    - key: exception.stacktrace
      replacement: /stacktrace

To see if the /stacktrace field still works in the v2 API?

@Jaskaranbir
Copy link
Author

Sorry, was busy in last few days. Though I think some major languages already seem to include stacktrace parsers to an extent. For example, Go, Ruby, Dotnet, JavaScript (albit with external packages, but there are lots available). It'll just be about mapping an object to another.

And considering this is an alternative implementation that can be deprecated anytime and is easy enough to implement, this seems good enough as an optional drop-in until better implementation.

Will test out the /stacktrace field with V2 API when I get chance (likely sometime this weekend), thanks for the idea!

@Jaskaranbir
Copy link
Author

Jaskaranbir commented Aug 29, 2022

Finally had the chance to try this out. Though adding the exception.stacktrace attribute didnt quite work, it just adds a /stacktrace label in UI with clipped stacktrace text (same result as issue description).

@dashpole any further thoughts on the intermediate workaround (#12885 (comment))

@dashpole
Copy link
Contributor

Not right now. If we did go that route it is likely something we would have to keep around (we wouldn't risk breaking users by deprecating it), so i'd like to give it more thought before moving forward.

@Jaskaranbir
Copy link
Author

Sounds good, keeping my fingers crossed for this 🤞

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 10, 2022
@dashpole dashpole removed the Stale label Nov 10, 2022
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 10, 2023
@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants