Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demonstrate span-event-to-event, event-to-span-event bridging #6650

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jack-berg
Copy link
Member

A demonstration of how bridging between span events and events can work today. This is meant to illustrate what's possible without any changes, and to illustrate problems.

Some things to note:

  • Implemented LogRecordProcessor to bridge event API events to span events
  • Implemented SpanProcessor to bridge span events to event API events
  • event.name has stricter requirements than span event names. event.name can be easily mapped to span event name, but converting span event name to event.name requiring some prefixing to conform.
  • The event API makes liberal use of log record Any types for value, and puts key/value pairs of arbitrary complexity in the body. Right now, the only thing we can do with an event payload is put a string encoding of it in the span event body. For this to be acceptable, we need to spec out semantics for that string encoding (probably just JSON). A better option would be to extend span event with an Any value field matching log record body.
  • Not all events can be recorded as span events. This is actually a good thing, since limiting events only to tracing scenarios is overly restrictive. But it means we need to check that there is an active span being recorded before bridging the events. When events aren't bridged, the user will want to know why. I solved this with a counter instrument tracking the processing results.
  • Need conventions about how to bridge top level event fields like severity, severity_text.

Check out a few examples:

cc @open-telemetry/semconv-event-approvers

@jack-berg jack-berg requested a review from a team August 16, 2024 16:11
@jack-berg jack-berg marked this pull request as draft August 16, 2024 16:21
Copy link

codecov bot commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 83.72093% with 14 lines in your changes missing coverage. Please review.

Project coverage is 90.05%. Comparing base (ad120a5) to head (ea9c442).

Files Patch % Lines
...opentelemetry/sdk/logs/EventToSpanEventBridge.java 79.24% 7 Missing and 4 partials ⚠️
...pentelemetry/sdk/trace/SpanEventToEventBridge.java 90.90% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6650      +/-   ##
============================================
- Coverage     90.05%   90.05%   -0.01%     
- Complexity     6275     6295      +20     
============================================
  Files           697      699       +2     
  Lines         18944    19030      +86     
  Branches       1858     1865       +7     
============================================
+ Hits          17060    17137      +77     
- Misses         1307     1314       +7     
- Partials        577      579       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jack-berg!

So, this looks like doing bridging as data is recorded, which has advantages for those who can control the instrumentation sdk vs attempting to do this at the collector tier. If you can add this LogRecordProcessor first, we can adjust data.

Quick q and sorry if I should know this. Is there already code here to basically skip other processors when we've handled the event? e.g. not also record the same as a log event.

processedCounter.add(1, SPAN_NOT_RECORDING);
return;
}
currentSpan.addEvent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, just speaking aloud. This works because the event api is involved directly (e.g. not intermediated through a queue or otherwise). So, it has access to whatever the current span is, which would match the trace and span id in the event, if someone added the span context.

In other words, we could if we wanted to, assert logRecordData.getSpanContext() is the same as the current span context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly. LogRecordProcessors are called synchronously from the point where the log was emitted in the application. This means its important for them to be fast and non-blocking, but grants access to the current context, including the active span.

@jack-berg
Copy link
Member Author

Quick q and sorry if I should know this. Is there already code here to basically skip other processors when we've handled the event? e.g. not also record the same as a log event.

No there isn't code for that and I think its a gap in the log / event SDK design.

Currently, there are two sources of data for the log SDK:

  1. Log appenders for log4j, logback, etc bridge log records recorded via their respective APIs. These logs are probably already logged to console, files or some other local location. The user is probably configuring the log bridge to export logs via OTLP to a network location. Fair enough. Makes sense.
  2. Event API records data directly from instrumentation. These events SHOULD probably go to logback, log4j, etc so they can be logged to console, files or some other local location. But this tooling doesn't exist today, which means that event API is only good for exporting to a network location like OTLP. For bridging from event API -> log sdk -> logback, log4j, etc to be successful, we need to be able to avoid cycles. Need to differentiate between log records recorded via bridge (which should not be bridged back to logback, log4j, etc) and those from event API (which should go back to logback, log4j, etc).

Couldn't find a spec issue for this so opened: open-telemetry/opentelemetry-specification#4189

@codefromthecrypt
Copy link
Contributor

thanks for the help, @jack-berg. the advice was spot-on and appreciate your opening an issue on the gap!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants