You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The problem I've encountered is the opencensus-shim, when used with the javaagent, misbehaves and doesn't work the same when the javaagent is not present (I'd go so far as to say: it breaks).
tl;dr: due to the otel javaagent's internals and resulting requirements for properly bridging application and agent Contexts, the very specific ApplicationSpanImpl is required to affect changes in the stored Context. This is something integrations like opencensus-shim do not handle correctly, but otherwise correctly conform to the otel interfaces and API requirements. The result is unexpected -- and on its face, unexplainable -- breakage when the javaagent is used.
I've created an instrumentation that addresses this problem, and even wires up a "baseline" test without the java agent to prove out the situation. However, the solution is nothing that pleases me as it requires the use of reflection directly, and since the java agent works just fine as-is, this felt like more of an "enhancement" than a bug...but I'm torn:
What I outline below is the direct result of the otel instrumentation code placing an implicit, undocumented (afaict anyway) requirement that a specific type of io.opentelemetry.trace.Spanbe passed back to it for normal operation (details in Additional Context), in a world where the interfaces would -- and in most cases I've encountered do -- suffice. Add to that, there's nothing patently wrong with the way integrations like opencensus-shim behave when they conform to the interfaces and work as expected when the javaagent isn't present.
And while that's certainly what I've come to see as a good indicator of "instrumentation required here", in this case it's instrumenting the instrumented API (opencensus-shim) and its associated wrappers themselves. There's no interface to determine if anything is being wrapped and there's no requirement for the Span interface, for example -- in this case OpenTelemetrySpanImpl, specifically -- to delegate all its otel interface methods to the wrapped Span. And in this specific scenario, that would avert this issue entirely if it did so.
So what's the answer? Do we:
instrument the code as I've done (or in some other way), messy as it sometimes can be, ever struggling against moving targets without interfaces to adequately inform the instrumentors on how/when to properly delegate/unwrap, etc.?
OR, impose/implement the change in the upstream (in this case, opencensus-shim) to delegate all the methods to the wrapped object despite the sufficient conformance to the documented API?
I suppose the latter could have accompanying doc authored for the case and would be preferable when it's under direct control, but is that reasonably always going to be the case?
And, thinking further: I haven't looked too closely at opentracing-shim, but at a glance it looks like it may also fall victim to this problem as well? It's unfortunately a fair bet, and it makes me wonder: are there more out there or otherwise waiting to happen?
Describe the solution you'd like
If I had to pick, on balance, I'd go with option B above: implement the delegation changes in the upstream (in this case, specifically OpenTelemetrySpanImpl in opencensus-shim) and accompany it with documentation/guidance instructing how and when to delegate the calls as part of the standard API.
But, I'm here because I'm definitely interested in what anyone else thinks about this problem, if they've encountered it, and how to solve it. (Or heck: am I completely off my rocker here and this isn't a problem?)
Describe alternatives you've considered
I considered instrumentation of course, as I wrote one to at least solve the acute problem, but I honestly struggled writing it (with an undesirable result no less) for a few reasons, the sum of which weighs it heavily out of favor in this case for me personally:
writing wrappers for OpenTelemetrySpanImpl, for example, proved difficult (impossible?) because the interface shares that of the javaagent classes io.opentelemetry... and not application.io.opentelemetry...: bridging that gap did not seem like a straight line to me even after looking at the opentelemetry instrumentations
creating method overrides would have been necessary here (for OpenTelemetrySpanImpl certainly) and isn't something the byte buddy Advice method transformers can do, and getting the heavier-weight transformer to work proved challenging (the muzzle guide also dissuaded me heavily from this approach, as well)
the final, shortest, simplest result I could achieve using Advice relies on reflection directly and is itself brittle
Additional context
The is a detailed accounting of the problem and the path one needs to follow to see it (from a cold start like I did).
Click here to expand...it's rather long
I tracked the issue down to the following, detailed for posterity and newcomers like myself:
As part of javaagent instrumentation, all context management is brokered by AgentContextStorage and in turn all context manipulation is brokered by AgentContextWrapper
When performing actions such as Span#storeInContext, the call tree follows through whichever Context instance and, barring some intense overriding of methods, ultimately lands in AgentContextWrapper#with
When that happens, the incoming Span is subjected to this test, and in the case of opencensus-shim it fails but results in no errors or other meaningful output
It fails because in opencensus-shim, the incoming Span isn't an ApplicationSpan, but an OpenTelemtrySpanImpl which wraps the ApplicationSpan which was returned from calls to the various otel-instrumented API methods
In my specific case, I created an Otel Span and attempted to use it as the parent of an opencensus span (implicitly occurring via using the GCP Spanner client + opencensus-shim on my classpath):
I created an otel span and made it current
I then called the spanner client methods to start executing a read against a table which constructs its own spans
At this point, opencensus-shim starts building a new Span to track its work, using io.opencensus.trace.Tracer#spanBuilder which uses the current Context as the parent:
it correctly pulled the current otel Context out of the instrumented AgentContextStorage as was expected (via OpenTelemetryContextManager), and wrapped it in an oc-shim OpenTelemetryCtx
through the call chain to obtain an opencensusSpan, this OpenTelemetryCtx was then used to obtain an OpenTelemetrySpanImpl via call to SpanConverter, which is done here
⚠️ this technically fails due to the aforementioned, but it works because the current context happens to be the parent: a near miss
the OpenTelemetrySpanImpl is then constructed, and all is well so far: the span and trace ids are accurate, and the parent is assigned from Context#current()
This newly-created Span is then attempted to be scoped and made current via Tracer#withSpan, and this is where the breakdown occurs:
as we work down the Tracer#withSpan call tree, we encounter ContextHandleUtils#withValue inside the ScopeInSpan class which in turn calls the opencensus-shim's OpenTelemetryContextManager#withValue method
The result: the current Span is never assigned as current and calls to Context#current continue to return the parent context created outside and before the calls to the opencensus-shim realm.
The effect: upon ending spans and closing scopes, it's possible -- and indeed happened in my case -- that the otel Span sitting atop the opencensus trace tree is prematurely closed. Numerous knock-on effects happen at that point too, of course, and are interesting but altogether moot: with that top Span closed, anything up to nearly everything suddenly becomes invalid or inaccurate.
As part of writing a new instrumentation for an as-yet-unsupported framework (finagle), I was testing against some GCP clients I've worked with and the local emulators (in my case, Spanner) and discovered opencensus-shim simply doesn't work as expected when the otel javaagent is loaded and running.
The text was updated successfully, but these errors were encountered:
I think so, yes. For posterity, the related issue is open-telemetry/opentelemetry-java#4900 (in opentelemetry-java, not this project, opentelemetry-java-instrumentation).
Is your feature request related to a problem? Please describe.
The problem I've encountered is the opencensus-shim, when used with the javaagent, misbehaves and doesn't work the same when the javaagent is not present (I'd go so far as to say: it breaks).
tl;dr: due to the otel javaagent's internals and resulting requirements for properly bridging application and agent
Context
s, the very specificApplicationSpanImpl
is required to affect changes in the storedContext
. This is something integrations likeopencensus-shim
do not handle correctly, but otherwise correctly conform to the otel interfaces and API requirements. The result is unexpected -- and on its face, unexplainable -- breakage when the javaagent is used.I've created an instrumentation that addresses this problem, and even wires up a "baseline" test without the java agent to prove out the situation. However, the solution is nothing that pleases me as it requires the use of reflection directly, and since the java agent works just fine as-is, this felt like more of an "enhancement" than a bug...but I'm torn:
What I outline below is the direct result of the otel instrumentation code placing an implicit, undocumented (afaict anyway) requirement that a specific type of
io.opentelemetry.trace.Span
be passed back to it for normal operation (details in Additional Context), in a world where the interfaces would -- and in most cases I've encountered do -- suffice. Add to that, there's nothing patently wrong with the way integrations likeopencensus-shim
behave when they conform to the interfaces and work as expected when the javaagent isn't present.And while that's certainly what I've come to see as a good indicator of "instrumentation required here", in this case it's instrumenting the instrumented API (
opencensus-shim
) and its associated wrappers themselves. There's no interface to determine if anything is being wrapped and there's no requirement for the Span interface, for example -- in this caseOpenTelemetrySpanImpl
, specifically -- to delegate all its otel interface methods to the wrappedSpan
. And in this specific scenario, that would avert this issue entirely if it did so.So what's the answer? Do we:
opencensus-shim
) to delegate all the methods to the wrapped object despite the sufficient conformance to the documented API?I suppose the latter could have accompanying doc authored for the case and would be preferable when it's under direct control, but is that reasonably always going to be the case?
And, thinking further: I haven't looked too closely at
opentracing-shim
, but at a glance it looks like it may also fall victim to this problem as well? It's unfortunately a fair bet, and it makes me wonder: are there more out there or otherwise waiting to happen?Describe the solution you'd like
If I had to pick, on balance, I'd go with option B above: implement the delegation changes in the upstream (in this case, specifically
OpenTelemetrySpanImpl
inopencensus-shim
) and accompany it with documentation/guidance instructing how and when to delegate the calls as part of the standard API.But, I'm here because I'm definitely interested in what anyone else thinks about this problem, if they've encountered it, and how to solve it. (Or heck: am I completely off my rocker here and this isn't a problem?)
Describe alternatives you've considered
I considered instrumentation of course, as I wrote one to at least solve the acute problem, but I honestly struggled writing it (with an undesirable result no less) for a few reasons, the sum of which weighs it heavily out of favor in this case for me personally:
OpenTelemetrySpanImpl
, for example, proved difficult (impossible?) because the interface shares that of the javaagent classesio.opentelemetry...
and notapplication.io.opentelemetry...
: bridging that gap did not seem like a straight line to me even after looking at the opentelemetry instrumentationsAdditional context
The is a detailed accounting of the problem and the path one needs to follow to see it (from a cold start like I did).
Click here to expand...it's rather long
I tracked the issue down to the following, detailed for posterity and newcomers like myself:
Span#storeInContext
, the call tree follows through whicheverContext
instance and, barring some intense overriding of methods, ultimately lands in AgentContextWrapper#withSpan
is subjected to this test, and in the case ofopencensus-shim
it fails but results in no errors or other meaningful outputIt fails because in
opencensus-shim
, the incomingSpan
isn't anApplicationSpan
, but an OpenTelemtrySpanImpl which wraps theApplicationSpan
which was returned from calls to the various otel-instrumented API methodsIn my specific case, I created an Otel Span and attempted to use it as the parent of an opencensus span (implicitly occurring via using the GCP Spanner client + opencensus-shim on my classpath):
opencensus-shim
starts building a new Span to track its work, usingio.opencensus.trace.Tracer#spanBuilder
which uses the currentContext
as the parent:Context
out of the instrumentedAgentContextStorage
as was expected (via OpenTelemetryContextManager), and wrapped it in an oc-shimOpenTelemetryCtx
opencensus
Span
, thisOpenTelemetryCtx
was then used to obtain anOpenTelemetrySpanImpl
via call toSpanConverter
, which is done hereOpenTelemetrySpanBuilderImpl
where the parent is attempted to be assigned to the current context, in effect creating that parent relationship and creating an identity for the current spanOpenTelemetrySpanImpl
is then constructed, and all is well so far: the span and trace ids are accurate, and the parent is assigned fromContext#current()
Tracer#withSpan
, and this is where the breakdown occurs:Tracer#withSpan
call tree, we encounterContextHandleUtils#withValue
inside theScopeInSpan
class which in turn calls theopencensus-shim
'sOpenTelemetryContextManager#withValue
methodThe result: the current
Span
is never assigned as current and calls toContext#current
continue to return the parent context created outside and before the calls to theopencensus-shim
realm.The effect: upon ending spans and closing scopes, it's possible -- and indeed happened in my case -- that the otel Span sitting atop the opencensus trace tree is prematurely closed. Numerous knock-on effects happen at that point too, of course, and are interesting but altogether moot: with that top Span closed, anything up to nearly everything suddenly becomes invalid or inaccurate.
As part of writing a new instrumentation for an as-yet-unsupported framework (finagle), I was testing against some GCP clients I've worked with and the local emulators (in my case, Spanner) and discovered opencensus-shim simply doesn't work as expected when the otel javaagent is loaded and running.
The text was updated successfully, but these errors were encountered: