Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix context propagation in Executor#execute() for non-capturing lambdas #9179

Merged

Conversation

mateuszrzeszutek
Copy link
Member

Fixes #9175

This is an interesting issue that only occurs when using a non-capturing lambda (or any other singleton/cached Runnable instance), and only when Executor#execute() is used (since all other ExecutorService methods typically wrap the passed task in a new FutureTask and delegate to execute() anyway).

Because a singleton Runnable instance was passed, the whole machinery in ExecutorAdviceHelper and TaskAdviceHelper that handles a stateful virtual field completely broke when trying to correctly update the PropagatedContext, and sometimes overwrote the propagated context with a null.

While this is a surgical fix that handles precisely the issue posted by the user, I believe we should rethink our executors instrumentation as a whole: I think instead of the complex and buggy state management we have right now we should switch to decoration whenever it's possible. While this will add a little bit of extra memory overhead, I think it's much safer that way.

@mateuszrzeszutek mateuszrzeszutek requested a review from a team August 10, 2023 10:48
@laurit
Copy link
Contributor

laurit commented Aug 10, 2023

While this is a surgical fix that handles precisely the issue posted by the user, I believe we should rethink our executors instrumentation as a whole: I think instead of the complex and buggy state management we have right now we should switch to decoration whenever it's possible. While this will add a little bit of extra memory overhead, I think it's much safer that way.

I think there was something that failed with the decorator approach. Some slick or other scala tests. If I remember correctly the issue was that it is possible to supply a custom work queue to ThreadPoolExecutor and they had used something that blew up on the wrapper types. I also believe that we should give using wrappers another go, perhaps we can detect when there is a custom queue and just not instrument these executors by default.

…etry/javaagent/bootstrap/executors/ContextPropagatingRunnable.java

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
// We wrap only lambdas' anonymous classes and if given object has not already been wrapped.
// Anonymous classes have '/' in class name which is not allowed in 'normal' classes.
// note: it is always safe to decorate lambdas since downstream code cannot be expecting a specific runnable implementation anyways
return task.getClass().getName().contains("/") && !(task instanceof ContextPropagatingRunnable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can handle the "lambda" situation, but how about the singleton task?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not fix that. We're planning to redesign the executors instrumentation and fix that issue across the board; but that might take a while and will not be included in the next release (which is in ~ 5 days).

@trask trask merged commit 32c5d4c into open-telemetry:main Aug 11, 2023
44 checks passed
breedx-splk pushed a commit to breedx-splk/opentelemetry-java-instrumentation that referenced this pull request Aug 15, 2023
…as (open-telemetry#9179)

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
@mateuszrzeszutek mateuszrzeszutek deleted the fix-non-capturing-lambda-execute branch August 28, 2023 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug or not?]Confusion logback traceid print
5 participants