Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!(active_job): Use ActiveSupport instead of patches #677

Merged

Conversation

arielvalentin
Copy link
Collaborator

@arielvalentin arielvalentin commented Sep 28, 2023

This PR changes the ActiveJob instrumentation to use ActiveSupport::Notifications instead of using Monkey Patches.

What's new?

  1. The instrumentation now generates new internal spans for most of the ActiveJob notifications like discard and retry_stopped.
  2. ActiveJob specific attributes are now namespaced under the messaging.active_job.*

Example Parent-Child propagation

image image

Example Link Propagation

image image

See #218

@arielvalentin
Copy link
Collaborator Author

Now that I have rewritten these to use AS Notifications I have been made aware of a few idiosyncratic things in rails 6 that makes it difficult to maintain parity with 7.

In Rails 6, perform_start is invoked prior to perform https://github.com/rails/rails/blob/v6.1.7.6/activejob/lib/active_job/instrumentation.rb#L13

The callbacks are called before the instrumentation events are invoked so they are not included in the timing
https://github.com/rails/rails/blob/v6.1.7.6/activejob/lib/active_job/execution.rb#L47

This is not ideal but in order to maintain correctness with 6.1 I will change this instrumentation to work with Rails 7+ and leave the original implementation in place for Rails 6.1.

Hopefully we have more luck with other framework components.

@arielvalentin arielvalentin force-pushed the active-job-instrumentation-redux branch 2 times, most recently from 3be71a3 to 4deba54 Compare October 11, 2023 01:45
@arielvalentin arielvalentin changed the title refactor(active_job): Use ActiveSupport instead of patches feat!(active_job): Use ActiveSupport instead of patches Oct 11, 2023
@arielvalentin arielvalentin force-pushed the active-job-instrumentation-redux branch from 4deba54 to 6f51c7c Compare October 11, 2023 02:01
@arielvalentin arielvalentin self-assigned this Oct 11, 2023
@arielvalentin arielvalentin marked this pull request as ready for review October 11, 2023 02:13
arielvalentin added a commit to arielvalentin/opentelemetry-ruby-contrib that referenced this pull request Oct 11, 2023
While working on open-telemetry#677,
I ran into an issue where the base instrumentation tracer was not being automatically upgraded
after the SDK initialization was complete.

I am not sure why this was implemented this way but I figured other instrumentations would need a reference to a proxy tracer in the future.

open-telemetry/opentelemetry-ruby#671
Copy link
Contributor

@kaylareopelle kaylareopelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still getting familiar with this code, so I will continue reviewing, but here's some low-level first-pass feedback.


appraise 'activejob-7.0' do
gem 'activejob', '~> 7.0.0'
%w[6.0.0 6.1.0 7.0.0 7.1.0].each do |version|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you leave 6.0 here on purpose, even though it's EOL'd?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keeping that's been merged thus far. The 6.0 was in a separate PR so I wanted to keep this the same until landing the other PR

instrumentation/active_job/README.md Outdated Show resolved Hide resolved
instrumentation/active_job/README.md Outdated Show resolved Hide resolved
@@ -11,6 +11,39 @@ module OpenTelemetry
module Instrumentation
# Contains the OpenTelemetry instrumentation for the ActiveJob gem
module ActiveJob
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of duplication between this file and the Rack instrumentation file. It might be a good future candidate for a helper gem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

I was of thinking of making a generic "ingress" span Context that would be instrumentation agnostic that way any child span is able to access the ingress span and enrich it.

So instead of saying Rack.current_span or ActiveJob.current_span it would be Ingress.current_span.

Comment on lines 41 to 49
tracer = OpenTelemetry.tracer_provider.tracer(ActiveJob.name, ActiveJob::VERSION)
mapper = Mappers::Attribute.new
config = ActiveJob::Instrumentation.instance.config
parent_span_provider = OpenTelemetry::Instrumentation::ActiveJob

# TODO, use delegation instead of inheritance
default_handler = Handlers::Default.new(tracer, parent_span_provider, mapper, config)
enqueue_handler = Handlers::Enqueue.new(tracer, parent_span_provider, mapper, config)
perform_handler = Handlers::Perform.new(tracer, parent_span_provider, mapper, config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our general pattern in instrumentation is to always reach for the tracer from the instrumentation when we need it rather than caching it in various helpers. I think the deviation from that code pattern here is the trigger for #688

Copy link
Collaborator Author

@arielvalentin arielvalentin Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yessir! This was intentional on my part in the hopes of starting a new trend 😄

I wanted to use dependency injection for everything that the handlers needed instead of using a service locator or a mix of the two.

Though we are hashing out some details in #688, do you think having this asymmetry is problematic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fbogsany did you see my comment about this in the other issue?

Is this a deal breaker?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the results of the conversation in #688, I have changed the handlers to use the service locator pattern and look up the tracer from the Instrumentation singleton to creating spans.

Copy link
Contributor

@bensheldon bensheldon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR, it looks correct. I'm excited to see more usage of Active Support Instrumentation. I'm assuming that once a few more components are instrumented, the common behavior will be extracted out into helpers and eventually slim this down.

btw, I'm the author of GoodJob; I also submitted the initial Active Job instrumentation for dd-trace-rb so I'm comparing and contrasting my experience implementing that to this code.

Of things I looked at:

  • I doublechecked the instrumentation names against my own lists, to make sure they were comprehensive.
  • I read through all the version-specific errata and couldn't think of any other weirdness (all the real weird weirdness I know of is in Rails <= 5.2, which isn't supported anyway)
  • I read through the tests and thought they poked at it in a good realistic / integrated fashion.

@simi
Copy link
Contributor

simi commented Oct 30, 2023

@bensheldon is there anything GJ specific to instrument in separate instrumentation?

@bensheldon
Copy link
Contributor

is there anything GJ specific to instrument in separate instrumentation?

I don't believe so, or anything that exists currently in GoodJob. The *.good_job events are all enumerated in this file.

Thinking about backend queue adapters generally there's usually an operation or two before perform.active_job that does the actual dequeuing operation. For example in GoodJob, within an execution thread there is a database query to the database to dequeue a job record, and then that record is passed into Active Job which then deserializes and performs it. Relatedly, I'm not sure if GlobalID deserialization happens as part of perform.active_job or before it. (GoodJob doesn't currently have instrumentation hooks at that foundational level... though it should!)

@arielvalentin
Copy link
Collaborator Author

Up to date screenshots:

2023-11-21_12-56-48 2023-11-21_12-55-17 2023-11-21_12-54-39

Copy link
Member

@robbkidd robbkidd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Message for you, sir.

@arielvalentin arielvalentin merged commit fb31945 into open-telemetry:main Nov 22, 2023
47 checks passed
@arielvalentin arielvalentin deleted the active-job-instrumentation-redux branch November 22, 2023 03:01
# Removes Event Handler Subscriptions for ActiveJob notifications
# @note this method is not thread-safe and should not be used in a multi-threaded context
def unsubscribe
@subscriptions&.each { |subscriber| ActiveSupport::Notifications.unsubscribe(subscriber) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be ::ActiveSupport::Notifications, otherwise it throws:

uninitialized constant OpenTelemetry::Instrumentation::ActiveSupport::Notifications

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, would you be able to provide us with a PR to address this problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +15 to +19
@span_name_formatter = if @config[:span_naming] == :job_class
->(job) { "#{job.class.name} publish" }
else
->(job) { "#{job.queue_name} publish" }
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing a regression in our codebase because the handler no longer honours dynamic configuration.

i.e. Changing OpenTelemetry::Instrumentation::ActiveJob::Instrumentation.instance after this is registered no longer affects the naming.

Arguably, we probably shouldn't rely on dynamic configs, but I wanted to point it out regardless

Same applies to Perform

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to provide us with a PR to fix this issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#
# @param [Span] span the span to activate
# @yield [span, context] yields span and a context containing the span to the block.
def with_span(span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is never called and that the context is never set.

This is causing link propagation to fail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any additional details you would like to share in an issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go! #1131

I was working on that, but it took me longer than expected to write up the issue. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants