[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

ivoanjo · 2024-03-06T15:06:04Z

What does this PR do?

This PR adds experimental support for getting profiling code hotspots data (including endpoint profiling) when profiling processes being traced using the opentelemetry ruby gem directly.

Note that this differs from the recommended way of using opentelemetry with the ddtrace library, which is to follow the instructions from https://docs.datadoghq.com/tracing/trace_collection/custom_instrumentation/otel_instrumentation/ruby/ .

The key difference is -- this PR makes code hotspots work even for setups that opt to not use require 'datadog/opentelemetry' (which is the recommended and easier way).

The approach taken here is similar to #2342 and #3466: we peek inside the implementation of the opentelemetry gem to extract the information we need (namely the span id, local root span id, trace type, and trace endpoint). This approach is potentially brittle, which is why the code is written very defensively, with the aim of never breaking the application (or profiling) if something is off -- it just won't collect code hotspots.

Motivation:

We have a customer interested in running this setup, so hopefully they'll be able to test this PR and validate if it works for them.

Furthermore, I'm hoping to see if the opentelemetry Ruby folks would be open to tweaking their APIs to be more friendlier to tools such as the profiler, but for now I opted for getting our hands dirt.

Additional Notes:

I'm opening this PR as draft until we can get feedback from the customer and see if this works for them.

How to test the change?

On top of the added test coverage, I was able to see code hotspots working for the following sinatra example app:

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'

  gem 'rackup'
  gem 'dogstatsd-ruby'
  gem 'datadog', git: 'https://github.com/datadog/dd-trace-rb', branch: 'ivoanjo/prof-9263-otlp-ruby-code-hotspots'
  gem 'sinatra'
  gem 'opentelemetry-api'
  gem 'opentelemetry-sdk'
  gem 'opentelemetry-instrumentation-sinatra'
  gem 'opentelemetry-exporter-otlp'
  gem 'pry'
end

require 'sinatra/base'
require 'opentelemetry/sdk'
require 'pry'

Datadog.configure do |c|
  c.service = 'ivoanjo-testing-opentelemetry-test'
  c.profiling.enabled = true
end

# Configure OpenTelemetry
OpenTelemetry::SDK.configure do |c|
  c.service_name = 'ivoanjo-testing-opentelemetry-test'
  c.use 'OpenTelemetry::Instrumentation::Sinatra'
end

class MyApp < Sinatra::Base
  get '/' do
    OpenTelemetry::Trace.current_span.add_attributes({'runtime-id' => Datadog::Core::Environment::Identity.id})
    sleep 1
    'Hello, OpenTelemetry!'
  end
end

MyApp.run!

After doing a few requests, here's how this looks:

AlexJF

LGTM! And so safe you should be designing baby furniture 😄

AlexJF · 2024-03-06T15:31:03Z

ext/datadog_profiling_native_extension/collectors_thread_context.c

@@ -734,6 +765,11 @@ static void trigger_sample_for_thread(
  struct trace_identifiers trace_identifiers_result = {.valid = false, .trace_endpoint = Qnil};
  trace_identifiers_for(state, thread, &trace_identifiers_result);

+  if (!trace_identifiers_result.valid) {


Worth/possible doing a bit of extra work at the start to arrive at a sticky decision here and short-circuit constantly failing trace_identifiers_for with all its rb_var_get if ddtrace is not used for tracing at all?

Or thinking is that we want to support situations where there's a mix of ddtrace and pure-ot traces and/or the ability to change between one and the other dynamically (e.g. via a feature flag)?

I think your suggestion makes sense.

My intent here in checking both is that the profiler may start quite early in the app lifecycle, so we may not know which one is going to be used yet.

Or thinking is that we want to support situations where there's a mix of ddtrace and pure-ot traces and/or the ability to change between one and the other dynamically (e.g. via a feature flag)?

I'm not sure mixing is even possible at this point, since the ddtrace otel support monkey patches itself pretty deep into opentelemetry (which is why I needed to contort a bit to be able to test both).

For that reason, and after our last discussion, I think it makes sense to stop checking opentelemetry once we see data coming from ddtrace traces.

The reverse is harder to figure out, actually. It would be weird, but not impossible, for an app that started with opentelemetry to then switch over to ddtrace.

TL;DR: I'll wait for feedback from our customer on how this is working before acting on this comment, just in case we end up going in a completely different direction BUT I'll definitely come back to it before marking the PR as non-draft.

ivoanjo · 2024-07-30T15:13:55Z

We're waiting for a bit more feedback on if this is the right approach before going forward. If there's not a lot of movement in a few months we can close this PR.

…tel gem Things missing: * Specs conflict with ddtrace otel specs (need to poke at appraisals) * Missing endpoint support

While we don't need the actual span object to read the span ids, we will need it to read the endpoint names.

… specs I'm... unhappy about this, but couldn't think of anything better that wouldn't involve refactoring the ddtrace tracing otel support and that seems even worse.

Sigh old rubies...

ivoanjo · 2024-09-25T15:40:02Z

Update: I've rebased this PR on top of the v2.3.0 stable release to help testing.

codecov-commenter · 2024-09-25T15:55:16Z

Codecov Report

Attention: Patch coverage is 99.20000% with 1 line in your changes missing coverage. Please review.

Project coverage is 97.84%. Comparing base (c5ab063) to head (6b67c03).
Report is 346 commits behind head on master.

Files with missing lines	Patch %	Lines
...atadog/profiling/collectors/thread_context_spec.rb	99.09%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3510      +/-   ##
==========================================
- Coverage   97.86%   97.84%   -0.03%     
==========================================
  Files        1271     1271              
  Lines       75976    76097     +121     
  Branches     3739     3746       +7     
==========================================
+ Hits        74356    74455      +99     
- Misses       1620     1642      +22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pr-commenter · 2024-09-25T16:14:23Z

Benchmarks

Benchmark execution time: 2024-10-01 15:28:53

Comparing candidate commit 6b67c03 in PR branch ivoanjo/prof-9263-otlp-ruby-code-hotspots with baseline commit c5ab063 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 22 metrics, 2 unstable metrics.

scenario:profiler - sample timeline=false

🟩 throughput [+0.743op/s; +0.764op/s] or [+11.563%; +11.905%]

ivoanjo · 2024-10-09T15:13:02Z

This PR is going to be superceded by #3984 . I'm going to leave it open for a while longer to give customers testing this feature time to migrate to a release containing the newer version of this PR.

ivoanjo · 2024-10-11T12:58:39Z

Erghh our automation was a bit too automated -- I've restored the branch, and removed this from the 2.4.0 milestone (since it was #3984 that got merged).

ivoanjo requested review from a team as code owners March 6, 2024 15:06

ivoanjo marked this pull request as draft March 6, 2024 15:06

github-actions bot added the profiling Involves Datadog profiling label Mar 6, 2024

AlexJF reviewed Mar 6, 2024

View reviewed changes

ivoanjo mentioned this pull request Jul 9, 2024

Add support for tagging profiles with opentelemetry trace identifiers #1568

Closed

ivoanjo added do-not-merge/WIP Not ready for merge otel OpenTelemetry-related changes labels Jul 30, 2024

ivoanjo added 9 commits September 25, 2024 15:58

[PROF-9263] First working version of getting trace identifiers from o…

c579077

…tel gem Things missing: * Specs conflict with ddtrace otel specs (need to poke at appraisals) * Missing endpoint support

Refactor otel_span_context_from to also return span

538aed9

While we don't need the actual span object to read the span ids, we will need it to read the endpoint names.

Add support for collecting trace endpoint from otel spans

1c8a9a7

Add coverage for trace with invalid span

fc26465

Allow specs for otel sdk without ddtrace to co-exist with the ddtrace…

a6164bd

… specs I'm... unhappy about this, but couldn't think of anything better that wouldn't involve refactoring the ddtrace tracing otel support and that seems even worse.

Fix specs breaking on Ruby 2.3 due to missing String#unpack1

5dcec3f

Sigh old rubies...

Add appraisal gemfiles/lockfiles for opentelemetry_otlp configuration

468cb13

Remove unneeded Rubocop config

dd865ee

Apply standardrb code style fixes

19646f2

ivoanjo force-pushed the ivoanjo/prof-9263-otlp-ruby-code-hotspots branch from 796ea12 to 19646f2 Compare September 25, 2024 15:38

Support running in situations where Gem.loaded_specs is not available

6b67c03

ivoanjo mentioned this pull request Oct 9, 2024

[PROF-10679] Add preview support for correlating profiling with otel ruby gem #3984

Merged

ivoanjo closed this pull request by merging all changes into master in 89d23d6 Oct 10, 2024

ivoanjo deleted the ivoanjo/prof-9263-otlp-ruby-code-hotspots branch October 10, 2024 08:32

github-actions bot added this to the 2.4.0 milestone Oct 10, 2024

ivoanjo restored the ivoanjo/prof-9263-otlp-ruby-code-hotspots branch October 11, 2024 12:58

ivoanjo removed this from the 2.4.0 milestone Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

ivoanjo commented Mar 6, 2024 •

edited

Loading

AlexJF left a comment

AlexJF Mar 6, 2024 •

edited

Loading

ivoanjo Mar 8, 2024

ivoanjo commented Jul 30, 2024

ivoanjo commented Sep 25, 2024

codecov-commenter commented Sep 25, 2024 •

edited

Loading

pr-commenter bot commented Sep 25, 2024 •

edited

Loading

ivoanjo commented Oct 9, 2024

ivoanjo commented Oct 11, 2024

[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

Conversation

ivoanjo commented Mar 6, 2024 • edited Loading

AlexJF left a comment

Choose a reason for hiding this comment

AlexJF Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

ivoanjo Mar 8, 2024

Choose a reason for hiding this comment

ivoanjo commented Jul 30, 2024

ivoanjo commented Sep 25, 2024

codecov-commenter commented Sep 25, 2024 • edited Loading

Codecov Report

pr-commenter bot commented Sep 25, 2024 • edited Loading

Benchmarks

scenario:profiler - sample timeline=false

ivoanjo commented Oct 9, 2024

ivoanjo commented Oct 11, 2024

ivoanjo commented Mar 6, 2024 •

edited

Loading

AlexJF Mar 6, 2024 •

edited

Loading

codecov-commenter commented Sep 25, 2024 •

edited

Loading

pr-commenter bot commented Sep 25, 2024 •

edited

Loading