-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-4198] Gather CPU time in profiler without monkey patching Thread for all supported Ruby versions (2.1 to 3.x) #1740
Merged
ivoanjo
merged 14 commits into
master
from
ivoanjo/prof-4198-native-cpu-time-on-older-rubies
Nov 5, 2021
Merged
Changes from 13 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
2d6e9c5
Enable gathering thread CPU time using native extension for legacy Ru…
ivoanjo a9f9354
Introduce NativeExtension#cpu_time_ns_for helper
ivoanjo 87c9676
Sample cpu time in Stack sampler using native extension directly
ivoanjo 2ce9dc4
Copy Process::Waiter comments and regression tests over from cthread.rb
ivoanjo 50ad4f5
Remove all references to the CPU extension from the Setup task
ivoanjo 1106b71
Remove all references to the CPU time extension aka Thread monkey patch
ivoanjo 09dd419
Add info banner when native extension build is being skipped
ivoanjo e85f017
Drop ffi dependency
ivoanjo 9a77c61
Skip building the profiling native extension on TruffleRuby
ivoanjo eb8ff20
Skip native extension build testing for Windows+ruby-head
ivoanjo b47bdbb
Workaround for Ruby 2.1 not having a public thread_native.h header
ivoanjo e1c4fa3
Add missing test assertions
ivoanjo 0f7eeae
Use nicer helper to detect Linux
ivoanjo 4725cc6
Pin version of `debase-ruby_core_source` used, as discussed during PR…
ivoanjo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
# typed: true | ||
|
||
require 'ddtrace/profiling/native_extension' | ||
require 'ddtrace/profiling/backtrace_location' | ||
require 'ddtrace/profiling/events/stack' | ||
require 'ddtrace/utils/only_once' | ||
|
@@ -32,7 +34,8 @@ class Stack < Worker # rubocop:disable Metrics/ClassLength | |
:trace_identifiers_helper, | ||
:ignore_thread, | ||
:max_time_usage_pct, | ||
:thread_api | ||
:thread_api, | ||
:cpu_time_provider | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this abstraction. |
||
|
||
def initialize( | ||
recorder, | ||
|
@@ -42,6 +45,7 @@ def initialize( | |
max_time_usage_pct: DEFAULT_MAX_TIME_USAGE_PCT, | ||
max_threads_sampled: DEFAULT_MAX_THREADS_SAMPLED, | ||
thread_api: Thread, | ||
cpu_time_provider: Datadog::Profiling::NativeExtension, | ||
fork_policy: Workers::Async::Thread::FORK_POLICY_RESTART, # Restart in forks by default | ||
interval: MIN_INTERVAL, | ||
enabled: true | ||
|
@@ -53,6 +57,8 @@ def initialize( | |
@max_time_usage_pct = max_time_usage_pct | ||
@max_threads_sampled = max_threads_sampled | ||
@thread_api = thread_api | ||
# Only set the provider if it's able to work in the current Ruby/OS combo | ||
@cpu_time_provider = cpu_time_provider unless cpu_time_provider.cpu_time_ns_for(thread_api.current).nil? | ||
|
||
# Workers::Async::Thread settings | ||
self.fork_policy = fork_policy | ||
|
@@ -63,8 +69,6 @@ def initialize( | |
# Workers::Polling settings | ||
self.enabled = enabled | ||
|
||
@warn_about_missing_cpu_time_instrumentation_only_once = Datadog::Utils::OnlyOnce.new | ||
|
||
# Cache this proc, since it's pretty expensive to keep recreating it | ||
@build_backtrace_location = method(:build_backtrace_location).to_proc | ||
# Cache this buffer, since it's pretty expensive to keep accessing it | ||
|
@@ -167,17 +171,10 @@ def collect_thread_event(thread, current_wall_time_ns) | |
end | ||
|
||
def get_cpu_time_interval!(thread) | ||
# Return if we can't get the current CPU time | ||
unless thread.respond_to?(:cpu_time_instrumentation_installed?) && thread.cpu_time_instrumentation_installed? | ||
warn_about_missing_cpu_time_instrumentation(thread) | ||
return | ||
end | ||
return unless cpu_time_provider | ||
|
||
current_cpu_time_ns = thread.cpu_time(:nanosecond) | ||
current_cpu_time_ns = cpu_time_provider.cpu_time_ns_for(thread) | ||
|
||
# NOTE: This can still be nil even when all of the checks above passed because of a race: there's a bit of | ||
# initialization that needs to be done by the thread itself, and it's possible for us to try to sample | ||
# *before* the thread had time to finish the initialization | ||
return unless current_cpu_time_ns | ||
|
||
get_elapsed_since_last_sample_and_set_value(thread, THREAD_LAST_CPU_TIME_KEY, current_cpu_time_ns) | ||
|
@@ -226,33 +223,6 @@ def build_backtrace_location(_id, base_label, lineno, path) | |
|
||
private | ||
|
||
def warn_about_missing_cpu_time_instrumentation(thread) | ||
@warn_about_missing_cpu_time_instrumentation_only_once.run do | ||
# Is the profiler thread instrumented? If it is, then we know instrumentation is available, but seems to be | ||
# missing on this thread we just found. | ||
# | ||
# As far as we know, it can be missing due to one the following: | ||
# | ||
# a) The thread was started before we installed our instrumentation. | ||
# In this case, the fix is to make sure ddtrace gets loaded before any other parts of the application. | ||
# | ||
# b) The thread was started using the Ruby native APIs (e.g. from a C extension such as ffi). | ||
# Known cases right now that trigger this are the ethon/typhoeus gems. | ||
# We currently have no solution for this case; these threads will always be missing our CPU instrumentation. | ||
# | ||
# c) The thread was started with `Thread.start`/`Thread.fork` and hasn't yet enabled the instrumentation. | ||
# When threads are started using these APIs, there's a small time window during which the thread has started | ||
# but our code to apply the instrumentation hasn't run yet; in these cases it's just a matter of allowing | ||
# it to run and our instrumentation to be applied. | ||
# | ||
if thread_api.current.respond_to?(:cpu_time) && thread_api.current.cpu_time | ||
Datadog.logger.debug( | ||
"Thread ('#{thread}') is missing profiling instrumentation; other threads should be unaffected" | ||
) | ||
end | ||
end | ||
end | ||
|
||
# If the profiler is started for a while, stopped and then restarted OR whenever the process forks, we need to | ||
# clean up any leftover per-thread counters, so that the first sample after starting doesn't end up with: | ||
# | ||
|
@@ -274,9 +244,18 @@ def reset_cpu_time_tracking | |
end | ||
|
||
def get_elapsed_since_last_sample_and_set_value(thread, key, current_value) | ||
# See cthread.rb for more details, but this is a workaround for https://bugs.ruby-lang.org/issues/17807 ; | ||
# using all thread_variable related methods on these instances also triggers a crash and for now we just | ||
# skip it for the affected Rubies | ||
# Process::Waiter crash workaround: | ||
# | ||
# This is a workaround for a Ruby VM segfault (usually something like | ||
# "[BUG] Segmentation fault at 0x0000000000000008") in the affected Ruby versions. | ||
# See https://bugs.ruby-lang.org/issues/17807 for details. | ||
# | ||
# In those Ruby versions, there's a very special subclass of `Thread` called `Process::Waiter` that causes VM | ||
# crashes whenever something tries to read its instance or thread variables. This subclass of thread only | ||
# shows up when the `Process.detach` API gets used. | ||
# In the specs you'll find crash regression tests that include a way of reproducing it. | ||
# | ||
# As workaround for now we just skip it for the affected Rubies | ||
return 0 if @needs_process_waiter_workaround && thread.is_a?(::Process::Waiter) | ||
|
||
last_value = thread.thread_variable_get(key) || current_value | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for users with versions of Ruby not in the list, for example 2.5.6p201? https://github.com/ruby-debug/debase-ruby_core_source/tree/master/lib/debase/ruby_core_source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It falls back to the closest lower version:
Currently they have the first version of every Ruby release (all the .0). For some versions, they're missing the latest point release (they have 2.2.9 but not 2.2.10), but these core VM apis seem to rarely shift, and we also test with the latest versions in our CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major concern here is would this dependency in anyway prevent users who use tracing on supported versions, platforms and engines from installing & using tracing, if they were to upgrade
ddtrace
and use this dependency?Sounds like there's some kind of fallback mechanism; is this a sufficient guarantee against the prior question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replied in context in #1740 (comment) since I was already touching on this subject.