-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-4904] New Stack collector is feature-complete #2000
[PROF-4904] New Stack collector is feature-complete #2000
Conversation
Ooops mistakenly forgot to open this as a draft! |
31a46cf
to
5814cf1
Compare
1fb0dea
to
bb89cce
Compare
5814cf1
to
9d35f1e
Compare
… in next PR) Making the stack collector work on older Rubies is quite a lot of work, which I'd like to submit as a separate PR. To avoid our CI breaking, let's disable a few profiler things, which we will re-enable in <#2000>. Note that *NONE* of the components being disabled or messed with are enabled for customers -- they're still not on in any situation -- so this strategy of disabling stuff is just an enabler for incremental development.
bb89cce
to
20c1c4b
Compare
// * If we don't move it into a different thread, does releasing the GVL on a Ruby thread mean that we're introducing | ||
// a new thread switch point where there previously was none? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my first thought when I was reading this method and thinking of moving out of the GVL. Of course, profiling the code performance here is will tell us, but I do agree that it's very possible that the complexity introduced might negate any performance gains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Part of the awkwardness of releasing the GVL or even sending the sample to a different thread that operates without the GVL is that we'd need to duplicate a lot of the Ruby-level stuff, because it requires the GVL being held to access it. That's annoying because it would introduce an extra copy; tracking of a bunch of heap-allocated stuff, etc.
There's been some discussion with the other profiler libraries about possibly changing libddprof/libdatadog to expose its string table. I've been thinking it could be used to "solve" this issue -- e.g. we could copy all the Ruby strings into the string table (with the GVL being held), avoiding any extra copies, and then we'd have a lot less data to send to a background thread.
So yeah, definitely I'll be keeping an eye on this one.
// To give customers visibility into these threads, rather than reporting an empty stack, we replace the empty stack | ||
// with one containing a placeholder frame, so that these threads are properly represented in the UX. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice usability feature for users. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! We already had this for the old profiler as well: #1719 :)
if (max_frames < 5) rb_raise(rb_eArgError, "Invalid max_frames: value must be >= 5"); | ||
if (max_frames > MAX_FRAMES_LIMIT) rb_raise(rb_eArgError, "Invalid max_frames: value must be <= " MAX_FRAMES_LIMIT_AS_STRING); | ||
|
||
sampling_buffer* buffer = xcalloc(1, sizeof(sampling_buffer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the "checked memory allocation" variants (e.g. xcalloc
) recommended for the Ruby VM? I'm assuming so and the Ruby VM has a proper global error handler configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually thanks for calling me out on this!
xcalloc
is indeed provided by Ruby -- it's aliased to ruby_xcalloc
:
* Allocates a storage instance. It is largely the same as system malloc(),
* except:
*
* - It raises Ruby exceptions instead of returning NULL, and
* - In case of `ENOMEM` it tries to GC to make some room.
The part I had missed thus far is this one:
* @note It doesn't return NULL.
Because Ruby will call ruby_memerror
aka try to raise our friend the NoMemoryError
exception in case of NULL
. So I've been adding more NULL
-checks than needed. I'll do a pass an clean-up any such unneeded checks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 68e4bc5
if (max_frames > MAX_FRAMES_LIMIT) rb_raise(rb_eArgError, "Invalid max_frames: value must be <= " MAX_FRAMES_LIMIT_AS_STRING); | ||
|
||
sampling_buffer* buffer = xcalloc(1, sizeof(sampling_buffer)); | ||
if (buffer == NULL) rb_memerror(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the comment above, if we are calling the x
variant of calloc
, will this code be reachable with a buffer == NULL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're 💯% right, will remove :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 68e4bc5
// The only case where any of the underlying arrays are NULL is when initial allocation failed; otherwise they | ||
// can be assumed to be not-null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than one system I've worked with in the past used this pattern of free if not NULL
. I think it's a fair safety pattern, if a bit defensive. 👍
ptrdiff_t stack_depth_for(VALUE thread) { | ||
#ifndef USE_THREAD_INSTEAD_OF_EXECUTION_CONTEXT // Modern Rubies | ||
const rb_execution_context_t *ec = thread_struct_from_object(thread)->ec; | ||
#else // Ruby < 2.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to say "thank you" for documenting what parts of our code can be dropped in the future when we deprecate support for specific versions of Ruby.
There are self evident sections if RUBY_VERSION < '2.3'
and those are great, but I want to highlight the importance of documenting the non-self evident ones. So yeah, thank you again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm looking forward to dropping support for 2.1 older Rubies on this part of the code.
I also plan to submit everything I can upstream, so that hopefully at some point in the 3.x series, we will no longer need private_vm_api_access.c
for those Rubies :)
const static ddprof_ffi_ValueType enabled_value_types[] = {CPU_TIME_VALUE, CPU_SAMPLES_VALUE, WALL_TIME_VALUE}; | ||
static const ddprof_ffi_ValueType enabled_value_types[] = {CPU_TIME_VALUE, CPU_SAMPLES_VALUE, WALL_TIME_VALUE}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the most important change right here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include IbhModuleD | ||
end | ||
|
||
$ibh_a_proc = proc { IbhClassC.new.hello } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Globals! How about slapping all these $
bad boys into a module in this file, like IbhGlobal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear there, my suggestion will likely accomplish the same level of code isolation, but I just find it jarring to see these $
around me. It makes me slightly confused when I glance a random $
while reading code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in ce1c971
…reverted in next PR)" This reverts commit 2334216. The commit being reverted was only added so that we could merge the previous PR and still have CI happy and green.
Stack allocation for arrays is... weird and complicated so I've extracted all of the buffers needed for sampling into a dynamically-allocated struct. This is also the first step towards making the maximum stack depth configurable from Ruby code.
A frame representing a method implemented using native code needs to be handled in a special way: * we need to use a different API to get the method name * we need to use the path and line number of its direct/indirect Ruby caller (if we want to match the Ruby backtrace APIs at least) To implement this, I further extended our custom `rb_profile_frames` to support flagging a method as being a Ruby frame or not. In terms of memory we could probably pack things a bit more (e.g. not use a space-wasting array of bool) but I'll leave such optimizations for later. Finally, the `rb_profile_frame_method_name` method only returns the actual method name for a native method on Ruby 3.0+. To support older Rubies, a further commit will introduce a few more copy-pasted backported code.
…bies Ruby 3.0 finally added support for showing CFUNC frames (frames for methods written using native code) in stack traces gathered via `rb_profile_frames` (ruby/ruby#3299). To access this information on older Rubies, beyond using our custom `ddtrace_rb_profile_frames`, we also need to backport the Ruby 3.0+ version of `rb_profile_frame_method_name`.
It turns out the awful hack we had thus far to change the `extconf.h` file is not actually needed, and can be simplified.
It's a good idea for `private_vm_api_access.c` to include `private_vm_api_access.h` so that the compiler can complain at us if they got out of sync. We previously were not doing it because `private_vm_api_access.c` includes the `RUBY_MJIT_HEADER` which can conflict with other Ruby headers that were being included in `private_vm_api_access.h`. To fix this, I've introduced an extra define -- `PRIVATE_VM_API_ACCESS_SKIP_RUBY_INCLUDES` which is used in `private_vm_api_access.h` to decide if the Ruby includes should or should not be included. (I also did a minor cleanup to introduce some indentation on the preprocessor definitions to make them more readable, especially when nested.)
We always specify `lines`, so there's no need to carry code designed to make that argument optional.
A few compatibility fixes to support older Rubies: * Use `RHASH_SIZE` instead of `rb_hash_size_num` * Fix a few warnings regarding integer sizes * Import `iseq.h` internal header, it's needed on older Rubies to support our custom `rb_profile_frames` * Import `rb_vm_frame_method_entry`, it's needed on older Rubies to support our custom `rb_profile_frames` (this included importing a few more things needed to support that function)
The 3.20 series dropped support for 2.4 but somehow rubygems/bundler still try to install it.
A few compatibility fixes to support older Rubies: * Add missing header to be able to use bool * Use `rb_thread_t` instead of `rb_execution_context_t`. The latter was extracted from the former starting on Ruby 2.5, so on older Rubies we use the former directly.
A lot of macros in `rb_vm_frame_method_entry` did not exist back in 2.3 so I decided to import the Ruby 2.3 version of `rb_vm_frame_method_entry`.
…elow The `rb_profile_frames` function changed quite a bit between Ruby 2.2 and 2.3. Since the change was quite complex I opted not to try to extend support to Ruby 2.2 and below using the same custom function, and instead I started anew from the Ruby 2.2 version of the function, applying some of the same fixes that we have for the modern version. This gets us closer to a green profiler test suite on 2.2 and 2.1 (there's at least one other compatibility fix needed elsewhere which will come in a later commit).
…elow While I hesistate to invest too much on supporting older Rubies, this seems like a worthy trade-off, because otherwise we'd be stuck creating alternative versions for every test in `stack_spec.rb`.
This is becoming a common-enough operation that it makes sense to have a nice wrapper for it.
This is similar to the one we had in `Datadog::Profiling::Encoding::Profile::Protobuf`, but for the new libddprof-based `StackRecorder`. Unfortunately we don't get as many debug stats with libddprof as we had with the previous code; at some point we may want to take a stab at exposing that info in libddprof.
Whenever something is dynamically allocated, we must ensure that either no exceptions can be thrown or that we clean it up before throwing an exception.
The `TypedData_Get_Struct` API already checks that a Ruby object has the correct types, so there's no need to do it twice.
Make sure to clean up the `serialized_profile` before calling into `ruby_time_from`, in case any exceptions are raised.
With everything being cached, it's kinda hard to interactively experiment inside e.g. a pry session.
…g frame from eval/instance_eval See comments in `private_vm_api_acccess.c` for the gritty details. The TL;DR is that we needed to add yet another change to our custom `rb_profile_frames` in order to match what the Ruby reference stack trace API returns.
…r specs This diff looks weirder because of the whitespace changes; I recommend looking at it with withespace being ignored.
No further changes needed to support this, but I want to add the extra coverage for weird things so that in the future we avoid any regressions.
…it for testing the stack collector (As the creator and sole copyright owner of that file in the `backtracie` gem, taken from <https://github.com/ivoanjo/backtracie/blob/1f6116ff5b07e87fe3e5be95a22c6a8e1210de8b/spec/unit/interesting_backtrace_helper.rb> I am relicensing the file and its copyrights to belong to Datadog similarly to the rest of the dd-trace-rb codebase). This file is used as a single mega-complex test for the stack sampling code, and is quite useful to spot regressions.
What we're doing in these tests is inherently a race: we want the thread to be stopped at the `sleep`, but have no way to atomically signal the `ready_queue` and go to `sleep`, and thus we kinda pray that Ruby just won't schedule us away between both operations. This seems to work so far, but to avoid creating any flaky tests, let's try to signal readyness as close as possible to calling `sleep`. (Note that if this ever gets us into trouble and ends up creating test flakiness, maybe we could implement a native method that signals the queue AND calls sleep and ensures the Global VM Lock is not released.)
As pointed out by Marco in <#2019 (comment)> it's possible that when memory allocation fails, we can be in such a state that casually raising an exception may not work at all. Instead, let's use a function that Ruby exposes for this exact situation -- `rb_memerror()` -- which is built to withstand such memory issues.
As pointed out by Marco during review (<#2000 (comment)>) the `xcalloc` function already tests for `NULL` and thus we don't need to repeat the test. To make it even more obvious that we're using the Ruby-provided functions for managing memory, I've decided to directly call `ruby_xcalloc` and `ruby_xfree` instead of their shorted aliases.
Not that it makes any difference, because they are still globals.
98fe521
to
ce1c971
Compare
One last rebase to solve a last-minute conflict with #1966 and this PR is now on top of master! Just need to chase down that CI failure relating to max_frames before merging :) |
The `frames_omitted_message` CAN be stack allocated but MUST live as long as the call to `record_sample`. Bizarrely enough, the previous approach accidentally worked on macOS, which is why I had not spotted it before -- it worked, and tests passed, so surely it was ok... right? Nope!
CI is beautiful green, merging away! |
As pointed out by Marco during review (<#2000 (comment)>) the `xcalloc` function already tests for `NULL` and thus we don't need to repeat the test. To make it even more obvious that we're using the Ruby-provided functions for managing memory, I've decided to directly call `ruby_xcalloc` and `ruby_xfree` instead of their shorted aliases.
The new stack collector introduced in #1966 gets feature-complete in this PR.
By feature-complete I mean:
rb_profile_frames
got quite a lot of fixes to exactly match the Ruby reference stack trace api (Thread#backtrace_locations
)I don't quite like how long and how complex this PR got. Thanks for putting up with it 😅 , and feel free to challenge me if this should be further broken down. Reviewing it commit-by-commit may make more sense, helping to show what bits were added why.
Note: This PR builds on top of #1966.
Note 2: I'm aware that specs for "Datadog::Profiling::Collectors::Stack when sampling a thread with a stack that is deeper than the configured max_frames" are failing. I missed this previously as it did not happen on macOS (works fine on macOS). I'll push a fix for it ASAP, but otherwise this is ready for reviewing.