Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-4904] First implementation of new Stack collector based on rb_profile_frames API #1966

Merged
merged 13 commits into from
May 23, 2022

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Apr 5, 2022

This PR adds the new Stack collector, which samples using the rb_profile_frames API.

Rather than using the rb_profile_frames API directly, it copy pastes it from the Ruby VM sources into the private_vm_api_access.c file.

I claim license-wise this is OK since upstream Ruby allows the source code to be used under the terms of the BSD-2-Clause license, which I include in the file with the copied snippets as well as in the LICENSE-3rdparty.csv file as usual. I'll also run this through the internal opensource group, in case they have any notes/concerns.

Code-wise, here's a quick description of rb_profile_frames and why we need it:

What is rb_profile_frames?

rb_profile_frames is a Ruby VM debug API added for use by profilers for sampling the stack trace of a Ruby thread.
Its main other user is the stackprof profiler: https://github.com/tmm1/stackprof .

Why do we need a custom version of rb_profile_frames?

There are a few reasons:

  1. To backport improved behavior to older Rubies. Prior to Ruby 3.0 (vm_backtrace.c: let rb_profile_frames show cfunc frames ruby/ruby#3299), rb_profile_frames skipped CFUNC frames, aka frames that are implemented with native code, and thus the resulting stacks were quite incomplete as a big part of the Ruby standard library is implemented with native code.
  2. To extend this function to work with any thread. The upstream rb_profile_frames function only targets the current
    thread, and to support wall-clock profiling we require sampling other threads. This is only safe because of the
    Global VM Lock. (We don't yet support sampling Ractors beyond the main one; we'll need to find a way to do it
    safely first.)
  3. To get more information out of the Ruby VM. The Ruby VM has a lot more information than is exposed through
    rb_profile_frames, and by making our own copy of this function we can extract more of this information.
    See for backtracie gem (https://github.com/ivoanjo/backtracie) for an exploration of what can potentially be done.
  4. Because we haven't yet submitted patches to upstream Ruby. As with any changes on the private_vm_api_access.c,
    our medium/long-term plan is to contribute upstream changes and make it so that we don't need any of this
    on modern Rubies.

This PR depends on #1960. It also isn't the full story -- I tried to limit the scope of this PR to avoid it growing too much, since "copy pasting code from the Ruby codebase" was already a quite important event by itself.

Copy link
Member

@jeremy-lq jeremy-lq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open source aspects LGTM

@ivoanjo ivoanjo force-pushed the ivoanjo/prof-4904-native-stack-collector branch from 9078e11 to 4be8f4f Compare May 18, 2022 09:27
@ivoanjo ivoanjo force-pushed the ivoanjo/prof-4904-native-stack-collector-continued branch from 31a46cf to 5814cf1 Compare May 18, 2022 09:35
@ivoanjo ivoanjo force-pushed the ivoanjo/prof-4904-native-stack-collector branch 2 times, most recently from 898b439 to 6be9535 Compare May 19, 2022 10:42
ivoanjo added 11 commits May 19, 2022 14:26
The Ruby sources usually use `rb_thread_t` so this brings us closer
to upstream.
The `rb_profile_frames` method can be used to get stack traces for
a profiler, but it has a number of limitations and caveats which we'll
need to bridge, and thus we need to ship our own copy of it.

This is OK since upstream Ruby allows the source code to be used under
the terms of the BSD-2-Clause license, which I include in the file
with the copied snippets. I'll also run this through the internal
opensource group, in case they have any notes/concerns.
(The extraction originally came from `http_transport.c`, added in
 #1923 but after a rebase only the addition of the new header remained)
This header always needs to be included when including
`private_vm_api_access.h` so it makes more sense to have it there.
These will be needed in the `collectors_stack.c`.
This first version has only a very limited test; this will be
extended as we fix `rb_profile_frames` sharp edges in later commits.
Now that we have a component capable of writing stacks into the
`StackRecorder` -- the `Collectors::Stack`, we can use it to write
a stack and confirm that values and labels are correctly propagated.
@ivoanjo ivoanjo force-pushed the ivoanjo/prof-4904-native-stack-collector-continued branch from 5814cf1 to 9d35f1e Compare May 19, 2022 13:39
… in next PR)

Making the stack collector work on older Rubies is quite a lot of work,
which I'd like to submit as a separate PR.

To avoid our CI breaking, let's disable a few profiler things,
which we will re-enable in
<#2000>.

Note that *NONE* of the components being disabled or messed
with are enabled for customers -- they're still not on in any
situation -- so this strategy of disabling stuff is just an
enabler for incremental development.
}

void sample(VALUE thread, VALUE recorder_instance, ddprof_ffi_Slice_i64 metric_values, ddprof_ffi_Slice_label labels) {
const int max_frames = 400; // FIXME: Should be configurable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is made configurable in a later PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a performance argument to be made to keeping this a compile time constant, but I'd probably think it's unlikely to be worth the rigidness of this being unconfigurable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, yeah, especially because here I'm still doing stack allocation for the needed data. In the next PR (#2000) I move this over to being heap-allocated as stack allocation failure modes are... weird.

I do plan to benchmark and profile the new profiler code once it's all wired, to catch any obvious inefficiencies that may have been left :)

@ivoanjo ivoanjo marked this pull request as ready for review May 19, 2022 16:23
@ivoanjo ivoanjo requested a review from a team May 19, 2022 16:23
Comment on lines 45 to 46
// The sources below are modified versions of code extracted from the Ruby project.
// The Ruby project copyright and license follow:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were there modifications to the source code we copied here?
If so, could you leave some inline comments around the code lines that were modified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully addressed in 4b7ee20:

  • Added a note to document that every function has a list of changes (most of them are Modifications: none)
  • Added an inline comment in ddtrace_rb_profile_frames to point at what changed from the upstream version (added thread argument)

Base automatically changed from ivoanjo/prof-4904-native-stack-collector to master May 23, 2022 08:31
@ivoanjo ivoanjo merged commit e55a123 into master May 23, 2022
@ivoanjo ivoanjo deleted the ivoanjo/prof-4904-native-stack-collector-continued branch May 23, 2022 09:31
@github-actions github-actions bot added this to the 1.1.0 milestone May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants