New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[PROF-4904] New Stack collector is feature-complete #2000

Merged

ivoanjo merged 58 commits into master from ivoanjo/prof-4904-native-stack-collector-part3

May 24, 2022

Member

ivoanjo commented May 4, 2022 •

edited

Loading

The new stack collector introduced in #1966 gets feature-complete in this PR.

By feature-complete I mean:

There is extensive test coverage with many corner cases being checked
Our custom rb_profile_frames got quite a lot of fixes to exactly match the Ruby reference stack trace api (Thread#backtrace_locations)
It works for all supported Rubies, including 2.1 and 2.2 which were a though nut to crack
It matches the behavior of the old profiling codepaths, e.g. it has placeholders for threads with empty stacks that are running in native code and for threads with too deep stacks
etc...

I don't quite like how long and how complex this PR got. Thanks for putting up with it 😅 , and feel free to challenge me if this should be further broken down. Reviewing it commit-by-commit may make more sense, helping to show what bits were added why.

Note: This PR builds on top of #1966.
Note 2: I'm aware that specs for "Datadog::Profiling::Collectors::Stack when sampling a thread with a stack that is deeper than the configured max_frames" are failing. I missed this previously as it did not happen on macOS (works fine on macOS). I'll push a fix for it ASAP, but otherwise this is ready for reviewing.

ivoanjo requested a review from a team

May 4, 2022 10:53

ivoanjo changed the title ~~[PROF-4904] New Stack collector feature-complete~~ [PROF-4904] New Stack collector is feature-complete

ivoanjo marked this pull request as draft

May 5, 2022 07:31

Member Author

ivoanjo commented May 5, 2022

Ooops mistakenly forgot to open this as a draft!

ivoanjo mentioned this pull request

[PROF-5473] Release global VM lock during profile serialization #2007

Merged

ivoanjo force-pushed the ivoanjo/prof-4904-native-stack-collector-continued branch from 31a46cf to 5814cf1 Compare

May 18, 2022 09:35

ivoanjo force-pushed the ivoanjo/prof-4904-native-stack-collector-part3 branch from 1fb0dea to bb89cce Compare

May 18, 2022 09:53

ivoanjo force-pushed the ivoanjo/prof-4904-native-stack-collector-continued branch from 5814cf1 to 9d35f1e Compare

May 19, 2022 13:39

ivoanjo added a commit that referenced this pull request


          Temporarily disable Stack collector on older Rubies (will be reverted…

… in next PR)

Making the stack collector work on older Rubies is quite a lot of work,
which I'd like to submit as a separate PR.

To avoid our CI breaking, let's disable a few profiler things,
which we will re-enable in
<#2000>.

Note that *NONE* of the components being disabled or messed
with are enabled for customers -- they're still not on in any
situation -- so this strategy of disabling stuff is just an
enabler for incremental development.

ivoanjo force-pushed the ivoanjo/prof-4904-native-stack-collector-part3 branch from bb89cce to 20c1c4b Compare

May 20, 2022 10:14

ivoanjo marked this pull request as ready for review

May 20, 2022 11:17

marcotc approved these changes

View reviewed changes

ext/ddtrace_profiling_native_extension/collectors_stack.c

Comment on lines +111 to +112

		// * If we don't move it into a different thread, does releasing the GVL on a Ruby thread mean that we're introducing
		// a new thread switch point where there previously was none?

Member

marcotc May 20, 2022

This was my first thought when I was reading this method and thinking of moving out of the GVL. Of course, profiling the code performance here is will tell us, but I do agree that it's very possible that the complexity introduced might negate any performance gains.

Member Author

ivoanjo May 23, 2022 •

edited

Loading

Indeed. Part of the awkwardness of releasing the GVL or even sending the sample to a different thread that operates without the GVL is that we'd need to duplicate a lot of the Ruby-level stuff, because it requires the GVL being held to access it. That's annoying because it would introduce an extra copy; tracking of a bunch of heap-allocated stuff, etc.

There's been some discussion with the other profiler libraries about possibly changing libddprof/libdatadog to expose its string table. I've been thinking it could be used to "solve" this issue -- e.g. we could copy all the Ruby strings into the string table (with the GVL being held), avoiding any extra copies, and then we'd have a lot less data to send to a background thread.

So yeah, definitely I'll be keeping an eye on this one.

ext/ddtrace_profiling_native_extension/collectors_stack.c

Comment on lines +223 to +224

		// To give customers visibility into these threads, rather than reporting an empty stack, we replace the empty stack
		// with one containing a placeholder frame, so that these threads are properly represented in the UX.

Member

marcotc May 20, 2022

This is a very nice usability feature for users. 👍

Member Author

ivoanjo May 23, 2022

Yup! We already had this for the old profiler as well: #1719 :)

ext/ddtrace_profiling_native_extension/collectors_stack.c Outdated

+                if (max_frames < 5) rb_raise(rb_eArgError, "Invalid max_frames: value must be >= 5");
+                if (max_frames > MAX_FRAMES_LIMIT) rb_raise(rb_eArgError, "Invalid max_frames: value must be <= " MAX_FRAMES_LIMIT_AS_STRING);
+                sampling_buffer* buffer = xcalloc(1, sizeof(sampling_buffer));

Member

marcotc May 20, 2022

Are the "checked memory allocation" variants (e.g. xcalloc) recommended for the Ruby VM? I'm assuming so and the Ruby VM has a proper global error handler configured.

Member Author

ivoanjo May 23, 2022

Actually thanks for calling me out on this!

xcalloc is indeed provided by Ruby -- it's aliased to ruby_xcalloc:

 * Allocates a  storage instance.  It is  largely the same as  system malloc(),
 * except:
 *
 *   - It raises Ruby exceptions instead of returning NULL, and
 *   - In case of `ENOMEM` it tries to GC to make some room.

https://github.com/ruby/ruby/blob/0293f8ca8e1fbe8e4202c2d211bb11f4e0ffa5fd/include/ruby/internal/xmalloc.h#L53-L147

The part I had missed thus far is this one:

 * @note       It doesn't return NULL.

Because Ruby will call ruby_memerror aka try to raise our friend the NoMemoryError exception in case of NULL. So I've been adding more NULL-checks than needed. I'll do a pass an clean-up any such unneeded checks :)

Member Author

ivoanjo May 23, 2022

Fixed in 68e4bc5

ext/ddtrace_profiling_native_extension/collectors_stack.c Outdated

+                if (max_frames > MAX_FRAMES_LIMIT) rb_raise(rb_eArgError, "Invalid max_frames: value must be <= " MAX_FRAMES_LIMIT_AS_STRING);
+                sampling_buffer* buffer = xcalloc(1, sizeof(sampling_buffer));
+                if (buffer == NULL) rb_memerror();

Member

marcotc May 20, 2022

Following the comment above, if we are calling the x variant of calloc, will this code be reachable with a buffer == NULL?

Member Author

ivoanjo May 23, 2022

You're 💯% right, will remove :)

Member Author

ivoanjo May 23, 2022

Fixed in 68e4bc5

ext/ddtrace_profiling_native_extension/collectors_stack.c Outdated

Comment on lines 276 to 277

		// The only case where any of the underlying arrays are NULL is when initial allocation failed; otherwise they
		// can be assumed to be not-null.

Member

marcotc May 20, 2022

More than one system I've worked with in the past used this pattern of free if not NULL. I think it's a fair safety pattern, if a bit defensive. 👍

ext/ddtrace_profiling_native_extension/private_vm_api_access.c

+              ptrdiff_t stack_depth_for(VALUE thread) {
+                #ifndef USE_THREAD_INSTEAD_OF_EXECUTION_CONTEXT // Modern Rubies
+                  const rb_execution_context_t *ec = thread_struct_from_object(thread)->ec;
+                #else // Ruby < 2.5

Member

marcotc May 20, 2022

I just want to say "thank you" for documenting what parts of our code can be dropped in the future when we deprecate support for specific versions of Ruby.
There are self evident sections if RUBY_VERSION < '2.3' and those are great, but I want to highlight the importance of documenting the non-self evident ones. So yeah, thank you again!

Member Author

ivoanjo May 23, 2022

Yeah, I'm looking forward to dropping support for 2.1 older Rubies on this part of the code.

I also plan to submit everything I can upstream, so that hopefully at some point in the 3.x series, we will no longer need private_vm_api_access.c for those Rubies :)

ext/ddtrace_profiling_native_extension/stack_recorder.h

		const static ddprof_ffi_ValueType enabled_value_types[] = {CPU_TIME_VALUE, CPU_SAMPLES_VALUE, WALL_TIME_VALUE};
		static const ddprof_ffi_ValueType enabled_value_types[] = {CPU_TIME_VALUE, CPU_SAMPLES_VALUE, WALL_TIME_VALUE};

Member

marcotc May 20, 2022

Found the most important change right here!

Member Author

ivoanjo May 23, 2022

spec/datadog/profiling/collectors/interesting_backtrace_helper.rb Outdated

+                include IbhModuleD
+              end
+              $ibh_a_proc = proc { IbhClassC.new.hello }

Member

marcotc May 20, 2022

Globals! How about slapping all these $ bad boys into a module in this file, like IbhGlobal?

Member

marcotc May 20, 2022

To be clear there, my suggestion will likely accomplish the same level of code isolation, but I just find it jarring to see these $ around me. It makes me slightly confused when I glance a random $ while reading code here.

Member Author

ivoanjo May 23, 2022

Fixed in ce1c971

ivoanjo mentioned this pull request

[PROF-4904] First implementation of new Stack collector based on rb_profile_frames API #1966

Merged

Base automatically changed from ivoanjo/prof-4904-native-stack-collector-continued to master

May 23, 2022 09:31

ivoanjo added 16 commits

May 23, 2022 10:46


          Revert "Temporarily disable Stack collector on older Rubies (will be …

7439c22

…reverted in next PR)"

This reverts commit 2334216.

The commit being reverted was only added so that we could merge the
previous PR and still have CI happy and green.


          Minor cleanups to file structure

e5884ce


          Use heap memory for sample buffer space

9ff3305

Stack allocation for arrays is... weird and complicated so I've
extracted all of the buffers needed for sampling into a
dynamically-allocated struct.

This is also the first step towards making the maximum stack depth
configurable from Ruby code.


          Handle sampling stacks with methods implemented using native code

879a1f0

A frame representing a method implemented using native code needs to
be handled in a special way:

* we need to use a different API to get the method name
* we need to use the path and line number of its direct/indirect Ruby
  caller (if we want to match the Ruby backtrace APIs at least)

To implement this, I further extended our custom `rb_profile_frames`
to support flagging a method as being a Ruby frame or not.

In terms of memory we could probably pack things a bit more (e.g.
not use a space-wasting array of bool) but I'll leave such
optimizations for later.

Finally, the `rb_profile_frame_method_name` method only returns
the actual method name for a native method on Ruby 3.0+.

To support older Rubies, a further commit will introduce a few more
copy-pasted backported code.


          Gather the name of a method implemented using native code on older Ru…

8aef07b

…bies

Ruby 3.0 finally added support for showing CFUNC frames (frames for
methods written using native code) in stack traces gathered via
`rb_profile_frames` (ruby/ruby#3299).

To access this information on older Rubies, beyond using our custom
`ddtrace_rb_profile_frames`, we also need to backport the Ruby 3.0+
version of `rb_profile_frame_method_name`.


          Simplify definition of RUBY_MJIT_HEADER

6d3e108

It turns out the awful hack we had thus far to change the `extconf.h`
file is not actually needed, and can be simplified.


          Enable including private_vm_api_access.h in the .c without issues

227b78a

It's a good idea for `private_vm_api_access.c` to include
`private_vm_api_access.h` so that the compiler can complain at us if
they got out of sync.

We previously were not doing it because `private_vm_api_access.c`
includes the `RUBY_MJIT_HEADER` which can conflict with other Ruby
headers that were being included in `private_vm_api_access.h`.

To fix this, I've introduced an extra define --
`PRIVATE_VM_API_ACCESS_SKIP_RUBY_INCLUDES` which is used in
`private_vm_api_access.h` to decide if the Ruby includes should or
should not be included.

(I also did a minor cleanup to introduce some indentation on the
preprocessor definitions to make them more readable, especially
when nested.)


          Remove not-useful test from ddtrace_rb_profile_frames

f47f98b

We always specify `lines`, so there's no need to carry code
designed to make that argument optional.


          Fix support for Ruby 2.5

645a4bd

A few compatibility fixes to support older Rubies:
* Use `RHASH_SIZE` instead of `rb_hash_size_num`
* Fix a few warnings regarding integer sizes
* Import `iseq.h` internal header, it's needed on older Rubies
  to support our custom `rb_profile_frames`
* Import `rb_vm_frame_method_entry`, it's needed on older Rubies
  to support our custom `rb_profile_frames` (this included importing
  a few more things needed to support that function)


          Also exclude Ruby 2.4 from installing modern google-protobuf

f5be542

The 3.20 series dropped support for 2.4 but somehow rubygems/bundler
still try to install it.


          Fix support for Ruby 2.4

93646e0

A few compatibility fixes to support older Rubies:
* Add missing header to be able to use bool
* Use `rb_thread_t` instead of `rb_execution_context_t`. The latter
  was extracted from the former starting on Ruby 2.5, so on older
  Rubies we use the former directly.


          Replace non-existent VM_FRAME_RUBYFRAME_P on Ruby < 2.4

2830d35


          Fix support for Ruby 2.3

fb88f95

A lot of macros in `rb_vm_frame_method_entry` did not exist back in
2.3 so I decided to import the Ruby 2.3 version of
`rb_vm_frame_method_entry`.


          Minor comment fixes and cleanups

db4a03b


          Add alternative rb_profile_frames implementation for Ruby 2.2 and b…

0ffd58c

…elow

The `rb_profile_frames` function changed quite a bit between Ruby 2.2
and 2.3.

Since the change was quite complex I opted not to try to extend support
to Ruby 2.2 and below using the same custom function, and instead I
started anew from the Ruby 2.2 version of the function, applying some
of the same fixes that we have for the modern version.

This gets us closer to a green profiler test suite on 2.2 and 2.1
(there's at least one other compatibility fix needed elsewhere which
will come in a later commit).


          Add support for sampling frames with native methods on Ruby 2.2 and b…

6b3cea7

…elow

While I hesistate to invest too much on supporting older Rubies, this
seems like a worthy trade-off, because otherwise we'd be stuck
creating alternative versions for every test in `stack_spec.rb`.

ivoanjo added 22 commits

May 23, 2022 10:49


          Extract helper for creating Ruby string from libddprof Vec_u8

e9c61ce

This is becoming a common-enough operation that it makes sense to
have a nice wrapper for it.


          Fix Rubocop complaints

598fab0


          Minor: Fix typo

16b7992


          Add debug log message during profile encoding

7e18799

This is similar to the one we had in
`Datadog::Profiling::Encoding::Profile::Protobuf`, but for the new
libddprof-based `StackRecorder`.

Unfortunately we don't get as many debug stats with libddprof as we
had with the previous code; at some point we may want to take a stab
at exposing that info in libddprof.


          Fix potential memory leaks in error code paths for Stack collector

e4be125

Whenever something is dynamically allocated, we must ensure that either
no exceptions can be thrown or that we clean it up before throwing
an exception.


          Remove redundant object type checks

8f56e79

The `TypedData_Get_Struct` API already checks that a Ruby object has
the correct types, so there's no need to do it twice.


          Fix potential memory leak in error code path

28f84e0

Make sure to clean up the `serialized_profile` before calling into
`ruby_time_from`, in case any exceptions are raised.


          Add note about handling dynamic allocations

6d3850c


          Improve test coverage for stack collector

c6bd380


          Refactor spec for easier debugging

b625b24

With everything being cached, it's kinda hard to interactively
experiment inside e.g. a pry session.


          Add spec and fix mismatch with reference stack trace API when samplin…

e896694

…g frame from eval/instance_eval

See comments in `private_vm_api_acccess.c` for the gritty details.
The TL;DR is that we needed to add yet another change to our custom
`rb_profile_frames` in order to match what the Ruby reference
stack trace API returns.


          Avoid repetition when setting up background thread for stack collecto…

4a40e79

…r specs

This diff looks weirder because of the whitespace changes; I recommend
looking at it with withespace being ignored.


          Fix rubocop

2716fbe


          Add spec for eval with custom file/line

626a53e

No further changes needed to support this, but I want to add the
extra coverage for weird things so that in the future we avoid any
regressions.


          Import interesting_backtrace_helper.rb from backtracie gem and use …

7e1f891

…it for testing the stack collector

(As the creator and sole copyright owner of that file in the
`backtracie` gem, taken from
<https://github.com/ivoanjo/backtracie/blob/1f6116ff5b07e87fe3e5be95a22c6a8e1210de8b/spec/unit/interesting_backtrace_helper.rb>
I am relicensing the file and its copyrights to belong to Datadog
similarly to the rest of the dd-trace-rb codebase).

This file is used as a single mega-complex test for the stack
sampling code, and is quite useful to spot regressions.


          Add a few more weird code patterns to the interesting backtrace helper

5839db4


          Minor: Fix typo

ed39e37


          Move ready signal closer to sleep

6f5905e

What we're doing in these tests is inherently a race: we want the
thread to be stopped at the `sleep`, but have no way to atomically
signal the `ready_queue` and go to `sleep`, and thus we kinda
pray that Ruby just won't schedule us away between both operations.

This seems to work so far, but to avoid creating any flaky tests,
let's try to signal readyness as close as possible to calling `sleep`.

(Note that if this ever gets us into trouble and ends up creating
test flakiness, maybe we could implement a native method that
signals the queue AND calls sleep and ensures the Global VM
Lock is not released.)


          Use rb_memerror() when handing out of memory issues

85aa0b4

As pointed out by Marco in
<#2019 (comment)>
it's possible that when memory allocation fails, we can be in such
a state that casually raising an exception may not work at all.

Instead, let's use a function that Ruby exposes for this exact
situation -- `rb_memerror()` -- which is built to withstand such
memory issues.


          Fix "'static' is not at beginning of declaration" compiler warning

d10f903


          Remove explicit checks for null for return of xcalloc

68e4bc5

As pointed out by Marco during review
(<#2000 (comment)>)
the `xcalloc` function already tests for `NULL` and thus we don't need
to repeat the test.

To make it even more obvious that we're using the Ruby-provided
functions for managing memory, I've decided to directly call
`ruby_xcalloc` and `ruby_xfree` instead of their shorted aliases.


          Move globals "inside" a module to be easier to follow

ce1c971

Not that it makes any difference, because they are still globals.

ivoanjo force-pushed the ivoanjo/prof-4904-native-stack-collector-part3 branch from 98fe521 to ce1c971 Compare

May 23, 2022 09:49

Member Author

ivoanjo commented May 23, 2022

One last rebase to solve a last-minute conflict with #1966 and this PR is now on top of master!

Just need to chase down that CI failure relating to max_frames before merging :)


          Fix incorrect lifetime of stack-allocated char*

230ef27

The `frames_omitted_message` CAN be stack allocated but MUST live
as long as the call to `record_sample`.

Bizarrely enough, the previous approach accidentally worked on macOS,
which is why I had not spotted it before -- it worked, and tests
passed, so surely it was ok... right? Nope!

Member Author

ivoanjo commented May 24, 2022

CI is beautiful green, merging away!

ivoanjo merged commit 4e9061e into master

ivoanjo deleted the ivoanjo/prof-4904-native-stack-collector-part3 branch

May 24, 2022 09:41

github-actions bot added this to the 1.1.0 milestone

ivoanjo added a commit that referenced this pull request


          Remove unneeded NULL check

c6da78b

As pointed out by Marco during review
(<#2000 (comment)>)
the `xcalloc` function already tests for `NULL` and thus we don't need
to repeat the test.

To make it even more obvious that we're using the Ruby-provided
functions for managing memory, I've decided to directly call
`ruby_xcalloc` and `ruby_xfree` instead of their shorted aliases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet