-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NO-TICKET] Check extension dir as well when loading profiler #3582
Conversation
**What does this PR do?** This PR tweaks our existing logic for loading the profiling native extension to try to find the file in two directories: 1. The lib/ directory 2. The Ruby extensions directory Previously we would only check 1. **Motivation:** I'll admit I don't 100% understand when one or the other folder gets used. For instance, on a fresh install of `ddtrace` on my machine I get ``` $ bundle exec gem contents ddtrace | grep so$ .rvm/gems/ruby-3.1.4/gems/ddtrace-1.21.1/lib/datadog_profiling_loader.3.1.4_x86_64-linux.so .rvm/gems/ruby-3.1.4/gems/ddtrace-1.21.1/lib/datadog_profiling_native_extension.3.1.4_x86_64-linux.so ``` (this is 1., above) but also... ``` $ find `bundle exec ruby -e "require 'ddtrace'; puts Gem::loaded_specs['ddtrace'].extension_dir"` | grep so$ .rvm/gems/ruby-3.1.4/extensions/x86_64-linux/3.1.0/ddtrace-1.21.1/datadog_profiling_native_extension.3.1.4_x86_64-linux.so .rvm/gems/ruby-3.1.4/extensions/x86_64-linux/3.1.0/ddtrace-1.21.1/datadog_profiling_loader.3.1.4_x86_64-linux.so ``` (this is 2., above) (Aside: These are not leftovers -- if I uninstall the gem, the files get removed, and get recreated when I install them.) So, on my machine, the files get installed to two different folders. And for most of our customers, I'm pretty sure they get installed into 1. because otherwise the profiler won't work. But one customer reported the following error: ``` Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load ddtrace_profiling_native_extension.3.2.2_x86_64-linux due to /var/app/current/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.20.0/lib/datadog/profiling/../../ddtrace_profiling_native_extension.3.2.2_x86_64-linux.so: cannot open shared object file: No such file or directory' at '/var/app/current/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.20.0/lib/datadog/profiling/load_native_extension.rb:26:in `<top (required)>'' ``` Now my first intuition was to suspect -- something is wrong, the native extension didn't get built. But, actually, that's not the case, and here's a very important detail that took me a while to fully understand: the error message says that the **ddtrace_profiling_native_extension.3.2.2_x86_64-linux.so** was not found, not the **datadog_profiling_loader.3.2.2_x86_64-linux.so**. This distinction is important, because it's the loader that attempts to load the extension. So if the loader didn't fail first, it means the loader **was found**; but the extension **wasn't**. With a bit more input from the customer, and some testing on my side, it looks like Ruby puts both folders above (1. and 2.) in the `$LOAD_PATH`, so when using `require` to load a file, both get checked. And for this one customer, the extension only existed in folder 2., not 1. (even after reinstalling the gem). Since we use a regular `require` to load the loader, it was still found, regardless of where it was. BUT because for the extension we try to load by using a specific path, and that path only checked lib (1.), then the extension failed to load. This is why the customer saw an error message that pointed to the loader working, but the extension not being found. Now because our previous logic of only checking folder 1. worked for all other customers we have, I'm not really sure why this one customer had the extension only in folder 2., but not in 1. . But regardless, having either folder in use is something that Ruby does, so I think it's reasonable for us to emulate that logic. **Additional Notes:** If you're curious why we are reimplementing Ruby logic for loading extensions, check the comments on `datadog_profiling_loader.c`. TL;DR is, we want to load the library with specific system flags, and Ruby doesn't provide a way to do so, so we need to basically duplicate a bunch of Ruby's library loading logic ourselves to be able to emulate Ruby's behavior but change the system flags. **How to test the change?** I was able to reproduce the issue locally by installing ddtrace, and then manually deleting the files in 1., leaving them only in 2. Without this change, I saw exactly the same error message as the customer did; with this change, profiling works. On top of the added specs, I manually tested this fix on Ruby versions 2.3, 3.1 and 3.3.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3582 +/- ##
==========================================
- Coverage 98.24% 98.24% -0.01%
==========================================
Files 1254 1255 +1
Lines 74355 74402 +47
Branches 3529 3536 +7
==========================================
+ Hits 73052 73098 +46
- Misses 1303 1304 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love these weird behaviours! LGTM!
…ension dir **What does this PR do?** This PR is a follow-up to #3582 . In that PR, we fixed loading the profiling native extension so that it could be loaded from the Ruby extensions directory (see the original PR for more details). It turns out this was not enough! Specifically, the customer reported that they saw the following error > Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling > native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux > due to libdatadog_profiling.so: cannot open shared object file: No such file or directory Specifically, what this message tells is that we're finding the profiling native extension BUT it's failing to load BECAUSE the dynamic loader is not able to find its `libdatadog_profiling.so` dependency. From debugging the issue with the customer, I suspect that what we're seeing here is a repeat of #2067 / #2125 , that is, the paths where the profiler is compiled are changed at deployment, and so we also need to adjust the relative rpath to account for this. I haven't yet confirmed with the customer that this is their issue, BUT I was able to reproduce the exact problem if I moved the installation of the library in the way I mention above (see "how to test the change", below). **Motivation:** Fix this weird corner case that made the profiler not load. **Additional Notes:** This is a really really weird corner case, so I'm happy to further describe what the issue is if my description above + the comments in the code are still too cryptic to understand. **How to test the change?** I've added test code for the helper, but actually validating the whole rpath thing is a bit annoying. Here's how I triggered the issue myself, and then used it to validate the fix: ``` # Build fixed gem into folder, will be used later $ bundle exec rake build datadog 2.0.0.rc1 built to pkg/datadog-2.0.0.rc1.gem. # Open a clean Ruby docker installation $ docker run --network=host -ti -v `pwd`:/working ruby:3.2.2-bookworm /bin/bash # I've created a minimal test gemfile ahead of time /working/rpathtest# cat gems.rb source 'https://rubygems.org' gem 'datadog' # Tell bundler to install the gem into a folder /working/rpathtest# bundle config set --local path 'vendor/bundle' /working/rpathtest# bundle install # Confirm profiler works: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # Now let's simulate the native extension being loaded from the # extensions directory: /working/rpathtest# find | grep \.so$ | grep datadog ./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_loader.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so ./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so /working/rpathtest# rm ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so # Confirm profiler still works: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # Now let's simulate the folders being moved (the issue being fixed): /working/rpathtest# cat /usr/local/bundle/config --- BUNDLE_PATH: "vendor/bundle" # Update this to vendor2... working/rpathtest# cat /usr/local/bundle/config --- BUNDLE_PATH: "vendor2/bundle" # and move the folder /working/rpathtest# mv vendor/ vendor2 # Now we've triggered the exact same error message as reported by the # customer /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" W, [2024-06-05T15:51:12.488843 #517] WARN -- datadog: [datadog] Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux due to libdatadog_profiling.so: cannot open shared object file: No such file or directory' at '/working/rpathtest/vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog/profiling/load_native_extension.rb:41:in `<top (required)>'' # Now let's test the fix. Let's start by recreating the issue: # Put the fixed version into the bundler cache... /working/rpathtest# cp /working/pkg/datadog-2.0.0.rc1.gem vendor2/bundle/ruby/3.2.0/cache/datadog-2.0.0.rc1.gem # force bundler to reinstall... working/rpathtest# rm -rf vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/ working/rpathtest# bundle install # Force gem to be loaded from extension directory /working/rpathtest# rm ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so # Confirm it works: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # Let's now change the vendor folder again: /working/rpathtest# cat /usr/local/bundle/config --- BUNDLE_PATH: "vendor3/bundle" /working/rpathtest# mv vendor2/ vendor3 # And it now doesn't fail: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # And extra confirmation that the relative paths are working: /working/rpathtest# ldd ./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so libdatadog_profiling.so => /working/rpathtest/./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/../../../../gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so (0x00007ff127c00000) ```
…ension dir **What does this PR do?** This PR is a follow-up to #3582 . In that PR, we fixed loading the profiling native extension so that it could be loaded from the Ruby extensions directory (see the original PR for more details). It turns out this was not enough! Specifically, the customer reported that they saw the following error > Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling > native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux > due to libdatadog_profiling.so: cannot open shared object file: No such file or directory Specifically, what this message tells is that we're finding the profiling native extension BUT it's failing to load BECAUSE the dynamic loader is not able to find its `libdatadog_profiling.so` dependency. From debugging the issue with the customer, I suspect that what we're seeing here is a repeat of #2067 / #2125 , that is, the paths where the profiler is compiled are changed at deployment, and so we also need to adjust the relative rpath to account for this. I haven't yet confirmed with the customer that this is their issue, BUT I was able to reproduce the exact problem if I moved the installation of the library in the way I mention above (see "how to test the change", below). **Motivation:** Fix this weird corner case that made the profiler not load. **Additional Notes:** This is a really really weird corner case, so I'm happy to further describe what the issue is if my description above + the comments in the code are still too cryptic to understand. **How to test the change?** I've added test code for the helper, but actually validating the whole rpath thing is a bit annoying. Here's how I triggered the issue myself, and then used it to validate the fix: ``` # Build fixed gem into folder, will be used later $ bundle exec rake build datadog 2.0.0.rc1 built to pkg/datadog-2.0.0.rc1.gem. # Open a clean Ruby docker installation $ docker run --network=host -ti -v `pwd`:/working ruby:3.2.2-bookworm /bin/bash # I've created a minimal test gemfile ahead of time /working/rpathtest# cat gems.rb source 'https://rubygems.org' gem 'datadog' # Tell bundler to install the gem into a folder /working/rpathtest# bundle config set --local path 'vendor/bundle' /working/rpathtest# bundle install # Confirm profiler works: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # Now let's simulate the native extension being loaded from the # extensions directory: /working/rpathtest# find | grep \.so$ | grep datadog ./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_loader.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so ./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so /working/rpathtest# rm ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so # Confirm profiler still works: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # Now let's simulate the folders being moved (the issue being fixed): /working/rpathtest# cat /usr/local/bundle/config --- BUNDLE_PATH: "vendor/bundle" # Update this to vendor2... working/rpathtest# cat /usr/local/bundle/config --- BUNDLE_PATH: "vendor2/bundle" # and move the folder /working/rpathtest# mv vendor/ vendor2 # Now we've triggered the exact same error message as reported by the # customer /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" W, [2024-06-05T15:51:12.488843 #517] WARN -- datadog: [datadog] Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux due to libdatadog_profiling.so: cannot open shared object file: No such file or directory' at '/working/rpathtest/vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog/profiling/load_native_extension.rb:41:in `<top (required)>'' # Now let's test the fix. Let's start by recreating the issue: # Put the fixed version into the bundler cache... /working/rpathtest# cp /working/pkg/datadog-2.0.0.rc1.gem vendor2/bundle/ruby/3.2.0/cache/datadog-2.0.0.rc1.gem # force bundler to reinstall... working/rpathtest# rm -rf vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/ working/rpathtest# bundle install # Force gem to be loaded from extension directory /working/rpathtest# rm ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so # Confirm it works: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # Let's now change the vendor folder again: /working/rpathtest# cat /usr/local/bundle/config --- BUNDLE_PATH: "vendor3/bundle" /working/rpathtest# mv vendor2/ vendor3 # And it now doesn't fail: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # And extra confirmation that the relative paths are working: /working/rpathtest# ldd ./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so libdatadog_profiling.so => /working/rpathtest/./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/../../../../gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so (0x00007ff127c00000) ```
@@ -18,7 +18,20 @@ | |||
end | |||
|
|||
extension_name = "datadog_profiling_native_extension.#{RUBY_VERSION}_#{RUBY_PLATFORM}" | |||
full_file_path = "#{__dir__}/../../#{extension_name}.#{RbConfig::CONFIG['DLEXT']}" | |||
file_name = "#{extension_name}.#{RbConfig::CONFIG['DLEXT']}" | |||
full_file_path = "#{__dir__}/../../#{file_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not use File.join
for best results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed... The profiler really only targets Linux/Unix right now, so a number of linux/unixisms have crept in.
The native extension loader that's getting called using this path, for instance, uses dlopen
, so that's another thing that wouldn't work.
So yeah, I'm not sure it's worth trying to piecemeal support all OSs for the profiler in preparation for maybe supporting them in the future, I think it's easier to evaluate the whole thing once we want that (e.g. for instance not using the load_native_extension.rb
on Windows at all).
…d relative rpath is needed **What does this PR do?** This PR adds a new test case that validates that DataDog/dd-trace-rb#3582 and DataDog/dd-trace-rb#3683 keep working fine. **Motivation:** As described in DataDog/dd-trace-rb#3683, this a somewhat annoying thing to test, but important to avoid regressing. **Additional Notes:** You can actually see the evolution of both of those fixes in this test. E.g. here's dd-trace-rb 1.21.0 (prior to DataDog/dd-trace-rb#3582 ): ``` W, [2024-06-12T09:34:08.759519 #7] WARN -- ddtrace: [ddtrace] (/app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.21.1/lib/datadog/core/configuration/components.rb:115:in `startup!') Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.3.2_x86_64-linux due to /app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.21.1/lib/datadog/profiling/../../datadog_profiling_native_extension.3.3.2_x86_64-linux.so: cannot open shared object file: No such file or directory' at '/app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.21.1/lib/datadog/profiling/load_native_extension.rb:26:in `<top (required)>'' --- FAIL: TestScenarios/scenarios/ruby_extension_dir_and_rpath (14.86s) ``` in this version, we failed because we couldn't load the native extension. Then here's dd-trace-rb 1.23.1 (without DataDog/dd-trace-rb#3683 ) and if we don't move the `vendor` folder (but still delete the so from the lib folder): ``` --- PASS: TestScenarios/scenarios/ruby_extension_dir_and_rpath (18.96s) ``` ...but if we additionally move the vendor folder (aka what this PR does in the Dockerfile): ``` W, [2024-06-12T09:37:33.517188 #6] WARN -- ddtrace: [ddtrace] (/app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.23.1/lib/datadog/core/configuration/components.rb:116:in `startup!') Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.3.2_x86_64-linux due to libdatadog_profiling.so: cannot open shared object file: No such file or directory' at '/app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.23.1/lib/datadog/profiling/load_native_extension.rb:39:in `<top (required)>'' --- FAIL: TestScenarios/scenarios/ruby_extension_dir_and_rpath (3.25s) ``` Notice it fails BUT the error is now different from the one above -- the error is relating to loading `libdatadog_profiling.so`, not `datadog_profiling_native_extension.3.3.2_x86_64-linux.so`. And with the change in DataDog/dd-trace-rb#3683 (which will be in 1.23.2): ``` --- PASS: TestScenarios/scenarios/ruby_extension_dir_and_rpath (9.60s) ``` **NOTE**: For this test, unlike other Ruby tests we have, we're pulling in the latest **released** gem version (e.g. with `gem 'datadog'` on the `gems.rb` file), not the latest from git (as we do for other Ruby tests). This is because gems get installed in different paths when bundler downloads them directly from git, and we want to validate the path when a stable version is installed. This also means that this PR will show up as failed until the latest datadog release (which will be 2.2.0) gets released. (Or 1.23.2, but I left the test setup to test the latest 2.x releases, not the 1.x ones, although I used 1.x on my tests above to show the evolution of the issue).
…ension dir **What does this PR do?** This PR is a follow-up to #3582 . In that PR, we fixed loading the profiling native extension so that it could be loaded from the Ruby extensions directory (see the original PR for more details). It turns out this was not enough! Specifically, the customer reported that they saw the following error > Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling > native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux > due to libdatadog_profiling.so: cannot open shared object file: No such file or directory Specifically, what this message tells is that we're finding the profiling native extension BUT it's failing to load BECAUSE the dynamic loader is not able to find its `libdatadog_profiling.so` dependency. From debugging the issue with the customer, I suspect that what we're seeing here is a repeat of #2067 / #2125 , that is, the paths where the profiler is compiled are changed at deployment, and so we also need to adjust the relative rpath to account for this. I haven't yet confirmed with the customer that this is their issue, BUT I was able to reproduce the exact problem if I moved the installation of the library in the way I mention above (see "how to test the change", below). **Motivation:** Fix this weird corner case that made the profiler not load. **Additional Notes:** This is a really really weird corner case, so I'm happy to further describe what the issue is if my description above + the comments in the code are still too cryptic to understand. **How to test the change?** I've added test code for the helper, but actually validating the whole rpath thing is a bit annoying. Here's how I triggered the issue myself, and then used it to validate the fix: ``` # Build fixed gem into folder, will be used later $ bundle exec rake build datadog 2.0.0.rc1 built to pkg/datadog-2.0.0.rc1.gem. # Open a clean Ruby docker installation $ docker run --network=host -ti -v `pwd`:/working ruby:3.2.2-bookworm /bin/bash # I've created a minimal test gemfile ahead of time /working/rpathtest# cat gems.rb source 'https://rubygems.org' gem 'datadog' # Tell bundler to install the gem into a folder /working/rpathtest# bundle config set --local path 'vendor/bundle' /working/rpathtest# bundle install # Confirm profiler works: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # Now let's simulate the native extension being loaded from the # extensions directory: /working/rpathtest# find | grep \.so$ | grep datadog ./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_loader.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so ./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so /working/rpathtest# rm ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so # Confirm profiler still works: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # Now let's simulate the folders being moved (the issue being fixed): /working/rpathtest# cat /usr/local/bundle/config --- BUNDLE_PATH: "vendor/bundle" # Update this to vendor2... working/rpathtest# cat /usr/local/bundle/config --- BUNDLE_PATH: "vendor2/bundle" # and move the folder /working/rpathtest# mv vendor/ vendor2 # Now we've triggered the exact same error message as reported by the # customer /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" W, [2024-06-05T15:51:12.488843 #517] WARN -- datadog: [datadog] Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux due to libdatadog_profiling.so: cannot open shared object file: No such file or directory' at '/working/rpathtest/vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog/profiling/load_native_extension.rb:41:in `<top (required)>'' # Now let's test the fix. Let's start by recreating the issue: # Put the fixed version into the bundler cache... /working/rpathtest# cp /working/pkg/datadog-2.0.0.rc1.gem vendor2/bundle/ruby/3.2.0/cache/datadog-2.0.0.rc1.gem # force bundler to reinstall... working/rpathtest# rm -rf vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/ working/rpathtest# bundle install # Force gem to be loaded from extension directory /working/rpathtest# rm ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so # Confirm it works: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # Let's now change the vendor folder again: /working/rpathtest# cat /usr/local/bundle/config --- BUNDLE_PATH: "vendor3/bundle" /working/rpathtest# mv vendor2/ vendor3 # And it now doesn't fail: /working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1" # ... No errors loading profiler ... # And extra confirmation that the relative paths are working: /working/rpathtest# ldd ./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so libdatadog_profiling.so => /working/rpathtest/./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/../../../../gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so (0x00007ff127c00000) ```
What does this PR do?
This PR tweaks our existing logic for loading the profiling native extension to try to find the file in two directories:
Previously we would only check 1.
Motivation:
I'll admit I don't 100% understand when one or the other folder gets used.
For instance, on a fresh install of
ddtrace
on my machine I get(this is 1., above)
but also...
(this is 2., above)
(Aside: These are not leftovers -- if I uninstall the gem, the files get removed, and get recreated when I install them.)
So, on my machine, the files get installed to two different folders.
And for most of our customers, I'm pretty sure they get installed into 1. because otherwise the profiler won't work.
But one customer reported the following error:
Now my first intuition was to suspect -- something is wrong, the native extension didn't get built. But, actually, that's not the case, and here's a very important detail that took me a while to fully understand: the error message says that the
ddtrace_profiling_native_extension.3.2.2_x86_64-linux.so was not found, not the datadog_profiling_loader.3.2.2_x86_64-linux.so.
This distinction is important, because it's the loader that attempts to load the extension. So if the loader didn't fail first, it means the loader was found; but the extension wasn't.
With a bit more input from the customer, and some testing on my side, it looks like Ruby puts both folders above (1. and 2.) in the
$LOAD_PATH
, so when usingrequire
to load a file, both get checked.And for this one customer, the extension only existed in folder 2., not 1. (even after reinstalling the gem).
Since we use a regular
require
to load the loader, it was still found, regardless of where it was. BUT because for the extension we try to load by using a specific path, and that path only checked lib (1.), then the extension failed to load.This is why the customer saw an error message that pointed to the loader working, but the extension not being found.
Now because our previous logic of only checking folder 1. worked for all other customers we have, I'm not really sure why this one customer had the extension only in folder 2., but not in 1. . But regardless, having either folder in use is something that Ruby does, so I think it's reasonable for us to emulate that logic.
Additional Notes:
If you're curious why we are reimplementing Ruby logic for loading extensions, check the comments on
datadog_profiling_loader.c
.TL;DR is, we want to load the library with specific system flags, and Ruby doesn't provide a way to do so, so we need to basically duplicate a bunch of Ruby's library loading logic ourselves to be able to emulate Ruby's behavior but change the system flags.
How to test the change?
I was able to reproduce the issue locally by installing ddtrace, and then manually deleting the files in 1., leaving them only in 2.
Without this change, I saw exactly the same error message as the customer did; with this change, profiling works.
On top of the added specs, I manually tested this fix on Ruby versions 2.3, 3.1 and 3.3.