Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEBUG-2334 Dynamic Instrumentation code tracker component #3942

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

p-datadog
Copy link
Contributor

What does this PR do?

Adds the code tracker component. This is responsible for tracking the mapping from source file path to RubyVM::InstructionSequence object used for setting targeted trace points.

Motivation:
Efficient instrumentation of lines.

Additional Notes:
There will be further functionality added to CodeTracker later to instrument loaded code (requires the instrumentation component that hasn't been PR'ed yet).

How to test the change?
Unit tests at this time.

Unsure? Have a question? Request a review!

@p-datadog p-datadog requested a review from a team as a code owner September 24, 2024 13:51
# path separator to use.
if path.length > suffix.length && (
path[path.length - suffix.length - 1] == "/" ||
suffix[0] == "/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Suggested change
suffix[0] == "/"
suffix.first == "/"
Improve readability with first (...read more)

This rule encourages the use of first and last methods over array indexing to access the first and last elements of an array, respectively. The primary reason behind this rule is to improve code readability. Using first and last makes it immediately clear that you are accessing the first or last element of the array, which might not be immediately obvious with array indexing, especially for developers who are new to Ruby.

The use of these methods also helps to make your code more idiomatic, which is a crucial aspect of writing effective Ruby code. Idiomatic code is easier to read, understand, and maintain. It also tends to be more efficient, as idioms often reflect patterns that are optimized for the language.

To adhere to this rule, replace the use of array indexing with first or last methods when you want to access the first and last elements of an array. For instance, instead of arr[0] use arr.first and instead of arr[-1] use arr.last. However, note that this rule should be applied only when reading values. When modifying the first or last elements, array indexing should still be used. For example, arr[0] = 'new_value' and arr[-1] = 'new_value'.

View in Datadog  Leave us feedback  Documentation


path = tracker.send(:registry).each.to_a.first.first
# The path in the registry should be absolute.
expect(path[0]).to eq "/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Suggested change
expect(path[0]).to eq "/"
expect(path.first).to eq "/"
Improve readability with first (...read more)

This rule encourages the use of first and last methods over array indexing to access the first and last elements of an array, respectively. The primary reason behind this rule is to improve code readability. Using first and last makes it immediately clear that you are accessing the first or last element of the array, which might not be immediately obvious with array indexing, especially for developers who are new to Ruby.

The use of these methods also helps to make your code more idiomatic, which is a crucial aspect of writing effective Ruby code. Idiomatic code is easier to read, understand, and maintain. It also tends to be more efficient, as idioms often reflect patterns that are optimized for the language.

To adhere to this rule, replace the use of array indexing with first or last methods when you want to access the first and last elements of an array. For instance, instead of arr[0] use arr.first and instead of arr[-1] use arr.last. However, note that this rule should be applied only when reading values. When modifying the first or last elements, array indexing should still be used. For example, arr[0] = 'new_value' and arr[-1] = 'new_value'.

View in Datadog  Leave us feedback  Documentation

@pr-commenter
Copy link

pr-commenter bot commented Sep 24, 2024

Benchmarks

Benchmark execution time: 2024-09-27 19:09:03

Comparing candidate commit ff0d80c in PR branch di-code-tracker with baseline commit 69ac3b8 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

# path separator to use.
if path.length > suffix.length && (
path[path.length - suffix.length - 1] == "/" ||
suffix[0] == "/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Suggested change
suffix[0] == "/"
suffix.first == "/"
Improve readability with first (...read more)

This rule encourages the use of first and last methods over array indexing to access the first and last elements of an array, respectively. The primary reason behind this rule is to improve code readability. Using first and last makes it immediately clear that you are accessing the first or last element of the array, which might not be immediately obvious with array indexing, especially for developers who are new to Ruby.

The use of these methods also helps to make your code more idiomatic, which is a crucial aspect of writing effective Ruby code. Idiomatic code is easier to read, understand, and maintain. It also tends to be more efficient, as idioms often reflect patterns that are optimized for the language.

To adhere to this rule, replace the use of array indexing with first or last methods when you want to access the first and last elements of an array. For instance, instead of arr[0] use arr.first and instead of arr[-1] use arr.last. However, note that this rule should be applied only when reading values. When modifying the first or last elements, array indexing should still be used. For example, arr[0] = 'new_value' and arr[-1] = 'new_value'.

View in Datadog  Leave us feedback  Documentation

lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved
lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved
@registry_lock = Hash.new
end

def start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this method is not called at a high frequency (which it doesn't seem like it is), I believe to make it truly tread safe we have to wrap the whole execution in a mutex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have a lock taken for all reads and writes of instance variables and I believe I account for e.g. state changing during constructor execution. Can you elaborate on what situation you think is not properly covered by existing locking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Marco has a great point -- the current code is racy but seems correct -- e.g. multiple threads may see active? == false, and then create the tracepoint,. but then "walk it back" inside the lock.

But initialization is probably not performance-sensitive, so rather than creating a bunch of code that needs to be reasoned about every time we need to touch it, initializing the whole thing under a lock seems to be both safer and not particularly slower?

@trace_point_lock = Mutex.new
@registry_lock = Hash.new
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding, even if a very simple, method level documentation to the start method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i will type up a comment.

spec/datadog/di/code_tracker_spec.rb Show resolved Hide resolved
Copy link

@Strech Strech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a small feedback, but nothing critical to say

Comment on lines +8 to +12
describe ".new" do
it "creates an instance" do
expect(tracker).to be_a(described_class)
end
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for asking, but what a point of this test? Technically we are testing here that Ruby is going to instantiate a class if we call new, but do we really challenge it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test only exercises the constructor. If you object to it I can remove it.

spec/datadog/di/code_tracker_spec.rb Outdated Show resolved Hide resolved
spec/datadog/di/code_tracker_spec.rb Outdated Show resolved Hide resolved
lib/datadog/di/code_tracker.rb Outdated Show resolved Hide resolved
@registry_lock = Hash.new
end

def start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Marco has a great point -- the current code is racy but seems correct -- e.g. multiple threads may see active? == false, and then create the tracepoint,. but then "walk it back" inside the lock.

But initialization is probably not performance-sensitive, so rather than creating a bunch of code that needs to be reasoned about every time we need to touch it, initializing the whole thing under a lock seems to be both safer and not particularly slower?

Comment on lines 35 to 49
compiled_trace_point = TracePoint.trace(:script_compiled) do |tp|
# Useful attributes of the trace point object here:
# .instruction_sequence
# .method_id
# .path (refers to the code location that called the require/eval/etc.,
# not where the loaded code is; use .path on the instruction sequence
# to obtain the location of the compiled code)
# .eval_script
#
# For now just map the path to the instruction sequence.
path = tp.instruction_sequence.path
registry_lock.synchronize do
registry[path] = tp.instruction_sequence
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does script_compiled get emitted for each individual method in a file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should be emitted once per file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, in that case, how do we know the correct iseq to target, if 1 file has N iseqs? (Or I may be misunderstanding how this works?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one iseq per file.

# disable our trace point and do nothing.
if @compiled_trace_point
# Disable the local variable, leave instance variable as it is.
compiled_trace_point.disable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it me or was the tracepoint was not enabled before we disable it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TracePoint.trace enables the trace point. .new does not enable. I agree it can be confusing at first glance.

Comment on lines +110 to +112
if path.length > suffix.length && (
path[path.length - suffix.length - 1] == "/" ||
suffix[0] == "/"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask – can we improve split the if to improve the readability and maybe use variable to give a name to our intents? We anyway is inside the mutex section and performance is probably not highly impacted by extra branching? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you are asking. The logic of this code fragment is explained in the comment above. What are you suggesting to split exactly?

p-datadog and others added 10 commits September 25, 2024 10:56
Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>
Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>
Co-authored-by: datadog-datadog-prod-us1[bot] <88084959+datadog-datadog-prod-us1[bot]@users.noreply.github.com>
Co-authored-by: datadog-datadog-prod-us1[bot] <88084959+datadog-datadog-prod-us1[bot]@users.noreply.github.com>
Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>
Comment on lines +110 to +115
if path.length > suffix.length && (
path[path.length - suffix.length - 1] == "/" ||
suffix[0] == "/"
) && path.end_with?(suffix)
inexact << iseq
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like this. Not sure where do we stand in "comments vs code", but maybe this may be it?

Suggested change
if path.length > suffix.length && (
path[path.length - suffix.length - 1] == "/" ||
suffix[0] == "/"
) && path.end_with?(suffix)
inexact << iseq
end
if path.length > suffix.length
previous_char = path[path.length - suffix.length - 1]
inexact << iseq if previous_char == "/" && path.end_with?(suffix)
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like you lost the suffix[0] == "/" condition along the way and I personally find your proposed version more difficult to follow because it lacks symmetry where there is symmetry in the logic and inverts cause and effect with the postfix if, thus also putting the action in the middle of the conditions, but I'm happy to commit a rendering you are happy with as long as it is equivalent functionally to what is currently in the diff.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 97.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 97.87%. Comparing base (74363cc) to head (ff0d80c).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
lib/datadog/di/code_tracker.rb 91.89% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3942      +/-   ##
==========================================
+ Coverage   97.85%   97.87%   +0.01%     
==========================================
  Files        1305     1311       +6     
  Lines       78224    78324     +100     
  Branches     3887     3893       +6     
==========================================
+ Hits        76549    76658     +109     
+ Misses       1675     1666       -9     
Flag Coverage Δ
97.87% <97.14%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants