Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-9342] Introduce profiler workaround for Ruby Dir interruption bug #3720
[PROF-9342] Introduce profiler workaround for Ruby Dir interruption bug #3720
Changes from 16 commits
b03696e
aeb2db6
88f1164
3c51234
1721270
42c40b2
25a96a1
57e64bf
0e80c77
577fdb0
43a3e5e
9152470
b30ca79
d6eebe9
30601a2
9ad82fe
8099c94
ba0d6b5
a6366f9
3501237
533eeca
259e134
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be more functionally named? ("what" instead of "how")
WDYT of
dir_signal_masking
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with this name because I know it will show up in the profiling flamegraph for customers.
My thinking is that it's clearer if you see "oh, why is there a datadog thing here? ah, it's monkey patching my uses of dir". Signal masking makes me think a bit more along hte lines of "what is this signal masking thing? what is masking anyway?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, interesting! I was thinking that if I saw that in a stack trace or something I would wonder "why is Datadog monkey patching dir stuff at all?". In a way "monkey patching" is how but not what/why. In a way it's also redundant with
ext
, the sort of conventional way to say "this folder contains things that extend stuff".dir_signal_fix
maybe?But at this stage that's becoming bikeshedding and certainly not a blocker. Feel free to move forward.
Adding links to upstream issues in the comment at the top of the file would help too I think (I reread it and with only the Datadog issue link it makes it like this is a Datadog issue when it's really an upstream issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the joke about naming things being hard applies here >_>. I suspect/fear just relying on
ext/
may be too subtle (e.g. when you're looking a flamegraph), but am open to changing if folks get confused about the current name.+1 I've added a link to the upstream ticket in 259e134.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be in an
ensure
as well?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I don't think so...? It's ok to deliver signals during exception handling, so if an exception gets raised in the client code, it should be fine for the profiler to continue operating from them on, since no more directory calls will be made by Ruby.
Or do you mean making sure that
_native_hold_interruptions
never raises or something similar?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't know, hence why I ask ;)
I'm basing myself on the following outline:
On the super happy path the execution order is
[A+] [B-] [B+] [A-]
.But if
yield
raises it becomes[A+] [B-] [A-]
, which in my mind triggers some "consistency" alarms and makes me wonder about internal details:_native_resume_signals
is called twice in a row? is it a NOOP? is it doing work twice (e.g invoking a syscall) but idempotent?yield
(because of that[B+]
) and before method exit ([A-]
) so I presume that work is insuper
, but ifyield
raises there may be work to protect too in a hypotheticalensure
block inside thatsuper
too? Or is[B+]
entirely superfluous even in the happyyield
case?Conversely, imagine an abstraction like so:
And the code becoming:
The above is not even a suggestion it's merely to materialise and compare the kind of structural "consistency".
It could simply instead just need a comment to prevent the mind tripwire or help someone who would refactor this code in the future for some reason.
But really, I'd say it's fine to merge if you're confident and my concerns are unwarranted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for explaining!
So, in parts:
yield
raises. Specifically:We can call
_native_resume_signals
many times in a row.As you suspected, it leads to
pthread_sigmask
which I believe on Linux will lead to thert_sigprocmask
system call being called.I've included a benchmark in the PR of what the cost of this is, and here's a run I did:
...so I think we're fine if we end up calling it a few more times than strictly needed.
The root of the issue in the Ruby VM happens because in the affected APIs Ruby silently interprets an interruption as "end of folder/empty folder". When an exception is being raised, the iteration will stop anyway, so there's no longer a concern of the interruption causing a semantic mismatch (at least that I can see).
I've pushed 259e134 to add this note to the code as in hindsight I agree it was not very clear before (and hopefully this clarifies it)