Skip to content

linux: workaround to avoid deadlock inside dl_iterate_phdr in glibc #57035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jan 13, 2025

Extend the fix for #43578 on Darwin to also cover the same bug in Glibc (and just assume other libc have the same bug). We cannot use the same atfork trick, since the atfork implementation of this in Glibc makes this unsafe to use this after fork, just like Darwin (though for different basic concurrency mistakes in each of their respective codes).

Fix #57017

@vtjnash vtjnash added profiler backport 1.11 Change should be backported to release-1.11 labels Jan 13, 2025
Extend the fix for #43578 on Darwin to also cover the same bug in Glibc
(and just assume other libc have the same bug). We cannot use the same
atfork trick, since the atfork implementation of this in Glibc makes
this unsafe to use this after fork, just like Darwin (though for
different basic concurrency mistakes in each of their respective codes).

Fix #57017
@vtjnash vtjnash merged commit 1f05953 into master Jan 14, 2025
5 of 7 checks passed
@vtjnash vtjnash deleted the jn/57017 branch January 14, 2025 15:33
@giordano
Copy link
Member

Sounds like this introduced a deadlock elsewhere: #57058. FreeBSD tests timed out also on this branch: https://buildkite.com/julialang/julia-master/builds/43702#019461a2-569c-45b2-b720-8acf560eb139

@ararslan ararslan removed the backport 1.11 Change should be backported to release-1.11 label Jan 16, 2025
@ararslan
Copy link
Member

Removing the backport label as this shouldn't be backported as-is given that it introduces a regression

ararslan added a commit that referenced this pull request Jan 18, 2025
This is Jameson's proposed amendment to the changes made in #57035 that
introduced a deadlock on FreeBSD, amusingly in service of fixing a
deadlock on Linux.

On Linux and (non-macOS) BSD, instead of acquiring and releasing a lock
on the profiler in `jl_with_stackwalk_lock`, we're just blocking the
thread from receiving the profiler's `SIGUSR2` signal at all.

This should fix #57058; I don't get the deadlock locally on FreeBSD with
this change, but it's AArch64 instead of x86-64 like on CI, so let's see
if this also makes CI happy. If so, closes #57059.

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@StevenWhitaker StevenWhitaker mentioned this pull request Jan 20, 2025
@ararslan ararslan added the backport 1.11 Change should be backported to release-1.11 label Jan 31, 2025
@ararslan
Copy link
Member

Re-added the backport label, should only be backported in conjunction with #57089.

vtjnash added a commit that referenced this pull request Feb 3, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120
vtjnash added a commit that referenced this pull request Feb 3, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120
vtjnash added a commit that referenced this pull request Feb 4, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120
KristofferC pushed a commit that referenced this pull request Feb 6, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120

(cherry picked from commit 2f0a523)
KristofferC pushed a commit that referenced this pull request Feb 6, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120

(cherry picked from commit 2f0a523)
@KristofferC KristofferC mentioned this pull request Mar 11, 2025
71 tasks
@KristofferC KristofferC mentioned this pull request Apr 25, 2025
71 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.11 Change should be backported to release-1.11 profiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profiling hangs
3 participants