Skip to content

Conversation

@vtjnash
Copy link
Member

@vtjnash vtjnash commented Jan 13, 2025

Extend the fix for #43578 on Darwin to also cover the same bug in Glibc (and just assume other libc have the same bug). We cannot use the same atfork trick, since the atfork implementation of this in Glibc makes this unsafe to use this after fork, just like Darwin (though for different basic concurrency mistakes in each of their respective codes).

Fix #57017

@vtjnash vtjnash added profiler backport 1.11 Change should be backported to release-1.11 labels Jan 13, 2025
Extend the fix for #43578 on Darwin to also cover the same bug in Glibc
(and just assume other libc have the same bug). We cannot use the same
atfork trick, since the atfork implementation of this in Glibc makes
this unsafe to use this after fork, just like Darwin (though for
different basic concurrency mistakes in each of their respective codes).

Fix #57017
@vtjnash vtjnash merged commit 1f05953 into master Jan 14, 2025
5 of 7 checks passed
@vtjnash vtjnash deleted the jn/57017 branch January 14, 2025 15:33
@giordano
Copy link
Member

Sounds like this introduced a deadlock elsewhere: #57058. FreeBSD tests timed out also on this branch: https://buildkite.com/julialang/julia-master/builds/43702#019461a2-569c-45b2-b720-8acf560eb139

@ararslan ararslan removed the backport 1.11 Change should be backported to release-1.11 label Jan 16, 2025
@ararslan
Copy link
Member

Removing the backport label as this shouldn't be backported as-is given that it introduces a regression

ararslan added a commit that referenced this pull request Jan 18, 2025
This is Jameson's proposed amendment to the changes made in #57035 that
introduced a deadlock on FreeBSD, amusingly in service of fixing a
deadlock on Linux.

On Linux and (non-macOS) BSD, instead of acquiring and releasing a lock
on the profiler in `jl_with_stackwalk_lock`, we're just blocking the
thread from receiving the profiler's `SIGUSR2` signal at all.

This should fix #57058; I don't get the deadlock locally on FreeBSD with
this change, but it's AArch64 instead of x86-64 like on CI, so let's see
if this also makes CI happy. If so, closes #57059.

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@StevenWhitaker StevenWhitaker mentioned this pull request Jan 20, 2025
@ararslan ararslan added the backport 1.11 Change should be backported to release-1.11 label Jan 31, 2025
@ararslan
Copy link
Member

Re-added the backport label, should only be backported in conjunction with #57089.

vtjnash added a commit that referenced this pull request Feb 3, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120
vtjnash added a commit that referenced this pull request Feb 3, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120
vtjnash added a commit that referenced this pull request Feb 4, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120
KristofferC pushed a commit that referenced this pull request Feb 6, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120

(cherry picked from commit 2f0a523)
KristofferC pushed a commit that referenced this pull request Feb 6, 2025
Restores #57035, undo #57089 for non-FreeBSD. While I suggested doing
this change for all platforms, I forgot that means non-FreeBSD platforms
become vulnerable again to the very deadlock problems that #57035 was
required to prevent. That fix seems to not be viable on FreeBSD due to
known libc implementation problems on that platform. However, upon
closer inspection of the questionable design implementation decisions
they seem to have made here, the platform is likely not currently
vulnerable to this libunwind bug in the first place:
https://github.com/lattera/freebsd/blob/master/libexec/rtld-elf/rtld_lock.c#L120

(cherry picked from commit 2f0a523)
@KristofferC KristofferC mentioned this pull request Mar 11, 2025
71 tasks
@KristofferC KristofferC mentioned this pull request Apr 25, 2025
71 tasks
@KristofferC KristofferC mentioned this pull request Aug 19, 2025
65 tasks
DilumAluthge added a commit that referenced this pull request Sep 5, 2025
Backported PRs:
- [x] #54840 <!-- Add boundscheck in speccache_eq to avoid OOB access
due to data race -->
- [x] #42080 <!-- recommend explicit `using Foo: Foo, ...` in package
code (was: "using considered harmful") -->
- [x] #58127 <!-- [DOC] Update installation docs: /downloads/ =>
/install/ -->
- [x] #58202 <!-- [release-1.11] malloc: use jl_get_current_task to fix
null check -->
- [x] #58584 <!-- Make `Ptr` values static-show w/ type-information -->
- [x] #58637 <!-- Make late gc lower handle insertelement of alloca use.
-->
- [x] #58837 <!-- fix null comparisons for non-standard address spaces
-->
- [x] #57826 <!-- Add a `similar` method for `Type{<:CodeUnits}` -->
- [x] #58293 <!-- fix trailing indices stackoverflow in reinterpreted
array -->
- [x] #58887 <!-- Pkg: Allow configuring can_fancyprint(io::IO) using
IOContext -->
- [x] #58937 <!-- Fix nthreadpools size in JLOptions -->
- [x] #58978 <!-- Fix precompilepkgs warn loaded setting -->
- [x] #58998 <!-- Bugfix: Use Base.aligned_sizeof instead of sizeof in
Mmap.mmap -->
- [x] #59120 <!-- Fix memory order typo in "src/julia_atomics.h" -->
- [x] #59170 <!-- Clarify and enhance confusing precompile test -->

Need manual backport:
- [ ] #56329 <!-- loading: clean up more concurrency issues -->
- [ ] #56956 <!-- Add "mea culpa" to foreign module assignment error.
-->
- [ ] #57035 <!-- linux: workaround to avoid deadlock inside
dl_iterate_phdr in glibc -->
- [ ] #57089 <!-- Block thread from receiving profile signal with
stackwalk lock -->
- [ ] #57249 <!-- restore non-freebsd-unix fix for profiling -->
- [ ] #58011 <!-- Remove try-finally scope from `@time_imports`
`@trace_compile` `@trace_dispatch` -->
- [ ] #58062 <!-- remove unnecessary edge from `exp_impl` to `pow` -->
- [ ] #58157 <!-- add showing a string to REPL precompile workload -->
- [ ] #58209 <!-- Specialize `one` for the `SizedArray` test helper -->
- [ ] #58108 <!-- Base.get_extension & Dates.format made public -->
- [ ] #58356 <!-- codegen: remove readonly from abstract type calling
convention -->
- [ ] #58415 <!-- [REPL] more reliable extension loading -->
- [ ] #58510 <!-- Don't filter `Core` methods from newly-inferred list
-->
- [ ] #58110 <!-- relax dispatch for the `IteratorSize` method for
`Generator` -->
- [ ] #58965 <!-- Fix `hygienic-scope`s in inner macro expansions -->
- [ ] #58971 <!-- Fix alignment of failed precompile jobs on CI -->
- [ ] #59066 <!-- build: Also pass -fno-strict-aliasing for C++ -->

Contains multiple commits, manual intervention needed:
- [ ] #55877 <!-- fix FileWatching designs and add workaround for a stat
bug on Apple -->
- [ ] #56755 <!-- docs: fix scope type of a `struct` to hard -->
- [ ] #57809 <!-- Fix fptrunc Float64 -> Float16 rounding through
Float32 -->
- [ ] #57398 <!-- Make remaining float intrinsics require float
arguments -->
- [ ] #56351 <!-- Fix `--project=@script` when outside script directory
-->
- [ ] #57129 <!-- clarify that time_ns is monotonic -->
- [ ] #58134 <!-- Note annotated string API is experimental in Julia
1.11 in HISTORY.md -->
- [ ] #58401 <!-- check that hashing of types does not foreigncall
(`jl_type_hash` is concrete evaluated) -->
- [ ] #58435 <!-- Fix layout flags for types that have oddly sized
primitive type fields -->
- [ ] #58483 <!-- Fix tbaa usage when storing into heap allocated
immutable structs -->
- [ ] #58512 <!-- Make more types jl_static_show readably -->
- [ ] #58012 <!-- Re-enable tab completion of kwargs for large method
tables -->
- [ ] #58683 <!-- Add 0 predecessor to entry basic block and handle it
in inlining -->
- [ ] #59112 <!-- Add builtin function name to add methods error -->

Non-merged PRs with backport label:
- [ ] #59329 <!-- aotcompile: destroy LLVM context after serializing
combined module -->
- [ ] #58848 <!-- Set array size only when safe to do so -->
- [ ] #58535 <!-- gf.c: include const-return methods in
`--trace-compile` -->
- [ ] #58038 <!-- strings/cstring: `transcode`: prevent Windows sysimage
invalidation -->
- [ ] #57604 <!-- `@nospecialize` for `string_index_err` -->
- [ ] #57366 <!-- Use ptrdiff_t sized offsets for gvars_offsets to allow
large sysimages -->
- [ ] #56890 <!-- Enable getting non-boxed LLVM type from Julia Type -->
- [ ] #56823 <!-- Make version of opaque closure constructor in world
-->
- [ ] #55958 <!-- also redirect JL_STDERR etc. when redirecting to
devnull -->
- [ ] #55956 <!-- Make threadcall gc safe -->
- [ ] #55534 <!-- Set stdlib sources as read-only during installation
-->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->

---------

Co-authored-by: Kiran Pamnany <kpamnany@users.noreply.github.com>
Co-authored-by: adienes <51664769+adienes@users.noreply.github.com>
Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
Co-authored-by: Keno Fischer <keno@juliacomputing.com>
Co-authored-by: Simeon David Schaub <simeon@schaub.rocks>
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Alex Arslan <ararslan@comcast.net>
Co-authored-by: Fons van der Plas <fonsvdplas@gmail.com>
Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
Co-authored-by: JonasIsensee <jonas.isensee@web.de>
Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Co-authored-by: DilumAluthgeBot <43731525+DilumAluthgeBot@users.noreply.github.com>
Co-authored-by: DilumAluthge <5619885+DilumAluthge@users.noreply.github.com>
@DilumAluthge DilumAluthge mentioned this pull request Sep 9, 2025
59 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 1.11 Change should be backported to release-1.11 profiler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Profiling hangs

3 participants