Description
Hi,
There is a bug open in Debian related to gimp 2.10.2 and openblas 3.2: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903514.
Depending on the machine and environment used, gimp can deadlock at startup because of a deadlock inside glibc.
I'm forwarding what I wrote for the Debian bug tracker:
Using gdb to find where it hung (gimp-gdb.txt) gives threads waiting on
a lock while doing thread-local related stuff and the main thread is in
the process of dl_close-ing openblas waiting the threads to exit using
pthread_join.
It seems that the lock used in tls_get_addr_tail
[0] is the same as
the one locked by _dl_close
[1].
A recursive lock is used but here it does not help as the thread calling
tls_get_addr_tail
and _dl_close
are not the same.
This deadlock may not happen everytime, in my case, the openblas threads
are still initializing while _dl_close
is called.
Given this, I think the offending commit in openblas is bf40f80 [2]
which add TLS variables to avoid locking. But many change were done
since then.
One of related bug report is [3] which seems to indicate that the locks
handling is not easy inside glibc.
There were an attempt to fix deadlocks between tls_get_addr
and a
_dl_close
of a module whose finalizer joins with that thread [4].
So I see these possibles solutions:
- Add a breaks between gimp and openblas
- Disable TLS in openblas build (if possible, but this would cause a
performance loss for users that use openblas without gimp) - Patch glibc to not deadlock (but this seems not easy to do at all)
[0] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-tls.c#L761
[1] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-close.c#L812
[2]
bf40f80#diff-31f8d4e8863583d95bf2f9529f83844e
[4] https://sourceware.org/ml/libc-alpha/2015-06/msg00062.html