Mutex issues when two versions of libprotobuf are linked to two Python libraries separately

**What version of protobuf and what language are you using?**
Version: v3.25.1 (in `tink` 1.9) and v3.21.3 (in `pyarrow` 20.0.0)
Language: C++ and Python

**What operating system (Linux, Windows, ...) and version?**

macOS

**What runtime / compiler are you using (e.g., python version or gcc version)**

Python 3.10.17

**What did you do?**

1. Install latest `tink` (1.11.0) and `pyarrow` (20.0.0) in the Python version
2. Run python terminal
3. Import `tink`/`pyarrow` first
4. Import `pyarrow`/`tink`

If `tink` is imported first, importing `pyarrow` will lead to dead lock because the mutex is invalid.
Otherwise, the program will crash directly.

**What did you expect to see**

Importing both libraries should be safe ;)

**What did you see instead?**

Crashing:

```
libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
```

or hanging:

```
[mutex.cc : 453] RAW: Lock blocking 0x156747c38   @
```

I did a first (and kind of deep) investigation. TL;DR, it seems that on macOS, the mutex lock is getting the wrong `google::protobuf::internal::ShutdownData::get()::data` to lock the mutex (before it was an internal impl of mutex - some wrappers of `std::mutex` in protobuf and now it's using `absl::Mutex`).

You should be able to find the stack trace in https://github.com/tink-crypto/tink-py/issues/25 and https://github.com/apache/arrow/issues/40088.

Here is my latest finding:

I set breakpoint on `google::protobuf::internal::OnShutdownRun` and then import `pyarrow` first. The assembly of it in `libarrow` is as follows:

```
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
    frame #0: 0x000000010732a0f4 libarrow.2000.dylib`google::protobuf::internal::OnShutdownRun(void (*)(void const*), void const*)
libarrow.2000.dylib`google::protobuf::internal::OnShutdownRun:
    0x10732a0f4 <+0>:  stp    x26, x25, [sp, #-0x50]!
    0x10732a0f8 <+4>:  stp    x24, x23, [sp, #0x10]
    0x10732a0fc <+8>:  stp    x22, x21, [sp, #0x20]
    0x10732a100 <+12>: stp    x20, x19, [sp, #0x30]
    0x10732a104 <+16>: stp    x29, x30, [sp, #0x40]
    0x10732a108 <+20>: add    x29, sp, #0x40
    0x10732a10c <+24>: mov    x20, x1
    0x10732a110 <+28>: mov    x21, x0
    0x10732a114 <+32>: adrp   x8, 1794
->  0x10732a118 <+36>: ldr    x8, [x8, #0x340]
    0x10732a11c <+40>: ldaprb w8, [x8]
    0x10732a120 <+44>: adrp   x19, 1796
    0x10732a124 <+48>: ldr    x19, [x19, #0x50]
    0x10732a128 <+52>: tbz    w8, #0x0, 0x10732a238 ; <+324>
    0x10732a12c <+56>: ldr    x22, [x19]
    0x10732a130 <+60>: add    x19, x22, #0x18
    0x10732a134 <+64>: mov    x0, x19
    0x10732a138 <+68>: bl     0x107531dd0    ; symbol stub for: std::__1::mutex::lock()
    0x10732a13c <+72>: ldp    x23, x8, [x22, #0x8]
    0x10732a140 <+76>: cmp    x23, x8
    0x10732a140 <+76>: cmp    x23, x8
    0x10732a144 <+80>: b.hs   0x10732a158    ; <+100>
    0x10732a148 <+84>: stp    x21, x20, [x23]
    0x10732a14c <+88>: add    x8, x23, #0x10 
```

where the marked instruction is to get the singleton data. When I get the register bank, it contains:

```
x8 = 0x0000000107b2b8c8  guard variable for google::protobuf::internal::ShutdownData::get()::data
```

We can notice that `0x10732a138 <+68>` contains a direct call to standard C++ lib of mutex lock. And it's ok here because the singleton data also contains a `std::mutex` member.

I let it continue running and then import tink (with a newer version using `absl` mutex).

```
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
    frame #0: 0x0000000103f5b104 tink_bindings.cpython-310-darwin.so`google::protobuf::internal::OnShutdownRun(void (*)(void const*), void const*)
tink_bindings.cpython-310-darwin.so`google::protobuf::internal::OnShutdownRun:
    0x103f5b104 <+0>:  stp    x28, x27, [sp, #-0x60]!
    0x103f5b108 <+4>:  stp    x26, x25, [sp, #0x10]
    0x103f5b10c <+8>:  stp    x24, x23, [sp, #0x20]
    0x103f5b110 <+12>: stp    x22, x21, [sp, #0x30]
    0x103f5b114 <+16>: stp    x20, x19, [sp, #0x40]
    0x103f5b118 <+20>: stp    x29, x30, [sp, #0x50]
    0x103f5b11c <+24>: add    x29, sp, #0x50
    0x103f5b120 <+28>: mov    x20, x1
    0x103f5b124 <+32>: mov    x21, x0
    0x103f5b128 <+36>: adrp   x8, 501
->  0x103f5b12c <+40>: ldr    x8, [x8, #0xe8]
    0x103f5b130 <+44>: ldaprb w8, [x8]
    0x103f5b134 <+48>: adrp   x19, 502
    0x103f5b138 <+52>: ldr    x19, [x19, #0xaa0]
    0x103f5b13c <+56>: tbz    w8, #0x0, 0x103f5b218 ; <+276>
    0x103f5b140 <+60>: ldr    x22, [x19]
    0x103f5b144 <+64>: add    x19, x22, #0x18
    0x103f5b148 <+68>: mov    x0, x19
->  0x103f5b14c <+72>: bl     0x104026a88    ; absl::lts_20240722::Mutex::Lock()
    0x103f5b150 <+76>: ldp    x9, x8, [x22, #0x8]
    0x103f5b154 <+80>: cmp    x9, x8
    0x103f5b158 <+84>: b.hs   0x103f5b16c    ; <+104>
```

When I read the register, it's giving the same address (at the first arrow):

```
x8 = 0x0000000107b2b8c8  guard variable for google::protobuf::internal::ShutdownData::get()::data
```

The data is already created while importing `pyarrow`, and it has a member of `std::mutex`.

But then it calls the `absl` mutex lock (at the second arrow), which expects `absl` mutex, which can crash the program.

---------------

The current question is that, why they share the same address even though they are from two different separately loaded libraries (should have `RTLD_LOCAL` by default).

Maybe there are something going wrong in the build configurations, which prevent them from creating different data segment.

**Anything else we should know about your project / environment**

My previous investigations:
- https://github.com/tink-crypto/tink-py/issues/25#issuecomment-2857960794
- https://github.com/apache/arrow/issues/40088#issuecomment-2865062539

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutex issues when two versions of libprotobuf are linked to two Python libraries separately #21686

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mutex issues when two versions of libprotobuf are linked to two Python libraries separately #21686

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions