Description
reproduction code upload to this repo
Scenario:
- Use rust to compile a staticlib with cxx build, target is
aarch64-linux-android
- Integrate the staticlib(*.a) to a c++ compiled .so
- Use dlopen/dlclose to use the symbol in this .so
- Run the binary on android(which use bionic libc)
then we will find the segment fault
Hello from Rust!
dude loop 125
Hello from C++!
Hello from Rust!
dude loop 126
Hello from C++!
Hello from Rust!
dude loop 127
Hello from C++!
Segmentation fault
info from logcat
Cmdline: ./test_dlopen 128
pid: 14405, tid: 14405, name: test_dlopen >>> ./test_dlopen <<<
uid: 0
tagged_addr_ctrl: 0000000000000001
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7fc28c7ff8
Cause: stack pointer is in a non-existent map; likely due to stack overflow.
#00 pc 00000000000c4cdc /data/local/libdude.so (std::sys_common::thread_local_key::StaticKey::lazy_init::h713657dd8d2d4621+36)
#01 pc 000000000009216c /data/local/libdude.so (core::ops::function::FnOnce::call_once::h726b8069cd1e002c+80)
#02 pc 00000000000b7868 /data/local/libdude.so (std::panicking::rust_panic_with_hook::h2748add3cd52cde1+84)
#03 pc 00000000000b77dc /data/local/libdude.so (std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::hf339e6c238ee80b2+144)
#04 pc 00000000000b51c4 /data/local/libdude.so (std::sys_common::backtrace::__rust_end_short_backtrace::hf829d410f7587982+8)
#05 pc 00000000000b7550 /data/local/libdude.so (rust_begin_unwind+48)
#06 pc 00000000000d94d4 /data/local/libdude.so (core::panicking::panic_fmt::h955ec3f09bb74715+40)
#07 pc 00000000000d992c /data/local/libdude.so (core::panicking::assert_failed_inner::hd795eb67b74b452d+276)
#08 pc 0000000000094cb4 /data/local/libdude.so (core::panicking::assert_failed::h2f68f007dd54e097+44)
#09 pc 00000000000c4d74 /data/local/libdude.so (std::sys_common::thread_local_key::StaticKey::lazy_init::h713657dd8d2d4621+188)
#10 pc 000000000009216c /data/local/libdude.so (core::ops::function::FnOnce::call_once::h726b8069cd1e002c+80)
#11 pc 00000000000b7868 /data/local/libdude.so (std::panicking::rust_panic_with_hook::h2748add3cd52cde1+84)
#12 pc 00000000000b77dc /data/local/libdude.so (std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::hf339e6c238ee80b2+144)
Based on my analysis, this maybe caused by std::sys_common::thread_local_key::StaticKey::lazy_init
, which belong to rust runtime. Before the rust func in dynamic lib was first called, thread_local variable maybe generated use this method.
This behavior is just same to routine in c/cpp runtime. See emutls.c in llvm-project's compile-rt.
The process is as follows
__emutls_get_address =>
emutls_get_index =>
pthread_once(&once, emutls_init) =>
emutls_init =>
abort =>
Simply put:
1. After dlopen, a threadlocal variable generated when rust func was first called use pthread_key_create
, but the matching pthread_key_delete
was not called when dlclose.
2. Each dlopen -> call -> dlclose
loop will occupy a key_map util the BIONIC_PTHREAD_KEY_COUNT was arrived.
3. Then the abort happens.
But the same code ran happliy on an x86 Linux machine.
We hack libc(both bionic for android and glibc2.35 for my ubuntu 22.04).
We found that android(which use bionic on arm64 cpu) will generate a thread_local variable when rust func was first called after dlopen use pthread_key_create
(code in bionic libc).
But linux(which user glibc on x86 cpu) not call pthread_key_create(Maybe glibc use another mechanism to use manager threadlocal variable)
So, my final question is:
- Is my scenario, which dlopen a wrapped rust cdylib, use the function and then dlclose, for n loop, reasonable?
- Is there anyway to release the threadlocal variable generate by rust when dlclose?