Skip to content

c++ code frequency dlclose/dlopen *.so compiled by rust cause crash #134820

Open
@Rust401

Description

@Rust401

reproduction code upload to this repo

Scenario:

  1. Use rust to compile a staticlib with cxx build, target is aarch64-linux-android
  2. Integrate the staticlib(*.a) to a c++ compiled .so
  3. Use dlopen/dlclose to use the symbol in this .so
  4. Run the binary on android(which use bionic libc)

then we will find the segment fault

Hello from Rust!
dude loop 125
Hello from C++!
Hello from Rust!
dude loop 126
Hello from C++!
Hello from Rust!
dude loop 127
Hello from C++!
Segmentation fault

info from logcat

Cmdline: ./test_dlopen 128
pid: 14405, tid: 14405, name: test_dlopen  >>> ./test_dlopen <<<
uid: 0
tagged_addr_ctrl: 0000000000000001
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7fc28c7ff8
Cause: stack pointer is in a non-existent map; likely due to stack overflow.
#00 pc 00000000000c4cdc  /data/local/libdude.so (std::sys_common::thread_local_key::StaticKey::lazy_init::h713657dd8d2d4621+36)
#01 pc 000000000009216c  /data/local/libdude.so (core::ops::function::FnOnce::call_once::h726b8069cd1e002c+80)
#02 pc 00000000000b7868  /data/local/libdude.so (std::panicking::rust_panic_with_hook::h2748add3cd52cde1+84)
#03 pc 00000000000b77dc  /data/local/libdude.so (std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::hf339e6c238ee80b2+144)
#04 pc 00000000000b51c4  /data/local/libdude.so (std::sys_common::backtrace::__rust_end_short_backtrace::hf829d410f7587982+8)
#05 pc 00000000000b7550  /data/local/libdude.so (rust_begin_unwind+48)
#06 pc 00000000000d94d4  /data/local/libdude.so (core::panicking::panic_fmt::h955ec3f09bb74715+40)
#07 pc 00000000000d992c  /data/local/libdude.so (core::panicking::assert_failed_inner::hd795eb67b74b452d+276)
#08 pc 0000000000094cb4  /data/local/libdude.so (core::panicking::assert_failed::h2f68f007dd54e097+44)
#09 pc 00000000000c4d74  /data/local/libdude.so (std::sys_common::thread_local_key::StaticKey::lazy_init::h713657dd8d2d4621+188)
#10 pc 000000000009216c  /data/local/libdude.so (core::ops::function::FnOnce::call_once::h726b8069cd1e002c+80)
#11 pc 00000000000b7868  /data/local/libdude.so (std::panicking::rust_panic_with_hook::h2748add3cd52cde1+84)
#12 pc 00000000000b77dc  /data/local/libdude.so (std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::hf339e6c238ee80b2+144)

Based on my analysis, this maybe caused by std::sys_common::thread_local_key::StaticKey::lazy_init, which belong to rust runtime. Before the rust func in dynamic lib was first called, thread_local variable maybe generated use this method.

This behavior is just same to routine in c/cpp runtime. See emutls.c in llvm-project's compile-rt.

The process is as follows

__emutls_get_address =>
    emutls_get_index =>
        pthread_once(&once, emutls_init) =>
            emutls_init =>
                abort =>

Simply put:
1. After dlopen, a threadlocal variable generated when rust func was first called use pthread_key_create, but the matching pthread_key_delete was not called when dlclose.
2. Each dlopen -> call -> dlclose loop will occupy a key_map util the BIONIC_PTHREAD_KEY_COUNT was arrived.
3. Then the abort happens.

But the same code ran happliy on an x86 Linux machine.
We hack libc(both bionic for android and glibc2.35 for my ubuntu 22.04).
We found that android(which use bionic on arm64 cpu) will generate a thread_local variable when rust func was first called after dlopen use pthread_key_create(code in bionic libc).
But linux(which user glibc on x86 cpu) not call pthread_key_create(Maybe glibc use another mechanism to use manager threadlocal variable)

So, my final question is:

  1. Is my scenario, which dlopen a wrapped rust cdylib, use the function and then dlclose, for n loop, reasonable?
  2. Is there anyway to release the threadlocal variable generate by rust when dlclose?

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-dynamic-libraryArea: Dynamic/Shared LibrariesA-linkageArea: linking into static, shared libraries and binariesC-discussionCategory: Discussion or questions that doesn't represent real issues.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions