Fix: Prevent reallocation of TLS during thread exit on iOS #5575

xuezhulian · 2025-11-24T12:02:33Z

Problem：
The previous implementation used pthread_key_create(&terminationKey, onThreadExitCallback) to listen for thread exit and execute a cleanup callback.
The core issue was that this onThreadExitCallback function accessed a statically declared thread_local variable: THREAD_LOCAL_VARIABLE RuntimeState* runtimeState = kInvalidRuntime;.
Due to the order of operations during thread termination, the cleanup for our custom terminationKey was executed after the thread_local variable destructor. This meant that by the time our callback was called, the TLV for runtimeState had already been destroyed.
Consequently, accessing the destroyed runtimeState inside the callback triggered a new allocation for it. Because the thread's TSD cleanup loop was already complete, this newly allocated memory was never freed, leading to a memory leak.

Solution：
The fix is to change the callback registration mechanism to align with the C++ runtime's intended process for thread_local cleanup.
Instead of creating a custom key, the new implementation uses _tlv_atexit(&onThreadExitCallback, destructorRecord) to register the thread exit callback.
This works because _tlv_atexit indirectly registers our callback with the main cleanup list managed by dyld. During process initialization, dyld creates a system-level key, _terminatorsKey, and associates it with a master cleanup function, finalizeListTLV. The _tlv_atexit function essentially adds our callback to the list that finalizeListTLV will process.
Crucially, the execution of finalizeListTLV is guaranteed to happen before the individual thread_local variables like runtimeState are destroyed.
As a result, when onThreadExitCallback now accesses runtimeState, the variable is still valid, which prevents the TLV reallocation and resolves the memory leak.

xuezhulian · 2025-11-27T02:45:08Z

The _tlv_get_addr function, which is called when a thread_local variable is accessed, operates on the DATA.__thread_data section. Accessing this section maybe trigger a page fault, resulting in blocking disk I/O. This becomes critical during thread termination.
We have observed the following deadlock scenario in production:

A Main GC event is triggered. As part of its process, the GC thread needs all Kotlin threads to reach a safepoint before it can proceed.
At the same time, the main thread is already suspended, waiting for the GC to complete.
Concurrently, another Kotlin thread is in the process of exiting and is executing __pthread_tsd_cleanup.
Within this cleanup routine, our previous logic inadvertently accesses a thread_local variable. This access triggers a page fault, causing the exiting thread to be suspended by the kernel while it waits for disk I/O.
This creates a deadlock:
○ The Main GC thread is blocked, waiting for the exiting Kotlin thread to reach a safepoint.
○ The exiting Kotlin thread is blocked by the kernel, waiting for a page fault to be resolved.
○ The main thread remains suspended, indirectly blocked by the exiting thread's stall.
This entire sequence prevents the main thread from responding, ultimately leading to a watchdog timeout and termination of the application.

haitaka · 2025-12-11T14:36:54Z

Hi, @xuezhulian! Thank you for your pull request!

I feel a bit uncertain about using _tlv_atexit function, which does not seem to be a part of well documented API. You wrote that this mechanism aligns with the C++ runtime's cleanup process – how do you think would it be possible to achieve the same result using C++ language features instead?
Moreover, it would be nice if we could resolve the TLS restoration issue for all the target platform and not only for iOS.

xuezhulian · 2025-12-15T03:18:32Z

@haitaka Thanks for the reply！

Using standard C++ features on iOS works fine.

thread_local std::unique_ptr<ThreadExit> threadExit;

struct ThreadExit {
    int data = 0;
    ~ThreadExit() { 
    }
};

threadExit = std::make_unique<ThreadExit>();

Initially, I attempted to indirectly trigger the deinit runtime using the destructor of a C++ TLS variable. However, debugging revealed that on iOS, C++ TLS variables use _tlv_atexit to register the callback for thread destruction. Since _tlv_atexit is a public API (and its full implementation can be found in the dyld library), I ended up using _tlv_atexit directly instead of the C++ TLS variable.

My primary focus is iOS development, so I am uncertain if this approach is applicable to other platforms.

haitaka

@xuezhulian

thread_local std::unique_ptr threadExit;

That's exactly what I had in mind! We even have this idea somewhere in our back log. Let's go this way.

Let's try replacing troublesome onThreadExit call with the setup of a thread_local cleaner that will call Kotlin_deinitRuntimeCallback on destruction, ensuring that runtimeState is still alive and accessible. If everything works out we will resolve an old annoying issue.

xuezhulian requested a review from a team as a code owner November 24, 2025 12:02

xuezhulian requested a review from anton-bannykh November 24, 2025 12:02

xuezhulian force-pushed the master branch 11 times, most recently from 10e75ee to 3dc5c7c Compare November 24, 2025 14:19

Fix: Prevent reallocation of TLS during thread exit on iOS

8672440

xuezhulian force-pushed the master branch from 3dc5c7c to 8672440 Compare November 24, 2025 15:29

sbogolepov added the Native label Nov 24, 2025

sbogolepov requested a review from haitaka November 26, 2025 13:44

haitaka requested changes Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Prevent reallocation of TLS during thread exit on iOS #5575

Fix: Prevent reallocation of TLS during thread exit on iOS #5575

xuezhulian commented Nov 24, 2025

Uh oh!

xuezhulian commented Nov 27, 2025

Uh oh!

haitaka commented Dec 11, 2025

Uh oh!

xuezhulian commented Dec 15, 2025

Uh oh!

haitaka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: Prevent reallocation of TLS during thread exit on iOS #5575

Are you sure you want to change the base?

Fix: Prevent reallocation of TLS during thread exit on iOS #5575

Conversation

xuezhulian commented Nov 24, 2025

Uh oh!

xuezhulian commented Nov 27, 2025

Uh oh!

haitaka commented Dec 11, 2025

Uh oh!

xuezhulian commented Dec 15, 2025

Uh oh!

haitaka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants