Skip to content

UndispatchedCoroutine leaks CoroutineContext via thread local when using custom ContinuationInterceptor #4296

Closed
@pyricau

Description

@pyricau

Bug description

To fix #2930 (thread locals using ThreadContextElement not cleaned up), #3252 introduce a little hack (source) in UndispatchedCoroutine which immediately saves the previous thread context to UndispatchedCoroutine.threadStateToRecover within the UndispatchedCoroutine constructor, only when the ContinuationInterceptor is not a CoroutineDispatcher:

internal actual class UndispatchedCoroutine<in T>actual constructor () : ScopeCoroutine<T>() {
    init {
        if (uCont.context[ContinuationInterceptor] !is CoroutineDispatcher) {
            val values = updateThreadContext(context, null)
            restoreThreadContext(context, values)
            saveThreadContext(context, values)
        }
    }

If the continuation doesn't suspend, then UndispatchedCoroutine.afterResume() isn't invoked and UndispatchedCoroutine.threadStateToRecover.remove() is not called.

One thing of note, the javadoc for UndispatchedCoroutine.threadStateToRecover claims this:

     * Each access to Java's [ThreadLocal] leaves a footprint in the corresponding Thread's `ThreadLocalMap`
     * that is cleared automatically as soon as the associated thread-local (-> UndispatchedCoroutine) is garbage collected.

Unfortunately, that's not exactly correct. When the ThreadLocal is garbage collected, its value actually stays in the ThreadLocalMap, until either the thread itself gets garbage collected, or the ThreadLocal map cleans up stale entries, which only happens when using thread locals some more.

So when the right conditions apply, the CoroutineContext instance ends up being held in memory long after it's no longer needed.

To reproduce this, you need:

  • A CoroutineContext that contains a ContinuationInterceptor that is not a CoroutineDispatcher (to trigger the UndispatchedCoroutine constructor hack)
  • Call withContext(), changing the CoroutineContext but without changing the ContinuationInterceptor (to trigger the withContext() fast path 2 that starts the coroutine on the same ContinuationInterceptor, wrapping it with UndispatchedCoroutine.

Here's a repro: #4295

package kotlinx.coroutines

import kotlinx.coroutines.testing.*
import org.junit.Test
import java.lang.ref.*
import kotlin.coroutines.*
import kotlin.test.*

class CustomContinuationInterceptorTest : TestBase() {

    fun `CoroutineDispatcher not suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(Dispatchers.Default, suspend = false)

    @Test
    fun `CoroutineDispatcher suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(Dispatchers.Default, suspend = true)


    @Test
    fun `CustomContinuationInterceptor suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(
            CustomContinuationInterceptor(Dispatchers.Default),
            suspend = true
        )

    // This is the one test that fails
    @Test
    fun `CustomContinuationInterceptor not suspending leaks CoroutineContext`() =
        ensureCoroutineContextGCed(
            CustomContinuationInterceptor(Dispatchers.Default),
            suspend = false
        )

    @Test
    fun `CustomNeverEqualContinuationInterceptor suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(
            CustomNeverEqualContinuationInterceptor(Dispatchers.Default),
            suspend = true
        )

    @Test
    fun `CustomNeverEqualContinuationInterceptor not suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(
            CustomNeverEqualContinuationInterceptor(Dispatchers.Default),
            suspend = false
        )


    private fun ensureCoroutineContextGCed(coroutineContext: CoroutineContext, suspend: Boolean) {
        runTest {
            lateinit var ref: WeakReference<CoroutineName>
            val job = GlobalScope.launch(coroutineContext) {
                val coroutineName = CoroutineName("Yo")
                ref = WeakReference(coroutineName)
                withContext(coroutineName) {
                    if (suspend) {
                        delay(1)
                    }
                }
            }
            job.join()

            System.gc()
            assertNull(ref.get())
        }
    }

}

class CustomContinuationInterceptor(private val delegate: ContinuationInterceptor) :
    AbstractCoroutineContextElement(ContinuationInterceptor), ContinuationInterceptor {

    override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> {
        return delegate.interceptContinuation(continuation)
    }
}

class CustomNeverEqualContinuationInterceptor(private val delegate: ContinuationInterceptor) :
    AbstractCoroutineContextElement(ContinuationInterceptor), ContinuationInterceptor {

    override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> {
        return delegate.interceptContinuation(continuation)
    }

    override fun equals(other: Any?) = false
}

Fix

I see a few options:

Workaround

Override equals in any custom ContinuationInterceptor implementation and have it return always false, to skip the fast way that's creating the UndispatchedCoroutine creation.

How we found this

I ran into some Compose related memory leaks in Square's UI tests back in February, asked the Compose team if they knew anything about that. They didn't. Then yesterday folks from the Compose team reached out because they're enabling leak detection in their test suite and were running into the exact same leak. We worked together on analyzing a heap dump and debugging and eventually root cause it to this bug.

The Compose team is looking into whether they can start using CoroutineDispatcher implementations directly in Compose tests (for other reasons as well, e.g. see this issue)

Here's the leaktrace as reported by leakcanary:

====================================
HEAP ANALYSIS RESULT
====================================
1 APPLICATION LEAKS

References underlined with "~~~" are likely causes.
Learn more at https://squ.re/leaks.

44298 bytes retained by leaking objects
Signature: c4797f2a268bbe8b50072affe05b2f82ce29f5ed
┬───
│ GC Root: Thread object
│
├─ java.lang.Thread instance
│    Leaking: NO (the main thread always runs)
│    Thread name: 'main'
│    ↓ Thread.threadLocals
│             ~~~~~~~~~~~~
├─ java.lang.ThreadLocal$ThreadLocalMap instance
│    Leaking: UNKNOWN
│    Retaining 446.6 kB in 9221 objects
│    ↓ ThreadLocal$ThreadLocalMap.table
│                                 ~~~~~
├─ java.lang.ThreadLocal$ThreadLocalMap$Entry[] array
│    Leaking: UNKNOWN
│    Retaining 446.6 kB in 9220 objects
│    ↓ ThreadLocal$ThreadLocalMap$Entry[2]
│                                      ~~~
├─ java.lang.ThreadLocal$ThreadLocalMap$Entry instance
│    Leaking: UNKNOWN
│    Retaining 444.0 kB in 9136 objects
│    ↓ ThreadLocal$ThreadLocalMap$Entry.value
│                                       ~~~~~
├─ kotlin.Pair instance
│    Leaking: UNKNOWN
│    Retaining 444.0 kB in 9135 objects
│    ↓ Pair.first
│           ~~~~~
├─ kotlin.coroutines.CombinedContext instance
│    Leaking: UNKNOWN
│    Retaining 444.0 kB in 9134 objects
│    ↓ CombinedContext.left
│                      ~~~~
├─ kotlin.coroutines.CombinedContext instance
│    Leaking: UNKNOWN
│    Retaining 442.9 kB in 9112 objects
│    ↓ CombinedContext.left
│                      ~~~~
├─ kotlin.coroutines.CombinedContext instance
│    Leaking: UNKNOWN
│    Retaining 442.9 kB in 9111 objects
│    ↓ CombinedContext.element
│                      ~~~~~~~
├─ kotlinx.coroutines.internal.ScopeCoroutine instance
│    Leaking: UNKNOWN
│    Retaining 441.0 kB in 9040 objects
│    ↓ ScopeCoroutine.uCont
│                     ~~~~~
├─ androidx.compose.foundation.gestures.DefaultScrollableState$scroll$2 instance
│    Leaking: UNKNOWN
│    Retaining 441.0 kB in 9038 objects
│    Anonymous subclass of kotlin.coroutines.jvm.internal.SuspendLambda
│    ↓ DefaultScrollableState$scroll$2.$block
│                                      ~~~~~~
├─ androidx.compose.foundation.gestures.ScrollingLogic$scroll$2 instance
│    Leaking: UNKNOWN
│    Retaining 104 B in 2 objects
│    Anonymous subclass of kotlin.coroutines.jvm.internal.SuspendLambda
│    ↓ ScrollingLogic$scroll$2.this$0
│                              ~~~~~~
├─ androidx.compose.foundation.gestures.ScrollingLogic instance
│    Leaking: UNKNOWN
│    Retaining 796 B in 33 objects
│    ↓ ScrollingLogic.overscrollEffect
│                     ~~~~~~~~~~~~~~~~
├─ androidx.compose.foundation.AndroidEdgeEffectOverscrollEffect instance
│    Leaking: UNKNOWN
│    Retaining 484 B in 12 objects
│    ↓ AndroidEdgeEffectOverscrollEffect.edgeEffectWrapper
│                                        ~~~~~~~~~~~~~~~~~
├─ androidx.compose.foundation.EdgeEffectWrapper instance
│    Leaking: UNKNOWN
│    Retaining 56 B in 1 objects
│    context instance of androidx.activity.ComponentActivity with mDestroyed = true
│    ↓ EdgeEffectWrapper.context
│                        ~~~~~~~
╰→ androidx.activity.ComponentActivity instance
     Leaking: YES (ObjectWatcher was watching this because androidx.activity.ComponentActivity received Activity#onDestroy() callback and Activity#mDestroyed is true)
     Retaining 44.3 kB in 797 objects
     key = 6bdc725d-e124-437c-895d-437bf03213af
     watchDurationMillis = 5182
     retainedDurationMillis = 174
     mApplication instance of android.app.Application
     mBase instance of android.app.ContextImpl

For anyone with access to the ASG slack, here's the investigation thread: https://androidstudygroup.slack.com/archives/CJH03QASH/p1733875296139909?thread_ts=1708204699.328019&cid=CJH03QASH

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions