Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested runBlocking deadlocks when a task depends on the nested runBlocking itself #3982

Open
dovchinnikov opened this issue Dec 11, 2023 · 5 comments
Labels

Comments

@dovchinnikov
Copy link
Contributor

dovchinnikov commented Dec 11, 2023

What do we have now?

The following deadlocks because nested runBlocking tries to process tasks from the outer runBlocking, and one of the tasks actually depends on the completion of the nested runBlocking:

import kotlinx.coroutines.*

class Container(private val scope: CoroutineScope) {

  private val instances = HashMap<Class<*>, Deferred<Any>>() // basically a map

  suspend fun <T> instance(key: Class<out T>): T {
    return instances
      .computeIfAbsent(key, ::instanceDeferred)
      .await() as T
  }

  private fun instanceDeferred(clazz: Class<*>): Deferred<Any> = scope.async {
    val constructor = clazz.getConstructor(Container::class.java)
    constructor.isAccessible = true
    constructor.newInstance(this@Container)
  }
}

class MyService(container: Container) {
  init {
    runBlocking {
      // this can ask container for an unrelated service,
      // or in this case it's enough for it to be empty
    }
  }
}

class MyService2(container: Container) {
  val service1 = runBlocking {
    container.instance(MyService::class.java)
  }
}

fun main() {
  runBlocking {
    val container = Container(this@runBlocking)
    launch {
      println(container.instance(MyService::class.java))
    }
    launch {
      println(container.instance(MyService2::class.java).service1)
    }
  }
}

What should be instead?

runBlocking should not steal tasks from outer runBlocking.

Alternatively, please provide an API to make runBlocking not steal tasks. It may be a boolean, or a special coroutine context element. Currently we have to use internals.

Why?

Because otherwise it deadlocks.

Notes

Originally, the "stealing" was added to fix a very specific use-case #860.

I believe that if nested runBlocking "steals" tasks while being inside a limited dispatcher, then this stealing logic should apply to all limited dispatchers, i.e. runBlocking on a thread inside Dispatchers.Default should steal tasks from Dispatchers.Default (#3439). If that's not possible for any reason, then nested runBlocking should not steal tasks from the outer runBlocking for the very same reason whatever it might be.

@dovchinnikov dovchinnikov changed the title Nested runBlocking deadlock when a task Nested runBlocking deadlock when a task depends on the nested runBlocking itself Dec 11, 2023
@dovchinnikov dovchinnikov changed the title Nested runBlocking deadlock when a task depends on the nested runBlocking itself Nested runBlocking deadlocks when a task depends on the nested runBlocking itself Dec 11, 2023
@dkhalanskyjb
Copy link
Collaborator

A simplified reproducer:

fun main() {
    runBlocking {
        val job = launch {
            runBlocking {
            }
        }
        launch {
            runBlocking {
                job.join()
            }
        }
    }
}

@dkhalanskyjb
Copy link
Collaborator

Right now, the code I show above hangs.

If we disable work stealing and let the thread do its thing inside runBlocking, we encounter the opposite problem: the following code will hang.

fun main() {
    runBlocking {
        val job = launch {
            yield() // new thing
            runBlocking {
            }
        }
        launch {
            runBlocking {
                job.join()
            }
        }
    }
}

Right now, this code passes just fine: the runBlocking on the bottom gets entered, runs the other tasks to completion, joins the job and exits. If it couldn't steal tasks, then the runBlocking on the bottom would just wait forever until someone completed the job.

In fact, in your case, just swapping the launch blocks at the bottom is enough to "fix" the problem, but this "fix" would stop working if we disabled work stealing.

What I'm getting from this is, with runBlocking, you can't avoid topologically sorting your code by the time an inner runBlocking is entered using the "depends-on" relation: whether work stealing is enabled or not simply reverses the direction of the dependency arrows.

@dovchinnikov
Copy link
Contributor Author

Basically, yes.

@dovchinnikov
Copy link
Contributor Author

The important thing here is a conceptual problem: should runBlocking steal tasks from a limited dispatcher, a thread of which is blocked by the said runBlocking? I believe that if runBlocking steals from outer runBlocking, then runBlocking should steal from Dispatchers.Default if it happens on a thread which holds the CPU-token, and, in general, runBlocking should steal from any limited dispatcher.

@ninja-
Copy link

ninja- commented May 12, 2024

I did hit this edge case in one of my projects and it was difficult to debug based on just thread and coroutine dumps.

Some coroutines were just stuck in CREATED state forever.
Task A is a run-once where results are derived from a mix of synchronous and asynchronous code, including runBlocking code.
Tasks B, C, D synchronously await task A promise using runBlocking.

Run this with a dispatcher using 4 threads and you have a deadlock.
On a larger scale, deadlock gets random which makes it hard to debug.

Unfortunately, the main reason runBlocking is needed is to provide asynchronous resources within class initializers which are not suspending.

Solution for me was to remove thread limit, then there is always a new thread available to unpause the coroutine and for other threads to unblock as a result.
I then also slowed down adding new tasks to the system to not overwhelm the unlimited thread pool.

Alternatively, the design for task- stealing executor is quite simple and maybe will be in Kotlin some day?
1 Create a dispatcher class which adds callback objects to a queue
2 In every runBlocking call, start a background task that consumes callbacks from the shared queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants