-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runBlocking
should let go of CPU token before parking the thread
#3983
Comments
|
Then nested
No, not really, it's something beyond the control of the library. |
Let me describe an issue, which we've faced in IJ. Consider a platform API interface which is supposed to be implemented in 3rd-party plugins:
The platform calls Now, the client decides to use coroutines:
After some time, the platform API evolves, and a suspending entry point is added:
The platform calls Possible solutions
In particular, this means we cannot guarantee that there are at most X=CPU threads are active at any given moment, which defeats the purpose of
|
First, an important thing: making
Imagine that a thread of In a single-threaded environment, the problem of liveness is not as pronounced: it's par for the course that a single long-running task prevents progress everywhere else. Here, it's really strange that |
After reading everything in #3982, #3439, and this issue several times, I understand the proposed solutions, but I still have no idea what the problem even is, how many problems there are, or how the solutions map to the actual problems (and not simplified reproducers). Here's one question that has a chance of helping us. Imagine a thread pool that freely spawned up to If you think this question misses the mark entirely, could you explain the big picture with a clear breakdown of the design constraints we have to fit into? If not, I doubt I can say anything useful and you'll probably have to proceed without me. |
Let me elaborate on this first: Two things to consider:
The problem:
In general, we have to call |
How is this situation different from the user code, say, accessing the Internet? That will also block the thread that you provide it with. If the 3rd-party implementation is some arbitrary code, then either you can require that it does nothing blocking there (for example, by validating the implementation with BlockHound), or you can't expect them to be good citizens, so they will eventually eat the thread that you give them. |
This approach would solve the starvation problem, because eventually there would be enough threads to handle all the tasks. But with this approach there will be no guarantee that the parallelism is equal to CPU count, i.e. after a bunch of threads are spawned, there will be a window where all of them would be unblocked and all of them will process the global CPU-bound queue together. |
A blocking operation, which blocks on IO, just blocks.
On the other hand, |
Ok, but let's consider this:
It would be okay if a suspending API just appears giving me a choice to migrate top it at a convenient time, but, instead, my current implementation becomes broken because I've used This issue is about evolution of existing APIs, it's about calling the code, which already uses |
That's another big thing I don't understand, yes. Do you actually need that guarantee? Quoting your initial message:
I get a strong impression that you're okay with utilizing extra threads to resolve deadlocks.
It can realistically take several seconds or more. If you're okay with spawning extra threads, this would be a good time to do that, no?
I think I see this point, thank you.
And then this blocking API with no constraints starts to be unconditionally run on Here's another possible way to perform this migration: interface CoolExtension {
fun computeStuff(): Any
suspend fun computeStuffS(): Any {
throw SuspendImplNotProvided()
}
}
internal class SuspendImplNotProvided(): Exception()
suspend fun CoolExtension.doComputeStuff() {
try {
withContext(Dispatchers.Default) {
computeStuffS()
}
} catch (e: SuspendImplNotProvided) {
withContext(Dispatchers.IO) {
computeStuff() // can potentially block, so using the IO dispatcher
}
}
} |
Well, yes! Isn't that the whole thing
Yes, extra threads (and risking OOM) are better than total deadlock.
Correct, it would be a good time. But, again, we don't have control over the implementation of JVM IO.
This is what we'd want: run everything on
So, basically, this:
Listed as Possible Solution 1 |
Exactly. This is why, even if we change what happens in kotlinx-coroutines, you still won't have control over what happens in
Oh, ok, I misunderstood that point, then. Yes, I like this solution the most: until the author of
and
Looks like the crucial point. Do I understand correctly that you are prepared to give up some of the parallelism by allowing some of the threads with CPU tokens to block in exchange for improving the ratio of useful work over the total work? I can imagine wanting to do this if the goal is to reduce energy consumption, for example. |
With regular IO, where we don't have control over, this indeed is a black box. I'd argue that
I don't really understand the proposition. How would you detect when to spawn a new thread?
Not really. I argue that some situations can be detected, e.g. I believe there is only one user-callable function in the library which blocks: |
I hate to bring it up here, but this is where Loom is expected to shine. Basically, the coroutine library implementation on top of Loom should spawn a new virtual thread per coroutine and use blocking JVM API calls to park/join, effectively delegating the scheduling to VM. I wonder if any work was done in this direction. This would cover IO being a black box: the VM will detect it and unmount the virtual thread before mounting another. |
…h a CPU permit And reacquire CPU permit after runBlocking finishes. This should resolve Dispatchers.Default starvation in cases where runBlocking is used to run suspend functions from non-suspend execution context. Kotlin#3983 / IJPL-721
…h a CPU permit And reacquire CPU permit after runBlocking finishes. This should resolve Dispatchers.Default starvation in cases where runBlocking is used to run suspend functions from non-suspend execution context. Kotlin#3983 / IJPL-721
…irect interaction with Worker Kotlin#3983 / IJPL-721
PermitTransfer is extracted to be used both in CoroutineScheduler and in LimitedDispatcher. BlockingDispatchAware interface is introduced for LimitedDispatcher.Worker to be accounted by CoroutineScheduler. Kotlin#3983 / IJPL-721
…h a CPU permit And reacquire CPU permit after runBlocking finishes. This should resolve Dispatchers.Default starvation in cases where runBlocking is used to run suspend functions from non-suspend execution context. Kotlin#3983 / IJPL-721
…irect interaction with Worker Kotlin#3983 / IJPL-721
PermitTransfer is extracted to be used both in CoroutineScheduler and in LimitedDispatcher. BlockingDispatchAware interface is introduced for LimitedDispatcher.Worker to be accounted by CoroutineScheduler. Kotlin#3983 / IJPL-721
@dovchinnikov, does #4084 (comment) mean that there is no more intention to pursue this approach? |
We have another idea how to deal with thread starvation, though it may hit the performance in some cases. I'll publish another MR when it's ready. |
What do we have now?
runBlocking
parks the thread holding the CPU-token if it happens on a thread ofDispatchers.Default
.What should be instead?
runBlocking
should let go of the CPU-token before parking, and "re-acquire" the token before un-parking (it should be un-parked in state where the token is already held by it).Why?
The current solution is just "don't use
runBlocking
" followed by "at least don't userunBlocking
insideDispatchers.Default
", which is not a solution. This does not work in real-life scenarios, especially with large mixed codebases like IJ. In IJ we've tried to leverage #3439 but the approach is stillborn because it causes #3982 but on multi-threaded scale, and we even didn't start to tackle the thread locals which leak from outer thread into inner tasks.In other similar scenarios (FJ's managed block), another thread is spawned to compensate for the blocked one. On JVM this is the most viable approach to this day. It's better to spawn an extra thread, which might do some other work later on or just die after timeout, and to risk OOME, than to have a starvation deadlock.
The text was updated successfully, but these errors were encountered: