Skip to content

Inconsistent behaviour when child coroutine attaches to the parent during "completing" -> "completed" transition #3893

@qwwdfsad

Description

@qwwdfsad

Steps to reproduce:

// Add this test to JobChildStressTest

@Test
fun testFailingChildIsAddedWhenJobFinalizesItsState() {
    // All exceptions should get aggregated here
    repeat(N_ITERATIONS) {
        runBlocking {
            val rogueJob = AtomicReference<Job?>()
            println(it)
            val deferred = CompletableDeferred<Unit>()
            launch(pool + deferred) {
                deferred.complete(Unit) // Transition deferred into "completing" state waiting for current child
                // **Asynchronously** submit task that launches a child so it races with completion
                pool.executor.execute {
                    rogueJob.set(launch(pool + deferred) {
                        println("isCancelled: " + coroutineContext.job.isCancelled)
                        throw TestException()
                    })
                }
            }

            deferred.join()
            if (rogueJob.get()?.isActive ?: false) {
                val rogue = rogueJob.get()!!
                println("Rogue job with parent " + rogue.parent + " and children list: " + rogue.parent?.children?.toList())
            }
        }
    }
}

What happens here:

  • Deferred is completing, waiting for the first launch (1) ChildCompletion handler to finalize its state
  • ChildCompletion invokes continueCompleting
  • In parallel, the second launch (2) is attached to the deferred
  1. Happy path: 2 successfully attaches to the parent, 1 detects that in continueCompleting and starts waiting for it. This situation is indistinguishable from deferred having two children

  2. Unhappy path #1: 1 detects there are no children and invokes finalizeFinishingState.
    Then 2 attaches itself to the parent. finalizeFinishingState reaches completeStateFinalization -> notifyCompletion and cancels the child, which might have been running for some time already.
    This is an observable and counter-intuitive (because nothing actually failed or was cancelled explicitly) behaviour.
    Also, if 2 fails with an exception, it gets reported to the global exception handler.

  3. Unhappy path #2: the same as above, but 2 attaches itself to the parent after it completely finalizes its state.
    Meaning that we have a completed deferred with no children and active non-cancelled coroutine with a parent pointing to the deferred

Note that 2) kind of emulates the behaviour "attempt to attach as a child to already completed job immediately cancels current job"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions