Skip to content

test: extend actor-system-terminate phase timeout in InputStreamSourceTest for JDK 25 nightly#2994

Open
He-Pin wants to merge 1 commit into
apache:mainfrom
He-Pin:fix/jdk25-input-stream-source-tck-shutdown
Open

test: extend actor-system-terminate phase timeout in InputStreamSourceTest for JDK 25 nightly#2994
He-Pin wants to merge 1 commit into
apache:mainfrom
He-Pin:fix/jdk25-input-stream-source-tck-shutdown

Conversation

@He-Pin
Copy link
Copy Markdown
Member

@He-Pin He-Pin commented May 25, 2026

Motivation

The JDK 25 nightly build aborts the stream TCK with:

[CoordinatedShutdown(pekko://InputStreamSourceTest)] Coordinated shutdown phase [actor-system-terminate] timed out after 10000 milliseconds
java.lang.RuntimeException: Failed to stop [InputStreamSourceTest] within [40000 milliseconds]

The printTree dump shows two flow-X-0-take ActorGraphInterpreter
children stuck mid-termination under StreamSupervisor-0.

InputStreamSourceTest feeds a CPU-busy InputStream whose read()
always returns a fresh byte without blocking or yielding, so each
onPull runs up to chunkSize synchronous read() calls. The JDK 25
nightly forces pekko.test.stream-dispatcher.fork-join-executor.virtualize=on
(see .github/workflows/nightly-builds.yml), which is the very
dispatcher the test pins via ActorAttributes.dispatcher(...). On a
virtualized dispatcher, cancellation propagation through take(elements)
is slow enough that the default 10 s actor-system-terminate phase
timeout fires before the lingering flow actors finish terminating, even
though the outer ActorSystemLifecycle.shutdownTimeout is already
scaled to 40 s by pekko.test.timefactor (#2885).

Modification

Override additionalConfig in InputStreamSourceTest to bump
pekko.coordinated-shutdown.phases.actor-system-terminate.timeout to
30 s, mirroring the pattern already used in MixedProtocolClusterSpec
for the same JDK 25 virtualized failure mode. The override layers on top
of PekkoPublisherVerification.additionalConfig via withFallback so
the existing buffer-size settings are preserved.

This is the smallest viable fix — it does not change production code,
does not alter the test's semantics, and does not relax any other
TCK timing.

Result

The phase has enough headroom to drain in-flight cancellation traffic on
virtualized dispatchers before the outer shutdown await fires. The other
TCK tests are untouched and keep their default 10 s phase timeout.

Tests

Locally on Oracle OpenJDK 25.0.2 (arm64) with the same flags as
nightly-builds.yml jdk-nightly-build:

sbt \
  -Dpekko.test.timefactor=4 \
  -Dpekko.actor.testkit.typed.timefactor=4 \
  -Dpekko.test.stream-dispatcher.fork-join-executor.virtualize=on \
  -Dpekko.test.stream-dispatcher.fork-join-executor.minimum-runnable=8 \
  -Dpekko.actor.default-dispatcher.fork-join-executor.virtualize=on \
  -Dpekko.actor.default-dispatcher.fork-join-executor.minimum-runnable=8 \
  -Dpekko.actor.internal-dispatcher.fork-join-executor.virtualize=on \
  -Dpekko.actor.internal-dispatcher.fork-join-executor.minimum-runnable=8 \
  "project stream-tests-tck" \
  "testOnly org.apache.pekko.stream.tck.InputStreamSourceTest"

Result: 26 passing / 0 failing / 12 canceled (TCK optional multi-subscriber specs).

scalafmt was run against the edited file.

References

…eTest for JDK 25 virtualized nightly

Motivation:
JDK 25 nightly runs abort the stream TCK with `Failed to stop
[InputStreamSourceTest] within [40000 milliseconds]` after the
CoordinatedShutdown `actor-system-terminate` phase times out at its
default 10 seconds. The dump shows two `flow-X-0-take` ActorGraphInterpreter
children stuck mid-termination under the StreamSupervisor.

The test feeds a CPU-busy `InputStream` whose `read()` always returns a
fresh byte without blocking or yielding, so each `onPull` runs up to
`chunkSize` synchronous `read()` calls. The nightly JDK 25 build forces
`pekko.test.stream-dispatcher.fork-join-executor.virtualize=on`, which is
the very dispatcher the test pins via `ActorAttributes.dispatcher(...)`.
On a virtualized dispatcher this combination slows cancellation
propagation through `take(elements)` enough that the 10 second phase
timeout fires before the lingering flow actors finish terminating, even
though the outer `ActorSystemLifecycle.shutdownTimeout` is already scaled
to 40 seconds by `pekko.test.timefactor`.

Modification:
Override `additionalConfig` in `InputStreamSourceTest` to extend
`pekko.coordinated-shutdown.phases.actor-system-terminate.timeout` to
30 seconds, mirroring the pattern already used in
`MixedProtocolClusterSpec` for the same JDK 25 virtualized failure mode.
The override layers on top of `PekkoPublisherVerification.additionalConfig`
via `withFallback` so existing buffer-size settings are preserved.

Result:
The phase has enough headroom to drain in-flight cancellation traffic on
virtualized dispatchers before the outer shutdown await fires. Verified
locally on JDK 25 (Oracle OpenJDK 25.0.2) with the same virtualize/timefactor
flags as `nightly-builds.yml`: `sbt "project stream-tests-tck"
"testOnly org.apache.pekko.stream.tck.InputStreamSourceTest"` reports
26 passing / 0 failing / 12 canceled (TCK optional multi-subscriber
specs).

References:
nightly-builds.yml `jdk-nightly-build` matrix entry javaVersion=25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant