test: extend actor-system-terminate phase timeout in InputStreamSourceTest for JDK 25 nightly#2994
Open
He-Pin wants to merge 1 commit into
Open
Conversation
…eTest for JDK 25 virtualized nightly Motivation: JDK 25 nightly runs abort the stream TCK with `Failed to stop [InputStreamSourceTest] within [40000 milliseconds]` after the CoordinatedShutdown `actor-system-terminate` phase times out at its default 10 seconds. The dump shows two `flow-X-0-take` ActorGraphInterpreter children stuck mid-termination under the StreamSupervisor. The test feeds a CPU-busy `InputStream` whose `read()` always returns a fresh byte without blocking or yielding, so each `onPull` runs up to `chunkSize` synchronous `read()` calls. The nightly JDK 25 build forces `pekko.test.stream-dispatcher.fork-join-executor.virtualize=on`, which is the very dispatcher the test pins via `ActorAttributes.dispatcher(...)`. On a virtualized dispatcher this combination slows cancellation propagation through `take(elements)` enough that the 10 second phase timeout fires before the lingering flow actors finish terminating, even though the outer `ActorSystemLifecycle.shutdownTimeout` is already scaled to 40 seconds by `pekko.test.timefactor`. Modification: Override `additionalConfig` in `InputStreamSourceTest` to extend `pekko.coordinated-shutdown.phases.actor-system-terminate.timeout` to 30 seconds, mirroring the pattern already used in `MixedProtocolClusterSpec` for the same JDK 25 virtualized failure mode. The override layers on top of `PekkoPublisherVerification.additionalConfig` via `withFallback` so existing buffer-size settings are preserved. Result: The phase has enough headroom to drain in-flight cancellation traffic on virtualized dispatchers before the outer shutdown await fires. Verified locally on JDK 25 (Oracle OpenJDK 25.0.2) with the same virtualize/timefactor flags as `nightly-builds.yml`: `sbt "project stream-tests-tck" "testOnly org.apache.pekko.stream.tck.InputStreamSourceTest"` reports 26 passing / 0 failing / 12 canceled (TCK optional multi-subscriber specs). References: nightly-builds.yml `jdk-nightly-build` matrix entry javaVersion=25
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The JDK 25 nightly build aborts the stream TCK with:
The
printTreedump shows twoflow-X-0-takeActorGraphInterpreterchildren stuck mid-termination under
StreamSupervisor-0.InputStreamSourceTestfeeds a CPU-busyInputStreamwhoseread()always returns a fresh byte without blocking or yielding, so each
onPullruns up tochunkSizesynchronousread()calls. The JDK 25nightly forces
pekko.test.stream-dispatcher.fork-join-executor.virtualize=on(see
.github/workflows/nightly-builds.yml), which is the verydispatcher the test pins via
ActorAttributes.dispatcher(...). On avirtualized dispatcher, cancellation propagation through
take(elements)is slow enough that the default 10 s
actor-system-terminatephasetimeout fires before the lingering flow actors finish terminating, even
though the outer
ActorSystemLifecycle.shutdownTimeoutis alreadyscaled to 40 s by
pekko.test.timefactor(#2885).Modification
Override
additionalConfiginInputStreamSourceTestto bumppekko.coordinated-shutdown.phases.actor-system-terminate.timeoutto30 s, mirroring the pattern already used in
MixedProtocolClusterSpecfor the same JDK 25 virtualized failure mode. The override layers on top
of
PekkoPublisherVerification.additionalConfigviawithFallbacksothe existing buffer-size settings are preserved.
This is the smallest viable fix — it does not change production code,
does not alter the test's semantics, and does not relax any other
TCK timing.
Result
The phase has enough headroom to drain in-flight cancellation traffic on
virtualized dispatchers before the outer shutdown await fires. The other
TCK tests are untouched and keep their default 10 s phase timeout.
Tests
Locally on
Oracle OpenJDK 25.0.2(arm64) with the same flags asnightly-builds.yml jdk-nightly-build:Result: 26 passing / 0 failing / 12 canceled (TCK optional multi-subscriber specs).
scalafmtwas run against the edited file.References
.github/workflows/nightly-builds.ymljdk-nightly-buildmatrix javaVersion=25pekko.test.timefactorcluster/src/test/scala/org/apache/pekko/cluster/MixedProtocolClusterSpec.scala:40— same pattern for JDK 25 virtualized failures