Skip to content

Conversation

Aaronontheweb
Copy link
Member

@Aaronontheweb Aaronontheweb commented Aug 19, 2025

Changes

fixes #7770 - but this is just a prototype
supersedes, closes #7771

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

This commit resolves a deadlock that occurs when running tests in parallel, where the initial TestActor creation gets stuck during async initialization with CallingThreadDispatcher.

The root cause was that SystemActorOf hardcodes async=true initialization, creating a RepointableActorRef that requires processing a Supervise system message. With CallingThreadDispatcher, this creates a circular dependency:
- TestKit constructor blocks waiting for TestActor initialization
- CallingThreadDispatcher only runs on the calling thread
- The calling thread is blocked, so Supervise message never gets processed

The solution bypasses SystemActorOf and directly calls AttachChild with async=false, enabling true synchronous initialization while preserving full system integration including supervision tree and mailbox configuration.

This maintains compatibility with CallingThreadDispatcher for deterministic testing while eliminating startup deadlocks in parallel test scenarios.

Resolves issue where TestProbe child actor creation and implicit sender functionality would fail due to incomplete TestActor initialization.
@Aaronontheweb Aaronontheweb force-pushed the spike-synchronous-testactor-start branch from fedcd76 to c152a8f Compare August 20, 2025 00:56
- Use AttachChild with isSystemService=true to exempt TestActor from serialization verification
- Resolves 700+ test failures caused by UnboundedChannelWriter serialization errors
@Aaronontheweb Aaronontheweb force-pushed the spike-synchronous-testactor-start branch from c152a8f to 50bf0db Compare August 20, 2025 01:07
Resolves deadlock that occurs when TestKit instances are created in parallel
and actors try to interact with TestActor during initialization. The issue
was caused by CallingThreadDispatcher creating RepointableActorRef which
requires async initialization, leading to deadlocks.

Changes:
- Add AttachChildWithAsync internal method to ActorCell to control sync/async actor creation
- Modify TestKitBase to create TestActor synchronously (LocalActorRef) instead of async (RepointableActorRef)
- Update Xunit/Xunit2 TestKits to create logger actors synchronously
- Replace Ask with Tell for logger initialization to avoid synchronous wait deadlocks
- Add InternalsVisibleTo for Xunit TestKits to access internal Akka methods
- Maintain LoggerInitialized response for protocol compatibility (has IDeadLetterSuppression)

Fixes akkadotnet#7770
@Aaronontheweb Aaronontheweb added akka-testkit Akka.NET Testkit issues perf labels Aug 20, 2025
Copy link
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed my changes


// Create logger actor synchronously to avoid deadlock during parallel test execution
// Use AttachChildWithAsync with isAsync:false to create LocalActorRef instead of RepointableActorRef
var logger = systemImpl.Provider.SystemGuardian.Cell.AttachChildWithAsync(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the new AttachChildWithAsync method, which allows a top-level actor to start synchronously. Should really only be used during testing. But this means we don't have to wait for the logger to reply back - it should "just work"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This eliminates 1 source of deadlock / contention pressure when running lots of tests in parallel

loggerTask.ConfigureAwait(false).GetAwaiter().GetResult();
// Create logger actor synchronously to avoid deadlock during parallel test execution
// Use AttachChildWithAsync with isAsync:false to create LocalActorRef instead of RepointableActorRef
var logger = systemImpl.Provider.SystemGuardian.Cell.AttachChildWithAsync(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same category of fix from before

[assembly: System.Runtime.CompilerServices.InternalsVisibleToAttribute("Akka.Streams.Tests")]
[assembly: System.Runtime.CompilerServices.InternalsVisibleToAttribute("Akka.TestKit")]
[assembly: System.Runtime.CompilerServices.InternalsVisibleToAttribute("Akka.TestKit.Tests")]
[assembly: System.Runtime.CompilerServices.InternalsVisibleToAttribute("Akka.TestKit.Xunit")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to make the xUnit testkits friend assemblies in order to access the AttachChildWithAsync


namespace Akka.TestKit.Tests.TestActorRefTests
{
public class ParallelTestActorDeadlockSpec
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repurposed @Arkatufus 's reproduction from akkadotnet/Akka.Hosting#643 - got it to pass with these changes. Both the logger and the test initialization changes were required.


var systemImpl = system.AsInstanceOf<ActorSystemImpl>();
// Use the new AttachChildWithAsync method to create TestActor synchronously
var testActor = systemImpl.Provider.SystemGuardian.Cell.AttachChildWithAsync(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key fix: start the actor with this method so it's automatically initialized with an ActorCell - no need to spin and check afterwards

/// <returns>A reference to the initialized child actor.</returns>
internal IActorRef AttachChildWithAsync(Props props, bool isSystemService, bool isAsync, string? name = null)
{
return MakeChild(props, name == null ? GetRandomActorName() : CheckName(name), isAsync, isSystemService);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key fix - the ability to attach a system service actor and have it start synchronously.

This, very likely, screws with actor supervision if there are any crashes at startup, hence why this API is internal and should not be used outside of this one narrow case.

@Aaronontheweb Aaronontheweb marked this pull request as ready for review August 20, 2025 21:46
@Aaronontheweb Aaronontheweb changed the title Spike: synchronous TestActor start TestKit: synchronous TestActor start Aug 20, 2025
The test had a race condition where the PingerActor sends 'ping' to TestActor
during PreStart, but the test was expecting 'test-message' first. This could
cause ExpectMsgAsync to receive the wrong message and fail.

Fixed by properly expecting the 'ping' message first before sending and
expecting the 'test-message'.
Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb merged commit bf44386 into akkadotnet:dev Aug 21, 2025
8 of 11 checks passed
@Aaronontheweb Aaronontheweb deleted the spike-synchronous-testactor-start branch August 21, 2025 19:12
Arkatufus added a commit to Arkatufus/akka.net that referenced this pull request Aug 21, 2025
* Force synchronous start for `TestActor`

fix akkadotnet#7770

* separate creation of implicit, default `TestActor` from additional ones

* force `TestActor` to start via CTD tweak instead

* don't wait for `TestActor` to start

* Revert "don't wait for `TestActor` to start"

This reverts commit bdd77f9.

* run default `TestActor` without `CallingThreadDispatcher`

* fix TestKit deadlock during parallel test execution

This commit resolves a deadlock that occurs when running tests in parallel, where the initial TestActor creation gets stuck during async initialization with CallingThreadDispatcher.

The root cause was that SystemActorOf hardcodes async=true initialization, creating a RepointableActorRef that requires processing a Supervise system message. With CallingThreadDispatcher, this creates a circular dependency:
- TestKit constructor blocks waiting for TestActor initialization
- CallingThreadDispatcher only runs on the calling thread
- The calling thread is blocked, so Supervise message never gets processed

The solution bypasses SystemActorOf and directly calls AttachChild with async=false, enabling true synchronous initialization while preserving full system integration including supervision tree and mailbox configuration.

This maintains compatibility with CallingThreadDispatcher for deterministic testing while eliminating startup deadlocks in parallel test scenarios.

Resolves issue where TestProbe child actor creation and implicit sender functionality would fail due to incomplete TestActor initialization.

* Fix TestKit serialization issue

- Use AttachChild with isSystemService=true to exempt TestActor from serialization verification
- Resolves 700+ test failures caused by UnboundedChannelWriter serialization errors

* still working on synchronous `TestActor` startup

* Fix TestKit deadlock during parallel test execution

Resolves deadlock that occurs when TestKit instances are created in parallel
and actors try to interact with TestActor during initialization. The issue
was caused by CallingThreadDispatcher creating RepointableActorRef which
requires async initialization, leading to deadlocks.

Changes:
- Add AttachChildWithAsync internal method to ActorCell to control sync/async actor creation
- Modify TestKitBase to create TestActor synchronously (LocalActorRef) instead of async (RepointableActorRef)
- Update Xunit/Xunit2 TestKits to create logger actors synchronously
- Replace Ask with Tell for logger initialization to avoid synchronous wait deadlocks
- Add InternalsVisibleTo for Xunit TestKits to access internal Akka methods
- Maintain LoggerInitialized response for protocol compatibility (has IDeadLetterSuppression)

Fixes akkadotnet#7770

* added API approvals

* remove `EnsureTestActorReady` method

* API approvals

* ensure  calls can't get contaminated with  references

* fix API approvals

* Fix race condition in ParallelTestActorDeadlockSpec

The test had a race condition where the PingerActor sends 'ping' to TestActor
during PreStart, but the test was expecting 'test-message' first. This could
cause ExpectMsgAsync to receive the wrong message and fail.

Fixed by properly expecting the 'ping' message first before sending and
expecting the 'test-message'.
Aaronontheweb pushed a commit that referenced this pull request Aug 21, 2025
* Force synchronous start for `TestActor`

fix #7770

* separate creation of implicit, default `TestActor` from additional ones

* force `TestActor` to start via CTD tweak instead

* don't wait for `TestActor` to start

* Revert "don't wait for `TestActor` to start"

This reverts commit bdd77f9.

* run default `TestActor` without `CallingThreadDispatcher`

* fix TestKit deadlock during parallel test execution

This commit resolves a deadlock that occurs when running tests in parallel, where the initial TestActor creation gets stuck during async initialization with CallingThreadDispatcher.

The root cause was that SystemActorOf hardcodes async=true initialization, creating a RepointableActorRef that requires processing a Supervise system message. With CallingThreadDispatcher, this creates a circular dependency:
- TestKit constructor blocks waiting for TestActor initialization
- CallingThreadDispatcher only runs on the calling thread
- The calling thread is blocked, so Supervise message never gets processed

The solution bypasses SystemActorOf and directly calls AttachChild with async=false, enabling true synchronous initialization while preserving full system integration including supervision tree and mailbox configuration.

This maintains compatibility with CallingThreadDispatcher for deterministic testing while eliminating startup deadlocks in parallel test scenarios.

Resolves issue where TestProbe child actor creation and implicit sender functionality would fail due to incomplete TestActor initialization.

* Fix TestKit serialization issue

- Use AttachChild with isSystemService=true to exempt TestActor from serialization verification
- Resolves 700+ test failures caused by UnboundedChannelWriter serialization errors

* still working on synchronous `TestActor` startup

* Fix TestKit deadlock during parallel test execution

Resolves deadlock that occurs when TestKit instances are created in parallel
and actors try to interact with TestActor during initialization. The issue
was caused by CallingThreadDispatcher creating RepointableActorRef which
requires async initialization, leading to deadlocks.

Changes:
- Add AttachChildWithAsync internal method to ActorCell to control sync/async actor creation
- Modify TestKitBase to create TestActor synchronously (LocalActorRef) instead of async (RepointableActorRef)
- Update Xunit/Xunit2 TestKits to create logger actors synchronously
- Replace Ask with Tell for logger initialization to avoid synchronous wait deadlocks
- Add InternalsVisibleTo for Xunit TestKits to access internal Akka methods
- Maintain LoggerInitialized response for protocol compatibility (has IDeadLetterSuppression)

Fixes #7770

* added API approvals

* remove `EnsureTestActorReady` method

* API approvals

* ensure  calls can't get contaminated with  references

* fix API approvals

* Fix race condition in ParallelTestActorDeadlockSpec

The test had a race condition where the PingerActor sends 'ping' to TestActor
during PreStart, but the test was expecting 'test-message' first. This could
cause ExpectMsgAsync to receive the wrong message and fail.

Fixed by properly expecting the 'ping' message first before sending and
expecting the 'test-message'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
akka-testkit Akka.NET Testkit issues perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestKit deadlock during parallel test execution with TestActor initialization
2 participants