Skip to content

Conversation

Arkatufus
Copy link
Contributor

@Arkatufus Arkatufus commented Sep 24, 2025

Fixes #7838

Enforce ActorSystem shutdown on TLS handshake failures

Summary

  • Any TLS handshake failure now triggers CoordinatedShutdown with a clear reason (exit code 79), preventing a node from remaining "half‑healthy" via reverse connections.
  • Tests cover server/client TLS failures and verify the misconfigured side shuts down.

Changes

  • Transport (DotNetty)
    • Handle TlsHandshakeCompletionEvent failures in TcpTransport and run CoordinatedShutdown using a new TlsHandshakeFailureReason (exit code 79). Logs include channel endpoints and id.

New shutdown reason

  • Akka.Remote.Transport.DotNetty.TlsHandshakeFailureReason (exit code 79): used by TcpTransport and EndpointManager to terminate the misconfigured system.

Tests (Akka.Remote.Tests)

  • Tls_handshake_failure_should_be_logged_and_shutdown_server
  • Server_side_tls_handshake_failure_should_shutdown_server
  • Client_side_tls_handshake_failure_should_shutdown_client

Rationale

  • Akka.NET remote is peer‑to‑peer: any TLS misconfiguration in either role must cause the association to fail hard. Shutting down the misconfigured ActorSystem avoids split‑brain‑like symptoms where nodes can still send outbound but fail inbound.

Impact / Migration

  • Behavior change: nodes with TLS handshake failures now terminate (exit code 79). Ensure certificates are valid and accessible.
  • No public API changes; wire protocol unaffected.

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions and some definite must-do changes.

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean, very simple. Well done.

Assert.Contains("TLS handshake failed", msg, StringComparison.OrdinalIgnoreCase);

// Server should shutdown due to TLS failure
await AwaitAssertAsync(async () =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


namespace Akka.Remote.Transport.DotNetty
{
internal sealed class TlsHandshakeFailureReason : CoordinatedShutdown.Reason
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

$"TLS handshake failed on channel [{context.Channel.LocalAddress}->{context.Channel.RemoteAddress}](Id={context.Channel.Id})"));
// Shutdown the ActorSystem on TLS handshake failure
var cs = CoordinatedShutdown.Get(Transport.System);
cs.Run(new TlsHandshakeFailureReason($"TLS handshake failed on channel [{context.Channel.LocalAddress}->{context.Channel.RemoteAddress}](Id={context.Channel.Id})"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb merged commit 1ec6f9e into akkadotnet:v1.5 Sep 24, 2025
6 of 11 checks passed
Arkatufus added a commit to Arkatufus/akka.net that referenced this pull request Sep 24, 2025
* Fix TLS handshake error handling

* Simplify PR

* Simplify PR, remove new DisassociateInfo

* Clean whitespace noise

* cleanup, remove TlsHandshakeErrorAssociation

(cherry picked from commit 1ec6f9e)
Aaronontheweb pushed a commit that referenced this pull request Sep 24, 2025
* Fix TLS handshake error handling

* Simplify PR

* Simplify PR, remove new DisassociateInfo

* Clean whitespace noise

* cleanup, remove TlsHandshakeErrorAssociation

(cherry picked from commit 1ec6f9e)
@Arkatufus Arkatufus added this to the 1.5.51 milestone Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants