OF-3176: Offload blocking Netty handlers to dedicated executor by guusdk · Pull Request #3143 · igniterealtime/Openfire

guusdk · 2026-02-25T18:35:21Z

Introduce a shared blocking handler executor for Netty pipelines to ensure that potentially blocking or long-running operations do not execute on EventLoop threads.

This change separates responsibilities between:

acceptor (boss) EventLoopGroup
non-blocking I/O worker EventLoopGroup
dedicated executor for blocking pipeline handlers

The new executor is shared across all channels created by a NettyConnectionAcceptor and is used for handlers that may perform authentication, routing, persistence, or other blocking work.

This improves throughput and stability under load by preventing EventLoop starvation and aligns Openfire’s Netty usage with Netty best practices.

The connection configuration’s "max thread pool size" is now applied to the dedicated blocking handler executor. Netty EventLoopGroups (acceptor and I/O worker) now use Netty’s default thread sizing, as they are reserved exclusively for non-blocking socket I/O and protocol framing.

Copilot

Pull request overview

This PR introduces a dedicated EventExecutorGroup for potentially blocking Netty pipeline handlers, to keep Netty EventLoop threads focused on non-blocking I/O and reduce EventLoop starvation under load.

Changes:

Add a shared blocking-handler executor to NettyConnectionAcceptor and run core “business logic” handlers on that executor.
Extend plugin handler injection to receive an executor, enabling plugins to offload their handlers from EventLoop threads.
Update outbound S2S session initialization to also offload its business logic handler to a dedicated executor, plus documentation updates describing the new threading model.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`xmppserver/src/main/java/org/jivesoftware/openfire/spi/NettyServerInitializer.java`	Offloads the core business logic handler to a provided blocking executor; passes executor to plugin handler factories.
`xmppserver/src/main/java/org/jivesoftware/openfire/spi/NettyConnectionAcceptor.java`	Introduces and manages lifecycle of a shared blocking handler executor; updates bootstrap wiring and handler factory calls.
`xmppserver/src/main/java/org/jivesoftware/openfire/nio/NettySessionInitializer.java`	Offloads outbound S2S business logic to a new executor; introduces per-instance executor and thread naming changes.
`xmppserver/src/main/java/org/jivesoftware/openfire/nio/NettyChannelHandlerFactory.java`	Changes plugin extension interface to accept an executor for handler execution.
`documentation/internal-networking.html`	Documents the new dedicated executor concept alongside boss/worker groups.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xmppserver/src/main/java/org/jivesoftware/openfire/nio/NettyChannelHandlerFactory.java

xmppserver/src/main/java/org/jivesoftware/openfire/nio/NettySessionInitializer.java

xmppserver/src/main/java/org/jivesoftware/openfire/spi/NettyConnectionAcceptor.java

xmppserver/src/main/java/org/jivesoftware/openfire/nio/NettySessionInitializer.java

guusdk · 2026-02-27T20:30:14Z

Does this solution void the fixes for https://igniterealtime.atlassian.net/browse/OF-3180 in #3140 ?

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xmppserver/src/main/java/org/jivesoftware/openfire/nio/NettySessionInitializer.java

xmppserver/src/main/java/org/jivesoftware/openfire/nio/NettyConnectionHandler.java

xmppserver/src/main/java/org/jivesoftware/openfire/spi/ConnectionListener.java

xmppserver/src/main/java/org/jivesoftware/openfire/spi/NettyServerInitializer.java

xmppserver/src/main/java/org/jivesoftware/openfire/nio/NettySessionInitializer.java

xmppserver/src/main/java/org/jivesoftware/openfire/spi/NettyConnectionAcceptor.java

xmppserver/src/main/java/org/jivesoftware/openfire/spi/ConnectionListener.java

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

xmppserver/src/main/java/org/jivesoftware/openfire/spi/NettyServerInitializer.java:1

The channelActive handler that starts DirectTLS + enables autoRead is added after the business logic handler (which is offloaded to blockingHandlerExecutor). This means DirectTLS startup (and autoRead enabling) is sequenced behind the offloaded handler’s channelActive, which can delay TLS initiation or even prevent it under load (and can also allow earlier handlers to observe channelActive before TLS is started). Add this channelActive handler earlier in the pipeline (e.g., addFirst, or addBefore the business logic handler) so TLS startup and autoRead enabling happen before any potentially-blocking/offloaded handler can run.

/*

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xmppserver/src/main/java/org/jivesoftware/openfire/spi/NettyConnectionAcceptor.java

xmppserver/src/main/java/org/jivesoftware/openfire/spi/NettyServerInitializer.java

xmppserver/src/main/java/org/jivesoftware/openfire/nio/NettySessionInitializer.java

Introduce a shared blocking handler executor for Netty pipelines to ensure that potentially blocking or long-running operations do not execute on EventLoop threads. This change separates responsibilities between: - acceptor (boss) EventLoopGroup - non-blocking I/O worker EventLoopGroup - dedicated executor for blocking pipeline handlers The new executor is shared across all channels created by a NettyConnectionAcceptor and is used for handlers that may perform authentication, routing, persistence, or other blocking work. This improves throughput and stability under load by preventing EventLoop starvation and aligns Openfire’s Netty usage with Netty best practices. The connection configuration’s "max thread pool size" is now applied to the dedicated blocking handler executor. Netty EventLoopGroups (acceptor and I/O worker) now use Netty’s default thread sizing, as they are reserved exclusively for non-blocking socket I/O and protocol framing.

…ing Netty handlers A recent change switched server-side (NettyServerInitializer) and outbound client (NettySessionInitializer) pipelines to use a separate EventExecutorGroup for blocking handlers, to prevent long-running operations from exhausting the Netty EventLoop (OF-3176). This introduced a race condition: if data arrived immediately after a channel became active, it could reach the pipeline before the handler was fully registered on the executor, leaving it queued until the executor ran or the acceptor was stopped. Occassional, but flaky, failures of the LocalIncomingServerSessionTest unit tests were a symptom of this race condition. Fix: - Disable autoRead on the channel initially. - Re-enable autoRead only after the channel is fully registered. This restores reliable processing of incoming data on both server and outbound sessions while still preventing Netty EventLoop starvation from blocking operations.

…eturns NettyConnectionAcceptor.start() previously returned as soon as the server socket was bound. While this guarantees that the port is open, it does not ensure that the Netty boss EventLoop has completed acceptor initialization. After the recent change to offload blocking handlers to a separate EventExecutorGroup, this timing window became visible in tests: clients connecting immediately after startup could race ahead of acceptor readiness, leading to intermittent failures. Fix: - Add an explicit readiness barrier by scheduling a task on the boss EventLoop and waiting for it to execute before returning from start(). - This guarantees that the server channel is bound, registered, and fully initialized on the Netty event loop before accepting connections. This change provides deterministic startup semantics for both production use and tests, without affecting per-connection pipeline initialization or runtime behavior.

…eter Replace raw StanzaHandler usage with a generic type parameter to ensure type safety and remove the need for casting in subclasses.

…tion Ensure callers can reliably wait for RespondingServerStanzaHandler creation by exposing a future that completes from handlerAdded(). This prevents intermittent failures caused by accessing the handler before it is attached to the Netty pipeline, without introducing blocking or timing-based workarounds.

…ry per review feedback Keep the original addNewHandlerTo(ChannelPipeline) method and add an executor-aware overload as default, avoiding breaking existing plugin implementations.

The original javadoc was copied from a very similar code structure that was added to NettyConnectionAcceptor.

…g on InterruptedException Previously, when start() was interrupted, closeMainChannel() was called but the thread's interrupted status was cleared. This could hide the interruption from higher-level code. Now, the interrupt flag is restored via Thread.currentThread().interrupt(), ensuring that the thread interruption is correctly propagated without changing existing stop/start behavior.

Previously, each NettySessionInitializer instance created its own NioEventLoopGroup and DefaultEventExecutorGroup, meaning every outbound S2S connection owned its own thread pools for its entire lifetime. This caused unbounded thread growth under load. Introduce static shared thread pools (sharedIoWorkerGroup and sharedBlockingHandlerExecutor) in NettySessionInitializer, managed via new startSharedResources() and stopSharedResources() static methods. All outbound S2S connections now share a single pair of pools.

…ctive for safe async handling Previously, DirectTLS startup was performed in initChannel or scheduled during channelRegistered, which could result in race conditions where incoming data arrived before the TLS handshake was initiated, potentially exposing plaintext. This change moves the TLS startup logic to channelActive, ensuring that the channel is fully connected before beginning the handshake. Auto-read is now enabled only after TLS has been successfully started, preventing premature reads. Additionally, references to the SocketChannel (ch) within handlers have been replaced with ctx.channel() for consistency and to avoid subtle bugs. The old channelRegistered auto-read scheduling has been removed, simplifying the pipeline and making the initialization logic safer for both NettySessionInitializer and NettyServerInitializer.

… to prevent half-initialized channels Previously, any exception thrown during handlerAdded (for example, in createNettyConnection or createStanzaHandler) could propagate out of the method, potentially leaving the Netty channel in a partially initialized state with missing attributes such as CONNECTION or HANDLER. This commit wraps the handlerAdded logic in a try-catch block, logs any initialization failures, safely closes the channel to avoid inconsistent state, and completes the stanzaHandlerFuture exceptionally if needed. This ensures that pipeline initialization is robust and that channels never remain half-initialized due to unexpected errors.

…nately This commit adds a (configurable) timeout to the startup of the main channel's startup.

…operties Replaced inline Netty channel options in NettyConnectionAcceptor with dedicated SystemProperty constants, replacing the older JiveGlobals approach. The new approach is a bit more type-safe, and is self-documenting. Added a couple of new properties for values that were up until now hard-coded.

…ndler sees channelActive In both NettyServerInitializer and NettySessionInitializer, the handler responsible for starting DirectTLS and enabling autoRead was appended to the end of the pipeline via a trailing addLast() call. Since channelActive propagates head-to-tail, this meant the business logic handler received channelActive before TLS was initiated and before autoRead was enabled. Move the tlsAndAutoReadHandler to immediately before businessLogicHandler in both pipelines, ensuring TLS startup and autoRead are sequenced correctly relative to business logic processing. Also give businessLogicHandler an explicit pipeline name for clarity.

…ter handshake completes Previously, tlsAndAutoReadHandler unconditionally called ctx.fireChannelActive() after starting DirectTLS, causing businessLogicHandler to receive channelActive before the TLS handshake had completed. After the prior fix that suppressed this premature propagation, a secondary issue emerged: userEventTriggered fired ctx.fireChannelActive() from businessLogicHandler's own context, meaning the event propagated only to handlers after it in the pipeline — so businessLogicHandler itself never received channelActive at all for DirectTLS connections. Fix by: - Changing ctx.fireChannelActive() to ctx.pipeline().fireChannelActive() in NettyConnectionHandler.userEventTriggered, so channelActive restarts from the head and reaches all handlers including businessLogicHandler - Guarding startTLS in tlsAndAutoReadHandler with connection.isEncrypted() to prevent a second TLS initialisation when channelActive re-propagates from the head through that handler The result is that for DirectTLS connections, businessLogicHandler receives channelActive exactly once, after the handshake has successfully completed.

… ready Enabling autoRead immediately after firing channelActive created a race condition where incoming data could be dispatched to the pipeline before the business logic handler had finished initialising on the blockingHandlerExecutor, causing sessions to occasionally fail to be established. Fix by gating autoRead on stanzaHandlerFuture, which completes once the handler has been fully initialised in handlerAdded. The setAutoRead(true) call is submitted back to the EventLoop via eventLoop().execute() to ensure it runs on the correct thread. Applies to both inbound (NettyServerInitializer) and outbound (NettySessionInitializer) pipelines.

This is mainly done to perform cache-busting for the Docker-image build inside CI.

The aioxmpp test suite contains a test (TestBookmarkClient.test__set_bookmarks_failure) that asserts a specific TypeError message when assigning a non-iterable to an extended slice. The expected message "can only assign an iterable" appears to have changed to "must assign iterable to extended slice" in Python 3.11.15, causing the test to fail. Pin the Python version used in the aioxmpp CI job to 3.11.14 until the issue is resolved upstream in the aioxmpp test suite.

akrherz requested a review from Copilot February 25, 2026 18:43

Copilot started reviewing on behalf of akrherz February 25, 2026 18:43 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

guusdk force-pushed the OF-3176_Netty-processing-blocking-events branch 2 times, most recently from 783be64 to e4c2680 Compare February 26, 2026 16:28

akrherz requested a review from Copilot February 27, 2026 20:40

Copilot started reviewing on behalf of akrherz February 27, 2026 20:41 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

guusdk force-pushed the OF-3176_Netty-processing-blocking-events branch 2 times, most recently from 22232ad to 999f6ca Compare March 3, 2026 15:02

akrherz requested a review from Copilot March 3, 2026 18:48

Copilot AI reviewed Mar 3, 2026

View reviewed changes

guusdk added 8 commits March 3, 2026 20:26

OF-3176: Type NettyConnectionHandler with generic StanzaHandler param…

891284e

…eter Replace raw StanzaHandler usage with a generic type parameter to ensure type safety and remove the need for casting in subclasses.

OF-3176: Preserve backward compatibility for NettyChannelHandlerFacto…

a1b3952

…ry per review feedback Keep the original addNewHandlerTo(ChannelPipeline) method and add an executor-aware overload as default, avoiding breaking existing plugin implementations.

OF-3176: Update javadoc for blockingHandlerExecutor per review feedback

25db4fb

The original javadoc was copied from a very similar code structure that was added to NettyConnectionAcceptor.

guusdk force-pushed the OF-3176_Netty-processing-blocking-events branch 2 times, most recently from 9f2e573 to 4a21fee Compare March 3, 2026 19:28

guusdk added 5 commits March 3, 2026 20:44

OF-3176: Ensure that Netty main channel startup does not block indefi…

afe42a8

…nately This commit adds a (configurable) timeout to the startup of the main channel's startup.

guusdk force-pushed the OF-3176_Netty-processing-blocking-events branch from 4a21fee to fa0772c Compare March 3, 2026 19:45

guusdk added 4 commits March 3, 2026 21:22

Bump Mockito from 5.4.0 to 5.22.0

1d6ade9

This is mainly done to perform cache-busting for the Docker-image build inside CI.

Uh oh!

Conversation

guusdk commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guusdk commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants