[improve][pip] PIP-434: Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf and pause receive requests when channel is unwritable #24510

poorbarcode · 2025-07-15T03:33:01Z

Motivation & Modifications

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: x

lhotari · 2025-07-15T09:10:37Z

pip/pip-434.md

+## In Scope
+- Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf
+- Stops receive requests continuously once the Netty channel is unwritable, users can use the new config to control the threshold that limits the max bytes that are pending write.


besides configuring the high water mark and the low water mark, there should be a way to disable the action (ServerCnxThrottleTracker's
incrementThrottleCount/decrementThrottleCount) that is executed in the channelWritabilityChanged callback method.

Added a new param pulsarChannelPauseReceivingRequestsIfUnwritable=false

The new parameter resolves this. It would be also be useful to describe the implementation detail of handling the action in channelWritabilityChanged and the usage of ServerCnxThrottleTracker in this action (somewhere in the PIP document).

lhotari · 2025-07-15T09:16:25Z

pip/pip-434.md

+> Our clusters are configured with numerous namespaces, each containing approximately 8,000 to 10,000 topics. Our consumer applications are quite large, with each consumer using a regular expression (regex) pattern to subscribe to all topics within a namespace.
+
+> The problem manifests particularly during consumer application restarts. When a consumer restarts, it issues a getTopicsOfNamespace request. Due to the sheer number of topics, the response size is extremely large. This massive response overwhelms the socket output buffer, causing it to fill up rapidly. Consequently, the broker's responses get backlogged in memory, eventually leading to the broker's OOM and subsequent restart loop.


Throttling incoming requests for individual channels (connections) isn't a very effective way to solve this problem. However, I do support adding a way to configure WRITE_BUFFER_WATER_MARK parameters for the Netty channel.

There will be a negative impact to performance in cases where the write buffer size exceeds due to other reasons. This is due to the fact that there will be less pipelining (parallel processing) of incoming requests when incoming requests are throttled whenever the channel isn't writable. For example, this will negatively impact the latency of publish (send) requests while consumers are dispatching messages (and the messages in the write buffer cause the size of the write buffer to exceed the high water mark).

I have explained this in the mailing list thread, but the arguments seem to have been ignored there, so let's handle the argument here.

I have explained this in the mailing list thread, but the arguments seem to have been ignored there, so let's handle the argument here.

It was not ignored. I have answered it in the mailing list thread

Once the channel is not writable, all requests that are sent to this
channel can not receive a reply anymore because the response can not be
written out. The results are the same; the clients receive replies delayed.
To improve performance, users should consider using more channels; in other
words, they can set a bigger connectionsPerBroker or separate clients.

For example, this will negatively impact the latency of publish (send) requests while consumers are dispatching messages (and the messages in the write buffer cause the size of the write buffer to exceed the high water mark).

Yes, it will. However, once the channel is not writable, the producer cannot receive a send response, which also delays message publishing. And we have exposed the 3 configurations to users, they can adjust them if needed

Yes, it will. However, once the channel is not writable, the producer cannot receive a send response, which also delays message publishing.

Yes, I agree that from the client's perspective, it would be delayed since it wouldn't receive a response immediately. However, there's an important difference here. When incoming send requests are continued to be processed, but the messages will get persisted and processed by the broker and therefore pipelining is happening. Allowing pipelining is important for Pulsar's performance, where it is common that the same connection is shared for multiple purposes (consumers, producers, etc.).

And we have exposed the 3 configurations to users, they can adjust them if needed

Sure, users can tune the parameters and that makes sense. However, the default values and configuration examples should contain useful examples. The default high watermark value should be relatively high so that it wouldn't impact use cases where there's consumers and producers sharing the same connection.

1

426007f

poorbarcode self-assigned this Jul 15, 2025

github-actions bot added PIP doc-required Your PR changes impact docs and you will update later. labels Jul 15, 2025

poorbarcode and others added 2 commits July 15, 2025 11:41

Update pip-434.md

f459966

2

dcbe255

lhotari reviewed Jul 15, 2025

View reviewed changes

Update pip-434.md

09281bc

poorbarcode requested a review from lhotari July 16, 2025 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[improve][pip] PIP-434: Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf and pause receive requests when channel is unwritable #24510

[improve][pip] PIP-434: Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf and pause receive requests when channel is unwritable #24510

poorbarcode commented Jul 15, 2025 •

edited

Loading

Uh oh!

lhotari Jul 15, 2025

Uh oh!

poorbarcode Jul 16, 2025

Uh oh!

lhotari Jul 16, 2025

Uh oh!

lhotari Jul 15, 2025 •

edited

Loading

Uh oh!

poorbarcode Jul 16, 2025 •

edited

Loading

Uh oh!

lhotari Jul 16, 2025

Uh oh!

Uh oh!

		> Our clusters are configured with numerous namespaces, each containing approximately 8,000 to 10,000 topics. Our consumer applications are quite large, with each consumer using a regular expression (regex) pattern to subscribe to all topics within a namespace.

		> The problem manifests particularly during consumer application restarts. When a consumer restarts, it issues a getTopicsOfNamespace request. Due to the sheer number of topics, the response size is extremely large. This massive response overwhelms the socket output buffer, causing it to fill up rapidly. Consequently, the broker's responses get backlogged in memory, eventually leading to the broker's OOM and subsequent restart loop.

[improve][pip] PIP-434: Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf and pause receive requests when channel is unwritable #24510

Are you sure you want to change the base?

[improve][pip] PIP-434: Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf and pause receive requests when channel is unwritable #24510

Conversation

poorbarcode commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation & Modifications

Documentation

Matching PR in forked repository

Uh oh!

lhotari Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

poorbarcode Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

lhotari Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

lhotari Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

poorbarcode Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lhotari Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

poorbarcode commented Jul 15, 2025 •

edited

Loading

lhotari Jul 15, 2025 •

edited

Loading

poorbarcode Jul 16, 2025 •

edited

Loading