Skip to content

[improve][pip] PIP-434: Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf and pause receive requests when channel is unwritable #24510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Jul 15, 2025

Motivation & Modifications

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode self-assigned this Jul 15, 2025
@github-actions github-actions bot added PIP doc-required Your PR changes impact docs and you will update later. labels Jul 15, 2025
Comment on lines +21 to +23
## In Scope
- Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf
- Stops receive requests continuously once the Netty channel is unwritable, users can use the new config to control the threshold that limits the max bytes that are pending write.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides configuring the high water mark and the low water mark, there should be a way to disable the action (ServerCnxThrottleTracker's
incrementThrottleCount/decrementThrottleCount) that is executed in the channelWritabilityChanged callback method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new param pulsarChannelPauseReceivingRequestsIfUnwritable=false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameter resolves this. It would be also be useful to describe the implementation detail of handling the action in channelWritabilityChanged and the usage of ServerCnxThrottleTracker in this action (somewhere in the PIP document).

Comment on lines +11 to +13
> Our clusters are configured with numerous namespaces, each containing approximately 8,000 to 10,000 topics. Our consumer applications are quite large, with each consumer using a regular expression (regex) pattern to subscribe to all topics within a namespace.

> The problem manifests particularly during consumer application restarts. When a consumer restarts, it issues a getTopicsOfNamespace request. Due to the sheer number of topics, the response size is extremely large. This massive response overwhelms the socket output buffer, causing it to fill up rapidly. Consequently, the broker's responses get backlogged in memory, eventually leading to the broker's OOM and subsequent restart loop.
Copy link
Member

@lhotari lhotari Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throttling incoming requests for individual channels (connections) isn't a very effective way to solve this problem. However, I do support adding a way to configure WRITE_BUFFER_WATER_MARK parameters for the Netty channel.

There will be a negative impact to performance in cases where the write buffer size exceeds due to other reasons. This is due to the fact that there will be less pipelining (parallel processing) of incoming requests when incoming requests are throttled whenever the channel isn't writable. For example, this will negatively impact the latency of publish (send) requests while consumers are dispatching messages (and the messages in the write buffer cause the size of the write buffer to exceed the high water mark).

I have explained this in the mailing list thread, but the arguments seem to have been ignored there, so let's handle the argument here.

Copy link
Contributor Author

@poorbarcode poorbarcode Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have explained this in the mailing list thread, but the arguments seem to have been ignored there, so let's handle the argument here.

It was not ignored. I have answered it in the mailing list thread

Once the channel is not writable, all requests that are sent to this
channel can not receive a reply anymore because the response can not be
written out. The results are the same; the clients receive replies delayed.
To improve performance, users should consider using more channels; in other
words, they can set a bigger connectionsPerBroker or separate clients.


For example, this will negatively impact the latency of publish (send) requests while consumers are dispatching messages (and the messages in the write buffer cause the size of the write buffer to exceed the high water mark).

Yes, it will. However, once the channel is not writable, the producer cannot receive a send response, which also delays message publishing. And we have exposed the 3 configurations to users, they can adjust them if needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will. However, once the channel is not writable, the producer cannot receive a send response, which also delays message publishing.

Yes, I agree that from the client's perspective, it would be delayed since it wouldn't receive a response immediately. However, there's an important difference here. When incoming send requests are continued to be processed, but the messages will get persisted and processed by the broker and therefore pipelining is happening. Allowing pipelining is important for Pulsar's performance, where it is common that the same connection is shared for multiple purposes (consumers, producers, etc.).

And we have exposed the 3 configurations to users, they can adjust them if needed

Sure, users can tune the parameters and that makes sense. However, the default values and configuration examples should contain useful examples. The default high watermark value should be relatively high so that it wouldn't impact use cases where there's consumers and producers sharing the same connection.

@poorbarcode poorbarcode requested a review from lhotari July 16, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required Your PR changes impact docs and you will update later. PIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants