Skip to content

Commit

Permalink
Use scheduleWithFixedDelay instead of scheduleAtFixedRate for jav…
Browse files Browse the repository at this point in the history
…a producer batch timer (apache#14125)

Fixes apache#11100 

### Motivation

We believe that the use of `scheduleAtFixedRate` in the java producer's batch timer can result in unnecessarily high thread usage, which can become especially problematic for applications that start many producers. 

### Modifications

Replaced the use of `scheduleAtFixedRate` with `scheduleWithFixedDelay`, which is the same behavior as previously in 2.6.x. The producer's parameter `batchingMaxPublishDelay` implies the use of the "delay" method instead of "rate" method as well.

### Verifying this change

- [x] Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as existing pulsar client producer tests.

Testing of the performance regression can be demonstrated by using [this](https://github.com/klevy-toast/dropwizard-pulsar-test) artifact and comparing a recent release of pulsar client with a manually built SNAPSHOT version with this change:

#### Version 2.7.1 CPU & thread behavior

- While sending messages
<img width="1632" alt="image" src="https://user-images.githubusercontent.com/42187013/152588959-8ee4beb9-70f3-4ad8-9132-240d4498dda5.png">
- While running idle producers
<img width="1613" alt="image" src="https://user-images.githubusercontent.com/42187013/152589079-b45fce49-757a-4bfd-8ddd-c438774ecf41.png">
- 30 second profile while sending messages
<img width="1295" alt="image" src="https://user-images.githubusercontent.com/42187013/152589222-54732bf3-44d7-40b8-8c6b-03b54ba01090.png">

#### Version 2.10.0-SNAPSHOT CPU & thread behavior
- While sending messages
<img width="1615" alt="image" src="https://user-images.githubusercontent.com/42187013/152589391-ae243e7a-5f1f-40b7-a77c-7e3d12a84c8e.png">
- While running idle producers
<img width="1603" alt="image" src="https://user-images.githubusercontent.com/42187013/152589436-784d9c56-043e-41fa-95e8-6a721e0adc78.png">
- 30 second profile while sending messages
<img width="1289" alt="image" src="https://user-images.githubusercontent.com/42187013/152589619-f274545d-b9f9-48e8-8b02-e226c6dec59e.png">

These samples show fewer threads running with this change compared to 2.7.1, less time spend in `batchMessageAndSend`, and overall lower CPU usage -- note that this testing was done on a laptop machine, and we have observed, along with [other users](apache#11100 (comment)) that this CPU regression can be much worse with more producers, smaller batch intervals, and on deployed cloud applications.

### Does this pull request potentially affect one of the following parts:

*If `yes` was chosen, please highlight the changes*

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API: (no)
  - The schema: (no)
  - The default values of configurations: (no)
  - The wire protocol: (no)
  - The rest endpoints: (no)
  - The admin cli options: (no)
  - Anything that affects deployment: (no)

### Documentation

Check the box below or label this PR directly (if you have committer privilege).

Need to update docs? 

- [x] `no-need-doc` 
  
The general behavior of the batch timer feature should not be changing
  • Loading branch information
klevy-toast authored and nicklixinyang committed Apr 20, 2022
1 parent 5c2df20 commit c15e6ca
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -1575,7 +1575,7 @@ public void connectionOpened(final ClientCnx cnx) {
if (!producerCreatedFuture.isDone() && isBatchMessagingEnabled()) {
// schedule the first batch message task
batchTimerTask = cnx.ctx().executor()
.scheduleAtFixedRate(catchingAndLoggingThrowables(() -> {
.scheduleWithFixedDelay(catchingAndLoggingThrowables(() -> {
if (log.isTraceEnabled()) {
log.trace(
"[{}] [{}] Batching the messages from the batch container from "
Expand Down

0 comments on commit c15e6ca

Please sign in to comment.