Disable backlog quota check by default. #4320

codelipenghui · 2019-05-21T07:10:53Z

Motivation

Previously we used producer_request_hold as the default backlog quota retention policy. But producer_request_hold is unfriendly for online business, even online business faces big risks. When users ask us to apply pulsar to the online environment, what should be paid attention to? Change the default backlog quota retention policy often mentioned by the first, and initially at zhaopin.com, we have also had such problems.

So, I propose to change the default backlog quota retention policy to consumer_backlog_eviction, from our practice shows that this is a choice for most situations and the most important thing is that this option allows users to avoid taking greater risks.

This change will affect the way our existing users use it. I want to discuss it together to make some trade-offs. If necessary, i can start a email thread.

Modifications

Change broker level default backlog quota retention policy to consumer_backlog_eviction.

If yes was chosen, please highlight the changes

Dependencies (does it add or upgrade a dependency): (no)
The public API: (no)
The schema: (no)
The default values of configurations: (yes)
The wire protocol: (no)
The rest endpoints: (no)
The admin cli options: (no)
Anything that affects deployment: (no)

Documentation

Does this pull request introduce a new feature? (no)

…onsumer_backlog_eviction.

sijie

LGTM. IMO - I would prefer always writable for an event streaming system.

merlimat

All the defaults we have are based on not deleting data unless explicitely asked to do so.

codelipenghui · 2019-05-22T06:57:05Z

@merlimat

If I didn’t understand it wrong, this is just to guarantee that the data that has been written is not be lost. If we look at the user's entire system, pulsar is the important part, serving real-time streaming data. If it becomes unwritable, this is also data loss for the user.

The core of the problem is how the user should handle when the backlog has exceeded the threshold, delete old data or reject new data. For streaming data platforms, i think rejecting new data is basically unacceptable. Users can use the retention mechanism to confirm how long they retain data.

For the backlog, this is for a subscriber. There will be one subscriber's backlog cause all new data to be rejected.

If the default configuration is inclined to not lost data. reject new data should also be considered.

If do not consider resource limitations, not limit the backlog by default can match both two . My suggestion is to prefer the configuration with the least loss to the user as the default configuration.

srkukarni · 2019-05-23T17:30:12Z

echoing @sijie thoughts, I believe that streaming systems usually prefer writable system. However at the same time I believe queuing systems prefer non-loss of any transactions. Since Pulsar supports both, this is a tricky issue.

merlimat · 2019-05-23T17:53:56Z

@codelipenghui

If I didn’t understand it wrong, this is just to guarantee that the data that has been written is not be lost. If we look at the user's entire system, pulsar is the important part, serving real-time streaming data. If it becomes unwritable, this is also data loss for the user.

Of course if the system is not-writable, the application will be impacted, though the main difference is that what was "acked" before is not lost.

Applications can apply fallback strategies when a downstream system is unavailable, from the simplest (fail user request) to more complex (degraded functioning mode), but they won't be able to know if "acked" data was dropped by the system. There would be no immediate way to communicate that to users, or to apply other strategies.

The core of the problem is how the user should handle when the backlog has exceeded the threshold, delete old data or reject new data. For streaming data platforms, i think rejecting new data is basically unacceptable. Users can use the retention mechanism to confirm how long they retain data.

The default option can be set in broker.conf and overridden per namespace, to adjust to needs.

For the backlog, this is for a subscriber. There will be one subscriber's backlog cause all new data to be rejected.

Without information around what's the importance of the subscriptions, it's difficult to generalize the judgement of which subscription should retain data.

Subscriptions are a way to explicitly retain data in the system.

If the default configuration is inclined to not lost data. reject new data should also be considered.

That's what the default setting is.

@srkukarni

I believe that streaming systems usually prefer writable system. However at the same time I believe queuing systems prefer non-loss of any transactions. Since Pulsar supports both, this is a tricky issue.

That also comes with the dichotomy between Consumer vs Reader.
In streaming mode, to me, a Reader makes more sense in that it does imply data retention, therefore it would never incur in a backlog quota violation.

@codelipenghui

If do not consider resource limitations, not limit the backlog by default can match both two . My suggestion is to prefer the configuration with the least loss to the user as the default configuration.

Yes, the default quota setting is very arbitrary (and very low). I'd be fine to disable backlog quota check by default. It will simplify many "gotchas", though that won't remove the reality that disk space is finite (even if most likely > 10GB), and applications need to reason about what to do when the limit is reached.

sijie · 2019-05-25T13:22:46Z

I'd be fine to disable backlog quota check by default.

I think this is a good tradeoff for satisfying both sides. @codelipenghui ?

codelipenghui · 2019-05-25T13:47:35Z

@sijie yes, agree.

sijie · 2019-05-27T10:31:37Z

@codelipenghui can you make the change based on Matteo's comment?

codelipenghui · 2019-05-28T02:05:45Z

@sijie I'm already disable backlog quota check by default.

codelipenghui · 2019-05-28T10:30:03Z

run Java8 Tests
run Integration Tests

merlimat · 2019-05-28T19:51:01Z

conf/broker.conf

@@ -68,13 +68,13 @@ zooKeeperOperationTimeoutSeconds=30
 brokerShutdownTimeoutMs=60000

 # Enable backlog quota check. Enforces action on topic when the quota is reached
-backlogQuotaCheckEnabled=true
+backlogQuotaCheckEnabled=false


This will also disable custom quota checks on namespaces. I think we should have a way to set backlogQuotaDefaultLimitGB to -1 for disabling the quota check by default. That would require some small adjustment in the code check.

codelipenghui · 2019-05-29T04:12:50Z

@merlimat I'm already address your comment. PTAL.

### Motivation Some parameters are added in the `broker.conf` and `standalone.conf` files. However, those parameters are not updated in the docs. See the following PRs for details: #4150, #4066, #4197, #3819, #4261, #4273, #4320. ### Modifications Add those parameter info, and sync docs with the code. Does not update the description quite much, there are two reasons for this: 1. Keep doc content consistent with code. We need to update the description for those parameters in the code first, and then sync them in docs. 2. Will adopt a generator to generate those content automatically in the near future.

### Motivation Some parameters are added in the `broker.conf` and `standalone.conf` files. However, those parameters are not updated in the docs. See the following PRs for details: apache#4150, apache#4066, apache#4197, apache#3819, apache#4261, apache#4273, apache#4320. ### Modifications Add those parameter info, and sync docs with the code. Does not update the description quite much, there are two reasons for this: 1. Keep doc content consistent with code. We need to update the description for those parameters in the code first, and then sync them in docs. 2. Will adopt a generator to generate those content automatically in the near future.

[broker-conf] Change broker level backlog quota retention policy to c…

2a2e59e

…onsumer_backlog_eviction.

codelipenghui requested review from ivankelly, sijie, merlimat, jiazhai, jerrypeng, srkukarni, jai1 and rdhabalia May 21, 2019 07:11

sijie reviewed May 21, 2019

View reviewed changes

merlimat requested changes May 21, 2019

View reviewed changes

sijie assigned codelipenghui May 27, 2019

sijie added area/config doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels May 27, 2019

disable backlog quota check by default.

4cce583

Fix unit tests.

a25dc44

codelipenghui changed the title ~~Change default backlog quota retention policy to consumer_backlog_eviction~~ Disable backlog quota check by default. May 28, 2019

merlimat reviewed May 28, 2019

View reviewed changes

codelipenghui added 2 commits May 29, 2019 12:04

SET default backlogQuotaDefaultLimitGB=-1

fea3e81

ADD disable backlog quota check for checkQuotas().

7b33736

merlimat approved these changes May 29, 2019

View reviewed changes

merlimat added this to the 2.4.0 milestone May 29, 2019

merlimat merged commit 84abb6c into apache:master May 29, 2019

codelipenghui deleted the config_broker_backlog_quota branch May 30, 2019 10:25

This was referenced Jan 13, 2020

[docs] Update parameters info in the reference-configuration file streamnative/pulsar-archived#558

Closed

[docs] Update configuration information for 2.4.x releases #6047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable backlog quota check by default. #4320

Disable backlog quota check by default. #4320

codelipenghui commented May 21, 2019

sijie left a comment

merlimat left a comment

codelipenghui commented May 22, 2019

srkukarni commented May 23, 2019

merlimat commented May 23, 2019

sijie commented May 25, 2019

codelipenghui commented May 25, 2019

sijie commented May 27, 2019

codelipenghui commented May 28, 2019

codelipenghui commented May 28, 2019

merlimat May 28, 2019

codelipenghui commented May 29, 2019

Disable backlog quota check by default. #4320

Disable backlog quota check by default. #4320

Conversation

codelipenghui commented May 21, 2019

Motivation

Modifications

Documentation

sijie left a comment

Choose a reason for hiding this comment

merlimat left a comment

Choose a reason for hiding this comment

codelipenghui commented May 22, 2019

srkukarni commented May 23, 2019

merlimat commented May 23, 2019

sijie commented May 25, 2019

codelipenghui commented May 25, 2019

sijie commented May 27, 2019

codelipenghui commented May 28, 2019

codelipenghui commented May 28, 2019

merlimat May 28, 2019

Choose a reason for hiding this comment

codelipenghui commented May 29, 2019