Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(TiCDC) back pressure mechanism #10298

Open
Tracked by #10343
zhangjinpeng87 opened this issue Dec 13, 2023 · 0 comments
Open
Tracked by #10343

(TiCDC) back pressure mechanism #10298

zhangjinpeng87 opened this issue Dec 13, 2023 · 0 comments
Labels
type/enhancement The issue or PR belongs to an enhancement. type/feature Issues about a new feature

Comments

@zhangjinpeng87
Copy link
Contributor

zhangjinpeng87 commented Dec 13, 2023

// This is a long term goal.

Scenario Description

TiDB -> TiCDC -> MySQL/TiDB/Kafka/External Storage

In the case of:

  1. Downstream system's experiencing slowness or short of resources (provisioned resource is not enough and need scale up/out)
  2. The upstream TiDB encountered a peak throughput (5x or 10x throughput) for a while (10minutes or half hour), but the downstream sinking speed can not catch up with the throughput

Current TiCDC behavior is fetching all upstream data changes ASAP and cache them in TiCDC, if the sinking speed can not catchup with the data changes producing speed, it will eat a lot of resource (memory + sorter disk) to cache all these new produced data changes and these data changes will pile up more and more. This is a huge risk for TiCDC in terms of stability, it may cause OOM issue and disk out of space issue for TiCDC, as well as sorter compaction can't catchup upstream data changes speeds cause TiCDC slowdown issue.

Back Pressure Mechanism

If TiCDC can pull/fetch new data changes according to the capability of downstream consumers, if the downstream system is experiencing temporary slowness, TiCDC slowdown the new data changes fetch speed. TiCDC don't need to hold so many data changes, it will result in predictable resource consumption for TiCDC which can improve the stability of TiCDC.

Preparations

The upstream TiDB/TiKV should have the capability of holding not consumed incremental data changes. Comparing to use current TiKV MVCC mechanism to store these incremental data changes, if upstream TiDB introduce txn/redo log to store time series data changes, it would be easier to achieve such back pressure mechanism for TiCDC.

Replication Lag Monitoring

Before and after this back pressure mechanism, if there is peak throughput or downstream slowness problem, the replication lag suppose to increase. Users should can monitor the replication lag and take action like scale up/out downstream to resolve such issue.

@zhangjinpeng87 zhangjinpeng87 added type/enhancement The issue or PR belongs to an enhancement. type/feature Issues about a new feature labels Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement. type/feature Issues about a new feature
Projects
None yet
Development

No branches or pull requests

1 participant