Skip to content

Commit

Permalink
Remove cyclic replication feature (pingcap#6668)
Browse files Browse the repository at this point in the history
* remove cyclic replication feature

* remove cyclic replication feature

* Update ticdc-overview.md

* Apply suggestions from code review

Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com>

* Update ticdc-overview.md

* Update ticdc/ticdc-overview.md

Co-authored-by: xixirangrang <35301108+hfxsd@users.noreply.github.com>
Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com>
Co-authored-by: xixirangrang <hfxsd@hotmail.com>
  • Loading branch information
4 people authored Nov 17, 2021
1 parent eddd634 commit efca7c7
Show file tree
Hide file tree
Showing 5 changed files with 39 additions and 179 deletions.
1 change: 0 additions & 1 deletion experimental-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ This document introduces the experimental features of TiDB in different versions
## Data sharing and subscription

+ [Integrate TiCDC with Kafka Connect (Confluent Platform)](/ticdc/integrate-confluent-using-ticdc.md) (Introduced in v5.0)
+ [The cyclic replication feature of TiCDC](/ticdc/manage-ticdc.md#cyclic-replication) (Introduced in v5.0)

## Storage

Expand Down
Binary file removed media/cdc-cyclic-replication.png
Binary file not shown.
18 changes: 0 additions & 18 deletions releases/release-5.0.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -397,24 +397,6 @@ To support the business requirements of streaming TiDB data to other systems, th

The Kafka connectors protocol provided by the Confluent platform is widely used in the community, and it supports transferring data to either relational or non-relational databases in different protocols. By integrating TiCDC to Kafka Connect of the Confluent platform, TiDB extends the ability to stream TiDB data to other heterogeneous databases or systems.

### Support cyclic replication between TiDB clusters using TiCDC (**experimental feature**)

[User document](/ticdc/manage-ticdc.md#cyclic-replication), [#471](https://github.com/pingcap/ticdc/issues/471)

Because of the communication latency and other issues caused by geographical differences, this scenario exists: to support the local business, users deploy multiple TiDB clusters in different geographical areas, and then to accomplish tasks such as analytics and settlements, users replicate data across multiple TiDB clusters or aggregate the replicated data to a central TiDB hub.

TiCDC supports replicating data across multiple independent TiDB clusters. For example, TiDB cluster A, cluster B, and cluster C all have a table named `test.user_data` and write data into this table respectively. With the cyclic replication feature, the data written into `test.user_data` in one cluster can be replicated to the other two clusters, so that the `test.user_data` table in the three clusters is consistent with each other.

This feature can be used in the following scenarios:

+ Multiple TiDB clusters back up data for each other, so the application can switch to a normal TiDB cluster in the event of a disaster.
+ Multiple geo-distributed TiDB clusters are deployed to support the local business, and the data in the same tables needs to be replicated between TiDB clusters.

This feature has the following limitations and constraints:

+ Writing data to tables with auto-increment IDs in different clusters is not supported, because the data replication causes data to overwrite each other and then results in data loss.
+ Writing the same data to the same table in different clusters is not supported, because the data replication causes data to overwrite each other and then results in data loss.

## Diagnostics

[User document](/sql-statements/sql-statement-explain.md#explain)
Expand Down
131 changes: 3 additions & 128 deletions ticdc/manage-ticdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:1
```shell
Create changefeed successfully!
ID: simple-replication-task
Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"unified","sort-dir":".","config":{"case-sensitive":true,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null,"protocol":"default"},"cyclic-replication":{"enable":false,"replica-id":0,"filter-replica-ids":null,"id-buckets":0,"sync-ddl":false},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null}
Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"unified","sort-dir":".","config":{"case-sensitive":true,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null,"protocol":"default"},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null}
```

- `--changefeed-id`: The ID of the replication task. The format must match the `^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$` regular expression. If this ID is not specified, TiCDC automatically generates a UUID (the version 4 format) as the ID.
Expand Down Expand Up @@ -381,13 +381,6 @@ cdc cli changefeed query --pd=http://10.0.10.25:2379 --changefeed-id=simple-repl
"dispatchers": null,
"protocol": "default"
},
"cyclic-replication": {
"enable": false,
"replica-id": 0,
"filter-replica-ids": null,
"id-buckets": 0,
"sync-ddl": false
},
"scheduler": {
"type": "table-number",
"polling-time": -1
Expand All @@ -409,8 +402,7 @@ cdc cli changefeed query --pd=http://10.0.10.25:2379 --changefeed-id=simple-repl
"status": {
"tables": {
"47": {
"start-ts": 419036036249681921,
"mark-table-id": 0
"start-ts": 419036036249681921
}
},
"operation": null,
Expand Down Expand Up @@ -528,8 +520,7 @@ Currently, you can modify the following configuration items:
"status": {
"tables": {
"56": { # ID of the replication table, corresponding to tidb_table_id of a table in TiDB
"start-ts": 417474117955485702,
"mark-table-id": 0 # ID of mark tables in the cyclic replication, corresponding to tidb_table_id of mark tables in TiDB
"start-ts": 417474117955485702
}
},
"operation": null,
Expand All @@ -546,7 +537,6 @@ Currently, you can modify the following configuration items:
In the command above:
- `status.tables`: Each key number represents the ID of the replication table, corresponding to `tidb_table_id` of a table in TiDB.
- `mark-table-id`: The ID of mark tables in the cyclic replication, corresponding to `tidb_table_id` of mark tables in TiDB.
- `resolved-ts`: The largest TSO among the sorted data in the current processor.
- `checkpoint-ts`: The largest TSO that has been successfully written to the downstream in the current processor.
Expand Down Expand Up @@ -592,128 +582,13 @@ dispatchers = [
# Currently four protocols are supported: default, canal, avro, and maxwell. The default protocol is TiCDC Open Protocol.
protocol = "default"
[cyclic-replication]
# Whether to enable cyclic replication.
enable = false
# The replica ID of the current TiCDC.
replica-id = 1
# The replica ID to be filtered.
filter-replica-ids = [2,3]
# Whether to replicate DDL statements.
sync-ddl = true
```
### Notes for compatibility
* In TiCDC v4.0.0, `ignore-txn-commit-ts` is removed and `ignore-txn-start-ts` is added, which uses start_ts to filter transactions.
* In TiCDC v4.0.2, `db-dbs`/`db-tables`/`ignore-dbs`/`ignore-tables` are removed and `rules` is added, which uses new filter rules for databases and tables. For detailed filter syntax, see [Table Filter](/table-filter.md).
## Cyclic replication
> **Warning:**
>
> Currently (v4.0.2), cyclic replication is still an experimental feature. It is **NOT** recommended to use it in the production environment.
The cyclic replication feature supports replicating data across multiple independent TiDB clusters. For example, TiDB clusters A, cluster B, and cluster C all have a table named `test.user_data` and write data into this table respectively. With the cyclic replication feature, the data written into `test.user_data` in one cluster can be replicated to the other two clusters, so that the `test.user_data` table in the three clusters is consistent with each other.
### Usage example
Enable cyclic replication in the three clusters of A, B, and C. Two TiCDC clusters are used for the replication from cluster A to cluster B. Among the three clusters, DDL statements enters cluster A first.
![TiCDC cyclic replication](/media/cdc-cyclic-replication.png)
To use the cyclic replication feature, you need to configure the following parameters for the replication task upon the task creation.
+ `--cyclic-replica-id`: Specifies the data source (to be written) ID of the upstream cluster. Each cluster ID must be unique.
+ `--cyclic-filter-replica-ids`: Specifies the data source ID to be filtered, which is usually the downstream cluster ID.
+ `--cyclic-sync-ddl`: Determines whether to replicate DDL statements to the downstream.
To create a cyclic replication task, take the following steps:
1. [Enable the TiCDC component](/ticdc/deploy-ticdc.md) in TiDB cluster A, cluster B, and cluster C.
{{< copyable "shell-regular" >}}
```shell
# Enables TiCDC in cluster A.
cdc server \
--pd="http://${PD_A_HOST}:${PD_A_PORT}" \
--log-file=ticdc_1.log \
--addr=0.0.0.0:8301 \
--advertise-addr=127.0.0.1:8301
# Enables TiCDC in cluster B.
cdc server \
--pd="http://${PD_B_HOST}:${PD_B_PORT}" \
--log-file=ticdc_2.log \
--addr=0.0.0.0:8301 \
--advertise-addr=127.0.0.1:8301
# Enables TiCDC in cluster C.
cdc server \
--pd="http://${PD_C_HOST}:${PD_C_PORT}" \
--log-file=ticdc_3.log \
--addr=0.0.0.0:8301 \
--advertise-addr=127.0.0.1:8301
```
2. Create the mark tables used for the cyclic replication in cluster A, cluster B, and cluster C.
{{< copyable "shell-regular" >}}
```shell
# Creates mark tables in cluster A.
cdc cli changefeed cyclic create-marktables \
--cyclic-upstream-dsn="root@tcp(${TIDB_A_HOST}:${TIDB_A_PORT})/" \
--pd="http://${PD_A_HOST}:${PD_A_PORT}"
# Creates mark tables in cluster B.
cdc cli changefeed cyclic create-marktables \
--cyclic-upstream-dsn="root@tcp(${TIDB_B_HOST}:${TIDB_B_PORT})/" \
--pd="http://${PD_B_HOST}:${PD_B_PORT}"
# Creates mark tables in cluster C.
cdc cli changefeed cyclic create-marktables \
--cyclic-upstream-dsn="root@tcp(${TIDB_C_HOST}:${TIDB_C_PORT})/" \
--pd="http://${PD_C_HOST}:${PD_C_PORT}"
```
3. Create the cyclic replication task in cluster A, cluster B, and cluster C.
{{< copyable "shell-regular" >}}
```shell
# Creates the cyclic replication task in cluster A.
cdc cli changefeed create \
--sink-uri="mysql://root@${TiDB_B_HOST}/" \
--pd="http://${PD_A_HOST}:${PD_A_PORT}" \
--cyclic-replica-id 1 \
--cyclic-filter-replica-ids 2 \
--cyclic-sync-ddl true
# Creates the cyclic replication task in cluster B.
cdc cli changefeed create \
--sink-uri="mysql://root@${TiDB_C_HOST}/" \
--pd="http://${PD_B_HOST}:${PD_B_PORT}" \
--cyclic-replica-id 2 \
--cyclic-filter-replica-ids 3 \
--cyclic-sync-ddl true
# Creates the cyclic replication task in cluster C.
cdc cli changefeed create \
--sink-uri="mysql://root@${TiDB_A_HOST}/" \
--pd="http://${PD_C_HOST}:${PD_C_PORT}" \
--cyclic-replica-id 3 \
--cyclic-filter-replica-ids 1 \
--cyclic-sync-ddl false
```
### Usage notes
+ Before creating the cyclic replication task, you must execute `cdc cli changefeed cyclic create-marktables` to create the mark tables for the cyclic replication.
+ The name of the table with cyclic replication enabled must match the `^[a-zA-Z0-9_]+$` regular expression.
+ Before creating the cyclic replication task, the tables for the task must be created.
+ After enabling the cyclic replication, you cannot create a table that will be replicated by the cyclic replication task.
+ To avoid causing errors, do not execute DDL statements such as `ADD COLUMN`/`DROP COLUMN` when data is written into multiple clusters at the same time.
+ To perform online DDL operations, ensure the following requirements are met:
- The application is compatible with the table schema before and after executing the DDL operations.
- The TiCDC components of multiple clusters form a one-way DDL replication chain, which is not cyclic. For example, in the example above, only the TiCDC component of cluster C disables `sync-ddl`.
- DDL operations must be performed on the cluster that is the starting point of the one-way DDL replication chain, such as cluster A in the example above.
## Output the historical value of a Row Changed Event <span class="version-mark">New in v4.0.5</span>
> **Warning:**
Expand Down
Loading

0 comments on commit efca7c7

Please sign in to comment.