Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update sync_diff_inspector: sync_diff_inspector v2.0 #6774

Merged
merged 10 commits into from
Nov 16, 2021
Next Next commit
update sync_diff_inspector: sync_diff_inspector v2.0
  • Loading branch information
Liuxiaozhen12 committed Nov 10, 2021
commit 0e3a0fbf00e206025519cd71eada66978631f314
5 changes: 3 additions & 2 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@
+ [Overview](/tidb-lightning/tidb-lightning-overview.md)
+ [Tutorial](/get-started-with-tidb-lightning.md)
+ [Deploy](/tidb-lightning/deploy-tidb-lightning.md)
+ [Precheck](/tidb-lightning/tidb-lightning-prechecks.md)
+ [Precheck](/tidb-lightning/tidb-lightning-prechecks.md)
+ [Configure](/tidb-lightning/tidb-lightning-configuration.md)
+ Key Features
+ [Checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md)
Expand All @@ -221,8 +221,9 @@
+ sync-diff-inspector
+ [Overview](/sync-diff-inspector/sync-diff-inspector-overview.md)
+ [Data Check for Tables with Different Schema/Table Names](/sync-diff-inspector/route-diff.md)
+ [Data Check in Sharding Scenarios](/sync-diff-inspector/shard-diff.md)
+ [Data Check in the Sharding Scenario](/sync-diff-inspector/shard-diff.md)
+ [Data Check for TiDB Upstream/Downstream Clusters](/sync-diff-inspector/upstream-downstream-diff.md)
+ [Data Check in the DM Replication Scenario](/sync-diff-inspector/dm-diff.md)
+ TiSpark
+ [Quick Start](/get-started-with-tispark.md)
+ [User Guide](/tispark-overview.md)
Expand Down
40 changes: 40 additions & 0 deletions sync-diff-inspector/dm-diff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: Data Check in the DM Replication Scenario
summary: Learn about how to set a specific `task-name` configuration from `DM-master` to perform a data check.
---

# Data Check in the DM Replication Scenario

When using replication tools such as [TiDB Data Migration](https://docs.pingcap.com/tidb-data-migration/stable/overview), you need to check the data consistency before and after the replication process. You can set a specific `task-name` configuration from `DM-master` to perform a data check.

The following is a simple configuration example. To learn the complete configuration, refer to [Sync-diff-inspector User Guide](/sync-diff-inspector/sync-diff-inspector-overview.md).

```toml
# Diff Configuration.

######################### Global config #########################

# The number of goroutines created to check data. The number of connections between upstream and downstream databases are slightly greater than this value.
check-thread-count = 4

# If enabled, SQL statements is exported to fix inconsistent tables.
export-fix-sql = true

# Only compares the table structure instead of the data.
check-struct-only = false

# The IP address of dm-master and the format is "http://127.0.0.1:8261".
dm-addr = "http://127.0.0.1:8261"

# Specifies the `task-name` of DM.
dm-task = "test"

######################### Task config #########################
[task]
output-dir = "./output"

# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.'
# The tables of downstream databases to be compared. Each table needs to contain the schema name and the table name, separated by '.'

target-check-tables = ["hb_test.*"]
```

This example is configured in dm-task = "test", which checks all the tables of hb_test schema under the "test" task. It automatically gets the regular matching of the schemas between upstream and downstream databases to verify the data consistency after DM replication.
92 changes: 46 additions & 46 deletions sync-diff-inspector/route-diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,58 +6,58 @@ aliases: ['/docs/dev/sync-diff-inspector/route-diff/','/docs/dev/reference/tools

# Data Check for Tables with Different Schema or Table Names

When using replication tools such as TiDB Data Migration, you can set `route-rules` to replicate data to a specified table in the downstream. sync-diff-inspector enables you to verify tables with different schema names or table names.
When using replication tools such as [TiDB Data Migration](https://docs.pingcap.com/tidb-data-migration/stable/overview), you can set `route-rules` to replicate data to a specified table in the downstream. sync-diff-inspector enables you to verify tables with different schema names or table names by setting `rules`.

Below is a simple example.
The following is a simple configuration example. To learn the complete configuration, refer to [Sync-diff-inspector User Guide](/sync-diff-inspector/sync-diff-inspector-overview.md).

```toml
######################### Tables config #########################

# Configure the tables of the target database that need to be checked
[[check-tables]]
# The name of the schema in the target database
schema = "test_2"

# The table that needs to be checked
tables = ["t_2"]

# Configuration example of comparing two tables with different schema names and table names
[[table-config]]
# The name of the schema in the target database
schema = "test_2"

# The name of the target table
table = "t_2"

# Configuration of the source data
[[table-config.source-tables]]
# The instance ID of the source schema
instance-id = "source-1"
# The name of the source schema
schema = "test_1"
# The name of the source table
table = "t_1"
######################### Datasource config #########################
[data-sources.mysql1]
host = "127.0.0.1"
port = 3306
user = "root"
password = ""
route-rules = ["rule1"]

[data-sources.tidb0]
host = "127.0.0.1"
port = 4000
user = "root"
password = ""
########################### Routes ###########################
[routes.rule1]
schema-pattern = "test_1" # Matches the schema name of the data source. Supports the wildcards "*" and "?"
table-pattern = "t_1" # Matches the table name of the data source. Supports the wildcards "*" and "?"
target-schema = "test_2" # The name of the schema in the target database
target-table = "t_2" # The name of the target table
```

This configuration can be used to check `test_2.t_2` in the downstream and `test_1.t_1` in the `source-1` instance.
This configuration can be used to check `test_2.t_2` in the downstream and `test_1.t_1` in the `mysql1` instance.

To check a large number of tables with different schema names or table names, you can simplify the configuration by setting the mapping relationship by using `table-rule`. You can configure the mapping relationship of either schema or table, or of both. For example, all the tables in the upstream `test_1` database are replicated to the downstream `test_2` database, which can be checked through the following configuration:
To check a large number of tables with different schema names or table names, you can simplify the configuration by setting the mapping relationship by using `rules`. You can configure the mapping relationship of either schema or table, or of both. For example, all the tables in the upstream `test_1` database are replicated to the downstream `test_2` database, which can be checked through the following configuration:

```toml
######################### Tables config #########################

# Configures the tables of the target database that need to be checked
[[check-tables]]
# The name of the schema in the target database
schema = "test_2"

# Check all the tables
tables = ["~^"]

[[table-rules]]
# schema-pattern and table-pattern support the wildcards "*" and "?"
schema-pattern = "test_1"
#table-pattern = ""
target-schema = "test_2"
#target-table = ""
######################### Datasource config #########################
[data-sources.mysql1]
host = "127.0.0.1"
port = 3306
user = "root"
password = ""
route-rules = ["rule1"]

[data-sources.tidb0]
host = "127.0.0.1"
port = 4000
user = "root"
password = ""
########################### Routes ###########################
[routes.rule1]
schema-pattern = "test_1" # # Matches the schema name of the data source. Supports the wildcards "*" and "?"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
schema-pattern = "test_1" # # Matches the schema name of the data source. Supports the wildcards "*" and "?"
schema-pattern = "test_1" # Matches the schema name of the data source. Supports the wildcards "*" and "?"

table-pattern = "*" # Matches the table name of the data source. Supports the wildcards "*" and "?"
target-schema = "test_2" # The name of the schema in the target database
target-table = "t_2" # The name of the target table
```

## Note

If `t_2` exists in the upstream database, the downstream databse also compares this table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这句话不太明白。意思是说会检查上游 test_1中所有的表的数据,如果恰好有个表也叫 t_2,和target-table重名,那么也会检查。但是这么说明似乎就有点儿多余了,* 就是匹配全部表,所以即使重名当然也会检查。可以考虑改成下面说法:
If there is a table with the same name as the target table ( t_2 in the above case) existing in the upstream database, the downstream databse also compares this table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解的也是这个意思。@Leavrth PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里少了,应该是如果上游数据库存在test_2.t_2,也会检查

Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
Loading