-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update sync_diff_inspector: sync_diff_inspector v2.0 #6774
Changes from 1 commit
0e3a0fb
3ad2073
0713596
8dbd338
6609cec
057e2bf
ce4d873
909e159
bb01939
06ee509
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
--- | ||
title: Data Check in the DM Replication Scenario | ||
summary: Learn about how to set a specific `task-name` configuration from `DM-master` to perform a data check. | ||
--- | ||
|
||
# Data Check in the DM Replication Scenario | ||
|
||
When using replication tools such as [TiDB Data Migration](https://docs.pingcap.com/tidb-data-migration/stable/overview), you need to check the data consistency before and after the replication process. You can set a specific `task-name` configuration from `DM-master` to perform a data check. | ||
|
||
The following is a simple configuration example. To learn the complete configuration, refer to [Sync-diff-inspector User Guide](/sync-diff-inspector/sync-diff-inspector-overview.md). | ||
|
||
```toml | ||
# Diff Configuration. | ||
|
||
######################### Global config ######################### | ||
|
||
# The number of goroutines created to check data. The number of connections between upstream and downstream databases are slightly greater than this value. | ||
check-thread-count = 4 | ||
|
||
# If enabled, SQL statements is exported to fix inconsistent tables. | ||
export-fix-sql = true | ||
|
||
# Only compares the table structure instead of the data. | ||
check-struct-only = false | ||
|
||
# The IP address of dm-master and the format is "http://127.0.0.1:8261". | ||
dm-addr = "http://127.0.0.1:8261" | ||
|
||
# Specifies the `task-name` of DM. | ||
dm-task = "test" | ||
|
||
######################### Task config ######################### | ||
[task] | ||
output-dir = "./output" | ||
|
||
# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.' | ||
target-check-tables = ["hb_test.*"] | ||
``` | ||
|
||
This example is configured in dm-task = "test", which checks all the tables of hb_test schema under the "test" task. It automatically gets the regular matching of the schemas between upstream and downstream databases to verify the data consistency after DM replication. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -6,58 +6,58 @@ aliases: ['/docs/dev/sync-diff-inspector/route-diff/','/docs/dev/reference/tools | |||||
|
||||||
# Data Check for Tables with Different Schema or Table Names | ||||||
|
||||||
When using replication tools such as TiDB Data Migration, you can set `route-rules` to replicate data to a specified table in the downstream. sync-diff-inspector enables you to verify tables with different schema names or table names. | ||||||
When using replication tools such as [TiDB Data Migration](https://docs.pingcap.com/tidb-data-migration/stable/overview), you can set `route-rules` to replicate data to a specified table in the downstream. sync-diff-inspector enables you to verify tables with different schema names or table names by setting `rules`. | ||||||
|
||||||
Below is a simple example. | ||||||
The following is a simple configuration example. To learn the complete configuration, refer to [Sync-diff-inspector User Guide](/sync-diff-inspector/sync-diff-inspector-overview.md). | ||||||
|
||||||
```toml | ||||||
######################### Tables config ######################### | ||||||
|
||||||
# Configure the tables of the target database that need to be checked | ||||||
[[check-tables]] | ||||||
# The name of the schema in the target database | ||||||
schema = "test_2" | ||||||
|
||||||
# The table that needs to be checked | ||||||
tables = ["t_2"] | ||||||
|
||||||
# Configuration example of comparing two tables with different schema names and table names | ||||||
[[table-config]] | ||||||
# The name of the schema in the target database | ||||||
schema = "test_2" | ||||||
|
||||||
# The name of the target table | ||||||
table = "t_2" | ||||||
|
||||||
# Configuration of the source data | ||||||
[[table-config.source-tables]] | ||||||
# The instance ID of the source schema | ||||||
instance-id = "source-1" | ||||||
# The name of the source schema | ||||||
schema = "test_1" | ||||||
# The name of the source table | ||||||
table = "t_1" | ||||||
######################### Datasource config ######################### | ||||||
[data-sources.mysql1] | ||||||
host = "127.0.0.1" | ||||||
port = 3306 | ||||||
user = "root" | ||||||
password = "" | ||||||
route-rules = ["rule1"] | ||||||
|
||||||
[data-sources.tidb0] | ||||||
host = "127.0.0.1" | ||||||
port = 4000 | ||||||
user = "root" | ||||||
password = "" | ||||||
########################### Routes ########################### | ||||||
[routes.rule1] | ||||||
schema-pattern = "test_1" # Matches the schema name of the data source. Supports the wildcards "*" and "?" | ||||||
table-pattern = "t_1" # Matches the table name of the data source. Supports the wildcards "*" and "?" | ||||||
target-schema = "test_2" # The name of the schema in the target database | ||||||
target-table = "t_2" # The name of the target table | ||||||
``` | ||||||
|
||||||
This configuration can be used to check `test_2.t_2` in the downstream and `test_1.t_1` in the `source-1` instance. | ||||||
This configuration can be used to check `test_2.t_2` in the downstream and `test_1.t_1` in the `mysql1` instance. | ||||||
|
||||||
To check a large number of tables with different schema names or table names, you can simplify the configuration by setting the mapping relationship by using `table-rule`. You can configure the mapping relationship of either schema or table, or of both. For example, all the tables in the upstream `test_1` database are replicated to the downstream `test_2` database, which can be checked through the following configuration: | ||||||
To check a large number of tables with different schema names or table names, you can simplify the configuration by setting the mapping relationship by using `rules`. You can configure the mapping relationship of either schema or table, or of both. For example, all the tables in the upstream `test_1` database are replicated to the downstream `test_2` database, which can be checked through the following configuration: | ||||||
|
||||||
```toml | ||||||
######################### Tables config ######################### | ||||||
|
||||||
# Configures the tables of the target database that need to be checked | ||||||
[[check-tables]] | ||||||
# The name of the schema in the target database | ||||||
schema = "test_2" | ||||||
|
||||||
# Check all the tables | ||||||
tables = ["~^"] | ||||||
|
||||||
[[table-rules]] | ||||||
# schema-pattern and table-pattern support the wildcards "*" and "?" | ||||||
schema-pattern = "test_1" | ||||||
#table-pattern = "" | ||||||
target-schema = "test_2" | ||||||
#target-table = "" | ||||||
######################### Datasource config ######################### | ||||||
[data-sources.mysql1] | ||||||
host = "127.0.0.1" | ||||||
port = 3306 | ||||||
user = "root" | ||||||
password = "" | ||||||
route-rules = ["rule1"] | ||||||
|
||||||
[data-sources.tidb0] | ||||||
host = "127.0.0.1" | ||||||
port = 4000 | ||||||
user = "root" | ||||||
password = "" | ||||||
########################### Routes ########################### | ||||||
[routes.rule1] | ||||||
schema-pattern = "test_1" # # Matches the schema name of the data source. Supports the wildcards "*" and "?" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
table-pattern = "*" # Matches the table name of the data source. Supports the wildcards "*" and "?" | ||||||
target-schema = "test_2" # The name of the schema in the target database | ||||||
target-table = "t_2" # The name of the target table | ||||||
``` | ||||||
|
||||||
## Note | ||||||
|
||||||
If `t_2` exists in the upstream database, the downstream databse also compares this table. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这句话不太明白。意思是说会检查上游 test_1中所有的表的数据,如果恰好有个表也叫 t_2,和target-table重名,那么也会检查。但是这么说明似乎就有点儿多余了,* 就是匹配全部表,所以即使重名当然也会检查。可以考虑改成下面说法: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 我理解的也是这个意思。@Leavrth PTAL There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这里少了,应该是如果上游数据库存在
Liuxiaozhen12 marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.