-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update sync_diff_inspector: sync_diff_inspector v2.0 #6774
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/verify |
sync-diff-inspector/dm-diff.md
Outdated
[task] | ||
output-dir = "./output" | ||
|
||
# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.' | |
# The tables of downstream databases to be compared. Each table needs to contain the schema name and the table name, separated by '.' |
sync-diff-inspector/route-diff.md
Outdated
password = "" | ||
########################### Routes ########################### | ||
[routes.rule1] | ||
schema-pattern = "test_1" # # Matches the schema name of the data source. Supports the wildcards "*" and "?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schema-pattern = "test_1" # # Matches the schema name of the data source. Supports the wildcards "*" and "?" | |
schema-pattern = "test_1" # Matches the schema name of the data source. Supports the wildcards "*" and "?" |
sync-diff-inspector/route-diff.md
Outdated
|
||
## Note | ||
|
||
If `t_2` exists in the upstream database, the downstream databse also compares this table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这句话不太明白。意思是说会检查上游 test_1中所有的表的数据,如果恰好有个表也叫 t_2,和target-table重名,那么也会检查。但是这么说明似乎就有点儿多余了,* 就是匹配全部表,所以即使重名当然也会检查。可以考虑改成下面说法:
If there is a table with the same name as the target table ( t_2
in the above case) existing in the upstream database, the downstream databse also compares this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我理解的也是这个意思。@Leavrth PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里少了,应该是如果上游数据库存在test_2
.t_2
,也会检查
sync-diff-inspector/shard-diff.md
Outdated
|
||
You can use `table-config` to configure `table-0`, set `is-sharding=true` and configure the upstream table information in `table-config.source-tables`. This configuration method requires setting all sharded tables, which is suitable for scenarios where the number of upstream sharded tables is small and the naming rules of sharded tables do not have a pattern as shown below. | ||
You can use `Datasource config` to configure `table-0`, set corresponding `rules` and configure the tables that have the mapping relationship between the upstream and downstream databases. This configuration method requires setting all sharded tables, which is suitable for scenarios where the number of upstream sharded tables is small and the naming rules of sharded tables do not have a pattern as shown below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要加几句话解释一下为啥图中的关系不能使用route rules,没看明白。另外,建议再画一个可以使用route rules的图例。有对比,更便于帮助用户理解。
sync-diff-inspector/shard-diff.md
Outdated
user = "root" | ||
password = "123456" | ||
instance-id = "target-1" | ||
# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.' | |
# The tables of downstream databases to be compared. Each table needs to contain the schema name and the table name, separated by '.' |
|
||
- sync-diff-inspector consumes a certain amount of server resources when checking data. Avoid using sync-diff-inspector to check data during peak business hours. | ||
- TiDB uses the `utf8_bin` collation. If you need to compare the data in MySQL with that in TiDB, pay attention to the collation configuration of MySQL tables. If the primary key or unique key is the `varchar` type and the collation configuration in MySQL differs from that in TiDB, then the final check result might be incorrect because of the collation issue. You need to add collation to the sync-diff-inspector configuration file. | ||
- sync-diff-inspector divides data into chunks first according to TiDB statistics and you need to guarantee the accuracy of the statistics. You can manually run the `analyze table {table_name}` command when the TiDB server's *workload is light*. | ||
- Pay special attention to `table-rules`. If you configure `schema-pattern="test1"` and `target-schema="test2"`, the `test1` schema in the source database and the `test2` schema in the target database are compared. If the source database has a `test2` schema, this `test2` schema is also compared with the `test2` schema in the target database. | ||
- The generated `fix.sql` is only used as a reference for repairing data, and you need to confirm it before executing these SQL statements to repair data. | ||
- Pay special attention to `table-rules`. If you configure `schema-pattern="test1"`, `table-pattern = "t_1"`, `target-schema="test2"` and `target-table = "t_2"`, the `test1`.`t_1` schema in the source database and the `test2`.`t_2` schema in the target database are compared. Sharding is enabled by default in sync-diff-inspector, so if the source database has a `test2`.`t_2` table, the `test1`.`t_1` table and `test2`.`t_2` table in the source database serving as sharding are compared with the `test2`.`t_2` table in the target database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
出现这种问题会对用户有什么影响呢?如果有风险,可以加个说明明确指出。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Leavrth PTAL~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一般来说,如果没有rules,那么上游数据库会匹配下游数据库相同名称的表,相当于每个表都有个默认的自己匹配自己的rule。但是加上rules并不会去掉这个默认的匹配规则,也就是如上面说的,如果用户想要做 test1.t_1 -> test2.t_2 的匹配,但是默认还存在 test2.t_2 -> test2.t_2 的匹配,那么上游存不存在 test2.t_2 的结果是不一样的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: xixirangrang <hfxsd@hotmail.com>
- `Databases config`: Configures the instances of the upstream and downstream databases. | ||
- `Tables config`: Special configurations for specific tables, including specified ranges, columns to be ignored and so on (optional). | ||
- `Routes`: Rules for upstream multiple schema names to regularly match downstream single schema names (optional). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regularly match
。 直接使用 match
会不会好些
|
||
Below is the description of a complete configuration file: | ||
|
||
``` toml | ||
# Diff Configuration. | ||
|
||
######################### Global config ######################### | ||
# The number of goroutines created to check data. The number of connections between upstream and downstream databases are slightly greater than this value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原文是上下游数据库的连接数会略大于该值
,意思是 sync-diff-inspector 与上游数据库的连接数/下游数据库的连接数都会略大于这个值
check-thread-count = 4 | ||
######################### Datasource config ######################### | ||
[data-sources] | ||
[data-sources.mysql1] # mysql1 is the only ID for the database instance. It is used in the following `task.source-instances/task.target-instance` files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task.source-instances
/task.target-instance
。- 应该不是 files,这两个是下面 task config 配置里面的
source-instances
和target-instance
两项。
sync-diff-inspector/route-diff.md
Outdated
|
||
## Note | ||
|
||
If `t_2` exists in the upstream database, the downstream databse also compares this table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里少了,应该是如果上游数据库存在test_2
.t_2
,也会检查
@Leavrth: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
/remove-status LGT1 |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 06ee509
|
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created: #6900. |
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?