Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update sync_diff_inspector: sync_diff_inspector v2.0 #6774

Merged
merged 10 commits into from
Nov 16, 2021

Conversation

Liuxiaozhen12
Copy link
Contributor

@Liuxiaozhen12 Liuxiaozhen12 commented Nov 10, 2021

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v5.3 (TiDB 5.3 versions)
  • v5.2 (TiDB 5.2 versions)
  • v5.1 (TiDB 5.1 versions)
  • v5.0 (TiDB 5.0 versions)
  • v4.0 (TiDB 4.0 versions)
  • v3.1 (TiDB 3.1 versions)
  • v3.0 (TiDB 3.0 versions)
  • v2.1 (TiDB 2.1 versions)

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@Liuxiaozhen12 Liuxiaozhen12 added translation/from-docs-cn This PR is translated from a PR in pingcap/docs-cn. area/migrate Indicates that the Issue or PR belongs to the area of TiDB migration tools. v5.3 This PR/issue applies to TiDB v5.3. labels Nov 10, 2021
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Nov 10, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • hfxsd

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 10, 2021
@Liuxiaozhen12
Copy link
Contributor Author

/verify

[task]
output-dir = "./output"

# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.'
# The tables of downstream databases to be compared. Each table needs to contain the schema name and the table name, separated by '.'

password = ""
########################### Routes ###########################
[routes.rule1]
schema-pattern = "test_1" # # Matches the schema name of the data source. Supports the wildcards "*" and "?"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
schema-pattern = "test_1" # # Matches the schema name of the data source. Supports the wildcards "*" and "?"
schema-pattern = "test_1" # Matches the schema name of the data source. Supports the wildcards "*" and "?"


## Note

If `t_2` exists in the upstream database, the downstream databse also compares this table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这句话不太明白。意思是说会检查上游 test_1中所有的表的数据,如果恰好有个表也叫 t_2,和target-table重名,那么也会检查。但是这么说明似乎就有点儿多余了,* 就是匹配全部表,所以即使重名当然也会检查。可以考虑改成下面说法:
If there is a table with the same name as the target table ( t_2 in the above case) existing in the upstream database, the downstream databse also compares this table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解的也是这个意思。@Leavrth PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里少了,应该是如果上游数据库存在test_2.t_2,也会检查


You can use `table-config` to configure `table-0`, set `is-sharding=true` and configure the upstream table information in `table-config.source-tables`. This configuration method requires setting all sharded tables, which is suitable for scenarios where the number of upstream sharded tables is small and the naming rules of sharded tables do not have a pattern as shown below.
You can use `Datasource config` to configure `table-0`, set corresponding `rules` and configure the tables that have the mapping relationship between the upstream and downstream databases. This configuration method requires setting all sharded tables, which is suitable for scenarios where the number of upstream sharded tables is small and the naming rules of sharded tables do not have a pattern as shown below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要加几句话解释一下为啥图中的关系不能使用route rules,没看明白。另外,建议再画一个可以使用route rules的图例。有对比,更便于帮助用户理解。

user = "root"
password = "123456"
instance-id = "target-1"
# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The tables of downstream databases to be compared. Each table needs to contain schema name and table name, separated by '.'
# The tables of downstream databases to be compared. Each table needs to contain the schema name and the table name, separated by '.'

sync-diff-inspector/sync-diff-inspector-overview.md Outdated Show resolved Hide resolved
sync-diff-inspector/sync-diff-inspector-overview.md Outdated Show resolved Hide resolved
sync-diff-inspector/sync-diff-inspector-overview.md Outdated Show resolved Hide resolved
sync-diff-inspector/sync-diff-inspector-overview.md Outdated Show resolved Hide resolved

- sync-diff-inspector consumes a certain amount of server resources when checking data. Avoid using sync-diff-inspector to check data during peak business hours.
- TiDB uses the `utf8_bin` collation. If you need to compare the data in MySQL with that in TiDB, pay attention to the collation configuration of MySQL tables. If the primary key or unique key is the `varchar` type and the collation configuration in MySQL differs from that in TiDB, then the final check result might be incorrect because of the collation issue. You need to add collation to the sync-diff-inspector configuration file.
- sync-diff-inspector divides data into chunks first according to TiDB statistics and you need to guarantee the accuracy of the statistics. You can manually run the `analyze table {table_name}` command when the TiDB server's *workload is light*.
- Pay special attention to `table-rules`. If you configure `schema-pattern="test1"` and `target-schema="test2"`, the `test1` schema in the source database and the `test2` schema in the target database are compared. If the source database has a `test2` schema, this `test2` schema is also compared with the `test2` schema in the target database.
- The generated `fix.sql` is only used as a reference for repairing data, and you need to confirm it before executing these SQL statements to repair data.
- Pay special attention to `table-rules`. If you configure `schema-pattern="test1"`, `table-pattern = "t_1"`, `target-schema="test2"` and `target-table = "t_2"`, the `test1`.`t_1` schema in the source database and the `test2`.`t_2` schema in the target database are compared. Sharding is enabled by default in sync-diff-inspector, so if the source database has a `test2`.`t_2` table, the `test1`.`t_1` table and `test2`.`t_2` table in the source database serving as sharding are compared with the `test2`.`t_2` table in the target database.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

出现这种问题会对用户有什么影响呢?如果有风险,可以加个说明明确指出。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leavrth PTAL~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一般来说,如果没有rules,那么上游数据库会匹配下游数据库相同名称的表,相当于每个表都有个默认的自己匹配自己的rule。但是加上rules并不会去掉这个默认的匹配规则,也就是如上面说的,如果用户想要做 test1.t_1 -> test2.t_2 的匹配,但是默认还存在 test2.t_2 -> test2.t_2 的匹配,那么上游存不存在 test2.t_2 的结果是不一样的。

Copy link
Collaborator

@hfxsd hfxsd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 11, 2021
- `Databases config`: Configures the instances of the upstream and downstream databases.
- `Tables config`: Special configurations for specific tables, including specified ranges, columns to be ignored and so on (optional).
- `Routes`: Rules for upstream multiple schema names to regularly match downstream single schema names (optional).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regularly match。 直接使用 match 会不会好些


Below is the description of a complete configuration file:

``` toml
# Diff Configuration.

######################### Global config #########################
# The number of goroutines created to check data. The number of connections between upstream and downstream databases are slightly greater than this value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原文是上下游数据库的连接数会略大于该值,意思是 sync-diff-inspector 与上游数据库的连接数/下游数据库的连接数都会略大于这个值

check-thread-count = 4
######################### Datasource config #########################
[data-sources]
[data-sources.mysql1] # mysql1 is the only ID for the database instance. It is used in the following `task.source-instances/task.target-instance` files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. task.source-instances/task.target-instance
  2. 应该不是 files,这两个是下面 task config 配置里面的 source-instancestarget-instance 两项。


## Note

If `t_2` exists in the upstream database, the downstream databse also compares this table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里少了,应该是如果上游数据库存在test_2.t_2,也会检查

sync-diff-inspector/route-diff.md Outdated Show resolved Hide resolved
sync-diff-inspector/sync-diff-inspector-overview.md Outdated Show resolved Hide resolved
sync-diff-inspector/sync-diff-inspector-overview.md Outdated Show resolved Hide resolved
sync-diff-inspector/sync-diff-inspector-overview.md Outdated Show resolved Hide resolved
sync-diff-inspector/sync-diff-inspector-overview.md Outdated Show resolved Hide resolved
@ti-chi-bot
Copy link
Member

@Leavrth: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@Liuxiaozhen12
Copy link
Contributor Author

/remove-status LGT1
/status LGT2

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 16, 2021
@Liuxiaozhen12
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 06ee509

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Nov 16, 2021
@ti-chi-bot ti-chi-bot merged commit fb5cc29 into pingcap:master Nov 16, 2021
@Liuxiaozhen12 Liuxiaozhen12 deleted the diff22 branch November 22, 2021 09:14
ti-chi-bot pushed a commit to ti-chi-bot/docs that referenced this pull request Nov 22, 2021
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6900.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/migrate Indicates that the Issue or PR belongs to the area of TiDB migration tools. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. translation/from-docs-cn This PR is translated from a PR in pingcap/docs-cn. v5.3 This PR/issue applies to TiDB v5.3.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants