Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor data migration #7480

Merged
merged 95 commits into from
Dec 31, 2021
Merged

Conversation

sunzhaoyang
Copy link
Contributor

@sunzhaoyang sunzhaoyang commented Nov 11, 2021

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions (in Chinese).

  • master (the latest development version)
  • v5.3 (TiDB 5.3 versions)
  • v5.2 (TiDB 5.2 versions)
  • v5.1 (TiDB 5.1 versions)
  • v5.0 (TiDB 5.0 versions)
  • v4.0 (TiDB 4.0 versions)
  • v3.1 (TiDB 3.1 versions)
  • v3.0 (TiDB 3.0 versions)
  • v2.1 (TiDB 2.1 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Nov 11, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • GMHDBJD
  • hfxsd

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. label Nov 11, 2021
@CLAassistant
Copy link

CLAassistant commented Nov 11, 2021

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot requested a review from TomShawn November 11, 2021 12:35
@ti-chi-bot ti-chi-bot added missing-translation-status This PR does not have translation status info. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 11, 2021
@TomShawn
Copy link
Contributor

TomShawn commented Nov 12, 2021

@sunzhaoyang 在英文 docs 仓库是合入的 refactor-data-migration 分支,在中文 docs-cn 仓库要合入 master 分支吗?另外,请签署 Contributor License Agreement ,谢谢~

@TomShawn TomShawn added refactor-migration-docs translation/doing This PR’s assignee is translating this PR. and removed missing-translation-status This PR does not have translation status info. labels Nov 12, 2021
@sunzhaoyang
Copy link
Contributor Author

@sunzhaoyang 在英文 docs 仓库是合入的 refactor-data-migration 分支,在中文 docs-cn 仓库要合入 master 分支吗?另外,请签署 Contributor License Agreement ,谢谢~

是的,英文的 refactor-data-migration 分支用于前期效果预览,现在已经不需要了。此 PR 就是预期的内容希望合进 master


## 迁移 Aurora MySQL 到 TiDB

从 Aurora 迁移数据到部署在 AWS 的 TiDB 集群, 数据迁移可以氛围全量迁移和增量迁移两个步骤进行,根据你的业务需求选择相应的步骤。考虑到 Aurora 和 TiDB 部署在不同 region 的情况,方案也包含介绍从不同 region 之前进行数据迁移的最佳实践。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
从 Aurora 迁移数据到部署在 AWS 的 TiDB 集群, 数据迁移可以氛围全量迁移和增量迁移两个步骤进行,根据你的业务需求选择相应的步骤。考虑到 Aurora 和 TiDB 部署在不同 region 的情况,方案也包含介绍从不同 region 之前进行数据迁移的最佳实践
从 Aurora 迁移数据到部署在 AWS 的 TiDB 集群, 数据迁移可以分为全量迁移和增量迁移两个步骤,根据你的业务需求选择相应的步骤。对于 Aurora 和 TiDB 部署在不同 region 的情况,方案也包含不同 region 之间进行数据迁移的最佳实践

“从 Aurora 迁移数据到 TiDB”里貌似没提到 region。是说 S3 跨 region 便宜吗(快帮我复习 SAA 考试)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还没写跨 region 的....

sunzhaoyang and others added 4 commits November 18, 2021 09:54
Co-authored-by: lance6716 <lance6716@gmail.com>
Co-authored-by: lance6716 <lance6716@gmail.com>

在进行增量数据迁移时,可以通过[如何过滤 binlog 事件](/data-migration/migrate-with-binlog-event-filter.md)功能过滤某些类型的 binlog event,例如不向下游迁移 `DELETE` 事件以达到归档、审计等目的。但是 binlog event filter 无法以更细粒度判断某一行的 `DELETE` 事件是否要被过滤。

为了解决上述问题,从 v2.0.5 起,DM 支持在增量数据同步阶段使用`binlog value filter`过滤迁移数据。DM 支持的 `ROW` 格式的 binlog 中,binlog event 带有所有列的值。你可以基于这些值配置 SQL 表达式。如果该表达式对于某条行变更的计算结果是 `TRUE`,DM 就不会向下游迁移该条行变更。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
为了解决上述问题,从 v2.0.5 起,DM 支持在增量数据同步阶段使用`binlog value filter`过滤迁移数据。DM 支持的 `ROW` 格式的 binlog 中,binlog event 带有所有列的值。你可以基于这些值配置 SQL 表达式。如果该表达式对于某条行变更的计算结果是 `TRUE`,DM 就不会向下游迁移该条行变更。
为了解决上述问题,从 v2.0.5 起,DM 支持在增量数据同步阶段使用 `binlog value filter` 过滤迁移数据。DM 支持的 `ROW` 格式的 binlog 中,binlog event 带有所有列的值。你可以基于这些值配置 SQL 表达式。如果该表达式对于某条行变更的计算结果是 `TRUE`,DM 就不会向下游迁移该条行变更。


为了解决上述问题,从 v2.0.5 起,DM 支持在增量数据同步阶段使用`binlog value filter`过滤迁移数据。DM 支持的 `ROW` 格式的 binlog 中,binlog event 带有所有列的值。你可以基于这些值配置 SQL 表达式。如果该表达式对于某条行变更的计算结果是 `TRUE`,DM 就不会向下游迁移该条行变更。

与[如何过滤 binlog 事件](/data-migration/migrate-with-binlog-event-filter.md)类似,表达式过滤需要在数据迁移任务配置文件里配置,详见下面配置样例。完整的配置及意义,可以参考 [DM 完整配置文件示例](https://docs.pingcap.com/zh/tidb-data-migration/stable/task-configuration-file-full#完整配置文件示例):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[如何过滤 binlog 事件](/data-migration/migrate-with-binlog-event-filter.md)类似,表达式过滤需要在数据迁移任务配置文件里配置,详见下面配置样例。完整的配置及意义,可以参考 [DM 完整配置文件示例](https://docs.pingcap.com/zh/tidb-data-migration/stable/task-configuration-file-full#完整配置文件示例)
[如何过滤 binlog 事件](/data-migration/migrate-with-binlog-event-filter.md)类似,你需要在数据迁移任务配置文件里配置 `binlog value filter`,详见下面配置样例。完整的配置及意义,可以参考 [DM 完整配置文件示例](https://docs.pingcap.com/zh/tidb-data-migration/stable/task-configuration-file-full#完整配置文件示例)


随后在下游查询 `tbl` 表,可见只有 `c` 的值为单数的行迁移到了下游:

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
```sql

TomShawn and others added 2 commits December 30, 2021 16:10
Co-authored-by: Enwei <jinenwei@pingcap.com>
summary: 介绍如何通过 SQL 表达式过滤 DML 事件
---

# 如何通过 SQL 表达式过滤 DML
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 如何通过 SQL 表达式过滤 DML
# 如何通过 SQL 表达式过滤 DML

en-jin19 and others added 4 commits December 30, 2021 11:22
Signed-off-by: Ran <huangran@pingcap.com>
Signed-off-by: Ran <huangran@pingcap.com>
Signed-off-by: Ran <huangran@pingcap.com>
TOC.md Outdated
- [从 SQL 文件迁移到 TiDB](/migrate-from-mysql-dumpling-files.md)
- [将 TiDB 集群的增量数据同步到另一集群](/incremental-replication-between-clusters.md)
- 数据迁移场景
- [从 Aurora 迁移数据到 TiDB](/data-migration/migrate-aurora-to-tidb.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是需要新增一个 data-migration 文件夹吗? 会不会和现有的 dm 文件夹容易混淆。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是新加了目录。dm 目录只是 dm 工具自己的文档。data-migration 是数据迁移场景的文档,不仅包含 dm,还有其他工具

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯,看了下没有必要新增文件夹。现有的文件夹大多是对应某个组件的。

Signed-off-by: Ran <huangran@pingcap.com>
Signed-off-by: Ran <huangran@pingcap.com>
Copy link
Collaborator

@hfxsd hfxsd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 31, 2021
@ran-huang
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 2536d7d

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 31, 2021
@ti-chi-bot ti-chi-bot merged commit e18f9b3 into pingcap:master Dec 31, 2021
@ran-huang ran-huang added translation/done This PR has been translated from English into Chinese and updated to pingcap/docs-cn in a PR. and removed translation/doing This PR’s assignee is translating this PR. labels Dec 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. refactor-migration-docs size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. translation/done This PR has been translated from English into Chinese and updated to pingcap/docs-cn in a PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.