diff --git a/README.md b/README.md index 8cc2d601798d5..91ed6b11679a2 100644 --- a/README.md +++ b/README.md @@ -123,7 +123,10 @@ - [mydumper](tools/mydumper.md) - [Loader](tools/loader.md) + Data Migration - - [Overview](tools/data-migration-overview.md) + + Overview + - [Architecture](tools/data-migration-overview.md#architecture) + - [Features](tools/data-migration-overview.md#data-synchronization-introduction) + - [Restrictions](tools/data-migration-overview.md#usage-restrictions) - [Deploy](tools/data-migration-deployment.md) - [Synchronize Data](tools/data-migration-practice.md) + Configure diff --git a/tools/data-migration-deployment.md b/tools/data-migration-deployment.md index b15096ec72dea..6f38538c6383b 100644 --- a/tools/data-migration-deployment.md +++ b/tools/data-migration-deployment.md @@ -106,7 +106,7 @@ Make sure you have logged in to the Control Machine using the `root` user accoun 2. Run the following command to download DM-Ansible. ```bash - $ wget http://download.pingcap.org/dm-ansible.tar.gz + $ wget http://download.pingcap.org/dm-ansible-latest.tar.gz ``` ## Step 4: Install Ansible and its dependencies on the Control Machine @@ -118,7 +118,8 @@ It is required to use `pip` to install Ansible and its dependencies, otherwise a 1. Install Ansible and the dependencies on the Control Machine: ```bash - $ tar -xzvf dm-ansible.tar.gz + $ tar -xzvf dm-ansible-latest.tar.gz + $ mv dm-ansible-latest dm-ansible $ cd /home/tidb/dm-ansible $ sudo pip install -r ./requirements.txt ``` @@ -193,7 +194,7 @@ You can choose one of the following two types of cluster topology according to y | node3 | 172.16.10.73 | DM-worker2 | ```ini -## DM modules +## DM modules. [dm_master_servers] dm_master ansible_host=172.16.10.71 @@ -202,7 +203,7 @@ dm_worker1 ansible_host=172.16.10.72 server_id=101 mysql_host=172.16.10.81 mysql dm_worker2 ansible_host=172.16.10.73 server_id=102 mysql_host=172.16.10.82 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 -## Monitoring modules +## Monitoring modules. [prometheus_servers] prometheus ansible_host=172.16.10.71 @@ -212,7 +213,7 @@ grafana ansible_host=172.16.10.71 [alertmanager_servers] alertmanager ansible_host=172.16.10.71 -## Global variables +## Global variables. [all:vars] cluster_name = test-cluster @@ -234,21 +235,21 @@ grafana_admin_password = "admin" | node2 | 172.16.10.72 | DM-worker1-1, DM-worker1-2 | | node3 | 172.16.10.73 | DM-worker2-1, DM-worker2-2 | -When you edit the `inventory.ini` file, pay attention to distinguish between the following variables: `server_id`, `deploy_dir`, `dm_worker_port`, and `dm_worker_status_port`. +When you edit the `inventory.ini` file, pay attention to distinguish between the following variables: `server_id`, `deploy_dir`, and `dm_worker_port`. ```ini -## DM modules +## DM modules. [dm_master_servers] dm_master ansible_host=172.16.10.71 [dm_worker_servers] -dm_worker1_1 ansible_host=172.16.10.72 server_id=101 deploy_dir=/data1/dm_worker dm_worker_port=10081 dm_worker_status_port=10082 mysql_host=172.16.10.81 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 -dm_worker1_2 ansible_host=172.16.10.72 server_id=102 deploy_dir=/data2/dm_worker dm_worker_port=10083 dm_worker_status_port=10084 mysql_host=172.16.10.82 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 +dm_worker1_1 ansible_host=172.16.10.72 server_id=101 deploy_dir=/data1/dm_worker dm_worker_port=8262 mysql_host=172.16.10.81 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 +dm_worker1_2 ansible_host=172.16.10.72 server_id=102 deploy_dir=/data2/dm_worker dm_worker_port=8263 mysql_host=172.16.10.82 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 -dm_worker2_1 ansible_host=172.16.10.73 server_id=103 deploy_dir=/data1/dm_worker dm_worker_port=10081 dm_worker_status_port=10082 mysql_host=172.16.10.83 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 -dm_worker2_2 ansible_host=172.16.10.73 server_id=104 deploy_dir=/data2/dm_worker dm_worker_port=10083 dm_worker_status_port=10084 mysql_host=172.16.10.84 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 +dm_worker2_1 ansible_host=172.16.10.73 server_id=103 deploy_dir=/data1/dm_worker dm_worker_port=8262 mysql_host=172.16.10.83 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 +dm_worker2_2 ansible_host=172.16.10.73 server_id=104 deploy_dir=/data2/dm_worker dm_worker_port=8263 mysql_host=172.16.10.84 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 -## Monitoring modules +## Monitoring modules. [prometheus_servers] prometheus ansible_host=172.16.10.71 @@ -258,7 +259,7 @@ grafana ansible_host=172.16.10.71 [alertmanager_servers] alertmanager ansible_host=172.16.10.71 -## Global variables +## Global variables. [all:vars] cluster_name = test-cluster @@ -283,7 +284,7 @@ Edit the `deploy_dir` variable to configure the deployment directory. The global variable is set to `/home/tidb/deploy` by default, and it applies to all services. If the data disk is mounted on the `/data1` directory, you can set it to `/data1/dm`. For example: ```ini -## Global variables +## Global variables. [all:vars] deploy_dir = /data1/dm ``` @@ -307,12 +308,15 @@ dm-master ansible_host=172.16.10.71 deploy_dir=/data1/deploy | Variable name | Description | | ------------- | ------- | -| server_id | DM-worker connects to MySQL as a slave. This variable is the server_id of the slave. Keep it globally unique in the MySQL cluster, and the value range is 0 ~ 4294967295. | +| source_id | DM-worker binds to a unique database instance or a replication group with the master-slave architecture. When the master and slave switch, you only need to update `mysql_host` or `mysql_port` and do not need to update the `source_id`. | +| server_id | DM-worker connects to MySQL as a slave. This variable is the `server_id` of the slave. Keep it globally unique in the MySQL cluster, and the value range is 0 ~ 4294967295. | | mysql_host | The upstream MySQL host. | | mysql_user | The upstream MySQL username; default "root". | | mysql_password | The upstream MySQL user password. You need to encrypt the password using the `dmctl` tool. See [Encrypt the upstream MySQL user password using dmctl](#encrypt-the-upstream-mysql-user-password-using-dmctl). | | mysql_port | The upstream MySQL port; default 3306. | -| enable_gtid | Whether to use GTID for DM-worker to pull the binlog. It supports the MySQL (and MariaDB) GTID. The prerequisite is that the upstream MySQL has enabled the GTID mode. | +| enable_gtid | Whether DM-worker uses GTID to pull the binlog. The prerequisite is that the upstream MySQL has enabled the GTID mode. | +| relay_binlog_name | Whether DM-worker pulls the binlog starting from the specified binlog file. Only used when the local has no valid relay log. | +| relay_binlog_gtid | Whether DM-worker pulls the binlog starting from the specified GTID. Only used when the local has no valid relay log and `enable_gtid` is true. | | flavor | "flavor" indicates the release type of MySQL. For the official version, Percona, and cloud MySQL, fill in "mysql"; for MariaDB, fill in "mariadb". It is "mysql" by default. | ### Encrypt the upstream MySQL user password using dmctl @@ -325,6 +329,36 @@ $ ./dmctl -encrypt 123456 VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU= ``` +### Configure the relay log synchronization position + +When you start DM-worker for the first time, you need to configure `relay_binlog_name` to specify the position where DM-worker starts to pull the corresponding upstream MySQL or MariaDB binlog. + +```yaml +[dm_worker_servers] +dm-worker1 ansible_host=172.16.10.72 source_id="mysql-replica-01" server_id=101 relay_binlog_name="binlog.000011" mysql_host=172.16.10.72 mysql_user=root mysql_port=3306 + +dm-worker2 ansible_host=172.16.10.73 source_id="mysql-replica-02" server_id=102 relay_binlog_name="binlog.000002" mysql_host=172.16.10.73 mysql_user=root mysql_port=3306 +``` + +> **Note:** If `relay_binlog_name` is not set, DM-worker pulls the binlog starting from the earliest existing binlog file of the upstream MySQL or MariaDB. In this case, it can take a long period of time to pull the latest binlog for the data synchronization task. + +### Enable the relay log GTID synchronization mode + +In a DM cluster, the relay log processing unit of DM-worker communicates with the upstream MySQL or MariaDB to pull its binlog to the local file system. + +You can enable the relay log GTID synchronization mode by configuring the following items. Currently, DM supports MySQL GTID and MariaDB GTID. + +- `enable_gtid`: to enable the relay log GTID synchronization mode to deal with scenarios like master-slave switch +- `relay_binlog_gtid`: to specify the position where DM-worker starts to pull the corresponding upstream MySQL or MariaDB binlog + +```yaml +[dm_worker_servers] +dm-worker1 ansible_host=172.16.10.72 source_id="mysql-replica-01" server_id=101 enable_gtid=true relay_binlog_gtid="aae3683d-f77b-11e7-9e3b-02a495f8993c:1-282967971,cc97fa93-f5cf-11e7-ae19-02915c68ee2e +:1-284361339" mysql_host=172.16.10.72 mysql_user=root mysql_port=3306 + +dm-worker2 ansible_host=172.16.10.73 source_id="mysql-replica-02" server_id=102 relay_binlog_name=binlog.000002 mysql_host=172.16.10.73 mysql_user=root mysql_port=3306 +``` + ## Step 9: Deploy the DM cluster When `ansible-playbook` runs Playbook, the default concurrent number is 5. If many deployment target machines are deployed, you can add the `-f` parameter to specify the concurrency, such as `ansible-playbook deploy.yml -f 10`. @@ -334,8 +368,6 @@ The following example uses `tidb` as the user who runs the service. 1. Edit the `dm-ansible/inventory.ini` file to make sure `ansible_user = tidb`. ```ini - ## Connection - # ssh via normal user ansible_user = tidb ``` @@ -383,20 +415,18 @@ This operation stops all the components in the entire DM cluster in order, which | Component | Port variable | Default port | Description | | :-- | :-- | :-- | :-- | -| DM-master | `dm_master_port` | 11080 | DM-master service communication port | -| DM-master | `dm_master_status_port` | 11081 | DM-master status port | -| DM-worker | `dm_worker_port` | 10081 | DM-worker service communication port | -| DM-worker | `dm_worker_status_port` | 10082 | DM-worker status port | +| DM-master | `dm_master_port` | 8261 | DM-master service communication port | +| DM-worker | `dm_worker_port` | 8262 | DM-worker service communication port | | Prometheus | `prometheus_port` | 9090 | Prometheus service communication port | | Grafana | `grafana_port` | 3000 | The port for the external service of web monitoring service and client (browser) access | | Alertmanager | `alertmanager_port` | 9093 | Alertmanager service communication port | ### Customize ports -Go to the `inventory.ini` file and add related host variable of the corresponding service port after the service IP: +Edit the `inventory.ini` file and add the related host variable of the corresponding service port after the service IP: -``` -dm_master ansible_host=172.16.10.71 dm_master_port=12080 dm_master_status_port=12081 +```ini +dm_master ansible_host=172.16.10.71 dm_master_port=18261 ``` ### Update DM-Ansible @@ -412,8 +442,9 @@ dm_master ansible_host=172.16.10.71 dm_master_port=12080 dm_master_status_port=1 ``` $ cd /home/tidb - $ wget http://download.pingcap.org/dm-ansible.tar.gz - $ tar -xzvf dm-ansible.tar.gz + $ wget http://download.pingcap.org/dm-ansible-latest.tar.gz + $ tar -xzvf dm-ansible-latest.tar.gz + $ mv dm-ansible-latest dm-ansible ``` 3. Migrate the `inventory.ini` configuration file. diff --git a/tools/data-migration-manage-task.md b/tools/data-migration-manage-task.md index e8646ed0bd7da..e33fbe720f3c9 100644 --- a/tools/data-migration-manage-task.md +++ b/tools/data-migration-manage-task.md @@ -17,11 +17,14 @@ This section shows the basic usage of dmctl commands. ```bash $ ./dmctl --help Usage of dmctl: - -V prints version and exit # Prints the version information. - -encrypt string # Encrypts the database password according to the encryption method provided by DM; used in DM configuration files. -​ encrypt plaintext to ciphertext - -master-addr string # dm-master access address. dmctl interacts with dm-master to complete task management operations. -​ master API server addr + # Prints the version information. + -V prints version and exit + # Encrypts the database password according to the encryption method provided by DM; used in DM configuration files. + -encrypt string + encrypt plaintext to ciphertext + # The dm-master access address. dmctl interacts with dm-master to complete task management operations. + -master-addr string + master API server addr ``` ### Database password encryption @@ -36,7 +39,8 @@ VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU= ### Task management overview ```bash -$ ./dmctl -master-addr 172.16.30.14 # Enters the command line mode to interact with DM-master +# Enters the command line mode to interact with DM-master. +$ ./dmctl -master-addr 172.16.30.14 Welcome to dmctl Release Version: v1.0.0-100-g2bef6f8b Git Commit Hash: 2bef6f8beda34c0dff57377005c71589b48aa3c5 @@ -91,10 +95,7 @@ This section describes how to use the task management commands to execute the fo ### Create the data synchronization task -You can use the task management command to create the data synchronization task. When you create the data management task, DM checks the privilege of upstream database instances and the table schema. For the table schemas of all sharded tables in the sharding data synchronization task, DM executes the following two checks: - -- Whether the auto-increment and unique column exists in the table, whether the corresponding `partition id` type of column mapping rule exists, and whether a conflict exists -- Whether the upstream and downstream table schemas to be synchronized are consistent +You can use the task management command to create the data synchronization task. Data Migration [prechecks the corresponding privileges and configuration automatically](#precheck-the-upstream-mysql-instance-configuration) while starting the data synchronization. ```bash » help start-task @@ -142,6 +143,54 @@ start-task [ -w "172.16.30.15:10081"] ./task.yaml } ``` +## Precheck the upstream MySQL instance configuration + +To detect possible errors of data synchronization configuration in advance, DM provides the precheck feature. You can use the `check-task` command to precheck whether the upstream MySQL instance configuration satisfies the DM requirements. + +The user of the upstream and downstream databases must have the corresponding read and write privileges. DM checks the following privileges and configuration automatically while starting the data synchronization task: + ++ MySQL binlog configuration + + - Whether the binlog is enabled (DM requires that the binlog must be enabled) + - Whether `binlog_format=ROW` (DM only supports the binlog synchronization in the ROW format) + - Whether `binlog_row_image=FULL` (DM only supports `binlog_row_image=FULL`) + ++ The privileges of the upstream MySQL instance user + + The MySQL user in DM configuration needs to have the following privileges at least: + + - REPLICATION SLAVE + - REPLICATION CLIENT + - RELOAD + - SELECT + ++ The compatibility of the upstream MySQL table schema + + TiDB differs from MySQL in compatibility in the following aspects: + + - Does not support the foreign key + - [Character set compatibility differs](../sql/character-set-support.md) + ++ The consistency check on the upstream MySQL multiple-instance shards + + + The schema consistency of all sharded tables + + - Column size + - Column name + - Column position + - Column type + - Primary key + - Unique index + + + The conflict of the auto increment primary keys in the sharded tables + + - The check fails in the following two conditions: + + - The auto increment primary key exists in the sharded tables and its column type *is not* bigint. + - The auto increment primary key exists in the sharded tables and its column type *is* bigint, but column mapping *is not* configured. + + - The check succeeds in other conditions except the two above. + ### Check the data synchronization task status You can use the task management command to check the status of the data synchronization task. @@ -499,43 +548,6 @@ update-task [-w "127.0.0.1:10181"] ./task.yaml } ``` -## Check the upstream MySQL instance configuration - -To check whether the upstream MySQL instance configuration satisfies the DM requirements, use the `check-task` command. - -The user of the upstream and downstream databases must have the corresponding read and write privileges. Data Migration checks the following privileges automatically while starting the data synchronization task: - -+ MySQL binlog configuration - - - Whether the binlog is enabled (DM requires that the binlog must be enabled) - - Whether `binlog_format=ROW` (DM only supports the binlog synchronization in the ROW format) - - Whether `binlog_row_image=FULL` (DM only supports `binlog_row_image=FULL`) - -+ The privileges of the upstream MySQL instance user - - The MySQL user in DM configuration needs to have the following privileges at least: - - - REPLICATION SLAVE - - REPLICATION CLIENT - - RELOAD - - SELECT - -+ The compatibility of the upstream MySQL table schema - - TiDB differs from MySQL in compatibility in the following aspects: - - - Does not support the foreign key - - [Character set compatibility differs](../sql/character-set-support.md) - -+ The consistency check on the upstream MySQL multiple-instance shards - - - The consistency of the table schema - - - Column name, type - - Index - - - Whether the auto increment primary key that conflicts during merging exists - ## Manage the DDL locks See [Troubleshooting Sharding DDL Locks](../tools/troubleshooting-sharding-ddl-locks.md). diff --git a/tools/data-migration-overview.md b/tools/data-migration-overview.md index 009be9efe6ead..d05cebca2b10d 100644 --- a/tools/data-migration-overview.md +++ b/tools/data-migration-overview.md @@ -44,28 +44,55 @@ dmctl is the command line tool used to control the DM cluster. - Handling the errors during data synchronization tasks - Verifying the configuration correctness of data synchronization tasks -## Data synchronization introduction +## Data synchronization features -This section describes the data synchronization feature provided by Data Migration in detail. +This section describes the data synchronization features provided by the Data Migration tool. + +### Schema and table routing + +The [schema and table routing](../tools/dm-data-synchronization-features.md#table-routing) feature means that DM can synchronize a certain table of the upstream MySQL or MariaDB instance to the specified table in the downstream, which can be used to merge or synchronize the sharding data. ### Black and white lists synchronization at the schema and table levels -The black and white lists filtering rule of the upstream database instances is similar to MySQL replication-rules-db/tables, which can be used to filter or only synchronize all operations of some databases or some tables. +The [black and white lists filtering rule](../tools/dm-data-synchronization-features.md#black-and-white-table-lists) of the upstream database instance tables is similar to MySQL `replication-rules-db`/`replication-rules-table`, which can be used to filter or only synchronize all operations of some databases or some tables. ### Binlog event filtering -Binlog event filtering is a more fine-grained filtering rule than the black and white lists filtering rule at the schema and table levels. You can use statements like `INSERT` and `TRUNCATE TABLE` to specify the Binlog events of the database(s) or table(s) that you need to synchronize or filter out. +[Binlog event filtering](../tools/dm-data-synchronization-features.md#binlog-event-filtering) is a more fine-grained filtering rule than the black and white lists filtering rule. You can use statements like `INSERT` or `TRUNCATE TABLE` to specify the binlog events of `schema/table` that you need to synchronize or filter out. ### Column mapping -Column mapping is used to resolve the conflicts occurred when the sharding auto-increment primary key IDs are merged for sharded tables. The value of the auto-increment primary key ID can be modified according to the instance-id, which is configured by the user, and the schema/table ID. +The [column mapping](../tools/dm-data-synchronization-features.md#column-mapping) feature means that the table column value can be modified according to the built-in expression specified by the user, which can be used to resolve the conflicts of the sharding auto-increment primary key IDs. ### Sharding support -DM supports merging the original sharded instances and tables into TiDB, with some restrictions. +DM supports merging the original sharded instances and tables into TiDB, but with [some restrictions](../tools/dm-sharding-solution.md#sharding-ddl-usage-restrictions). + +## Usage restrictions + +Before using the DM tool, note the following restrictions: + ++ Database version + + - 5.5 < MySQL version < 5.8 + - MariaDB version >= 10.1.2 + + Data Migration [prechecks the corresponding privileges and configuration automatically](../tools/data-migration-manage-task.md#precheck-the-upstream-mysql-instance-configuration) while starting the data synchronization task using dmctl. + ++ DDL syntax + + - Currently, TiDB is not compatible with all the DDL statements that MySQL supports. Because DM uses the TiDB parser to process DDL statements, it only supports the DDL syntax supported by the TiDB parser. For details, see [the DDL statements supported by TiDB](../sql/ddl.md). + + - DM reports an error when it encounters an incompatible DDL statement. To solve this error, you need to manually handle it using dmctl, either skipping this DDL statement or replacing it with a specified DDL statement(s). For details, see [Skip or replace abnormal SQL statements](../tools/data-migration-troubleshooting.md#skip-or-replace-abnormal-sql-statements). + ++ Sharding + + - If conflict exists between sharded tables, *only columns with the auto increment primary key* encounter the conflict, and the *column type is bigint*, solve the conflict using [column mapping](../tools/dm-data-synchronization-features.md#column-mapping). Otherwise, data synchronization is not supported. Conflicting data can cover each other and cause data loss. + + - For other sharding restrictions, see [Sharding DDL usage restrictions](../tools/dm-sharding-solution.md#sharding-ddl-usage-restrictions). -### Incompatible DDL handling ++ Operations -Currently, TiDB is not compatible with all the DDL statements that MySQL supports. See [the DDL statements supported by TiDB](../sql/ddl.md). + - After DM-worker is restarted, the data synchronization task cannot be automatically restored. You need to manually run `start-task`. For details, see [Manage the Data Synchronization Task](../tools/data-migration-manage-task.md). -DM reports an error when it encounters an incompatible DDL statement. To solve this error, you need to manually handle it using dmctl, either skipping this DDL statement or replacing it with a specified DDL statement(s). For details, see [Skip or replace abnormal SQL statements](../tools/data-migration-troubleshooting.md#skip-or-replace-abnormal-sql-statements). + - After DM-worker or DM-master is restarted, the DDL lock synchronization cannot be automatically restored in some conditions. You need to manually handle it. For details, see [Troubleshooting Sharding DDL Locks](../tools/troubleshooting-sharding-ddl-locks.md). \ No newline at end of file diff --git a/tools/data-migration-practice.md b/tools/data-migration-practice.md index c005fd8dab634..9d29c9b4f1774 100644 --- a/tools/data-migration-practice.md +++ b/tools/data-migration-practice.md @@ -14,7 +14,7 @@ It is recommended to deploy the DM cluster using DM-Ansible. For detailed deploy > **Notes**: > -> - For database related passwords in all the DM configuration files, use the passwords encrypted by `dmctl`. If a database password is empty, it is unnecessary to encrypt it. See [Encrypt the upstream MySQL user password using dmctl](../tools/data-migration-deployment.md#encrypt-the-upstream-mysql-user-password-using-dmctl). +> - For database passwords in all the DM configuration files, use the passwords encrypted by `dmctl`. If a database password is empty, it is unnecessary to encrypt it. See [Encrypt the upstream MySQL user password using dmctl](../tools/data-migration-deployment.md#encrypt-the-upstream-mysql-user-password-using-dmctl). > - The user of the upstream and downstream databases must have the corresponding read and write privileges. ## Step 2: Check the cluster information @@ -23,15 +23,15 @@ After the DM cluster is deployed using DM-Ansible, the configuration information - The configuration information of related components in the DM cluster: - | Component | IP | Service port | + | Component | Host | Port | |------| ---- | ---- | - | DM-worker | 172.16.10.72 | 10081 | - | DM-worker | 172.16.10.73 | 10081 | - | DM-master | 172.16.10.71 | 11080 | + | dm_worker1 | 172.16.10.72 | 8262 | + | dm_worker2 | 172.16.10.73 | 8262 | + | dm_master | 172.16.10.71 | 8261 | - The information of upstream and downstream database instances: - | Database instance | IP | Port | Username | Encrypted password | + | Database instance | Host | Port | Username | Encrypted password | | -------- | --- | --- | --- | --- | | Upstream MySQL-1 | 172.16.10.81 | 3306 | root | VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU= | | Upstream MySQL-2 | 172.16.10.82 | 3306 | root | VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU= | @@ -43,19 +43,19 @@ After the DM cluster is deployed using DM-Ansible, the configuration information # Master configuration. [[deploy]] - mysql-instance = "172.16.10.81:3306" - dm-worker = "172.16.10.72:10081" + source-id = "mysql-replica-01" + dm-worker = "172.16.10.72:8262" [[deploy]] - mysql-instance = "172.16.10.82:3306" - dm-worker = "172.16.10.73:10081" + source-id = "mysql-replica-02" + dm-worker = "172.16.10.73:8262" ``` ## Step 3: Configure the data synchronization task The following example assumes that you need to synchronize all the `test_table` table data in the `test_db` database of both the upstream MySQL-1 and MySQL-2 instances, to the downstream `test_table` table in the `test_db` database of TiDB, in the full data plus incremental data mode. -You can refer to the `task.yaml.example` task configuration template in `{ansible deploy}/conf`, and then copy, edit, and generate the `task.yaml` task configuration file as below: +Copy the `{ansible deploy}/conf/task.yaml.example` file and edit it to generate the `task.yaml` task configuration file as below: ```yaml # The task name. You need to use a different name for each of the multiple tasks that @@ -63,9 +63,6 @@ You can refer to the `task.yaml.example` task configuration template in `{ansibl name: "test" # The full data plus incremental data (all) synchronization mode task-mode: "all" -# Disables the heartbeat synchronization delay calculation -disable-heartbeat: true - # The downstream TiDB configuration information target-database: host: "172.16.10.83" @@ -73,18 +70,11 @@ target-database: user: "root" password: "" -# All the upstream MySQL instances that the current task needs to use +# Configuration of all the upstream MySQL instances required by the current data synchronization task mysql-instances: - - config: # MySQL-1 configuration - host: "172.16.10.81" - port: 3306 - user: "root" - # The ciphertext generated by a certain encryption of the plaintext `123456`. - # The ciphertext generated by each encryption is different. - password: "VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=" - # The instance ID of MySQL-1, corresponding to the `mysql-instance` in "dm-master.toml" - instance-id: "172.16.10.81:3306" + # The ID of upstream instances or the replication group. You can refer to the configuration of `source_id` in the "inventory.ini" file or in the "dm-master.toml" file. + source-id: "mysql-replica-01" # The configuration item name of the black and white lists of the name of the # database/table to be synchronized, used to quote the global black and white # lists configuration that is set in the global black-white-list map below. @@ -93,12 +83,7 @@ mysql-instances: mydumper-config-name: "global" - - config: # MySQL-2 configuration - host: "172.16.10.82" - port: 3306 - user: "root" - password: "VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=" - instance-id: "172.16.10.82:3306" + source-id: "mysql-replica-02" black-white-list: "global" mydumper-config-name: "global" @@ -106,15 +91,15 @@ mysql-instances: # configuration item name. black-white-list: global: - do-tables: # The upstream tables to be synchronized - - db-name: "test_db" # The database name of the table to be synchronized - tbl-name: "test_table" # The name of the table to be synchronized + do-tables: # The white list of upstream tables to be synchronized. + - db-name: "test_db" # The database name of the table to be synchronized. + tbl-name: "test_table" # The name of the table to be synchronized. # mydumper global configuration. Each instance can quote it by the configuration item name. mydumpers: global: - mydumper-path: "./bin/mydumper" # The file path of the mydumper binary - extra-args: "-B test_db -T test_table" # Only dumps the "test_table" table of the "test_db" database + mydumper-path: "./bin/mydumper" # The file path of the mydumper binary. + extra-args: "-B test_db -T test_table" # Only dumps the "test_table" table of the "test_db" database. It can configure any mydumper argument. ``` ## Step 4: Start the data synchronization task @@ -124,14 +109,14 @@ mydumpers: 2. Run the following command to start dmctl. ```bash - ./dmctl --master-addr 172.16.10.71:11080 + ./dmctl --master-addr 172.16.10.71:8261 ``` 3. Run the following command to start the data synchronization tasks. ```bash # `task.yaml` is the configuration file that is edited above. - start-task task.yaml + start-task ./task.yaml ``` - If the above command returns the following result, it indicates the task is successfully started. @@ -143,19 +128,19 @@ mydumpers: "workers": [ { "result": true, - "worker": "172.16.10.72:10081", + "worker": "172.16.10.72:8262", "msg": "" }, { "result": true, - "worker": "172.16.10.73:10081", + "worker": "172.16.10.73:8262", "msg": "" } ] } ``` - - If the above command returns other information, you can edit the configuration according to the prompt, and then run the `start-task task.yaml` command to restart the task. + - If you fail to start the data synchronization task, modify the configuration according to the returned prompt and then run the `start-task task.yaml` command to restart the task. ## Step 5: Check the data synchronization task diff --git a/tools/data-migration-troubleshooting.md b/tools/data-migration-troubleshooting.md index 87154a45ecc19..b009977410ff6 100644 --- a/tools/data-migration-troubleshooting.md +++ b/tools/data-migration-troubleshooting.md @@ -28,7 +28,7 @@ However, you need to reset the data synchronization task in some cases. For deta For database related passwords in all the DM configuration files, use the passwords encrypted by `dmctl`. If a database password is empty, it is unnecessary to encrypt it. For how to encrypt the plaintext password, see [Encrypt the upstream MySQL user password using dmctl](../tools/data-migration-deployment.md#encrypt-the-upstream-mysql-user-password-using-dmctl). -In addition, the user of the upstream and downstream databases must have the corresponding read and write privileges. Data Migration also [checks the corresponding privileges automatically](../tools/data-migration-manage-task.md#check-the-upstream-mysql-instance-configuration) while starting the data synchronization task. +In addition, the user of the upstream and downstream databases must have the corresponding read and write privileges. Data Migration also [prechecks the corresponding privileges automatically](../tools/data-migration-manage-task.md#precheck-the-upstream-mysql-instance-configuration) while starting the data synchronization task. ### Incompatible DDL statements diff --git a/tools/dm-configuration-file-overview.md b/tools/dm-configuration-file-overview.md index 5d642a706b484..ea2645bdc9482 100644 --- a/tools/dm-configuration-file-overview.md +++ b/tools/dm-configuration-file-overview.md @@ -34,9 +34,7 @@ You can perform the following steps to create a data synchronization task based This section shows description of some important concepts. -| Concept | Description | Configuration File | -| ------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| `instance-id` | Specifies a MySQL/MariaDB instance (if you deploy DM using DM-ansible, `host:port` is used to construct this ID) | `mysql-instance` of `dm-master.toml`;
`instance-id` of `task.yaml` | -| DM-worker ID | Specifies a DM-worker (from the `worker-addr` parameter of `dm-worker.toml`) | `worker-addr` of `dm-worker.toml`;
the `-worker`/`-w` flag of the dmctl command line | - -> **Note:** You must keep `mysql-instance` and DM-worker at a one-to-one relationship. +| Concept | Description | Configuration File | +| :------ | :--------- | :------------- | +| `source-id` | Uniquely identifies a MySQL or MariaDB instance, or a replication group with the master-slave structure. | `source_id` of `inventory.ini`;
`source-id` of `dm-master.toml`;
`source-id` of `task.yaml` | +| DM-worker ID | Uniquely identifies a DM-worker (by the `worker-addr` parameter of `dm-worker.toml`) | `worker-addr` of `dm-worker.toml`;
the `-worker`/`-w` flag of the dmctl command line | diff --git a/tools/dm-data-synchronization-features.md b/tools/dm-data-synchronization-features.md new file mode 100644 index 0000000000000..be031eb17a05b --- /dev/null +++ b/tools/dm-data-synchronization-features.md @@ -0,0 +1,454 @@ +--- +title: Data Synchronization Features +summary: Learn about the data synchronization features provided by the Data Migration tool. +category: tools +--- + +# Data Synchronization Features + +This document describes the data synchronization features provided by the Data Migration tool and explains the configuration of corresponding parameters. + +## Table routing + +The table routing feature enables DM to synchronize a certain table of the upstream MySQL or MariaDB instance to the specified table in the downstream. + +> **Note:** +> +> - Configuring multiple different routing rules for a single table is not supported. +> - The match rule of schema needs to be configured separately, which is used to synchronize `create/drop schema xx`, as shown in `rule-2` of the [parameter configuration](#parameter-configuration). + +### Parameter configuration + +```yaml +routes: + rule-1: + schema-pattern: "test_*" + table-pattern: "t_*" + target-schema: "test" + target-table: "t" + rule-2: + schema-pattern: "test_*" + target-schema: "test" +``` + +### Parameter explanation + +DM synchronizes the upstream MySQL or MariaDB instance table that matches the [`schema-pattern`/`table-pattern` rule provided by Table selector](../tools/dm-table-selector.md) to the downstream `target-schema`/`target-table`. + +### Usage examples + +This sections shows the usage examples in different scenarios. + +#### Merge sharded schemas and tables + +Assuming in the scenario of sharded schemas and tables, you want to synchronize the `test_{1,2,3...}`.`t_{1,2,3...}` tables in two upstream MySQL instances to the `test`.`t` table in the downstream TiDB instance. + +To synchronize the upstream instances to the downstream `test`.`t`, you must create two routing rules: + +- `rule-1` is used to synchronize DML or DDL statements of the table that matches `schema-pattern: "test_*"` and `table-pattern: "t_*"` to the downstream `test`.`t`. +- `rule-2` is used to synchronize DDL statements of the schema that matches `schema-pattern: "test_*"`, such as `create/drop schema xx`. + +> **Note:** +> +> - If the downstream `schema: test` already exists and will not be deleted, you can omit `rule-2`. +> - If the downstream `schema: test` does not exist and only `rule-1` is configured, then it reports the `schema test doesn't exist` error during synchronization. + +```yaml + rule-1: + schema-pattern: "test_*" + table-pattern: "t_*" + target-schema: "test" + target-table: "t" + rule-2: + schema-pattern: "test_*" + target-schema: "test" +``` + +#### Merge sharded schemas + +Assuming in the scenario of sharded schemas, you want to synchronize the `test_{1,2,3...}`.`t_{1,2,3...}` tables in the two upstream MySQL instances to the `test`.`t_{1,2,3...}` tables in the downstream TiDB instance. + +To synchronize the upstream schemas to the downstream `test`.`t_[1,2,3]`, you only need to create one routing rule. + +```yaml + rule-1: + schema-pattern: "test_*" + target-schema: "test" +``` + +#### Incorrect table routing + +Assuming that the following two routing rules are configured and `test_1_bak`.`t_1_bak` matches both `rule-1` and `rule-2`, an error is reported because the table routing configuration violates the number limitation. + +```yaml + rule-1: + schema-pattern: "test_*" + table-pattern: "t_*" + target-schema: "test" + target-table: "t" + rule-2: + schema-pattern: "test_1_bak" + table-pattern: "t_1_bak" + target-schema: "test" + target-table: "t_bak" +``` + +## Black and white table lists + +The black and white lists filtering rule of the upstream database instance tables is similar to MySQL replication-rules-db/tables, which can be used to filter or only synchronize all operations of some databases or some tables. + +### Parameter configuration + +```yaml +black-white-list: + rule-1: + do-dbs: ["~^test.*"] # Starting with "~" indicates it is a regular expression. +​ ignore-dbs: ["mysql"] + do-tables: + - db-name: "~^test.*" + tbl-name: "~^t.*" + - db-name: "test" + tbl-name: "t" + ignore-tables: + - db-name: "test" + tbl-name: "log" +``` + +### Parameter explanation + +- `do-dbs`: white lists of the schemas to be synchronized +- `ignore-dbs`: black lists of the schemas to be synchronized +- `do-tables`: white lists of the tables to be synchronized +- `ignore-tables`: black lists of the tables to be synchronized +- In black and white lists, starting with the "~" character indicates it is a [regular expression](https://golang.org/pkg/regexp/syntax/#hdr-Syntax). + +### Filtering process + +The filtering process is as follows: + +1. Filter at the schema level: + + - If `do-dbs` is not empty, judge whether a matched schema exists in `do-dbs`. + + - If yes, continue to filter at the table level. + - If not, filter `test`.`t`. + + - If `do-dbs` is empty and `ignore-dbs` is not empty, judge whether a matched schema exits in `ignore-dbs`. + + - If yes, filter `test`.`t`. + - If not, continue to filter at the table level. + + - If both `do-dbs` and `ignore-dbs` are empty, continue to filter at the table level. + +2. Filter at the table level: + + 1. If `do-tables` is not empty, judge whether a matched table exists in `do-tables`. + + - If yes, synchronize `test`.`t`. + - If not, filter `test`.`t`. + + 2. If `ignore-tables` is not empty, judge whether a matched table exists in `ignore-tables`. + + - If yes, filter `test`.`t`. + - If not, synchronize `test`.`t`. + + 3. If both `do-tables` and `ignore-tables` are empty, synchronize `test`.`t`. + +> **Note:** To judge whether the schema `test` is filtered, you only need to filter at the schema level. + +### Usage example + +Assume that the upstream MySQL instances include the following tables: + +``` +`logs`.`messages_2016` +`logs`.`messages_2017` +`logs`.`messages_2018` +`forum`.`users` +`forum`.`messages` +`forum_backup_2016`.`messages` +`forum_backup_2017`.`messages` +`forum_backup_2018`.`messages` +``` + +The configuration is as follows: + +```yaml +black-white-list: + bw-rule: + do-dbs: ["forum_backup_2018", "forum"] + ignore-dbs: ["~^forum_backup_"] + do-tables: + - db-name: "logs" + tbl-name: "~_2018$" + - db-name: "~^forum.*" +​ tbl-name: "messages" + ignore-tables: + - db-name: "~.*" +​ tbl-name: "^messages.*" +``` + +After using the `bw-rule` rule: + +| Table | Whether to filter | Why filter | +|:----|:----|:--------------| +| `logs`.`messages_2016` | Yes | The schema `logs` fails to match any `do-dbs`. | +| `logs`.`messages_2017` | Yes | The schema `logs` fails to match any `do-dbs`. | +| `logs`.`messages_2018` | Yes | The schema `logs` fails to match any `do-dbs`. | +| `forum_backup_2016`.`messages` | Yes | The schema `forum_backup_2016` fails to match any `do-dbs`. | +| `forum_backup_2017`.`messages` | Yes | The schema `forum_backup_2017` fails to match any `do-dbs`. | +| `forum`.`users` | Yes | 1. The schema `forum` matches `do-dbs` and continues to filter at the table level.
2. The schema and table fail to match any of `do-tables` and `ignore-tables` and `do-tables` is not empty. | +| `forum`.`messages` | No | 1. The schema `forum` matches `do-dbs` and continues to filter at the table level.
2. The table `messages` is in the `db-name: "~^forum.*",tbl-name: "messages"` of `do-tables`. | +| `forum_backup_2018`.`messages` | No | 1. The schema `forum_backup_2018` matches `do-dbs` and continues to filter at the table level.
2. The schema and table match the `db-name: "~^forum.*",tbl-name: "messages"` of `do-tables`. | + +## Binlog event filtering + +Binlog event filtering is a more fine-grained filtering rule than the black and white lists filtering rule. You can use statements like `INSERT` or `TRUNCATE TABLE` to specify the binlog events of `schema/table` that you need to synchronize or filter out. + +> **Note:** If a same table matches multiple rules, these rules are applied in order and the black list has priority over the white list. This means if both the `Ignore` and `Do` rules are applied to a single table, the `Ignore` rule takes effect. + +### Parameter configuration + +```yaml +filters: + rule-1: + schema-pattern: "test_*" + ​table-pattern: "t_*" + ​events: ["truncate table", "drop table"] + sql-pattern: ["^DROP\\s+PROCEDURE", "^CREATE\\s+PROCEDURE"] + ​action: Ignore +``` + +### Parameter explanation + +- [`schema-pattern`/`table-pattern`](../tools/dm-table-selector.md): the binlog events or DDL SQL statements of upstream MySQL or MariaDB instance tables that match `schema-pattern`/`table-pattern` are filtered by the rules below. + +- `events`: the binlog event array. + + | Events | Type | Description | + | --------------- | ---- | ----------------------------- | + | `all` | | Includes all the events below | + | `all dml` | | Includes all DML events below | + | `all ddl` | | Includes all DDL events below | + | `none` | | Includes none of the events below | + | `none ddl` | | Includes none of the DDL events below | + | `none dml` | | Includes none of the DML events below | + | `insert` | DML | The `INSERT` DML event | + | `update` | DML | The `UPDATE` DML event | + | `delete` | DML | The `DELETE` DML event | + | `create database` | DDL | The `CREATE DATABASE` DDL event | + | `drop database` | DDL | The `DROP DATABASE` DDL event | + | `create table` | DDL | The `CREATE TABLE` DDL event | + | `create index` | DDL | The `CREATE INDEX` DDL event | + | `drop table` | DDL | The `DROP TABLE` DDL event | + | `truncate table` | DDL | The `TRUNCATE TABLE` DDL event | + | `rename table` | DDL | The `RENAME TABLE` DDL event | + | `drop index` | DDL | The `DROP INDEX` DDL event | + | `alter table` | DDL | The `ALTER TABLE` DDL event | + +- `sql-pattern`: it is used to filter specified DDL SQL statements. The matching rule supports using a regular expression. For example, `"^DROP\\s+PROCEDURE"`. + +- `action`: the string (`Do`/`Ignore`). Based on the following rules, it judges whether to filter. If either of the two rules is satisfied, the binlog will be filtered; otherwise, the binlog will not be filtered. + + - `Do`: the white list. The binlog will be filtered in either of the following two conditions: + - The type of the event is not in the `event` list of the rule. + - The SQL statement of the event cannot be matched by `sql-pattern` of the rule. + - `Ignore`: the black list. The binlog will be filtered in either of the following two conditions: + - The type of the event is in the `event` list of the rule. + - The SQL statement of the event can be matched by `sql-pattern` of the rule. + +### Usage examples + +This sections shows the usage examples in the scenario of sharding (sharded schemas and tables). + +#### Filter all sharding deletion operations + +To filter out all deletion operations, configure the following two filtering rules: + +- `filter-table-rule` filters out the `truncate table`, `drop table` and `delete statement` operations of all tables that match the `test_*`.`t_*` pattern. +- `filter-schema-rule` filters out the `drop database` operation of all schemas that match the `test_*` pattern. + +```yaml +filters: + filter-table-rule: + schema-pattern: "test_*" + table-pattern: "t_*" + events: ["truncate table", "drop table", "delete"] + action: Ignore + filter-schema-rule: + schema-pattern: "test_*" + events: ["drop database"] + action: Ignore +``` + +#### Only synchronize sharding DML statements + +To only synchronize sharding DML statements, configure the following two filtering rules: + +- `do-table-rule` only synchronizes the `create table`, `insert`, `update` and `delete` statements of all tables that match the `test_*`.`t_*` pattern. +- `do-schema-rule` only synchronizes the `create database` statement of all schemas that match the `test_*` pattern. + +> **Note:** The reason why the `create database/table` statement is synchronized is that you can synchronize DML statements only after the schema and table are created. + +```yaml +filters: + do-table-rule: + schema-pattern: "test_*" + table-pattern: "t_*" + events: ["create table", "all dml"] + action: Do + do-schema-rule: + schema-pattern: "test_*" + events: ["create database"] + action: Do +``` + +#### Filter out the SQL statements that TiDB does not support + +To filter out the `PROCEDURE` statement that TiDB does not support, configure the following `filter-procedure-rule`: + +```yaml +filters: + filter-procedure-rule: + schema-pattern: "test_*" + table-pattern: "t_*" + sql-pattern: ["^DROP\\s+PROCEDURE", "^CREATE\\s+PROCEDURE"] + action: Ignore +``` + +`filter-procedure-rule` filters out the `^CREATE\\s+PROCEDURE` and `^DROP\\s+PROCEDURE` statements of all tables that match the `test_*`.`t_*` pattern. + +## Column mapping + +The column mapping feature supports modifying the value of table columns. You can execute different modification operations on the specified column according to different expressions. Currently, only the built-in expressions provided by DM are supported. + +> **Note:** +> +> - It does not support modifying the column type and the table schema. +> - It does not support configuring multiple different column mapping rules for a same table. + +### Parameter configuration + +```yaml +column-mappings: + rule-1: +​ schema-pattern: "test_*" +​ table-pattern: "t_*" +​ expression: "partition id" +​ source-column: "id" +​ target-column: "id" +​ arguments: ["1", "test_", "t_"] + rule-2: +​ schema-pattern: "test_*" +​ table-pattern: "t_*" +​ expression: "partition id" +​ source-column: "id" +​ target-column: "id" +​ arguments: ["2", "test_", "t_"] +``` + +### Parameter explanation + +- [`schema-pattern`/`table-pattern`](../tools/dm-table-selector.md): to execute column value modifying operations on the upstream MySQL or MariaDB instance tables that match the `schema-pattern`/`table-pattern` filtering rule. +- `source-column`, `target-column`: to modify the value of the `source-column` column according to specified `expression` and assign the new value to `target-column`. +- `expression`: the expression used to modify data. Currently, only the `partition id` built-in expression is supported. + +#### The `partition id` expression + +`partition id` is used to resolve the conflicts of auto-increment primary keys of sharded tables. + +**`partition id` restrictions** + +Note the following restrictions: + +- The `partition id` expression only supports the bigint type of atuo-increment primary key. +- The schema name format must be `the schema prefix + number (the schema ID)`. For example, it supports `s_1`, but does not support `s_a`. +- The table name format must be `the table name + number (the table ID)`. +- Restrictions on sharding size: + - It supports 16 MySQL or MariaDB instances at most (0 <= instance ID <= 15). + - Each instance supports 128 schemas at most (0 <= schema ID <= 127). + - Each schema of each instance supports 256 tables at most (0 <= table ID <= 255). + - The ID range of the auto-increment primary key is "0 <= ID <= 17592186044415". + - The `{instance ID、schema ID、table ID}` group must be unique. +- Currently, the `partition id` expression is a customized feature. If you want to modify this feature, contact the corresponding developers. + +**`partition id` arguments configuration** + +Configure the following three arguments in order: + +- `instance_id`: the ID of the upstream sharded MySQL or MariaDB instance (0 <= instance ID <= 15) +- The schema prefix: used to parse the schema name and get the `schema ID` +- The table prefix: used to parse the table name and get the `table ID` + +**`partition id` expression rules** + +`partition id` fills the beginning bit of the auto-increment primary key ID with the argument number, and computes an int64 (MySQL bigint) type of value. The specific rules are as follows: + +- int64 bit indicates `[1:1 bit] [2:4 bits] [3:7 bits] [4:8 bits] [5: 44 bits]`. +- `1`: the sign bit, reserved +- `2`: the instance ID, 4 bits by default +- `3`: the schema ID, 7 bits by default +- `4`: the table ID, 8 bits by default +- `5`: the auto-increment primary key ID, 44 bits by default + +### Usage example + +Assuming in the sharding scenario where all tables have the auto-increment primary key, you want to synchronize two upstream MySQL instances `test_{1,2,3...}`.`t_{1,2,3...}` to the downstream TiDB instances `test`.`t`. + +Configure the following two rules: + +```yaml +column-mappings: + rule-1: +​ schema-pattern: "test_*" +​ table-pattern: "t_*" +​ expression: "partition id" +​ source-column: "id" +​ target-column: "id" +​ arguments: ["1", "test_", "t_"] + rule-2: +​ schema-pattern: "test_*" +​ table-pattern: "t_*" +​ expression: "partition id" +​ source-column: "id" +​ target-column: "id" +​ arguments: ["2", "test_", "t_"] +``` + +- The column ID of the MySQL instance 1 table `test_1`.`t_1` is converted from `1` to `1 << (64-1-4) | 1 << (64-1-4 -7) | 1 << 44 | 1 = 580981944116838401`. +- The row ID of the MySQL instance 2 table `test_1`.`t_2` is converted from `2` to `2 << (64-1-4) | 1 << (64-1-4 -7) | 2 << 44 | 2 = 1157460288606306306`. + +## Synchronization delay monitoring + +The heartbeat feature supports calculating the real-time synchronization delay between each synchronization task and MySQL or MariaDB based on real synchronization data. + +> **Note:** +> +> - The estimation accuracy of the synchronization delay is at the second level. +> - The heartbeat related binlog will not be synchronized into the downstream, which is discarded after calculating the synchronization delay. + +### System privileges + +If the heartbeat feature is enabled, the upstream MySQL or MariaDB instances must provide the following privileges: + +- SELECT +- INSERT +- CREATE (databases, tables) + +### Parameter configuration + +In the task configuration file, enable the heartbeat feature: + +``` +enable-heartbeat: true +``` + +### Principles introduction + +- DM-worker creates the `dm_heartbeat` (currently unconfigurable) schema in the corresponding upstream MySQL or MariaDB. +- DM-worker creates the `heartbeat` (currently unconfigurable) table in the corresponding upstream MySQL or MariaDB. +- DM-worker uses `replace statement` to update the current `TS_master` timestamp every second (currently unconfigurable) in the corresponding upstream MySQL or MariaDB `dm_heartbeat`.`heartbeat` tables. +- DM-worker updates the `TS_slave_task` synchronization time after each synchronization task obtains the `dm_heartbeat`.`heartbeat` binlog. +- DM-worker queries the current `TS_master` timestamp in the corresponding upstream MySQL or MariaDB `dm_heartbeat`.`heartbeat` tables every 10 seconds, and calculates `task_lag` = `TS_master` - `TS_slave_task` for each task. + +See the `replicate lag` in the [binlog replication](../tools/dm-monitor.md#binlog-replication) processing unit of DM monitoring metrics. \ No newline at end of file diff --git a/tools/dm-table-selector.md b/tools/dm-table-selector.md new file mode 100644 index 0000000000000..6717df815b2f7 --- /dev/null +++ b/tools/dm-table-selector.md @@ -0,0 +1,44 @@ +--- +title: Table Selector +summary: Learn about Table Selector used by the table routing, binlog event filtering, and column mapping rule of Data Migration. +category: tools +--- + +# Table Selector + +Table selector provides a match rule based on [wildcard characters](https://en.wikipedia.org/wiki/Wildcard_character) for schema/table. To match a specified table, configure `schema-pattern`/`table-pattern`. + +## Wildcard character + +Table selector uses the following two wildcard characters in `schema-pattern`/`table-pattern`: + ++ The asterisk character (`*`, also called "star") + + - `*` matches zero or more characters. For example, `doc*` matches `doc` and `document` but not `dodo`. + - `*` can only be placed at the end of the word. For example, `doc*` is supported, while `do*c` is not supported. + ++ The question mark (`?`) + + `?` matches exactly one character except the empty character. + +## Match rules + +- `schema-pattern` cannot be empty. +- `table-pattern` can be empty. When you configure it as empty, only `schema` is matched according to `schema-pattern`. +- When `table-pattern` is not empty, the `schema` is matched according to `schema-pattern` and `table` is matched according to `table-pattern`. Only when both `schema` and `table` are successfully matched, you can get the match result. + +## Usage examples + +- Matching all schemas and tables that have a `schema_` prefix in the schema name: + + ```yaml + schema-pattern: "schema_*" + table-pattern: "" + ``` + +- Matching all tables that have a `schema_` prefix in the schema name and a `table_` prefix in the table name: + + ```yaml + schema-pattern = "schema_*" + table-pattern = "table_*" + ``` \ No newline at end of file diff --git a/tools/dm-task-config-argument-description.md b/tools/dm-task-config-argument-description.md deleted file mode 100644 index 5543bb5fbeecb..0000000000000 --- a/tools/dm-task-config-argument-description.md +++ /dev/null @@ -1,202 +0,0 @@ ---- -title: Data Migration Task Configuration Options -summary: This document introduces the configuration options that apply to Data Migration tasks. -category: tools ---- - -# Data Migration Task Configuration Options - -This document introduces the configuration options that apply to Data Migration tasks. - -## `task-mode` - -- String (`full`/`incremental`/`all`) -- The task mode of data migration to be executed -- Default value: `all` - - - `full`: Only makes a full backup of the upstream database and then restores it to the downstream database. - - `incremental`: Only synchronizes the incremental data of the upstream database to the downstream database using the binlog. - - `all`: `full` + `incremental`. Makes a full backup of the upstream database, imports the full data to the downstream database, and then uses the binlog to make an incremental synchronization to the downstream database starting from the exported position during the full backup process (binlog position/GTID). - -## Routing rule - -``` -# `schema-pattern`/`table-pattern` uses the wildcard matching rule -schema level: - schema-pattern: "test_*" - target-schema: "test" - -table level: - schema-pattern: "test_*" - table-pattern: "t_*" - target-schema: "test" - target-table: "t" -``` - -Description: Synchronizes the upstream table data that matches `schema-pattern`/`table-pattern` to the downstream `target-schema`/`target-table`. You can set the routing rule at the schema/table level. - -Taking the above code block as an example: - -- Schema level: Synchronizes all the upstream tables that match the `test_*` schema to the downstream `test` schema. - - For example, `schema: test_1 - tables [a, b, c]` => `schema:test - tables [a, b, c]` - -- Table level: Synchronizes the `t_*` matched upstream tables with `test_*` matched schema to the downstream `schema:test table:t` table. - -> **Notes:** -> -> - The `table level` rule has a higher priority than the `schema level` rule. -> - You can set one routing rule at most at one level. - - -## Black and white lists filtering rule - -``` -instance: - do-dbs: ["~^test.*", "do"] # Starts with "~", indicating it is a regular expression - ignore-dbs: ["mysql", "ignored"] - do-tables: - - db-name: "~^test.*" - tbl-name: "~^t.*" - - db-name: "do" - tbl-name: "do" - ignore-tables: - - db-name: "do" - tbl-name: "do" -``` - -The black and white lists filtering rule of the upstream database instances is similar to MySQL `replication-rules-db`/`replication-rules-table`. - -The filter process is as follows: - -1. Filter at the schema level: - - - If `do-dbs` is not empty, judge whether a matched schema exists in `do-dbs`. - - - If yes, continue to filter at the table level. - - If not, ignore it and exit. - - - If `do-dbs` is empty, and `ignore-dbs` is not empty, judge whether a matched schema exits in `ignore-dbs`. - - - If yes, ignore it and exit. - - If not, continue to filter at the table level. - - - If both `do-dbs` and `ignore-dbs` are empty, continue to filter at the table level. - -2. Filter at the table level: - - 1. If `do-tables` is not empty, judge whether a matched rule exists in `do-tables`. - - - If yes, exit and execute the statement. - - If not, continue to the next step. - - 2. If `ignore tables` is not empty, judge whether a matched rule exists in `ignore-tables`. - - - If yes, ignore it and exit. - - If not, continue to the next step. - - 3. If `do-tables` is not empty, ignore it and exit. Otherwise, exit and execute the statement. - -## Filtering rules of binlog events - -``` -# table level -user-filter-1: - schema-pattern: "test_*" # `schema-pattern`/`table-pattern` uses the wildcard matching rule. - table-pattern: "t_*" - events: ["truncate table", "drop table"] - sql-pattern: ["^DROP\\s+PROCEDURE", "^CREATE\\s+PROCEDURE"] - action: Ignore - -# schema level -user-filter-2: - schema-pattern: "test_*" - events: ["All DML"] - action: Do -``` - -Description: Configures the filtering rules for binlog events and DDL SQL statements of the upstream tables that match `schema-pattern`/`table-pattern`. - -- `events`: the binlog event array - - | Events | Type | Description | - | --------------- | ---- | ----------------------------- | - | `all` | | Includes all the events below | - | `all dml` | | Includes all DML events below | - | `all ddl` | | Includes all DDL events below | - | `none` | | Includes none of the events below | - | `none ddl` | | Includes none of the DDL events below | - | `none dml` | | Includes none of the DML events below | - | `insert` | DML | The `INSERT` DML event | - | `update` | DML | The `UPDATE` DML event | - | `delete` | DML | The `DELETE` DML event | - | `create database` | DDL | The `CREATE DATABASE` DDL event | - | `drop database` | DDL | The `DROP DATABASE` DDL event | - | `create table` | DDL | The `CREATE TABLE` DDL event | - | `create index` | DDL | The `CREATE INDEX` DDL event | - | `drop table` | DDL | The `DROP TABLE` DDL event | - | `truncate table` | DDL | The `TRUNCATE TABLE` DDL event | - | `rename table` | DDL | The `RENAME TABLE` DDL event | - | `drop index` | DDL | The `DROP INDEX` DDL event | - | `alter table` | DDL | The `ALTER TABLE` DDL event | - -- `sql-pattern` - - - Filters a specific DDL SQL statement. - - The matching rule supports using an regular expression, for example, `"^DROP\\s+PROCEDURE"`. - -> **Note:** If `sql-pattern` is empty, no filtering operation is performed. For the filtering rules, see the `action` description. - -- `action` - - - String (`Do`/`Ignore`) - - For rules that match `schema-pattern`/`table-pattern`, judge whether the DDL statement is in the events of the rule or `sql-pattern`. - - - Black list: If `action = Ignore`, execute `Ignore`; otherwise execute `Do`. - - White list: If `action = Ignore`, execute `Do`; otherwise execute `Ignore`. - -## Column mapping rule - -``` -instance-1: - schema-pattern: "test_*" # `schema-pattern`/`table-pattern` uses the wildcard matching rule - table-pattern: "t_*" - expression: "partition id" - source-column: "id" - target-column: "id" - arguments: ["1", "test_", "t_"] -instance-2: - schema-pattern: "test_*" - table-pattern: "t_*" - expression: "partition id" - source-column: "id" - target-column: "id" - arguments: ["2", "test_", "t_"] -``` - -Description: the rules for mapping the columns of `schema-pattern`/`table-pattern` matched tables in upstream database instances. It is used to resolve the conflicts of auto-increment primary keys of sharded tables. - -- `source-column`, `target-column`: Uses `expression` to compute the data of `source-column` as the data of `target-column`. - -- `expression`: The expression used to convert the column data. Currently, only the following built-in expression is supported: - - - `partition id` - - - You need to set `arguments` to `[instance_id, prefix of schema, prefix of table]`. - - - schema name = arguments[1] + schema ID(suffix of schema); schema ID == suffix of schema - - table name = argument[2] + table ID(suffix of table); table ID == suffix of table - - If argument[0] == "", the partition ID takes up 0 bit in the figure below; otherwise, it takes up 4 bits (by default) - - If argument[1] == "", the schema ID takes up 0 bit in the figure below; otherwise, it takes up 7 bits (by default) - - If argument[2] == "", the table ID takes up 0 bit in the figure below; otherwise, it takes up 8 bits (by default) - - The origin ID is the value of the auto-increment ID column of a row in the table - - ![partition ID](../media/partition-id.png) - - - Restrictions: - - - It is only applicable to the bigint column. - - The instance ID value should be (>= 0, <= 15) (4 bits by default) - - The schema ID value should be (>= 0, <= 127) (7 bits by default) - - The table ID value should be (>= 0, <= 255) (8 bits by default) - - The origin ID value should be (>= 0, <= 17592186044415) (44 bits by default) diff --git a/tools/dm-task-configuration-file-intro.md b/tools/dm-task-configuration-file-intro.md index 43915e7f2c493..f819de2eb7498 100644 --- a/tools/dm-task-configuration-file-intro.md +++ b/tools/dm-task-configuration-file-intro.md @@ -9,102 +9,113 @@ category: tools This document introduces the task configuration file of Data Migration -- [`task.yaml`](https://github.com/pingcap/tidb-tools/blob/docs/docs/dm/zh_CN/configuration/task.yaml), including [Global configuration](#global-configuration) and [Instance configuration](#instance-configuration). -For description of configuration items, see [Data Migration Task Configuration Options](../tools/dm-task-config-argument-description.md). +For the feature and configuration of each configuration item, see [Data Synchronization Features](../tools/dm-data-synchronization-features.md). ## Important concepts -For description of important concepts including `instance-id` and the DM-worker ID, see [Important concepts](../tools/dm-configuration-file-overview.md#important-concepts). +For description of important concepts including `instance-id` and the DM-worker ID, see [Important concepts](../tools/dm-configuration-file-overview.md#important-concepts). ## Global configuration -### Basic information configuration +### Basic configuration -``` +```yaml name: test # The name of the task. Should be globally unique. task-mode: all # The task mode. Can be set to `full`/`incremental`/`all`. -is-sharding: true # Whether it is a sharding task -meta-schema: "dm_meta" # The downstream database that stores the `meta` information -remove-meta: false # Whether to remove the `meta` information (`checkpoint` and `onlineddl`) before starting the - # synchronization task +is-sharding: true # Whether it is a sharding task. +meta-schema: "dm_meta" # The downstream database that stores the `meta` information. +remove-meta: false # Whether to remove the `meta` information (`checkpoint` and `onlineddl`) before starting the synchronization task. +enable-heartbeat: false # Whether to enable the heartbeat feature. -target-database: # Configuration of the downstream database instance +target-database: # Configuration of the downstream database instance. host: "192.168.0.1" port: 4000 user: "root" password: "" ``` -For more details of `task-mode`, see [Task configuration argument description](../tools/dm-task-config-argument-description.md). +`task-mode` + +- Description: the task mode that can be used to specify the data synchronization task to be executed. +- Value: string (`full`, `incremental`, or `all`), `all` by default. + - `full` only makes a full backup of the upstream database and then imports the full data to the downstream database. + - `incremental`: Only synchronizes the incremental data of the upstream database to the downstream database using the binlog. + - `all`: `full` + `incremental`. Makes a full backup of the upstream database, imports the full data to the downstream database, and then uses the binlog to make an incremental synchronization to the downstream database starting from the exported position during the full backup process (binlog position/GTID). ### Feature configuration set Global configuration includes the following feature configuration set. -``` -routes: # The routing mapping rule set between the upstream and downstream tables - user-route-rules-schema: # `schema-pattern`/`table-pattern` uses the wildcard matching rule. - schema-pattern: "test_*" +```yaml +# The routing mapping rule set between the upstream and downstream tables. +routes: + route-rule-1: + schema-pattern: "test_*" table-pattern: "t_*" target-schema: "test" target-table: "t" - -filters: # The binlog event filter rule set of the matched table of the upstream - # database instance - user-filter-1: - schema-pattern: "test_*" - table-pattern: "t_*" - events: ["truncate table", "drop table"] - action: Ignore - -black-white-list: # The filter rule set of the black white list of the matched table of the - # upstream database instance - instance: - do-dbs: ["~^test.*", "do"] - ignore-dbs: ["mysql", "ignored"] - do-tables: - - db-name: "~^test.*" - tbl-name: "~^t.*" - -column-mappings: # The column mapping rule set of the matched table of the upstream database - # instance - instance-1: - schema-pattern: "test_*" - table-pattern: "t_*" - expression: "partition id" - source-column: "id" - target-column: "id" - arguments: ["1", "test_", "t_"] - -mydumpers: # Configuration arguments of running mydumper + route-rule-2: + ​ schema-pattern: "test_*" + ​ target-schema: "test" + +# The binlog event filter rule set of the matched table of the upstream database instance. +filters: + filter-rule-1: + ​ schema-pattern: "test_*" + ​ table-pattern: "t_*" + ​ events: ["truncate table", "drop table"] + ​ action: Ignore + +# The filter rule set of the black white list of the matched table of the upstream database instance. +black-white-list: + bw-rule-1: + ​ do-dbs: ["~^test.*", "do"] + ​ ignore-dbs: ["mysql", "ignored"] + ​ do-tables: + ​ - db-name: "~^test.*" + ​ tbl-name: "~^t.*" + +# The column mapping rule set of the matched table of the upstream database instance. +column-mappings: + cm-rule-1: + ​ schema-pattern: "test_*" + ​ table-pattern: "t_*" + ​ expression: "partition id" + ​ source-column: "id" + ​ target-column: "id" + ​ arguments: ["1", "test_", "t_"] + +# Configuration arguments of running mydumper. +mydumpers: global: - mydumper-path: "./mydumper" # The mydumper binary file path. It is generated by the Ansible deployment # application automatically and needs no configuration. - threads: 16 # The number of the threads mydumper dumps from the upstream database instance - chunk-filesize: 64 # The size of the file mydumper generates + # The mydumper binary file path. It is generated by the Ansible deployment application automatically and needs no configuration. + mydumper-path: "./mydumper" + # The number of the threads mydumper dumps from the upstream database instance. + threads: 16 + # The size of the file generated by mydumper. + chunk-filesize: 64 skip-tz-utc: true extra-args: "-B test -T t1,t2 --no-locks" -loaders: # Configuration arguments of running Loader +# Configuration arguments of running Loader. +loaders: global: - pool-size: 16 # The number of threads that execute mydumper SQL files concurrently in Loader - dir: "./dumped_data" # The directory output by mydumper that Loader reads. Directories for - # different tasks of the same instance must be different. (mydumper outputs the - # SQL file based on the directory) + # The number of threads that execute mydumper SQL files concurrently in Loader. + pool-size: 16 + # The directory output by mydumper that Loader reads. Directories for different tasks of the same instance must be different. (mydumper outputs the SQL file based on the directory) + dir: "./dumped_data" -syncers: # Configuration arguments of running Syncer +# Configuration arguments of running Syncer. +syncers: global: - worker-count: 16 # The number of threads that synchronize binlog events concurrently in Syncer - batch: 1000 # The number of SQL statements in a transaction batch that Syncer - # synchronizes to the downstream database - max-retry: 100 # The retry times of the transactions with an error that Syncer synchronizes - # to the downstream database (only for DML operations) + # The number of threads that synchronize binlog events concurrently in Syncer. + worker-count: 16 + # The number of SQL statements in a transaction batch that Syncer synchronizes to the downstream database. + batch: 1000 + # The retry times of the transactions with an error that Syncer synchronizes to the downstream database (only for DML operations). + max-retry: 100 ``` -References: - -- For details of `user-filter-1`, see [Filtering rules of binlog events](../tools/dm-task-config-argument-description.md#filtering-rules-of-binlog-events). -- For details of `instance`, see [Black and white lists filtering rule](../tools/dm-task-config-argument-description.md#black-and-white-lists-filtering-rule). -- For details of `instance-1` of `column-mappings`, see [Column mapping rule](../tools/dm-task-config-argument-description.md#column-mapping-rule). - ## Instance configuration This part defines the subtask of data synchronization. DM supports synchronizing data from one or multiple MySQL instances to the same instance. @@ -112,63 +123,25 @@ This part defines the subtask of data synchronization. DM supports synchronizing ``` mysql-instances: - - config: # The upstream database configuration corresponding to `instance-id` - host: "192.168.199.118" - port: 4306 - user: "root" - password: "1234" # Requires the password encrypted by dmctl - instance-id: "instance118-4306" # The MySQL instance ID. It corresponds to the upstream MySQL instance. It is - # not allowed to set it to an ID of a MySQL instance that is not within the - # DM-master cluster topology. - - meta: # The position where the binlog synchronization starts when the checkpoint of - # the downstream database does not exist. If the checkpoint exits, this - # configuration does not work. + source-id: "mysql-replica-01" # The ID of the upstream instance or replication group ID. It can be configured by referring to the `source_id` in the `inventory.ini` file or the `source-id` in the `dm-master.toml` file. + meta: # The position where the binlog synchronization starts when the downstream database checkpoint does not exist. If the checkpoint exists, the checkpoint is used. binlog-name: binlog-00001 binlog-pos: 4 - route-rules: ["user-route-rules-schema", "user-route-rules"] # Routing rules selected from `routes` above - filter-rules: ["user-filter-1", "user-filter-2"] # Filter rules selected from `filters` above - column-mapping-rules: ["instance-1"] # Column mapping rules selected from `column-mappings` above - black-white-list: "instance" # The black white list item selected from `black-white-list` above + route-rules: ["route-rule-1", "route-rule-2"] # The name of the mapping rule between the table matching the upstream database instance and the downstream database. + filter-rules: ["filter-rule-1"] # The name of the binlog filtering rule of the table matching the upstream database instance. + column-mapping-rules: ["cm-rule-1"] # The name of the column mapping rule of the table matching the upstream database instance. + black-white-list: "bw-rule-1" # The name of the black and white lists filtering rule of the table matching the upstream database instance. - mydumper-config-name: "global" # The mydumper configuration name. You cannot set it - # and `mydumper` at the same time. - loader-config-name: "global" # The Loader configuration name. You cannot set it and - # `loader` at the same time. - syncer-config-name: "global" # The Syncer configuration name. You cannot set it and - # `syncer` at the same time. + mydumper-config-name: "global" # The mydumper configuration name. + loader-config-name: "global" # The Loader configuration name. + syncer-config-name: "global" # The Syncer configuration name. - - config: - host: "192.168.199.118" - port: 5306 - user: "root" - password: "1234" - instance-id: "instance118-5306" - - mydumper: # The mydumper configuration. You cannot set it and - # `mydumper-config-name` at the same time. - mydumper-path: "./mydumper" # The mydumper binary file path. It is generated by - # Ansible deployment application and needs no - # configuration. - threads: 4 - chunk-filesize: 8 - skip-tz-utc: true - extra-args: "-B test -T t1,t2" - - loader: # The Loader configuration. You cannot set it and - # `loader-config-name` at the same time. - pool-size: 32 # The number of threads that execute mydumper SQL - # files concurrently in Loader - dir: "./dumped_data" - - syncer: # The Syncer configuration. You cannot set it and - # `syncer-config-name` at the same time. - worker-count: 32 # The number of threads that synchronize binlog events - # concurrently in Syncer - batch: 2000 - max-retry: 200 + source-id: "mysql-replica-02" # The ID of the upstream instance or replication group. It can be configured by referring to the `source_id` in the `inventory.ini` file or the `source-id` in the `dm-master.toml` file. + mydumper-config-name: "global" # The mydumper configuration name. + loader-config-name: "global" # The Loader configuration name. + syncer-config-name: "global" # The Syncer configuration name. ``` For the configuration details of the above options, see the corresponding part in [Feature configuration set](#feature-configuration-set), as shown in the following table. diff --git a/tools/troubleshooting-sharding-ddl-locks.md b/tools/troubleshooting-sharding-ddl-locks.md index c4a06a7cdb0a2..b5a8ad8737bc1 100644 --- a/tools/troubleshooting-sharding-ddl-locks.md +++ b/tools/troubleshooting-sharding-ddl-locks.md @@ -64,7 +64,7 @@ After the DM-worker restarts and runs `start-task`, it retries to synchronize th No bad impact. After you have manually broken the lock, the subsequent sharding DDL statements can be automatically synchronized normally. -## Condition three: DM-master restarts +## Condition three: the DM-master restarts After a DM-worker sends the sharding DDL information to DM-master, this DM-worker will hang up, wait for the message from DM-master, and then decide whether to execute or skip this DDL statement.