title | summary | category |
---|---|---|
Data Migration Cluster Operations |
This document introduces the DM cluster operations and considerations when you administer a DM cluster using DM-Ansible. |
tools |
This document introduces the DM cluster operations and considerations when you administer a DM cluster using DM-Ansible.
Run the following command to start all the components (including DM-master, DM-worker and the monitoring component) of the whole DM cluster:
$ ansible-playbook start.yml
Run the following command to stop all the components (including DM-master, DM-worker and the monitoring component) of the whole DM cluster:
$ ansible-playbook stop.yml
You need to update the DM cluster components in the following cases:
- You want to upgrade the component version.
- A serious bug occurs and you have to restart the component for temporary recovery.
- The machine that the DM cluster is located in is restarted for certain reasons.
This sections describes the considerations that you need to know when you restart DM components.
In the process of full data loading:
For the SQL files during full data import, DM uses the downstream database to record the checkpoint information. When DM-worker is restarted, it checks the checkpoint information and you can use the start-task
command to recover the data synchronization task automatically.
In the process of incremental data synchronization:
For the binlog during incremental data import, DM uses the downstream database to record the checkpoint information, and enables the safe mode within the first 5 minutes after the synchronization task is started or recovered.
-
Sharding DDL statements synchronization is not enabled
If the sharding DDL statements synchronization is not enabled in the task running on DM-worker, when DM-worker is restarted, it checks the checkpoint information and you can use the
start-task
command to recover the data synchronization task automatically. -
Sharding DDL statements synchronization is enabled
-
When DM is synchronizing the sharding DDL statements, if DM-worker successfully executes (or skips) the sharding DDL binlog event, then the checkpoints of all tables related to sharding DDL in the DM-worker are updated to the position after the binlog event corresponding to the DDL statement.
-
When DM-worker is restarted before or after synchronizing sharding DDL statements, it checks the checkpoint information and you can use the
start-task
command to recover the data synchronization task automatically. -
When DM-worker is restarted during the process of synchronizing sharding DDL statements, the issue might occur that the DM-worker owner has executed the DDL statement and successfully changed the downstream database table schema, while other DM-worker instances are restarted but fail to skip the DDL statement and update the checkpoints.
At this time, DM tries again to synchronize these DDL statements that are not skipped. However, the restarted DM-worker instances will be blocked at the position of the binlog event corresponding to the DDL binlog event, because the DM-worker instance that is not restarted has executed to the place after this DDL binlog event.
To resolve this issue, follow the steps described in Troubleshooting Sharding DDL Locks
-
The information maintained by DM-master includes the following two major types, and these data are not being persisted when you restart DM-master.
- The corresponding relationship between the task and DM-worker
- The sharding DDL lock related information
When DM-master is restarted, it automatically requests the task information from each DM-worker instance and rebuilds the corresponding relationship between the task and DM-worker. However, at this time, DM-worker does not resend the sharding DDL information, so it might occur that the sharding DDL lock synchronization cannot be finished automatically because of the lost lock information.
To resolve this issue, follow the steps described in Troubleshooting Sharding DDL Locks.
The dmctl component is stateless. You can restart it at any time you like.
Note: Try to avoid restarting DM-worker during the process of synchronizing sharding DDL statements.
To restart the DM-worker component, you can use either of the following two approaches:
-
Perform a rolling update on DM-worker
$ ansible-playbook rolling_update.yml --tags=dm-worker
-
Stop DM-worker first and then restart it
$ ansible-playbook stop.yml --tags=dm-worker $ ansible-playbook start.yml --tags=dm-worker
Note: Try to avoid restarting DM-master during the process of synchronizing sharding DDL statements.
To restart the DM-master component, you can use either of the following two approaches:
-
Perform a rolling update on DM-master
$ ansible-playbook rolling_update.yml --tags=dm-master
-
Stop DM-master first and then restart it
$ ansible-playbook stop.yml --tags=dm-master $ ansible-playbook start.yml --tags=dm-master
To stop and restart dmctl, use the following command, instead of using DM-Ansible:
$ exit # Stops the running dmctl
$ sh dmctl.sh # Restart dmctl
-
Download the DM binary file.
-
Delete the existing file in the
downloads
directory.$ cd /home/tidb/dm-ansible $ rm -rf downloads
-
Use Playbook to download the latest DM binary file and replace the existing binary in the
/home/tidb/dm-ansible/resource/bin/
directory with it automatically.$ ansible-playbook local_prepare.yml
-
-
Use Ansible to perform the rolling update.
-
Perform a rolling update on the DM-worker instance:
ansible-playbook rolling_update.yml --tags=dm-worker
-
Perform a rolling update on the DM-master instance:
ansible-playbook rolling_update.yml --tags=dm-master
-
Upgrade dmctl:
ansible-playbook rolling_update.yml --tags=dmctl
-
Perform a rolling update on DM-worker, DM-master and dmctl:
ansible-playbook rolling_update.yml
-
Assuming that you want to add a DM-worker instance on the 172.16.10.74
machine and the alias of the instance is dm_worker3
, perform the following steps:
-
Configure the SSH mutual trust and sudo rules on the Control Machine.
-
Refer to Configure the SSH mutual trust and sudo rules on the Control Machine, log in to the Control Machine using the
tidb
user account and add172.16.10.74
to the[servers]
section of thehosts.ini
file.$ cd /home/tidb/dm-ansible $ vi hosts.ini [servers] 172.16.10.74 [all:vars] username = tidb
-
Run the following command and enter the
root
user password for deploying172.16.10.74
according to the prompt.$ ansible-playbook -i hosts.ini create_users.yml -u root -k
This step creates a
tidb
user on the172.16.10.74
machine, and configures sudo rules and the SSH mutual trust between the Control Machine and the172.16.10.74
machine.
-
-
Edit the
inventory.ini
file and add the new DM-worker instancedm_worker3
.[dm_worker_servers] dm_worker1 ansible_host=172.16.10.72 server_id=101 mysql_host=172.16.10.81 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 dm_worker2 ansible_host=172.16.10.73 server_id=102 mysql_host=172.16.10.82 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 dm_worker3 ansible_host=172.16.10.74 server_id=103 mysql_host=172.16.10.83 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306
-
Deploy the new DM-worker instance.
$ ansible-playbook deploy.yml --tags=dm-worker -l dm_worker3
-
Start the new DM-worker instance.
$ ansible-playbook start.yml --tags=dm-worker -l dm_worker3
-
Configure and restart the DM-master service.
$ ansible-playbook rolling_update.yml --tags=dm-master
-
Configure and restart the Prometheus service.
$ ansible-playbook rolling_update_monitor.yml --tags=prometheus
Assuming that you want to remove the dm_worker3
instance, perform the following steps:
-
Stop the DM-worker instance that you need to remove.
$ ansible-playbook stop.yml --tags=dm-worker -l dm_worker3
-
Edit the
inventory.ini
file and comment or delete the line where thedm_worker3
instance exists.[dm_worker_servers] dm_worker1 ansible_host=172.16.10.72 server_id=101 mysql_host=172.16.10.81 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 dm_worker2 ansible_host=172.16.10.73 server_id=102 mysql_host=172.16.10.82 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 # dm_worker3 ansible_host=172.16.10.74 server_id=103 mysql_host=172.16.10.83 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 # Comment or delete this line
-
Configure and restart the DM-master service.
$ ansible-playbook rolling_update.yml --tags=dm-master
-
Configure and restart the Prometheus service.
$ ansible-playbook rolling_update_monitor.yml --tags=prometheus
Assuming that the 172.16.10.71
machine needs to be maintained or this machine breaks down, and you need to migrate the DM-master instance from 172.16.10.71
to 172.16.10.80
, perform the following steps:
-
Configure the SSH mutual trust and sudo rules on the Control machine.
-
Refer to Configure the SSH mutual trust and sudo rules on the Control Machine, log in to the Control Machine using the
tidb
user account, and add172.16.10.80
to the[servers]
section of thehosts.ini
file.$ cd /home/tidb/dm-ansible $ vi hosts.ini [servers] 172.16.10.80 [all:vars] username = tidb
-
Run the following command and enter the
root
user password for deploying172.16.10.80
according to the prompt.$ ansible-playbook -i hosts.ini create_users.yml -u root -k
This step creates the
tidb
user account on172.16.10.80
, configures the sudo rules and the SSH mutual trust between the Control Machine and the172.16.10.80
machine.
-
-
Stop the DM-master instance that you need to replace.
Note: If the
172.16.10.71
machine breaks down and you cannot log in via SSH, ignore this step.$ ansible-playbook stop.yml --tags=dm-master
-
Edit the
inventory.ini
file, comment or delete the line where the DM-master instance that you want to replace exists, and add the information of the new DM-master instance.[dm_master_servers] # dm_master ansible_host=172.16.10.71 dm_master ansible_host=172.16.10.80
-
Deploy the new DM-master instance.
$ ansible-playbook deploy.yml --tags=dm-master
-
Start the new DM-master instance.
$ ansible-playbook start.yml --tags=dm-master
-
Update the dmctl configuration file.
ansible-playbook rolling_update.yml --tags=dmctl
Assuming that the 172.16.10.72
machine needs to be maintained or this machine breaks down, and you need to migrate dm_worker1
from 172.16.10.72
to 172.16.10.75
, perform the following steps:
-
Configure the SSH mutual trust and sudo rules on the Control Machine.
-
Refer to Configure the SSH mutual trust and sudo rules on the Control Machine, log in to the Control Machine using the
tidb
user account, and add172.16.10.75
to the[servers]
section of thehosts.ini
file.$ cd /home/tidb/dm-ansible $ vi hosts.ini [servers] 172.16.10.75 [all:vars] username = tidb
-
Run the following command and enter the
root
user password for deploying172.16.10.75
according to the prompt.$ ansible-playbook -i hosts.ini create_users.yml -u root -k
This step creates the
tidb
user account on172.16.10.75
, and configures the sudo rules and the SSH mutual trust between the Control Machine and the172.16.10.75
machine.
-
-
Stop the DM-worker instance that you need to replace.
Note: If the
172.16.10.72
machine breaks down and you cannot log in via SSH, ignore this step.$ ansible-playbook stop.yml --tags=dm-worker -l dm_worker1
-
Edit the
inventory.ini
file and add the new DM-worker instance.Edit the
inventory.ini
file, comment or delete the line where thedm_worker1
instance172.16.10.72
that you want to replace exists, and add the172.16.10.75
information of the newdm_worker1
instance.[dm_worker_servers] dm_worker1 ansible_host=172.16.10.75 server_id=101 mysql_host=172.16.10.81 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 # dm_worker1 ansible_host=172.16.10.72 server_id=101 mysql_host=172.16.10.81 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306 dm_worker2 ansible_host=172.16.10.73 server_id=102 mysql_host=172.16.10.82 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306
-
Deploy the new DM-worker instance.
$ ansible-playbook deploy.yml --tags=dm-worker -l dm_worker1
-
Start the new DM-worker instance.
$ ansible-playbook start.yml --tags=dm-worker -l dm_worker1
-
Configure and restart the DM-master service.
$ ansible-playbook rolling_update.yml --tags=dm-master
-
Configure and restart the Prometheus service.
$ ansible-playbook rolling_update_monitor.yml --tags=prometheus
This section describes how to switch between master and slave instances using dmctl in two conditions.
- Use
query-status
to make sure that relay catches up with the master instance before the switch (relayCatchUpMaster
). - Use
pause-relay
to pause relay. - Use
pause-task
to pause all running tasks. - The upstream master and slave instances behind the virtual IP execute the switch operation.
- Use
switch-relay-master
to tell relay to execute the master-slave switch. - Use
resume-relay
to make relay resume to read binlog from the new master instance. - Use
resume-task
to resume the previous synchronization task.
- Use
query-status
to make sure that relay catches up with the master instance before the switch (relayCatchUpMaster
). - Use
stop-task
to stop all running tasks. - Modify the DM-worker configuration, and use DM-Ansible to perform a rolling update on DM-worker.
- Update the
task.yaml
andmysql-instances / config
configurations. - Use
start-task
to restart the synchronization task.