Skip to content

Commit

Permalink
tools, readme: add and update 10 DM docs (pingcap#839)
Browse files Browse the repository at this point in the history
* tools, readme: add and update 10 DM docs

Via: pingcap/tidb-tools#145, pingcap/tidb-tools#151

* tools: address comments

* tools: address more comments and fix the red color

* tools: put Table Selector to a separate doc

* tools: address comments, update code comments and fix format
  • Loading branch information
lilin90 authored Jan 22, 2019
1 parent 9428399 commit 3b6a3d7
Show file tree
Hide file tree
Showing 12 changed files with 773 additions and 448 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,10 @@
- [mydumper](tools/mydumper.md)
- [Loader](tools/loader.md)
+ Data Migration
- [Overview](tools/data-migration-overview.md)
+ Overview
- [Architecture](tools/data-migration-overview.md#architecture)
- [Features](tools/data-migration-overview.md#data-synchronization-introduction)
- [Restrictions](tools/data-migration-overview.md#usage-restrictions)
- [Deploy](tools/data-migration-deployment.md)
- [Synchronize Data](tools/data-migration-practice.md)
+ Configure
Expand Down
85 changes: 58 additions & 27 deletions tools/data-migration-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ Make sure you have logged in to the Control Machine using the `root` user accoun
2. Run the following command to download DM-Ansible.
```bash
$ wget http://download.pingcap.org/dm-ansible.tar.gz
$ wget http://download.pingcap.org/dm-ansible-latest.tar.gz
```
## Step 4: Install Ansible and its dependencies on the Control Machine
Expand All @@ -118,7 +118,8 @@ It is required to use `pip` to install Ansible and its dependencies, otherwise a
1. Install Ansible and the dependencies on the Control Machine:
```bash
$ tar -xzvf dm-ansible.tar.gz
$ tar -xzvf dm-ansible-latest.tar.gz
$ mv dm-ansible-latest dm-ansible
$ cd /home/tidb/dm-ansible
$ sudo pip install -r ./requirements.txt
```
Expand Down Expand Up @@ -193,7 +194,7 @@ You can choose one of the following two types of cluster topology according to y
| node3 | 172.16.10.73 | DM-worker2 |

```ini
## DM modules
## DM modules.
[dm_master_servers]
dm_master ansible_host=172.16.10.71

Expand All @@ -202,7 +203,7 @@ dm_worker1 ansible_host=172.16.10.72 server_id=101 mysql_host=172.16.10.81 mysql

dm_worker2 ansible_host=172.16.10.73 server_id=102 mysql_host=172.16.10.82 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306

## Monitoring modules
## Monitoring modules.
[prometheus_servers]
prometheus ansible_host=172.16.10.71

Expand All @@ -212,7 +213,7 @@ grafana ansible_host=172.16.10.71
[alertmanager_servers]
alertmanager ansible_host=172.16.10.71

## Global variables
## Global variables.
[all:vars]
cluster_name = test-cluster

Expand All @@ -234,21 +235,21 @@ grafana_admin_password = "admin"
| node2 | 172.16.10.72 | DM-worker1-1, DM-worker1-2 |
| node3 | 172.16.10.73 | DM-worker2-1, DM-worker2-2 |

When you edit the `inventory.ini` file, pay attention to distinguish between the following variables: `server_id`, `deploy_dir`, `dm_worker_port`, and `dm_worker_status_port`.
When you edit the `inventory.ini` file, pay attention to distinguish between the following variables: `server_id`, `deploy_dir`, and `dm_worker_port`.

```ini
## DM modules
## DM modules.
[dm_master_servers]
dm_master ansible_host=172.16.10.71

[dm_worker_servers]
dm_worker1_1 ansible_host=172.16.10.72 server_id=101 deploy_dir=/data1/dm_worker dm_worker_port=10081 dm_worker_status_port=10082 mysql_host=172.16.10.81 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306
dm_worker1_2 ansible_host=172.16.10.72 server_id=102 deploy_dir=/data2/dm_worker dm_worker_port=10083 dm_worker_status_port=10084 mysql_host=172.16.10.82 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306
dm_worker1_1 ansible_host=172.16.10.72 server_id=101 deploy_dir=/data1/dm_worker dm_worker_port=8262 mysql_host=172.16.10.81 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306
dm_worker1_2 ansible_host=172.16.10.72 server_id=102 deploy_dir=/data2/dm_worker dm_worker_port=8263 mysql_host=172.16.10.82 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306

dm_worker2_1 ansible_host=172.16.10.73 server_id=103 deploy_dir=/data1/dm_worker dm_worker_port=10081 dm_worker_status_port=10082 mysql_host=172.16.10.83 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306
dm_worker2_2 ansible_host=172.16.10.73 server_id=104 deploy_dir=/data2/dm_worker dm_worker_port=10083 dm_worker_status_port=10084 mysql_host=172.16.10.84 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306
dm_worker2_1 ansible_host=172.16.10.73 server_id=103 deploy_dir=/data1/dm_worker dm_worker_port=8262 mysql_host=172.16.10.83 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306
dm_worker2_2 ansible_host=172.16.10.73 server_id=104 deploy_dir=/data2/dm_worker dm_worker_port=8263 mysql_host=172.16.10.84 mysql_user=root mysql_password='VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=' mysql_port=3306

## Monitoring modules
## Monitoring modules.
[prometheus_servers]
prometheus ansible_host=172.16.10.71

Expand All @@ -258,7 +259,7 @@ grafana ansible_host=172.16.10.71
[alertmanager_servers]
alertmanager ansible_host=172.16.10.71

## Global variables
## Global variables.
[all:vars]
cluster_name = test-cluster

Expand All @@ -283,7 +284,7 @@ Edit the `deploy_dir` variable to configure the deployment directory.
The global variable is set to `/home/tidb/deploy` by default, and it applies to all services. If the data disk is mounted on the `/data1` directory, you can set it to `/data1/dm`. For example:

```ini
## Global variables
## Global variables.
[all:vars]
deploy_dir = /data1/dm
```
Expand All @@ -307,12 +308,15 @@ dm-master ansible_host=172.16.10.71 deploy_dir=/data1/deploy

| Variable name | Description |
| ------------- | ------- |
| server_id | DM-worker connects to MySQL as a slave. This variable is the server_id of the slave. Keep it globally unique in the MySQL cluster, and the value range is 0 ~ 4294967295. |
| source_id | DM-worker binds to a unique database instance or a replication group with the master-slave architecture. When the master and slave switch, you only need to update `mysql_host` or `mysql_port` and do not need to update the `source_id`. |
| server_id | DM-worker connects to MySQL as a slave. This variable is the `server_id` of the slave. Keep it globally unique in the MySQL cluster, and the value range is 0 ~ 4294967295. |
| mysql_host | The upstream MySQL host. |
| mysql_user | The upstream MySQL username; default "root". |
| mysql_password | The upstream MySQL user password. You need to encrypt the password using the `dmctl` tool. See [Encrypt the upstream MySQL user password using dmctl](#encrypt-the-upstream-mysql-user-password-using-dmctl). |
| mysql_port | The upstream MySQL port; default 3306. |
| enable_gtid | Whether to use GTID for DM-worker to pull the binlog. It supports the MySQL (and MariaDB) GTID. The prerequisite is that the upstream MySQL has enabled the GTID mode. |
| enable_gtid | Whether DM-worker uses GTID to pull the binlog. The prerequisite is that the upstream MySQL has enabled the GTID mode. |
| relay_binlog_name | Whether DM-worker pulls the binlog starting from the specified binlog file. Only used when the local has no valid relay log. |
| relay_binlog_gtid | Whether DM-worker pulls the binlog starting from the specified GTID. Only used when the local has no valid relay log and `enable_gtid` is true. |
| flavor | "flavor" indicates the release type of MySQL. For the official version, Percona, and cloud MySQL, fill in "mysql"; for MariaDB, fill in "mariadb". It is "mysql" by default. |

### Encrypt the upstream MySQL user password using dmctl
Expand All @@ -325,6 +329,36 @@ $ ./dmctl -encrypt 123456
VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=
```

### Configure the relay log synchronization position

When you start DM-worker for the first time, you need to configure `relay_binlog_name` to specify the position where DM-worker starts to pull the corresponding upstream MySQL or MariaDB binlog.

```yaml
[dm_worker_servers]
dm-worker1 ansible_host=172.16.10.72 source_id="mysql-replica-01" server_id=101 relay_binlog_name="binlog.000011" mysql_host=172.16.10.72 mysql_user=root mysql_port=3306

dm-worker2 ansible_host=172.16.10.73 source_id="mysql-replica-02" server_id=102 relay_binlog_name="binlog.000002" mysql_host=172.16.10.73 mysql_user=root mysql_port=3306
```

> **Note:** If `relay_binlog_name` is not set, DM-worker pulls the binlog starting from the earliest existing binlog file of the upstream MySQL or MariaDB. In this case, it can take a long period of time to pull the latest binlog for the data synchronization task.
### Enable the relay log GTID synchronization mode

In a DM cluster, the relay log processing unit of DM-worker communicates with the upstream MySQL or MariaDB to pull its binlog to the local file system.

You can enable the relay log GTID synchronization mode by configuring the following items. Currently, DM supports MySQL GTID and MariaDB GTID.

- `enable_gtid`: to enable the relay log GTID synchronization mode to deal with scenarios like master-slave switch
- `relay_binlog_gtid`: to specify the position where DM-worker starts to pull the corresponding upstream MySQL or MariaDB binlog

```yaml
[dm_worker_servers]
dm-worker1 ansible_host=172.16.10.72 source_id="mysql-replica-01" server_id=101 enable_gtid=true relay_binlog_gtid="aae3683d-f77b-11e7-9e3b-02a495f8993c:1-282967971,cc97fa93-f5cf-11e7-ae19-02915c68ee2e
:1-284361339" mysql_host=172.16.10.72 mysql_user=root mysql_port=3306

dm-worker2 ansible_host=172.16.10.73 source_id="mysql-replica-02" server_id=102 relay_binlog_name=binlog.000002 mysql_host=172.16.10.73 mysql_user=root mysql_port=3306
```

## Step 9: Deploy the DM cluster

When `ansible-playbook` runs Playbook, the default concurrent number is 5. If many deployment target machines are deployed, you can add the `-f` parameter to specify the concurrency, such as `ansible-playbook deploy.yml -f 10`.
Expand All @@ -334,8 +368,6 @@ The following example uses `tidb` as the user who runs the service.
1. Edit the `dm-ansible/inventory.ini` file to make sure `ansible_user = tidb`.

```ini
## Connection
# ssh via normal user
ansible_user = tidb
```

Expand Down Expand Up @@ -383,20 +415,18 @@ This operation stops all the components in the entire DM cluster in order, which

| Component | Port variable | Default port | Description |
| :-- | :-- | :-- | :-- |
| DM-master | `dm_master_port` | 11080 | DM-master service communication port |
| DM-master | `dm_master_status_port` | 11081 | DM-master status port |
| DM-worker | `dm_worker_port` | 10081 | DM-worker service communication port |
| DM-worker | `dm_worker_status_port` | 10082 | DM-worker status port |
| DM-master | `dm_master_port` | 8261 | DM-master service communication port |
| DM-worker | `dm_worker_port` | 8262 | DM-worker service communication port |
| Prometheus | `prometheus_port` | 9090 | Prometheus service communication port |
| Grafana | `grafana_port` | 3000 | The port for the external service of web monitoring service and client (browser) access |
| Alertmanager | `alertmanager_port` | 9093 | Alertmanager service communication port |

### Customize ports

Go to the `inventory.ini` file and add related host variable of the corresponding service port after the service IP:
Edit the `inventory.ini` file and add the related host variable of the corresponding service port after the service IP:

```
dm_master ansible_host=172.16.10.71 dm_master_port=12080 dm_master_status_port=12081
```ini
dm_master ansible_host=172.16.10.71 dm_master_port=18261
```

### Update DM-Ansible
Expand All @@ -412,8 +442,9 @@ dm_master ansible_host=172.16.10.71 dm_master_port=12080 dm_master_status_port=1
```
$ cd /home/tidb
$ wget http://download.pingcap.org/dm-ansible.tar.gz
$ tar -xzvf dm-ansible.tar.gz
$ wget http://download.pingcap.org/dm-ansible-latest.tar.gz
$ tar -xzvf dm-ansible-latest.tar.gz
$ mv dm-ansible-latest dm-ansible
```
3. Migrate the `inventory.ini` configuration file.
Expand Down
106 changes: 59 additions & 47 deletions tools/data-migration-manage-task.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,14 @@ This section shows the basic usage of dmctl commands.
```bash
$ ./dmctl --help
Usage of dmctl:
-V prints version and exit # Prints the version information.
-encrypt string # Encrypts the database password according to the encryption method provided by DM; used in DM configuration files.
​ encrypt plaintext to ciphertext
-master-addr string # dm-master access address. dmctl interacts with dm-master to complete task management operations.
​ master API server addr
# Prints the version information.
-V prints version and exit
# Encrypts the database password according to the encryption method provided by DM; used in DM configuration files.
-encrypt string
encrypt plaintext to ciphertext
# The dm-master access address. dmctl interacts with dm-master to complete task management operations.
-master-addr string
master API server addr
```

### Database password encryption
Expand All @@ -36,7 +39,8 @@ VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=
### Task management overview

```bash
$ ./dmctl -master-addr 172.16.30.14 # Enters the command line mode to interact with DM-master
# Enters the command line mode to interact with DM-master.
$ ./dmctl -master-addr 172.16.30.14
Welcome to dmctl
Release Version: v1.0.0-100-g2bef6f8b
Git Commit Hash: 2bef6f8beda34c0dff57377005c71589b48aa3c5
Expand Down Expand Up @@ -91,10 +95,7 @@ This section describes how to use the task management commands to execute the fo

### Create the data synchronization task

You can use the task management command to create the data synchronization task. When you create the data management task, DM checks the privilege of upstream database instances and the table schema. For the table schemas of all sharded tables in the sharding data synchronization task, DM executes the following two checks:

- Whether the auto-increment and unique column exists in the table, whether the corresponding `partition id` type of column mapping rule exists, and whether a conflict exists
- Whether the upstream and downstream table schemas to be synchronized are consistent
You can use the task management command to create the data synchronization task. Data Migration [prechecks the corresponding privileges and configuration automatically](#precheck-the-upstream-mysql-instance-configuration) while starting the data synchronization.

```bash
» help start-task
Expand Down Expand Up @@ -142,6 +143,54 @@ start-task [ -w "172.16.30.15:10081"] ./task.yaml
}
```
## Precheck the upstream MySQL instance configuration
To detect possible errors of data synchronization configuration in advance, DM provides the precheck feature. You can use the `check-task` command to precheck whether the upstream MySQL instance configuration satisfies the DM requirements.
The user of the upstream and downstream databases must have the corresponding read and write privileges. DM checks the following privileges and configuration automatically while starting the data synchronization task:
+ MySQL binlog configuration
- Whether the binlog is enabled (DM requires that the binlog must be enabled)
- Whether `binlog_format=ROW` (DM only supports the binlog synchronization in the ROW format)
- Whether `binlog_row_image=FULL` (DM only supports `binlog_row_image=FULL`)
+ The privileges of the upstream MySQL instance user
The MySQL user in DM configuration needs to have the following privileges at least:
- REPLICATION SLAVE
- REPLICATION CLIENT
- RELOAD
- SELECT
+ The compatibility of the upstream MySQL table schema
TiDB differs from MySQL in compatibility in the following aspects:
- Does not support the foreign key
- [Character set compatibility differs](../sql/character-set-support.md)
+ The consistency check on the upstream MySQL multiple-instance shards
+ The schema consistency of all sharded tables
- Column size
- Column name
- Column position
- Column type
- Primary key
- Unique index
+ The conflict of the auto increment primary keys in the sharded tables
- The check fails in the following two conditions:
- The auto increment primary key exists in the sharded tables and its column type *is not* bigint.
- The auto increment primary key exists in the sharded tables and its column type *is* bigint, but column mapping *is not* configured.
- The check succeeds in other conditions except the two above.
### Check the data synchronization task status
You can use the task management command to check the status of the data synchronization task.
Expand Down Expand Up @@ -499,43 +548,6 @@ update-task [-w "127.0.0.1:10181"] ./task.yaml
}
```
## Check the upstream MySQL instance configuration
To check whether the upstream MySQL instance configuration satisfies the DM requirements, use the `check-task` command.
The user of the upstream and downstream databases must have the corresponding read and write privileges. Data Migration checks the following privileges automatically while starting the data synchronization task:
+ MySQL binlog configuration
- Whether the binlog is enabled (DM requires that the binlog must be enabled)
- Whether `binlog_format=ROW` (DM only supports the binlog synchronization in the ROW format)
- Whether `binlog_row_image=FULL` (DM only supports `binlog_row_image=FULL`)
+ The privileges of the upstream MySQL instance user
The MySQL user in DM configuration needs to have the following privileges at least:
- REPLICATION SLAVE
- REPLICATION CLIENT
- RELOAD
- SELECT
+ The compatibility of the upstream MySQL table schema
TiDB differs from MySQL in compatibility in the following aspects:
- Does not support the foreign key
- [Character set compatibility differs](../sql/character-set-support.md)
+ The consistency check on the upstream MySQL multiple-instance shards
- The consistency of the table schema
- Column name, type
- Index
- Whether the auto increment primary key that conflicts during merging exists
## Manage the DDL locks
See [Troubleshooting Sharding DDL Locks](../tools/troubleshooting-sharding-ddl-locks.md).
Expand Down
Loading

0 comments on commit 3b6a3d7

Please sign in to comment.