Skip to content

Latest commit

 

History

History
179 lines (149 loc) · 8.42 KB

task-configuration-file-intro.md

File metadata and controls

179 lines (149 loc) · 8.42 KB
title summary category
Data Migration Task Configuration File
This document introduces the task configuration file of Data Migration.
tools

Data Migration Task Configuration File

This document introduces the task configuration file of Data Migration -- task.yaml, including Global configuration and Instance configuration.

For the feature and configuration of each configuration item, see Data Synchronization Features.

Important concepts

For description of important concepts including instance-id and the DM-worker ID, see Important concepts.

Configuration order

  1. Edit the global configuration.
  2. Edit the instance configuration based on the global configuration.

Global configuration

Basic configuration

name: test                      # The name of the task. Should be globally unique.
task-mode: all                  # The task mode. Can be set to `full`/`incremental`/`all`.
is-sharding: true               # Whether it is a sharding task.
meta-schema: "dm_meta"          # The downstream database that stores the `meta` information.
remove-meta: false              # Whether to remove the `meta` information (`checkpoint` and `onlineddl`) before starting the synchronization task.
enable-heartbeat: false         # Whether to enable the heartbeat feature.

target-database:                # Configuration of the downstream database instance.
    host: "192.168.0.1"
    port: 4000
    user: "root"
    password: ""

task-mode

  • Description: the task mode that can be used to specify the data synchronization task to be executed.
  • Value: string (full, incremental, or all), all by default.
    • full only makes a full backup of the upstream database and then imports the full data to the downstream database.
    • incremental: Only synchronizes the incremental data of the upstream database to the downstream database using the binlog.
    • all: full + incremental. Makes a full backup of the upstream database, imports the full data to the downstream database, and then uses the binlog to make an incremental synchronization to the downstream database starting from the exported position during the full backup process (binlog position/GTID).

Feature configuration set

Global configuration includes the following feature configuration set.

# The routing mapping rule set between the upstream and downstream tables.
routes:
    route-rule-1:
        schema-pattern: "test_*"
        table-pattern: "t_*"
        target-schema: "test"
        target-table: "t"
    route-rule-2:
    ​    schema-pattern: "test_*"
    ​    target-schema: "test"

# The binlog event filter rule set of the matched table of the upstream database instance.
filters:
    filter-rule-1:
    ​    schema-pattern: "test_*"
    ​    table-pattern: "t_*"
    ​    events: ["truncate table", "drop table"]
    ​    action: Ignore
    filter-rule-2:
        schema-pattern: "test_*"
        # Only execute all the DML events in the `test_*` schema.
        events: ["All DML"]
        action: Do

# The filter rule set of the black white list of the matched table of the upstream database instance.
black-white-list:
    bw-rule-1:
    ​    do-dbs: ["~^test.*", "user"]
        ignore-dbs: ["mysql", "account"]
        do-tables:
        - db-name: "~^test.*"
          tbl-name: "~^t.*"
        - db-name: "user"
          tbl-name: "information"
        ignore-tables:
        - db-name: "user"
          tbl-name: "log"

# The column mapping rule set of the matched table of the upstream database instance.
column-mappings:
    cm-rule-1:
    ​    schema-pattern: "test_*"
    ​    table-pattern: "t_*"
    ​    expression: "partition id"
    ​    source-column: "id"
    ​    target-column: "id"
    ​    arguments: ["1", "test_", "t_"]
    cm-rule-2:
        schema-pattern: "test_*"
        table-pattern: "t_*"
        expression: "partition id"
        source-column: "id"
        target-column: "id"
        arguments: ["2", "test_", "t_"]

# Configuration arguments of running mydumper.
mydumpers:
    global:
        # The mydumper binary file path. It is generated by the Ansible deployment application automatically and needs no configuration.
        mydumper-path: "./mydumper"
        # The number of the threads mydumper dumps from the upstream database instance.
        threads: 16
        # The size of the file generated by mydumper.
        chunk-filesize: 64
        skip-tz-utc: true
        extra-args: "-B test -T t1,t2 --no-locks"

# Configuration arguments of running Loader.
loaders:
    global:
        # The number of threads that execute mydumper SQL files concurrently in Loader.
        pool-size: 16
        # The directory output by mydumper that Loader reads. Directories for different tasks of the same instance must be different. (mydumper outputs the SQL file based on the directory)
        dir: "./dumped_data"

# Configuration arguments of running Syncer.
syncers:
    global:
        # The number of threads that synchronize binlog events concurrently in Syncer.
        worker-count: 16
        # The number of SQL statements in a transaction batch that Syncer synchronizes to the downstream database.
        batch: 1000
        # The retry times of the transactions with an error that Syncer synchronizes to the downstream database (only for DML operations).
        max-retry: 100

Instance configuration

This part defines the subtask of data synchronization. DM supports synchronizing data from one or multiple MySQL instances to the same instance.

mysql-instances:
    -
        source-id: "mysql-replica-01"                                      # The ID of the upstream instance or replication group ID. It can be configured by referring to the `source_id` in the `inventory.ini` file or the `source-id` in the `dm-master.toml` file.
        meta:                                                              # The position where the binlog synchronization starts when the downstream database checkpoint does not exist. If the checkpoint exists, the checkpoint is used.
            binlog-name: binlog-00001
            binlog-pos: 4

        route-rules: ["route-rule-1", "route-rule-2"]                      # The name of the mapping rule between the table matching the upstream database instance and the downstream database.
        filter-rules: ["filter-rule-1"]                                    # The name of the binlog filtering rule of the table matching the upstream database instance.
        column-mapping-rules: ["cm-rule-1"]                                # The name of the column mapping rule of the table matching the upstream database instance.
        black-white-list:  "bw-rule-1"                                     # The name of the black and white lists filtering rule of the table matching the upstream database instance.

        mydumper-config-name: "global"                                     # The mydumper configuration name.
        loader-config-name: "global"                                       # The Loader configuration name.
        syncer-config-name: "global"                                       # The Syncer configuration name.

    -
        source-id: "mysql-replica-02"                                      # The ID of the upstream instance or replication group. It can be configured by referring to the `source_id` in the `inventory.ini` file or the `source-id` in the `dm-master.toml` file.
        mydumper-config-name: "global"                                     # The mydumper configuration name.
        loader-config-name: "global"                                       # The Loader configuration name.
        syncer-config-name: "global"                                       # The Syncer configuration name.

For the configuration details of the above options, see the corresponding part in Feature configuration set, as shown in the following table.

Option Corresponding part
route-rules routes
filter-rules filters
column-mapping-rules column-mappings
black-white-list black-white-list
mydumper-config-name mydumpers
loader-config-name loaders
syncer-config-name syncers