Skip to content

Commit 2d7b5f0

Browse files
kennytmlilin90
authored andcommitted
tools/tidb-lightning: document backend and that system DBs are filtered (#1620)
1 parent 67603b4 commit 2d7b5f0

File tree

17 files changed

+731
-12
lines changed

17 files changed

+731
-12
lines changed

dev/reference/tools/download.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ If you want to download the latest version of [TiDB Lightning](/dev/reference/to
2525

2626
| Package name | OS | Architecture | SHA256 checksum |
2727
|:---|:---|:---|:---|
28-
| [tidb-toolkit-latest-linux-amd64.tar.gz](http://download.pingcap.org/tidb-toolkit-latest-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-latest-linux-amd64.sha256](http://download.pingcap.org/tidb-toolkit-latest-linux-amd64.sha256) |
28+
| [tidb-toolkit-latest-linux-amd64.tar.gz](https://download.pingcap.org/tidb-toolkit-latest-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-latest-linux-amd64.sha256](https://download.pingcap.org/tidb-toolkit-latest-linux-amd64.sha256) |
2929

3030
## DM (Data Migration)
3131

dev/reference/tools/tidb-lightning/config.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,15 @@ driver = "file"
8989
#keep-after-success = false
9090

9191
[tikv-importer]
92-
# The listening address of tikv-importer. Change it to the actual address.
92+
# Delivery back end, can be "importer" or "tidb".
93+
# backend = "importer"
94+
# The listening address of tikv-importer when back end is "importer". Change it to the actual address.
9395
addr = "172.16.31.10:8287"
96+
# Action to do when trying to insert a duplicated entry in the "tidb" back end.
97+
# - replace: new entry replaces existing entry
98+
# - ignore: keep existing entry, ignore new entry
99+
# - error: report error and quit the program
100+
#on-duplicate = "replace"
94101

95102
[mydumper]
96103
# Block size for file reading. Keep it longer than the longest string of
@@ -288,6 +295,7 @@ min-available-ratio = 0.05
288295
| -V | Prints program version | |
289296
| -d *directory* | Directory of the data dump to read from | `mydumper.data-source-dir` |
290297
| -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` |
298+
| --backend *backend* | [Delivery back end](/dev/reference/tools/tidb-lightning/tidb-backend.md) (`importer` or `tidb`) | `tikv-importer.backend` |
291299
| --log-file *file* | Log file path | `lightning.log-file` |
292300
| --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` |
293301
| --importer *host:port* | Address of TiKV Importer | `tikv-importer.addr` |

dev/reference/tools/tidb-lightning/deployment.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@ category: reference
66

77
# TiDB Lightning Deployment
88

9-
This document describes the hardware requirements of TiDB Lightning on separate deployment and mixed deployment, and how to deploy it using Ansible or manually.
9+
This document describes the hardware requirements of TiDB Lightning using the default "Importer" back end, and how to deploy it using Ansible or manually.
10+
11+
If you wish to use the "TiDB" back end, also read [TiDB Lightning "TiDB" Back End](/dev/reference/tools/tidb-lightning/tidb-backend.md) for the changes to the deployment steps.
1012

1113
## Notes
1214

dev/reference/tools/tidb-lightning/overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,5 @@ The complete import process is as follows:
4040
The auto-increment ID of a table is computed by the estimated *upper bound* of the number of rows, which is proportional to the total file size of the data files of the table. Therefore, the final auto-increment ID is often much larger than the actual number of rows. This is expected since in TiDB auto-increment is [not necessarily allocated sequentially](/dev/reference/mysql-compatibility.md#auto-increment-id).
4141

4242
7. Finally, `tidb-lightning` switches the TiKV cluster back to "normal mode", so the cluster resumes normal services.
43+
44+
TiDB Lightning also supports using "TiDB" instead of "Importer" as the back end. In this configuration, `tidb-lightning` transforms data into SQL `INSERT` statements and directly executes them on the target cluster, similar to Loader. See [TiDB Lightning "TiDB" Back End](/dev/reference/tools/tidb-lightning/tidb-backend.md) for details.

dev/reference/tools/tidb-lightning/table-filter.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ ignore-dbs = ["pattern4", "pattern5"]
2626

2727
The pattern can either be a simple name, or a regular expression in [Go dialect](https://golang.org/pkg/regexp/syntax/#hdr-syntax) if it starts with a `~` character.
2828

29+
> **Note:**
30+
>
31+
> The system databases `INFORMATION_SCHEMA`, `PERFORMANCE_SCHEMA`, `mysql` and `sys` are always black-listed regardless of the table filter settings.
32+
2933
## Filtering tables
3034

3135
```toml
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
---
2+
title: TiDB Lightning "TiDB" Back End
3+
summary: Choose how to write data into the TiDB cluster.
4+
category: reference
5+
---
6+
7+
# TiDB Lightning "TiDB" Back End
8+
9+
TiDB Lightning supports two back ends: "Importer" and "TiDB". It determines how `tidb-lightning` delivers data into the target cluster.
10+
11+
The "Importer" back end (default) requires `tidb-lightning` to first encode the SQL or CSV data into KV pairs, and relies on the external `tikv-importer` program to sort these KV pairs and ingest directly into the TiKV nodes.
12+
13+
The "TiDB" back end requires `tidb-lightning` to encode these data into SQL `INSERT` statements, and has these statements executed directly on the TiDB node.
14+
15+
| Back end | "Importer" | "TiDB" |
16+
|:---|:---|:---|
17+
| Speed | Fast (~300 GB/hr) | Slow (~50 GB/hr) |
18+
| Resource usage | High | Low |
19+
| ACID respected while importing | No | Yes |
20+
| Target tables | Must be empty | Can be populated |
21+
22+
## Deployment for "TiDB" back end
23+
24+
When using the "TiDB" back end, you no longer need `tikv-importer`. Compared with the [standard deployment procedure](/dev/reference/tools/tidb-lightning/deployment.md), the "TiDB" back end deployment has the following two differences:
25+
26+
* Steps involving `tikv-importer` can all be skipped.
27+
* The configuration must be changed to indicate the "TiDB" back end is used.
28+
29+
### Ansible deployment
30+
31+
1. The `[importer_server]` section in `inventory.ini` can be left blank.
32+
33+
```ini
34+
...
35+
36+
[importer_server]
37+
# keep empty
38+
39+
[lightning_server]
40+
192.168.20.10
41+
42+
...
43+
```
44+
45+
2. The `tikv_importer_port` setting in `group_vars/all.yml` is ignored, and the file `group_vars/importer_server.yml` does not need to be changed. But you need to edit `conf/tidb-lightning.yml` and change the `backend` setting to `tidb`.
46+
47+
```yaml
48+
...
49+
tikv_importer:
50+
backend: "tidb" # <-- change this
51+
...
52+
```
53+
54+
3. Bootstrap and deploy the cluster as usual.
55+
56+
4. Mount the data source for TiDB Lightning as usual.
57+
58+
5. Start `tidb-lightning` as usual.
59+
60+
### Manual deployment
61+
62+
You do not need to download and configure `tikv-importer`.
63+
64+
Before running `tidb-lightning`, add the following lines into the configuration file:
65+
66+
```toml
67+
[tikv-importer]
68+
backend = "tidb"
69+
```
70+
71+
or supplying the `--backend tidb` arguments when executing `tidb-lightning`.
72+
73+
## Conflict resolution
74+
75+
The "TiDB" back end supports importing to an already-populated table. However, the new data might cause a unique key conflict with the old data. You can control how to resolve the conflict by using this task configuration.
76+
77+
```toml
78+
[tikv-importer]
79+
backend = "tidb"
80+
on-duplicate = "replace" # or "error" or "ignore"
81+
```
82+
83+
| Setting | Behavior on conflict | Equivalent SQL statement |
84+
|:---|:---|:---|
85+
| replace | New entries replace old ones | `REPLACE INTO ...` |
86+
| ignore | Keep old entries and ignore new ones | `INSERT IGNORE INTO ...` |
87+
| error | Abort import | `INSERT INTO ...` |
88+
89+
## Migrating from Loader to TiDB Lightning "TiDB" back end
90+
91+
TiDB Lightning using the "TiDB" back end can completely replace functions of [Loader](/dev/reference/tools/loader.md). The following list shows how to translate Loader configurations into [TiDB Lightning configurations](/dev/reference/tools/tidb-lightning/config.md).
92+
93+
<table>
94+
<thead><tr><th>Loader</th><th>TiDB Lightning</th></tr></thread>
95+
<tbody>
96+
<tr><td>
97+
98+
```toml
99+
100+
# logging
101+
log-level = "info"
102+
log-file = "loader.log"
103+
104+
# Prometheus
105+
status-addr = ":8272"
106+
107+
# concurrency
108+
pool-size = 16
109+
```
110+
111+
</td><td>
112+
113+
```toml
114+
[lightning]
115+
# logging
116+
level = "info"
117+
file = "tidb-lightning.log"
118+
119+
# Prometheus
120+
pprof-port = 8289
121+
122+
# concurrency (better left as default)
123+
#region-concurrency = 16
124+
```
125+
126+
</td></tr>
127+
<tr><td>
128+
129+
```toml
130+
131+
# checkpoint database
132+
133+
checkpoint-schema = "tidb_loader"
134+
135+
136+
137+
138+
139+
140+
```
141+
142+
</td><td>
143+
144+
```toml
145+
[checkpoint]
146+
# checkpoint storage
147+
enable = true
148+
schema = "tidb_lightning_checkpoint"
149+
# by default the checkpoint is stored in
150+
# a local file, which is more efficient.
151+
# but you could still choose to store the
152+
# checkpoints in the target database with
153+
# this setting:
154+
#driver = "mysql"
155+
```
156+
157+
</td></tr>
158+
<tr><td>
159+
160+
```toml
161+
162+
163+
164+
```
165+
166+
</td><td>
167+
168+
```toml
169+
[tikv-importer]
170+
# use the "TiDB" back end
171+
backend = "tidb"
172+
```
173+
174+
</td></tr>
175+
<tr><td>
176+
177+
```toml
178+
179+
# data source directory
180+
dir = "/data/export/"
181+
```
182+
183+
</td><td>
184+
185+
```toml
186+
[mydumper]
187+
# data source directory
188+
data-source-dir = "/data/export"
189+
```
190+
191+
</td></tr>
192+
193+
<tr><td>
194+
195+
```toml
196+
[db]
197+
# TiDB connection parameters
198+
host = "127.0.0.1"
199+
port = 4000
200+
201+
user = "root"
202+
password = ""
203+
204+
#sql-mode = ""
205+
```
206+
207+
</td><td>
208+
209+
```toml
210+
[tidb]
211+
# TiDB connection parameters
212+
host = "127.0.0.1"
213+
port = 4000
214+
status-port = 10080 # <- this is required
215+
user = "root"
216+
password = ""
217+
218+
#sql-mode = ""
219+
```
220+
221+
</td></tr>
222+
</tbody>
223+
</table>

v2.1/reference/tools/download.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ In addition, the Kafka version of TiDB Binlog is also provided.
1616

1717
| Package name | OS | Architecture | SHA256 checksum |
1818
|:---|:---|:---|:---|
19-
| [tidb-v2.1.16-linux-amd64.tar.gz](http://download.pingcap.org/tidb-v2.1.16-linux-amd64.tar.gz) (TiDB Binlog, TiDB Lightning) | Linux | amd64 |[tidb-v2.1.16-linux-amd64.sha256](http://download.pingcap.org/tidb-v2.1.16-linux-amd64.sha256)|
19+
| [tidb-v2.1.17-linux-amd64.tar.gz](https://download.pingcap.org/tidb-v2.1.17-linux-amd64.tar.gz) (TiDB Binlog, TiDB Lightning) | Linux | amd64 |[tidb-v2.1.17-linux-amd64.sha256](https://download.pingcap.org/tidb-v2.1.17-linux-amd64.sha256)|
2020
| [tidb-binlog-kafka-linux-amd64.tar.gz](http://download.pingcap.org/tidb-binlog-kafka-linux-amd64.tar.gz) (the Kafka version of TiDB Binlog) | Linux | amd64 |[tidb-binlog-kafka-linux-amd64.sha256](http://download.pingcap.org/tidb-binlog-kafka-linux-amd64.sha256)|
2121

2222
## DM (Data Migration)

v3.0/reference/tools/download.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ If you want to download the 3.0 version of [TiDB Lightning](/v3.0/reference/tool
2626

2727
| Package name | OS | Architecture | SHA256 checksum |
2828
|:---|:---|:---|:---|
29-
| [tidb-toolkit-v3.0.3-linux-amd64.tar.gz](http://download.pingcap.org/tidb-toolkit-v3.0.3-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-v3.0.3-linux-amd64.sha256](http://download.pingcap.org/tidb-toolkit-v3.0.3-linux-amd64.sha256) |
29+
| [tidb-toolkit-v3.0.5-linux-amd64.tar.gz](https://download.pingcap.org/tidb-toolkit-v3.0.5-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-v3.0.5-linux-amd64.sha256](https://download.pingcap.org/tidb-toolkit-v3.0.5-linux-amd64.sha256) |
3030

3131
## DM (Data Migration)
3232

v3.0/reference/tools/tidb-lightning/deployment.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@ category: reference
66

77
# TiDB Lightning Deployment
88

9-
This document describes the hardware requirements of TiDB Lightning on separate deployment and mixed deployment, and how to deploy it using Ansible or manually.
9+
This document describes the hardware requirements of TiDB Lightning using the default "Importer" back end, and how to deploy it using Ansible or manually.
10+
11+
If you wish to use the "TiDB" back end, also read [TiDB Lightning "TiDB" Back End](/v3.0/reference/tools/tidb-lightning/tidb-backend.md) for the changes to the deployment steps.
1012

1113
## Notes
1214

@@ -343,8 +345,15 @@ Follow the link to download the TiDB Lightning package (choose the same version
343345
# keep-after-success = false
344346
345347
[tikv-importer]
346-
# The listening address of tikv-importer. Change it to the actual address.
348+
# Delivery back end, can be "importer" or "tidb".
349+
# backend = "importer"
350+
# The listening address of tikv-importer when back end is "importer". Change it to the actual address.
347351
addr = "172.16.31.10:8287"
352+
# Action to do when trying to insert a duplicated entry in the "tidb" back end.
353+
# - replace: new entry replaces existing entry
354+
# - ignore: keep existing entry, ignore new entry
355+
# - error: report error and quit the program
356+
# on-duplicate = "replace"
348357
349358
[mydumper]
350359
# Block size for file reading. Keep it longer than the longest string of

v3.0/reference/tools/tidb-lightning/overview.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,10 @@ The complete import process is as follows:
3636

3737
There are two kinds of engine files: *data engines* and *index engines*, each corresponding to two kinds of KV pairs: the row data and secondary indices. Normally, the row data are entirely sorted in the data source, while the secondary indices are out of order. Because of this, the data engines are uploaded as soon as a batch is completed, while the index engines are imported only after all batches of the entire table are encoded.
3838

39-
6. After all engines associated to a table are imported, `tidb-lightning` performs a checksum comparison between the local data source and those calculated from the cluster, to ensure there is no data corruption in the process, and tells TiDB to `ANALYZE` all imported tables, to prepare for optimal query planning.
39+
6. After all engines associated to a table are imported, `tidb-lightning` performs a checksum comparison between the local data source and those calculated from the cluster, to ensure there is no data corruption in the process; tells TiDB to `ANALYZE` all imported tables, to prepare for optimal query planning; and adjusts the `AUTO_INCREMENT` value so future insertions will not cause conflict.
40+
41+
The auto-increment ID of a table is computed by the estimated *upper bound* of the number of rows, which is proportional to the total file size of the data files of the table. Therefore, the final auto-increment ID is often much larger than the actual number of rows. This is expected since in TiDB auto-increment is [not necessarily allocated sequentially](/v3.0/reference/mysql-compatibility.md#auto-increment-id).
4042

4143
7. Finally, `tidb-lightning` switches the TiKV cluster back to "normal mode", so the cluster resumes normal services.
44+
45+
TiDB Lightning also supports using "TiDB" instead of "Importer" as the back end. In this configuration, `tidb-lightning` transforms data into SQL `INSERT` statements and directly executes them on the target cluster, similar to Loader. See [TiDB Lightning "TiDB" Back End](/v3.0/reference/tools/tidb-lightning/tidb-backend.md) for details.

0 commit comments

Comments
 (0)