diff --git a/TOC.md b/TOC.md index 45aa62eda58d6..294dfb33cad2f 100644 --- a/TOC.md +++ b/TOC.md @@ -159,7 +159,7 @@ + [BR Tool Overview](/br/backup-and-restore-tool.md) + [Use BR Command-line for Backup and Restoration](/br/backup-and-restore-tool.md) + [BR Use Cases](/br/backup-and-restore-use-cases.md) - + [BR Storages](/br/backup-and-restore-storages.md) + + [External Storages](/br/backup-and-restore-storages.md) + [BR FAQ](/br/backup-and-restore-faq.md) + TiDB Binlog + [Overview](/tidb-binlog/tidb-binlog-overview.md) diff --git a/br/backup-and-restore-storages.md b/br/backup-and-restore-storages.md index 4f38a7a34f47b..ee36c690df549 100644 --- a/br/backup-and-restore-storages.md +++ b/br/backup-and-restore-storages.md @@ -1,11 +1,11 @@ --- -title: BR Storages -summary: Describes the storage URL format used in BR. +title: External Storages +summary: Describes the storage URL format used in BR, TiDB Lightning, and Dumpling. --- -# BR Storages +# External Storages -BR supports reading and writing data on the local filesystem, as well as on Amazon S3 and Google Cloud Storage. These are distinguished by the URL scheme in the `--storage` parameter passed into BR. +Backup & Restore (BR), TiDB Lighting, and Dumpling support reading and writing data on the local filesystem and on Amazon S3. BR also supports reading and writing data on the Google Cloud Storage (GCS). These are distinguished by the URL scheme in the `--storage` parameter passed into BR, in the `-d` parameter passed into TiDB Lightning, and in the `--output` (`-o`) parameter passed into Dumpling. ## Schemes @@ -18,19 +18,40 @@ The following services are supported: | Google Cloud Storage (GCS) | gcs, gs | `gcs://bucket-name/prefix/of/dest/` | | Write to nowhere (for benchmarking only) | noop | `noop://` | -## Parameters +## URL parameters Cloud storages such as S3 and GCS sometimes require additional configuration for connection. You can specify parameters for such configuration. For example: -{{< copyable "shell-regular" >}} ++ Use Dumpling to export data to S3: -```shell -./br backup full -u 127.0.0.1:2379 -s 's3://bucket-name/prefix?region=us-west-2' -``` + {{< copyable "shell-regular" >}} + + ```bash + ./dumpling -u root -h 127.0.0.1 -P 3306 -B mydb -F 256MiB \ + -o 's3://my-bucket/sql-backup?region=us-west-2' + ``` + ++ Use TiDB Lightning to import data from S3: + + {{< copyable "shell-regular" >}} + + ```bash + ./tidb-lightning --tidb-port=4000 --pd-urls=127.0.0.1:2379 --backend=local --sorted-kv-dir=/tmp/sorted-kvs \ + -d 's3://my-bucket/sql-backup?region=us-west-2' + ``` + ++ Use BR to back up data to GCS: -### S3 parameters + {{< copyable "shell-regular" >}} -| Parameter | Description | + ```bash + ./br backup full -u 127.0.0.1:2379 \ + -s 'gcs://bucket-name/prefix' + ``` + +### S3 URL parameters + +| URL parameter | Description | |----------:|---------| | `access-key` | The access key | | `secret-access-key` | The secret access key | @@ -45,30 +66,64 @@ Cloud storages such as S3 and GCS sometimes require additional configuration for > **Note:** > -> It is not recommended to pass in the access key and secret access key directly in the storage URL, because these keys are logged in plain text. BR tries to infer these keys from the environment in the following order: +> It is not recommended to pass in the access key and secret access key directly in the storage URL, because these keys are logged in plain text. The migration tools try to infer these keys from the environment in the following order: 1. `$AWS_ACCESS_KEY_ID` and `$AWS_SECRET_ACCESS_KEY` environment variables 2. `$AWS_ACCESS_KEY` and `$AWS_SECRET_KEY` environment variables -3. Shared credentials file on the BR node at the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable -4. Shared credentials file on the BR node at `~/.aws/credentials` +3. Shared credentials file on the tool node at the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable +4. Shared credentials file on the tool node at `~/.aws/credentials` 5. Current IAM role of the Amazon EC2 container 6. Current IAM role of the Amazon ECS task -### GCS parameters +### GCS URL parameters -| Parameter | Description | +| URL parameter | Description | |----------:|---------| -| `credentials-file` | The path to the credentials JSON file on the TiDB node | +| `credentials-file` | The path to the credentials JSON file on the tool node | | `storage-class` | Storage class of the uploaded objects (for example, `STANDARD`, `COLDLINE`) | | `predefined-acl` | Predefined ACL of the uploaded objects (for example, `private`, `project-private`) | -When `credentials-file` is not specified, BR will try to infer the credentials from the environment, in the following order: +When `credentials-file` is not specified, the migration tool will try to infer the credentials from the environment, in the following order: -1. Content of the file on the BR node at the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable -2. Content of the file on the BR node at `~/.config/gcloud/application_default_credentials.json` +1. Content of the file on the tool node at the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable +2. Content of the file on the tool node at `~/.config/gcloud/application_default_credentials.json` 3. When running in GCE or GAE, the credentials fetched from the metadata server. -## Sending credentials to TiKV +## Command-line parameters + +In addition to the URL parameters, BR and Dumpling also support specifying these configurations using command-line parameters. For example: + +{{< copyable "shell-regular" >}} + +```bash +./dumpling -u root -h 127.0.0.1 -P 3306 -B mydb -F 256MiB \ + -o 's3://my-bucket/sql-backup' \ + --s3.region 'us-west-2' +``` + +If you have specified URL parameters and command-line parameters at the same time, the URL parameters are overwritten by the command-line parameters. + +### S3 command-line parameters + +| Command-line parameter | Description | +|----------:|------| +| `--s3.region` | Amazon S3's service region, which defaults to `us-east-1`. | +| `--s3.endpoint` | The URL of custom endpoint for S3-compatible services. For example, `https://s3.example.com/`. | +| `--s3.storage-class` | The storage class of the upload object. For example, `STANDARD` and `STANDARD_IA`. | +| `--s3.sse` | The server-side encryption algorithm used to encrypt the upload. The value options are empty, `AES256` and `aws:kms`. | +| `--s3.sse-kms-key-id` | If `--s3.sse` is configured as `aws:kms`, this parameter is used to specify the KMS ID. | +| `--s3.acl` | The canned ACL of the upload object. For example, `private` and `authenticated-read`. | +| `--s3.provider` | The type of the S3-compatible service. The supported types are `aws`, `alibaba`, `ceph`, `netease` and `other`. | + +### GCS command-line parameters + +| Command-line parameter | Description | +|----------:|---------| +| `--gcs.credentials-file` | The path of the JSON-formatted credential on the tool node. | +| `--gcs.storage-class` | The storage type of the upload object, such as `STANDARD` and `COLDLINE`. | +| `--gcs.predefined-acl` | The pre-defined ACL of the upload object, such as `private` and `project-private`. | + +## BR sending credentials to TiKV By default, when using S3 and GCS destinations, BR will send the credentials to every TiKV nodes to reduce setup complexity. @@ -76,6 +131,16 @@ However, this is unsuitable on cloud environment, where every node has their own {{< copyable "shell-regular" >}} -```shell +```bash ./br backup full -c=0 -u pd-service:2379 -s 's3://bucket-name/prefix' ``` + +When using SQL statements to [back up](/sql-statements/sql-statement-backup.md) and [restore](/sql-statements/sql-statement-restore.md) data, you can add the `SEND_CREDENTIALS_TO_TIKV = FALSE` option: + +{{< copyable "sql" >}} + +```sql +BACKUP DATABASE * TO 's3://bucket-name/prefix' SEND_CREDENTIALS_TO_TIKV = FALSE; +``` + +This option is not supported in TiDB Lightning and Dumpling, because the two applications are currently standalone. diff --git a/br/backup-and-restore-tool.md b/br/backup-and-restore-tool.md index f8724bf9a598d..f3a3e0c9aa4b2 100644 --- a/br/backup-and-restore-tool.md +++ b/br/backup-and-restore-tool.md @@ -164,7 +164,7 @@ In the Kubernetes environment, you can use the BR tool to back up TiDB cluster d > **Note:** > -> For Amazon S3 and Google Cloud Storage parameter descriptions, see the [BR Storages](/br/backup-and-restore-storages.md) document. +> For Amazon S3 and Google Cloud Storage parameter descriptions, see the [External Storages](/br/backup-and-restore-storages.md#url-parameters) document. - [Back up Data to S3-Compatible Storage Using BR](https://docs.pingcap.com/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br) - [Restore Data from S3-Compatible Storage Using BR](https://docs.pingcap.com/tidb-in-kubernetes/stable/restore-from-aws-s3-using-br) @@ -178,4 +178,4 @@ In the Kubernetes environment, you can use the BR tool to back up TiDB cluster d - [Use BR Command-line](/br/use-br-command-line-tool.md) - [BR Use Cases](/br/backup-and-restore-use-cases.md) - [BR FAQ](/br/backup-and-restore-faq.md) -- [BR Storages](/br/backup-and-restore-storages.md) +- [External Storages](/br/backup-and-restore-storages.md) diff --git a/dumpling-overview.md b/dumpling-overview.md index 6f145e69ac92e..b958c78dd3802 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -165,7 +165,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey} export AWS_SECRET_ACCESS_KEY=${SecretKey} ``` -Dumpling also supports reading credential files from `~/.aws/credentials`. For more Dumpling configuration, see the configuration of [BR storages](/br/backup-and-restore-storages.md), which is consistent with the Dumpling configuration. +Dumpling also supports reading credential files from `~/.aws/credentials`. For more Dumpling configuration, see the configuration of [External storages](/br/backup-and-restore-storages.md). When you back up data using Dumpling, explicitly specify the `--s3.region` parameter, which means the region of the S3 storage: @@ -317,7 +317,7 @@ After your operation is completed, set the GC time back (the default value is `1 SET GLOBAL tidb_gc_life_time = '10m'; ``` -Finally, all the exported data can be imported back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-backends.md). +Finally, all the exported data can be imported back to TiDB using [TiDB Lightning](/tidb-lightning/tidb-lightning-backends.md). ## Option list of Dumpling @@ -341,7 +341,7 @@ Finally, all the exported data can be imported back to TiDB using [Lightning](/t | `-s` or `--statement-size` | Control the size of the `INSERT` statements; the unit is bytes | | `-F` or `--filesize` | The file size of the divided tables. The unit must be specified such as `128B`, `64KiB`, `32MiB`, and `1.5GiB`. | | `--filetype` | Exported file type (csv/sql) | "sql" | -| `-o` or `--output` | Exported file path | "./export-${time}" | +| `-o` or `--output` | The path of exported local files or [the URL of the external storage](/br/backup-and-restore-storages.md) | "./export-${time}" | | `-S` or `--sql` | Export data according to the specified SQL statement. This command does not support concurrent export. | | `--consistency` | flush: use FTWRL before the dump
snapshot: dump the TiDB data of a specific snapshot of a TSO
lock: execute `lock tables read` on all tables to be dumped
none: dump without adding locks, which cannot guarantee consistency
auto: use --consistency flush for MySQL; use --consistency snapshot for TiDB | "auto" | | `--snapshot` | Snapshot TSO; valid only when `consistency=snapshot` | diff --git a/faq/migration-tidb-faq.md b/faq/migration-tidb-faq.md index 48fdcdd53beae..a23f768423357 100644 --- a/faq/migration-tidb-faq.md +++ b/faq/migration-tidb-faq.md @@ -123,5 +123,5 @@ If the amount of data that needs to be deleted at a time is very large, this loo ### How to improve the data loading speed in TiDB? -- The [Lightning](/tidb-lightning/tidb-lightning-overview.md) tool is developed for distributed data import. It should be noted that the data import process does not perform a complete transaction process for performance reasons. Therefore, the ACID constraint of the data being imported during the import process cannot be guaranteed. The ACID constraint of the imported data can only be guaranteed after the entire import process ends. Therefore, the applicable scenarios mainly include importing new data (such as a new table or a new index) or the full backup and restoring (truncate the original table and then import data). +- The [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) tool is developed for distributed data import. It should be noted that the data import process does not perform a complete transaction process for performance reasons. Therefore, the ACID constraint of the data being imported during the import process cannot be guaranteed. The ACID constraint of the imported data can only be guaranteed after the entire import process ends. Therefore, the applicable scenarios mainly include importing new data (such as a new table or a new index) or the full backup and restoring (truncate the original table and then import data). - Data loading in TiDB is related to the status of disks and the whole cluster. When loading data, pay attention to metrics like the disk usage rate of the host, TiClient Error, Backoff, Thread CPU and so on. You can analyze the bottlenecks using these metrics. diff --git a/sql-statements/sql-statement-backup.md b/sql-statements/sql-statement-backup.md index 179562c94b4ed..63947a8694bd8 100644 --- a/sql-statements/sql-statement-backup.md +++ b/sql-statements/sql-statement-backup.md @@ -97,7 +97,7 @@ BACKUP DATABASE * TO 'local:///mnt/backup/full/'; Note that the system tables (`mysql.*`, `INFORMATION_SCHEMA.*`, `PERFORMANCE_SCHEMA.*`, …) will not be included into the backup. -### Remote destinations +### External storages BR supports backing up data to S3 or GCS: @@ -107,7 +107,7 @@ BR supports backing up data to S3 or GCS: BACKUP DATABASE `test` TO 's3://example-bucket-2020/backup-05/?region=us-west-2&access-key={YOUR_ACCESS_KEY}&secret-access-key={YOUR_SECRET_KEY}'; ``` -The URL syntax is further explained in [BR storages](/br/backup-and-restore-storages.md). +The URL syntax is further explained in [External Storages](/br/backup-and-restore-storages.md). When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`: diff --git a/sql-statements/sql-statement-restore.md b/sql-statements/sql-statement-restore.md index 5d657c4f6a3bb..10968985844f5 100644 --- a/sql-statements/sql-statement-restore.md +++ b/sql-statements/sql-statement-restore.md @@ -88,7 +88,7 @@ RESTORE DATABASE `test` FROM 'local:///mnt/backup/2020/04/'; RESTORE TABLE `test`.`sbtest01`, `test`.`sbtest02` FROM 'local:///mnt/backup/2020/04/'; ``` -### Remote destinations +### External storages BR supports restoring data from S3 or GCS: @@ -98,7 +98,7 @@ BR supports restoring data from S3 or GCS: RESTORE DATABASE * FROM 's3://example-bucket-2020/backup-05/?region=us-west-2'; ``` -The URL syntax is further explained in [BR storages](/br/backup-and-restore-storages.md). +The URL syntax is further explained in [External Storages](/br/backup-and-restore-storages.md). When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`: diff --git a/table-filter.md b/table-filter.md index e69d76af8fbeb..774c3c91705c5 100644 --- a/table-filter.md +++ b/table-filter.md @@ -35,7 +35,7 @@ Table filters can be applied to the tools using multiple `-f` or `--filter` comm # ^~~~~~~~~~~~~~~~~~~~~~~ ``` -* [Lightning](/tidb-lightning/tidb-lightning-overview.md): +* [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md): {{< copyable "shell-regular" >}} @@ -48,7 +48,7 @@ Table filters can be applied to the tools using multiple `-f` or `--filter` comm Table filters in TOML files are specified as [array of strings](https://toml.io/en/v1.0.0-rc.1#section-15). The following lists the example usage in each tool. -* Lightning: +* TiDB Lightning: ```toml [mydumper] diff --git a/tidb-lightning/monitor-tidb-lightning.md b/tidb-lightning/monitor-tidb-lightning.md index f0536ca823844..2872c2da32cc1 100644 --- a/tidb-lightning/monitor-tidb-lightning.md +++ b/tidb-lightning/monitor-tidb-lightning.md @@ -53,7 +53,7 @@ When you [deploy a TiDB cluster using TiUP](/production-deployment-using-tiup.md | Panel | Series | Description | |:-----|:-----|:-----| -| Import speed | write from lightning | Speed of sending KVs from TiDB Lightning to TiKV Importer, which depends on each table's complexity | +| Import speed | write from TiDB Lightning | Speed of sending KVs from TiDB Lightning to TiKV Importer, which depends on each table's complexity | | Import speed | upload to tikv | Total upload speed from TiKV Importer to all TiKV replicas | | Chunk process duration | | Average time needed to completely encode one single data file | @@ -76,7 +76,7 @@ Sometimes the import speed will drop to zero allowing other parts to catch up. T | Panel | Description | |:-----|:-----| | Memory usage | Amount of memory occupied by each service | -| Number of Lightning Goroutines | Number of running goroutines used by TiDB Lightning | +| Number of TiDB Lightning Goroutines | Number of running goroutines used by TiDB Lightning | | CPU% | Number of logical CPU cores utilized by each service | ### Row 4: Quota @@ -162,11 +162,11 @@ Metrics provided by `tikv-importer` are listed under the namespace `tikv_import_ - **`tikv_import_write_chunk_bytes`** (Histogram) - Bucketed histogram for the uncompressed size of a block of KV pairs received from Lightning. + Bucketed histogram for the uncompressed size of a block of KV pairs received from TiDB Lightning. - **`tikv_import_write_chunk_duration`** (Histogram) - Bucketed histogram for the time needed to receive a block of KV pairs from Lightning. + Bucketed histogram for the time needed to receive a block of KV pairs from TiDB Lightning. - **`tikv_import_upload_chunk_bytes`** (Histogram) diff --git a/tidb-lightning/tidb-lightning-checkpoints.md b/tidb-lightning/tidb-lightning-checkpoints.md index c9ea08d9faf95..d7dfba0140cc9 100644 --- a/tidb-lightning/tidb-lightning-checkpoints.md +++ b/tidb-lightning/tidb-lightning-checkpoints.md @@ -5,7 +5,7 @@ summary: Use checkpoints to avoid redoing the previously completed tasks before # TiDB Lightning Checkpoints -Importing a large database usually takes hours or days, and if such long running processes spuriously crashes, it can be very time-wasting to redo the previously completed tasks. To solve this, Lightning uses *checkpoints* to store the import progress, so that `tidb-lightning` continues importing from where it lefts off after restarting. +Importing a large database usually takes hours or days, and if such long running processes spuriously crashes, it can be very time-wasting to redo the previously completed tasks. To solve this, TiDB Lightning uses *checkpoints* to store the import progress, so that `tidb-lightning` continues importing from where it lefts off after restarting. This document describes how to enable, configure, store, and control *checkpoints*. @@ -14,8 +14,8 @@ This document describes how to enable, configure, store, and control *checkpoint ```toml [checkpoint] # Whether to enable checkpoints. -# While importing data, Lightning records which tables have been imported, so -# even if Lightning or some other component crashes, you can start from a known +# While importing data, TiDB Lightning records which tables have been imported, so +# even if TiDB Lightning or some other component crashes, you can start from a known # good state instead of redoing everything. enable = true diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index fd67377c27495..3e322e75dcc7c 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -74,7 +74,7 @@ io-concurrency = 5 [checkpoint] # Whether to enable checkpoints. # While importing data, TiDB Lightning records which tables have been imported, so -# even if Lightning or another component crashes, you can start from a known +# even if TiDB Lightning or another component crashes, you can start from a known # good state instead of redoing everything. enable = true # The schema name (database name) to store the checkpoints. @@ -84,7 +84,7 @@ schema = "tidb_lightning_checkpoint" # - mysql: store into a remote MySQL-compatible database driver = "file" # The data source name (DSN) indicating the location of the checkpoint storage. -# For the "file" driver, the DSN is a path. If the path is not specified, Lightning would +# For the "file" driver, the DSN is a path. If the path is not specified, TiDB Lightning would # default to "/tmp/CHECKPOINT_SCHEMA.pb". # For the "mysql" driver, the DSN is a URL in the form of "USER:PASS@tcp(HOST:PORT)/". # If the URL is not specified, the TiDB server from the [tidb] section is used to @@ -131,7 +131,7 @@ read-block-size = 65536 # Byte (default = 64 KB) # The engine file needs to be imported sequentially. Due to parallel processing, # multiple data engines will be imported at nearly the same time, and this -# creates a queue and wastes resources. Therefore, Lightning slightly +# creates a queue and wastes resources. Therefore, TiDB Lightning slightly # increases the size of the first few batches to properly distribute # resources. The scale up factor is controlled by this parameter, which # expresses the ratio of duration between the "import" and "write" steps @@ -142,7 +142,7 @@ read-block-size = 65536 # Byte (default = 64 KB) # This value should be in the range (0 <= batch-import-ratio < 1). batch-import-ratio = 0.75 -# Local source data directory. +# Local source data directory or the URL of the external storage. data-source-dir = "/data/my_database" # If no-schema is set to true, tidb-lightning assumes that the table skeletons # already exist on the target TiDB cluster, and will not execute the `CREATE @@ -150,10 +150,9 @@ data-source-dir = "/data/my_database" no-schema = false # The character set of the schema files, containing CREATE TABLE statements; # only supports one of: -# - utf8mb4: the schema files must be encoded as UTF-8, otherwise Lightning -# will emit errors -# - gb18030: the schema files must be encoded as GB-18030, otherwise -# Lightning will emit errors +# - utf8mb4: the schema files must be encoded as UTF-8; otherwise, an error is reported. +# - gb18030: the schema files must be encoded as GB-18030; otherwise, +# an error is reported # - auto: (default) automatically detects whether the schema is UTF-8 or # GB-18030. An error is reported if the encoding is neither. # - binary: do not try to decode the schema files @@ -165,13 +164,13 @@ character-set = "auto" # Implications of strict-format = true are: # * in CSV, every value cannot contain literal new lines (U+000A and U+000D, or \r and \n) even # when quoted, which means new lines are strictly used to separate rows. -# Strict format allows Lightning to quickly locate split positions of a large file for parallel +# Strict format allows TiDB Lightning to quickly locate split positions of a large file for parallel # processing. However, if the input data is not strict, it may split a valid data in half and # corrupt the result. # The default value is false for safety over speed. strict-format = false -# If strict-format is true, Lightning will split large CSV files into multiple chunks to process in +# If strict-format is true, TiDB Lightning will split large CSV files into multiple chunks to process in # parallel. max-region-size is the maximum size of each chunk after splitting. # max-region-size = 268_435_456 # Byte (default = 256 MB) @@ -265,7 +264,7 @@ analyze = true # Configures the background periodic actions. # Supported units: h (hour), m (minute), s (second). [cron] -# Duration between which Lightning automatically refreshes the import mode +# Duration between which TiDB Lightning automatically refreshes the import mode # status. Should be shorter than the corresponding TiKV setting. switch-mode = "5m" # Duration between which an import progress is printed to the log. @@ -359,7 +358,7 @@ min-available-ratio = 0.05 |:----|:----|:----| | --config *file* | Reads global configuration from *file*. If not specified, the default configuration would be used. | | | -V | Prints program version | | -| -d *directory* | Directory of the data dump to read from | `mydumper.data-source-dir` | +| -d *directory* | Directory or [external storage URL](/br/backup-and-restore-storages.md) of the data dump to read from | `mydumper.data-source-dir` | | -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` | | -f *rule* | [Table filter rules](/table-filter.md) (can be specified multiple times) | `mydumper.filter` | | --backend *backend* | [Delivery backend](/tidb-lightning/tidb-lightning-backends.md) (`importer`, `local`, or `tidb`) | `tikv-importer.backend` | @@ -380,7 +379,7 @@ min-available-ratio = 0.05 | --ca *file* | CA certificate path for TLS connection | `security.ca-path` | | --cert *file* | Certificate path for TLS connection | `security.cert-path` | | --key *file* | Private key path for TLS connection | `security.key-path` | -| --server-mode | Start Lightning in server mode | `lightning.server-mode` | +| --server-mode | Start TiDB Lightning in server mode | `lightning.server-mode` | If a command line parameter and the corresponding setting in the configuration file are both provided, the command line parameter will be used. For example, running `./tidb-lightning -L debug --config cfg.toml` would always set the log level to "debug" regardless of the content of `cfg.toml`. diff --git a/tidb-lightning/tidb-lightning-faq.md b/tidb-lightning/tidb-lightning-faq.md index f862ed9ba70c9..7fd4ad9031b29 100644 --- a/tidb-lightning/tidb-lightning-faq.md +++ b/tidb-lightning/tidb-lightning-faq.md @@ -6,7 +6,7 @@ aliases: ['/tidb/v5.0/troubleshoot-tidb-lightning'] # TiDB Lightning FAQs -## What is the minimum TiDB/TiKV/PD cluster version supported by Lightning? +## What is the minimum TiDB/TiKV/PD cluster version supported by TiDB Lightning? The version of TiDB Lightning should be the same as the cluster. If you use the Local-backend mode, the earliest available version is 4.0.0. If you use the Importer-backend mode or the TiDB-backend mode, the earliest available version is 2.0.9, but it is recommended to use the 3.0 stable version. @@ -80,7 +80,7 @@ ADMIN CHECKSUM TABLE `schema`.`table`; 1 row in set (0.01 sec) ``` -## What kind of data source format is supported by Lightning? +## What kind of data source format is supported by TiDB Lightning? TiDB Lightning only supports the SQL dump generated by [Dumpling](/dumpling-overview.md) or [CSV files](/tidb-lightning/migrate-from-csv-using-tidb-lightning.md) stored in the local file system. @@ -166,7 +166,7 @@ With the default settings of 3 replicas, the space requirement of the target TiK ## Can TiKV Importer be restarted while TiDB Lightning is running? -No. Importer stores some information of engines in memory. If `tikv-importer` is restarted, `tidb-lightning` will be stopped due to lost connection. At this point, you need to [destroy the failed checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-destroy) as those Importer-specific information is lost. You can restart Lightning afterwards. +No. TiKV Importer stores some information of engines in memory. If `tikv-importer` is restarted, `tidb-lightning` will be stopped due to lost connection. At this point, you need to [destroy the failed checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-destroy) as those TiKV Importer-specific information is lost. You can restart TiDB Lightning afterwards. See also [How to properly restart TiDB Lightning?](#how-to-properly-restart-tidb-lightning) for the correct sequence. @@ -188,7 +188,7 @@ See also [How to properly restart TiDB Lightning?](#how-to-properly-restart-tidb ## Why does TiDB Lightning report the `could not find first pair, this shouldn't happen` error? -This error occurs possibly because the number of files opened by TiDB Lightning exceeds the system limit when TiDB Lightning reads the sorted local files. In the Linux system, you can use the `ulimit -n` command to confirm whether the value of this system limit is too small. It is recommended that you adjust this value to `1000000` (`ulimit -n 1000000`) during TiDB Lightning import. +This error occurs possibly because the number of files opened by TiDB Lightning exceeds the system limit when TiDB Lightning reads the sorted local files. In the Linux system, you can use the `ulimit -n` command to confirm whether the value of this system limit is too small. It is recommended that you adjust this value to `1000000` (`ulimit -n 1000000`) during the import. ## Import speed is too slow @@ -241,7 +241,7 @@ Try the latest version! Maybe there is new speed improvement. 1. Delete the corrupted data using `tidb-lightning-ctl`, and restart TiDB Lightning to import the affected tables again. {{< copyable "shell-regular" >}} - + ```sh tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all ``` @@ -252,7 +252,7 @@ Try the latest version! Maybe there is new speed improvement. ## `Checkpoint for … has invalid status:` (error code) -**Cause**: [Checkpoint](/tidb-lightning/tidb-lightning-checkpoints.md) is enabled, and TiDB Lightning or TiKV Importer has previously abnormally exited. To prevent accidental data corruption, Lightning will not start until the error is addressed. +**Cause**: [Checkpoint](/tidb-lightning/tidb-lightning-checkpoints.md) is enabled, and TiDB Lightning or TiKV Importer has previously abnormally exited. To prevent accidental data corruption, TiDB Lightning will not start until the error is addressed. The error code is an integer smaller than 25, with possible values of 0, 3, 6, 9, 12, 14, 15, 17, 18, 20, and 21. The integer indicates the step where the unexpected exit occurs in the import process. The larger the integer is, the later step the exit occurs at. @@ -278,7 +278,7 @@ See the [Checkpoints control](/tidb-lightning/tidb-lightning-checkpoints.md#chec 2. Decrease the value of `table-concurrency` + `index-concurrency` so it is less than `max-open-engines`. -3. Restart `tikv-importer` to forcefully remove all engine files (default to `./data.import/`). This also removes all partially imported tables, which requires Lightning to clear the outdated checkpoints. +3. Restart `tikv-importer` to forcefully remove all engine files (default to `./data.import/`). This also removes all partially imported tables, which requires TiDB Lightning to clear the outdated checkpoints. ```sh tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all @@ -302,9 +302,9 @@ See the [Checkpoints control](/tidb-lightning/tidb-lightning-checkpoints.md#chec **Solutions**: -1. Ensure Lightning and the source database are using the same time zone. +1. Ensure TiDB Lightning and the source database are using the same time zone. - When executing Lightning directly, the time zone can be forced using the `$TZ` environment variable. + When executing TiDB Lightning directly, the time zone can be forced using the `$TZ` environment variable. ```sh # Manual deployment, and force Asia/Shanghai. diff --git a/tidb-lightning/tidb-lightning-glossary.md b/tidb-lightning/tidb-lightning-glossary.md index 8c5288826eb84..72c7d1ccb45f3 100644 --- a/tidb-lightning/tidb-lightning-glossary.md +++ b/tidb-lightning/tidb-lightning-glossary.md @@ -57,7 +57,7 @@ See also the [FAQs](/tidb-lightning/tidb-lightning-faq.md#checksum-failed-checks A continuous range of source data, normally equivalent to a single file in the data source. -When a file is too large, Lightning may split a file into multiple chunks. +When a file is too large, TiDB Lightning might split a file into multiple chunks. ### Compaction diff --git a/tidb-lightning/tidb-lightning-overview.md b/tidb-lightning/tidb-lightning-overview.md index 9a7d22b2b4e49..3beb6816185ec 100644 --- a/tidb-lightning/tidb-lightning-overview.md +++ b/tidb-lightning/tidb-lightning-overview.md @@ -7,11 +7,16 @@ summary: Learn about Lightning and the whole architecture. [TiDB Lightning](https://github.com/pingcap/tidb-lightning) is a tool used for fast full import of large amounts of data into a TiDB cluster. You can download TiDB Lightning from [here](/download-ecosystem-tools.md#tidb-lightning). -Currently, TiDB Lightning supports reading SQL dump exported via Dumpling or CSV data source. You can use it in the following two scenarios: +Currently, TiDB Lightning can mainly be used in the following two scenarios: - Importing **large amounts** of **new** data **quickly** - Restore all backup data +Currently, TiDB Lightning supports: + +- The data source of the [Dumpling](/dumpling-overview.md), CSV or [Amazon Aurora Parquet](/migrate-from-aurora-using-lightning.md) exported formats. +- Reading data from a local disk or from the Amazon S3 storage. For details, see [External Storages](/br/backup-and-restore-storages.md). + ## TiDB Lightning architecture ![Architecture of TiDB Lightning tool set](/media/tidb-lightning-architecture.png) diff --git a/tidb-troubleshooting-map.md b/tidb-troubleshooting-map.md index 7070a35a85f61..0f0df6cfdec63 100644 --- a/tidb-troubleshooting-map.md +++ b/tidb-troubleshooting-map.md @@ -525,7 +525,7 @@ Check the specific cause for busy by viewing the monitor **Grafana** -> **TiKV** - 6.3.4 `Checkpoint for … has invalid status:(error code)` - - Cause: Checkpoint is enabled, and Lightning/Importer has previously abnormally exited. To prevent accidental data corruption, Lightning will not start until the error is addressed. The error code is an integer less than 25, with possible values as `0, 3, 6, 9, 12, 14, 15, 17, 18, 20 and 21`. The integer indicates the step where the unexpected exit occurs in the import process. The larger the integer is, the later the exit occurs. + - Cause: Checkpoint is enabled, and Lightning/Importer has previously abnormally exited. To prevent accidental data corruption, TiDB Lightning will not start until the error is addressed. The error code is an integer less than 25, with possible values as `0, 3, 6, 9, 12, 14, 15, 17, 18, 20 and 21`. The integer indicates the step where the unexpected exit occurs in the import process. The larger the integer is, the later the exit occurs. - Solution: See [Troubleshooting Solution](/tidb-lightning/tidb-lightning-faq.md#checkpoint-for--has-invalid-status-error-code).