Skip to content

Commit

Permalink
*: generalize and link to the external storage docs from Lightning (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
ti-srebot authored Apr 13, 2021
1 parent c2377e1 commit 4f52d6f
Show file tree
Hide file tree
Showing 15 changed files with 136 additions and 67 deletions.
2 changes: 1 addition & 1 deletion TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@
+ [BR Tool Overview](/br/backup-and-restore-tool.md)
+ [Use BR Command-line for Backup and Restoration](/br/backup-and-restore-tool.md)
+ [BR Use Cases](/br/backup-and-restore-use-cases.md)
+ [BR Storages](/br/backup-and-restore-storages.md)
+ [External Storages](/br/backup-and-restore-storages.md)
+ [BR FAQ](/br/backup-and-restore-faq.md)
+ TiDB Binlog
+ [Overview](/tidb-binlog/tidb-binlog-overview.md)
Expand Down
109 changes: 87 additions & 22 deletions br/backup-and-restore-storages.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
title: BR Storages
summary: Describes the storage URL format used in BR.
title: External Storages
summary: Describes the storage URL format used in BR, TiDB Lightning, and Dumpling.
---

# BR Storages
# External Storages

BR supports reading and writing data on the local filesystem, as well as on Amazon S3 and Google Cloud Storage. These are distinguished by the URL scheme in the `--storage` parameter passed into BR.
Backup & Restore (BR), TiDB Lighting, and Dumpling support reading and writing data on the local filesystem and on Amazon S3. BR also supports reading and writing data on the Google Cloud Storage (GCS). These are distinguished by the URL scheme in the `--storage` parameter passed into BR, in the `-d` parameter passed into TiDB Lightning, and in the `--output` (`-o`) parameter passed into Dumpling.

## Schemes

Expand All @@ -18,19 +18,40 @@ The following services are supported:
| Google Cloud Storage (GCS) | gcs, gs | `gcs://bucket-name/prefix/of/dest/` |
| Write to nowhere (for benchmarking only) | noop | `noop://` |

## Parameters
## URL parameters

Cloud storages such as S3 and GCS sometimes require additional configuration for connection. You can specify parameters for such configuration. For example:

{{< copyable "shell-regular" >}}
+ Use Dumpling to export data to S3:

```shell
./br backup full -u 127.0.0.1:2379 -s 's3://bucket-name/prefix?region=us-west-2'
```
{{< copyable "shell-regular" >}}

```bash
./dumpling -u root -h 127.0.0.1 -P 3306 -B mydb -F 256MiB \
-o 's3://my-bucket/sql-backup?region=us-west-2'
```

+ Use TiDB Lightning to import data from S3:

{{< copyable "shell-regular" >}}

```bash
./tidb-lightning --tidb-port=4000 --pd-urls=127.0.0.1:2379 --backend=local --sorted-kv-dir=/tmp/sorted-kvs \
-d 's3://my-bucket/sql-backup?region=us-west-2'
```

+ Use BR to back up data to GCS:

### S3 parameters
{{< copyable "shell-regular" >}}

| Parameter | Description |
```bash
./br backup full -u 127.0.0.1:2379 \
-s 'gcs://bucket-name/prefix'
```

### S3 URL parameters

| URL parameter | Description |
|----------:|---------|
| `access-key` | The access key |
| `secret-access-key` | The secret access key |
Expand All @@ -45,37 +66,81 @@ Cloud storages such as S3 and GCS sometimes require additional configuration for

> **Note:**
>
> It is not recommended to pass in the access key and secret access key directly in the storage URL, because these keys are logged in plain text. BR tries to infer these keys from the environment in the following order:
> It is not recommended to pass in the access key and secret access key directly in the storage URL, because these keys are logged in plain text. The migration tools try to infer these keys from the environment in the following order:

1. `$AWS_ACCESS_KEY_ID` and `$AWS_SECRET_ACCESS_KEY` environment variables
2. `$AWS_ACCESS_KEY` and `$AWS_SECRET_KEY` environment variables
3. Shared credentials file on the BR node at the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable
4. Shared credentials file on the BR node at `~/.aws/credentials`
3. Shared credentials file on the tool node at the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable
4. Shared credentials file on the tool node at `~/.aws/credentials`
5. Current IAM role of the Amazon EC2 container
6. Current IAM role of the Amazon ECS task

### GCS parameters
### GCS URL parameters

| Parameter | Description |
| URL parameter | Description |
|----------:|---------|
| `credentials-file` | The path to the credentials JSON file on the TiDB node |
| `credentials-file` | The path to the credentials JSON file on the tool node |
| `storage-class` | Storage class of the uploaded objects (for example, `STANDARD`, `COLDLINE`) |
| `predefined-acl` | Predefined ACL of the uploaded objects (for example, `private`, `project-private`) |

When `credentials-file` is not specified, BR will try to infer the credentials from the environment, in the following order:
When `credentials-file` is not specified, the migration tool will try to infer the credentials from the environment, in the following order:

1. Content of the file on the BR node at the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable
2. Content of the file on the BR node at `~/.config/gcloud/application_default_credentials.json`
1. Content of the file on the tool node at the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable
2. Content of the file on the tool node at `~/.config/gcloud/application_default_credentials.json`
3. When running in GCE or GAE, the credentials fetched from the metadata server.

## Sending credentials to TiKV
## Command-line parameters

In addition to the URL parameters, BR and Dumpling also support specifying these configurations using command-line parameters. For example:

{{< copyable "shell-regular" >}}

```bash
./dumpling -u root -h 127.0.0.1 -P 3306 -B mydb -F 256MiB \
-o 's3://my-bucket/sql-backup' \
--s3.region 'us-west-2'
```

If you have specified URL parameters and command-line parameters at the same time, the URL parameters are overwritten by the command-line parameters.

### S3 command-line parameters

| Command-line parameter | Description |
|----------:|------|
| `--s3.region` | Amazon S3's service region, which defaults to `us-east-1`. |
| `--s3.endpoint` | The URL of custom endpoint for S3-compatible services. For example, `https://s3.example.com/`. |
| `--s3.storage-class` | The storage class of the upload object. For example, `STANDARD` and `STANDARD_IA`. |
| `--s3.sse` | The server-side encryption algorithm used to encrypt the upload. The value options are empty, `AES256` and `aws:kms`. |
| `--s3.sse-kms-key-id` | If `--s3.sse` is configured as `aws:kms`, this parameter is used to specify the KMS ID. |
| `--s3.acl` | The canned ACL of the upload object. For example, `private` and `authenticated-read`. |
| `--s3.provider` | The type of the S3-compatible service. The supported types are `aws`, `alibaba`, `ceph`, `netease` and `other`. |
### GCS command-line parameters
| Command-line parameter | Description |
|----------:|---------|
| `--gcs.credentials-file` | The path of the JSON-formatted credential on the tool node. |
| `--gcs.storage-class` | The storage type of the upload object, such as `STANDARD` and `COLDLINE`. |
| `--gcs.predefined-acl` | The pre-defined ACL of the upload object, such as `private` and `project-private`. |
## BR sending credentials to TiKV
By default, when using S3 and GCS destinations, BR will send the credentials to every TiKV nodes to reduce setup complexity.
However, this is unsuitable on cloud environment, where every node has their own role and permission. In such cases, you need to disable credentials sending with `--send-credentials-to-tikv=false` (or the short form `-c=0`):
{{< copyable "shell-regular" >}}
```shell
```bash
./br backup full -c=0 -u pd-service:2379 -s 's3://bucket-name/prefix'
```
When using SQL statements to [back up](/sql-statements/sql-statement-backup.md) and [restore](/sql-statements/sql-statement-restore.md) data, you can add the `SEND_CREDENTIALS_TO_TIKV = FALSE` option:
{{< copyable "sql" >}}
```sql
BACKUP DATABASE * TO 's3://bucket-name/prefix' SEND_CREDENTIALS_TO_TIKV = FALSE;
```
This option is not supported in TiDB Lightning and Dumpling, because the two applications are currently standalone.
4 changes: 2 additions & 2 deletions br/backup-and-restore-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ In the Kubernetes environment, you can use the BR tool to back up TiDB cluster d

> **Note:**
>
> For Amazon S3 and Google Cloud Storage parameter descriptions, see the [BR Storages](/br/backup-and-restore-storages.md) document.
> For Amazon S3 and Google Cloud Storage parameter descriptions, see the [External Storages](/br/backup-and-restore-storages.md#url-parameters) document.
- [Back up Data to S3-Compatible Storage Using BR](https://docs.pingcap.com/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br)
- [Restore Data from S3-Compatible Storage Using BR](https://docs.pingcap.com/tidb-in-kubernetes/stable/restore-from-aws-s3-using-br)
Expand All @@ -178,4 +178,4 @@ In the Kubernetes environment, you can use the BR tool to back up TiDB cluster d
- [Use BR Command-line](/br/use-br-command-line-tool.md)
- [BR Use Cases](/br/backup-and-restore-use-cases.md)
- [BR FAQ](/br/backup-and-restore-faq.md)
- [BR Storages](/br/backup-and-restore-storages.md)
- [External Storages](/br/backup-and-restore-storages.md)
6 changes: 3 additions & 3 deletions dumpling-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey}
export AWS_SECRET_ACCESS_KEY=${SecretKey}
```
Dumpling also supports reading credential files from `~/.aws/credentials`. For more Dumpling configuration, see the configuration of [BR storages](/br/backup-and-restore-storages.md), which is consistent with the Dumpling configuration.
Dumpling also supports reading credential files from `~/.aws/credentials`. For more Dumpling configuration, see the configuration of [External storages](/br/backup-and-restore-storages.md).
When you back up data using Dumpling, explicitly specify the `--s3.region` parameter, which means the region of the S3 storage:
Expand Down Expand Up @@ -317,7 +317,7 @@ After your operation is completed, set the GC time back (the default value is `1
SET GLOBAL tidb_gc_life_time = '10m';
```

Finally, all the exported data can be imported back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-backends.md).
Finally, all the exported data can be imported back to TiDB using [TiDB Lightning](/tidb-lightning/tidb-lightning-backends.md).

## Option list of Dumpling

Expand All @@ -341,7 +341,7 @@ Finally, all the exported data can be imported back to TiDB using [Lightning](/t
| `-s` or `--statement-size` | Control the size of the `INSERT` statements; the unit is bytes |
| `-F` or `--filesize` | The file size of the divided tables. The unit must be specified such as `128B`, `64KiB`, `32MiB`, and `1.5GiB`. |
| `--filetype` | Exported file type (csv/sql) | "sql" |
| `-o` or `--output` | Exported file path | "./export-${time}" |
| `-o` or `--output` | The path of exported local files or [the URL of the external storage](/br/backup-and-restore-storages.md) | "./export-${time}" |
| `-S` or `--sql` | Export data according to the specified SQL statement. This command does not support concurrent export. |
| `--consistency` | flush: use FTWRL before the dump <br/> snapshot: dump the TiDB data of a specific snapshot of a TSO <br/> lock: execute `lock tables read` on all tables to be dumped <br/> none: dump without adding locks, which cannot guarantee consistency <br/> auto: use --consistency flush for MySQL; use --consistency snapshot for TiDB | "auto" |
| `--snapshot` | Snapshot TSO; valid only when `consistency=snapshot` |
Expand Down
2 changes: 1 addition & 1 deletion faq/migration-tidb-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,5 +123,5 @@ If the amount of data that needs to be deleted at a time is very large, this loo
### How to improve the data loading speed in TiDB?
- The [Lightning](/tidb-lightning/tidb-lightning-overview.md) tool is developed for distributed data import. It should be noted that the data import process does not perform a complete transaction process for performance reasons. Therefore, the ACID constraint of the data being imported during the import process cannot be guaranteed. The ACID constraint of the imported data can only be guaranteed after the entire import process ends. Therefore, the applicable scenarios mainly include importing new data (such as a new table or a new index) or the full backup and restoring (truncate the original table and then import data).
- The [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) tool is developed for distributed data import. It should be noted that the data import process does not perform a complete transaction process for performance reasons. Therefore, the ACID constraint of the data being imported during the import process cannot be guaranteed. The ACID constraint of the imported data can only be guaranteed after the entire import process ends. Therefore, the applicable scenarios mainly include importing new data (such as a new table or a new index) or the full backup and restoring (truncate the original table and then import data).
- Data loading in TiDB is related to the status of disks and the whole cluster. When loading data, pay attention to metrics like the disk usage rate of the host, TiClient Error, Backoff, Thread CPU and so on. You can analyze the bottlenecks using these metrics.
4 changes: 2 additions & 2 deletions sql-statements/sql-statement-backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ BACKUP DATABASE * TO 'local:///mnt/backup/full/';

Note that the system tables (`mysql.*`, `INFORMATION_SCHEMA.*`, `PERFORMANCE_SCHEMA.*`, …) will not be included into the backup.

### Remote destinations
### External storages

BR supports backing up data to S3 or GCS:

Expand All @@ -107,7 +107,7 @@ BR supports backing up data to S3 or GCS:
BACKUP DATABASE `test` TO 's3://example-bucket-2020/backup-05/?region=us-west-2&access-key={YOUR_ACCESS_KEY}&secret-access-key={YOUR_SECRET_KEY}';
```

The URL syntax is further explained in [BR storages](/br/backup-and-restore-storages.md).
The URL syntax is further explained in [External Storages](/br/backup-and-restore-storages.md).

When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`:

Expand Down
4 changes: 2 additions & 2 deletions sql-statements/sql-statement-restore.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ RESTORE DATABASE `test` FROM 'local:///mnt/backup/2020/04/';
RESTORE TABLE `test`.`sbtest01`, `test`.`sbtest02` FROM 'local:///mnt/backup/2020/04/';
```

### Remote destinations
### External storages

BR supports restoring data from S3 or GCS:

Expand All @@ -98,7 +98,7 @@ BR supports restoring data from S3 or GCS:
RESTORE DATABASE * FROM 's3://example-bucket-2020/backup-05/?region=us-west-2';
```

The URL syntax is further explained in [BR storages](/br/backup-and-restore-storages.md).
The URL syntax is further explained in [External Storages](/br/backup-and-restore-storages.md).

When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`:

Expand Down
4 changes: 2 additions & 2 deletions table-filter.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Table filters can be applied to the tools using multiple `-f` or `--filter` comm
# ^~~~~~~~~~~~~~~~~~~~~~~
```

* [Lightning](/tidb-lightning/tidb-lightning-overview.md):
* [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md):

{{< copyable "shell-regular" >}}

Expand All @@ -48,7 +48,7 @@ Table filters can be applied to the tools using multiple `-f` or `--filter` comm

Table filters in TOML files are specified as [array of strings](https://toml.io/en/v1.0.0-rc.1#section-15). The following lists the example usage in each tool.

* Lightning:
* TiDB Lightning:

```toml
[mydumper]
Expand Down
8 changes: 4 additions & 4 deletions tidb-lightning/monitor-tidb-lightning.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ When you [deploy a TiDB cluster using TiUP](/production-deployment-using-tiup.md
| Panel | Series | Description |
|:-----|:-----|:-----|
| Import speed | write from lightning | Speed of sending KVs from TiDB Lightning to TiKV Importer, which depends on each table's complexity |
| Import speed | write from TiDB Lightning | Speed of sending KVs from TiDB Lightning to TiKV Importer, which depends on each table's complexity |
| Import speed | upload to tikv | Total upload speed from TiKV Importer to all TiKV replicas |
| Chunk process duration | | Average time needed to completely encode one single data file |
Expand All @@ -76,7 +76,7 @@ Sometimes the import speed will drop to zero allowing other parts to catch up. T
| Panel | Description |
|:-----|:-----|
| Memory usage | Amount of memory occupied by each service |
| Number of Lightning Goroutines | Number of running goroutines used by TiDB Lightning |
| Number of TiDB Lightning Goroutines | Number of running goroutines used by TiDB Lightning |
| CPU% | Number of logical CPU cores utilized by each service |
### Row 4: Quota
Expand Down Expand Up @@ -162,11 +162,11 @@ Metrics provided by `tikv-importer` are listed under the namespace `tikv_import_

- **`tikv_import_write_chunk_bytes`** (Histogram)

Bucketed histogram for the uncompressed size of a block of KV pairs received from Lightning.
Bucketed histogram for the uncompressed size of a block of KV pairs received from TiDB Lightning.

- **`tikv_import_write_chunk_duration`** (Histogram)

Bucketed histogram for the time needed to receive a block of KV pairs from Lightning.
Bucketed histogram for the time needed to receive a block of KV pairs from TiDB Lightning.

- **`tikv_import_upload_chunk_bytes`** (Histogram)

Expand Down
Loading

0 comments on commit 4f52d6f

Please sign in to comment.