Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit c05d1be
Author: Daniël van Eeden <github@myname.nl>
Date:   Fri Sep 10 09:32:40 2021 +0200

    encryption-at-rest: Update (pingcap#6152)

commit f42e8fd
Author: Morgan Tocker <tocker@gmail.com>
Date:   Fri Sep 10 00:24:39 2021 -0600

    Update system variables for correctness (pingcap#6224)

commit 380d0df
Author: shichun-0415 <89768198+shichun-0415@users.noreply.github.com>
Date:   Fri Sep 10 14:14:39 2021 +0800

    release note: add a check item for feedback-probability (pingcap#6405)

commit b2e70ce
Author: Morgan Tocker <tocker@gmail.com>
Date:   Thu Sep 9 08:14:38 2021 -0600

    basic features: add feature matrix (pingcap#6130)

commit 055dbf2
Author: Liuxiaozhen12 <82579298+Liuxiaozhen12@users.noreply.github.com>
Date:   Thu Sep 9 21:00:39 2021 +0800

    release-notes: add 5.2.1 release notes (pingcap#6438)

commit bea12de
Author: Fendy <40378371+septemberfd@users.noreply.github.com>
Date:   Thu Sep 9 16:04:39 2021 +0800

    Enhance TiDB login descriptions - EN (pingcap#6427)

commit 815ed6f
Author: Yini Xu <34967660+YiniXu9506@users.noreply.github.com>
Date:   Thu Sep 9 15:06:39 2021 +0800

    chore: update ci scripts (pingcap#6429)

commit c666f05
Author: Kolbe Kegel <kolbe@pingcap.com>
Date:   Thu Sep 9 00:04:40 2021 -0700

    performance_schema -> information_schema (pingcap#6418)

commit b8d3b32
Author: Yini Xu <34967660+YiniXu9506@users.noreply.github.com>
Date:   Thu Sep 9 14:50:39 2021 +0800

    chore: fix byte encode (pingcap#6428)

commit 7db1f25
Author: Fendy <40378371+septemberfd@users.noreply.github.com>
Date:   Thu Sep 9 12:06:38 2021 +0800

    add doc links to overview.md (pingcap#6422)

commit efd371e
Author: Morgan Tocker <tocker@gmail.com>
Date:   Wed Sep 8 19:56:38 2021 -0600

    system-variables: improve noop functions warning (pingcap#6374)

commit 5a56bcc
Author: Enwei <jinenwei@pingcap.com>
Date:   Wed Sep 8 10:50:58 2021 +0200

    Configuration Options: remove two TiDB's command options (pingcap#6370)

commit 9e53b40
Author: Enwei <jinenwei@pingcap.com>
Date:   Wed Sep 8 10:48:58 2021 +0200

    BR FAQ: add a warning about multi br importing (pingcap#6263)

commit 391e4bb
Author: you06 <you1474600@gmail.com>
Date:   Wed Sep 8 16:46:58 2021 +0800

    update transaction doc (pingcap#6158)

commit 7c6c1de
Author: Enwei <jinenwei@pingcap.com>
Date:   Wed Sep 8 10:42:59 2021 +0200

    TiKV config: fix wrong description about `compaction-readahead-size` (pingcap#6371)

commit 5f16115
Author: Liuxiaozhen12 <82579298+Liuxiaozhen12@users.noreply.github.com>
Date:   Wed Sep 8 15:34:58 2021 +0800

    Add description for TiKV Ready handled panel (pingcap#6375)
  • Loading branch information
Liuxiaozhen12 committed Sep 13, 2021
1 parent e57a95f commit 1af9e6f
Show file tree
Hide file tree
Showing 36 changed files with 329 additions and 187 deletions.
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,7 @@
+ Release Notes
+ [All Releases](/releases/release-notes.md)
+ v5.2
+ [5.2.1](/releases/release-5.2.1.md)
+ [5.2.0](/releases/release-5.2.0.md)
+ v5.1
+ [5.1.1](/releases/release-5.1.1.md)
Expand Down
219 changes: 144 additions & 75 deletions basic-features.md

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions br/backup-and-restore-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,3 +162,11 @@ BR does not back up statistics (except in v4.0.9). Therefore, after restoring th
In v4.0.9, BR backs up statistics by default, which consumes too much memory. To ensure that the backup process goes well, the backup for statistics is disabled by default starting from v4.0.10.

If you do not execute `ANALYZE` on the table, TiDB will fail to select the optimized execution plan due to inaccurate statistics. If query performance is not a key concern, you can ignore `ANALYZE`.

## Can I use multiple BR processes at the same time to restore the data of a single cluster?

**It is strongly not recommended** to use multiple BR processes at the same time to restore the data of a single cluster for the following reasons:

+ When BR restores data, it modifies some global configurations of PD. Therefore, if you use multiple BR processes for data restore at the same time, these configurations might be mistakenly overwritten and cause abnormal cluster status.
+ BR consumes a lot of cluster resources to restore data, so in fact, running BR processes in parallel improves the restore speed only to a limited extent.
+ There has been no test for running multiple BR processes in parallel for data restore, so it is not guaranteed to succeed.
2 changes: 2 additions & 0 deletions br/use-br-command-line-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,8 @@ To restore the cluster data, use the `br restore` command. You can add the `full
> - Where each peer is scattered to during restore is random. We don't know in advance which node will read which file.
>
> These can be avoided using shared storage, for example mounting an NFS on the local path, or using S3. With network storage, every node can automatically read every SST file, so these caveats no longer apply.
>
> Also, note that you can only run one restore operation for a single cluster at the same time. Otherwise, unexpected behaviors might occur. For details, see [FAQ](/br/backup-and-restore-faq.md#can-i-use-multiple-br-processes-at-the-same-time-to-restore-the-data-of-a-single-cluster).
### Restore all the backup data
Expand Down
11 changes: 0 additions & 11 deletions command-line-flags-for-tidb-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,6 @@ When you start the TiDB cluster, you can use command-line options or environment
- Default: ""
- This address must be accessible by the rest of the TiDB cluster and the user.

## `--binlog-socket`

- The TiDB services use the unix socket file for internal connections, such as the Pump service
- Default: ""
- You can use "/tmp/pump.sock" to accept the communication of Pump unix socket file.

## `--config`

- The configuration file
Expand Down Expand Up @@ -103,11 +97,6 @@ When you start the TiDB cluster, you can use command-line options or environment
- Default: "/tmp/tidb"
- You can use `tidb-server --store=unistore --path=""` to enable a pure in-memory TiDB.

## `--tmp-storage-path`

+ TiDB's temporary storage path
+ Default: `<TMPDIR>/tidb/tmp-storage`

## `--proxy-protocol-networks`

- The list of proxy server's IP addresses allowed to connect to TiDB using the [PROXY protocol](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt).
Expand Down
2 changes: 1 addition & 1 deletion dashboard/dashboard-access.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ You can use TiDB Dashboard in the following common desktop browsers of a relativ
## Sign in

For the first-time access, TiDB Dashboard displays the user sign in interface, as shown in the image below. You can sign in using the TiDB `root` account.
For the first-time access, TiDB Dashboard displays the user sign in interface, as shown in the image below. You can sign in using the TiDB `root` account. By default, the `root` password is empty.

![Login interface](/media/dashboard/dashboard-access-login.png)

Expand Down
8 changes: 4 additions & 4 deletions download-ecosystem-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ If you want to download the latest version of [TiDB Binlog](/tidb-binlog/tidb-bi

> **Note:**
>
> `{version}` in the above download link indicates the version number of TiDB. For example, the download link for `v5.2.0` is `https://download.pingcap.org/tidb-v5.2.0-linux-amd64.tar.gz`.
> `{version}` in the above download link indicates the version number of TiDB. For example, the download link for `v5.2.1` is `https://download.pingcap.org/tidb-v5.2.1-linux-amd64.tar.gz`.
## TiDB Lightning

Expand All @@ -30,7 +30,7 @@ Download [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) by using t

> **Note:**
>
> `{version}` in the above download link indicates the version number of TiDB Lightning. For example, the download link for `v5.2.0` is `https://download.pingcap.org/tidb-toolkit-v5.2.0-linux-amd64.tar.gz`.
> `{version}` in the above download link indicates the version number of TiDB Lightning. For example, the download link for `v5.2.1` is `https://download.pingcap.org/tidb-toolkit-v5.2.1-linux-amd64.tar.gz`.
## BR (backup and restore)

Expand All @@ -42,7 +42,7 @@ Download [BR](/br/backup-and-restore-tool.md) by using the download link in the

> **Note:**
>
> `{version}` in the above download link indicates the version number of BR. For example, the download link for `v5.0.0-beta` is `http://download.pingcap.org/tidb-toolkit-v5.0.0-beta-linux-amd64.tar.gz`.
> `{version}` in the above download link indicates the version number of BR. For example, the download link for `v5.2.1` is `https://download.pingcap.org/tidb-toolkit-v5.2.1-linux-amd64.tar.gz`.
## TiDB DM (Data Migration)

Expand All @@ -66,7 +66,7 @@ Download [Dumpling](/dumpling-overview.md) from the links below:

> **Note:**
>
> The `{version}` in the download link is the version number of Dumpling. For example, the link for downloading the `v5.2.0` version of Dumpling is `https://download.pingcap.org/tidb-toolkit-v5.2.0-linux-amd64.tar.gz`. You can view the currently released versions in [Dumpling Releases](https://github.com/pingcap/dumpling/releases).
> The `{version}` in the download link is the version number of Dumpling. For example, the link for downloading the `v5.2.1` version of Dumpling is `https://download.pingcap.org/tidb-toolkit-v5.2.1-linux-amd64.tar.gz`. You can view the currently released versions in [Dumpling Releases](https://github.com/pingcap/dumpling/releases).
>
> Dumpling supports arm64 linux. You can replace `amd64` in the download link with `arm64`, which means the `arm64` version of Dumpling.
Expand Down
82 changes: 60 additions & 22 deletions encryption-at-rest.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,47 @@ summary: Learn how to enable encryption at rest to protect sensitive data.
aliases: ['/docs/dev/encryption at rest/']
---

# Encryption at Rest <span class="version-mark">New in v4.0.0</span>
# Encryption at Rest

> **Note:**
>
> If your cluster is deployed on AWS and uses the EBS storage, it is recommended to use the EBS encryption. See [AWS documentation - EBS Encryption](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html). You are using the non-EBS storage on AWS such as the local NVMe storage, it is recommended to use encryption at rest introduced in this document.
Encryption at rest means that data is encrypted when it is stored. For databases, this feature is also referred to as TDE (transparent data encryption). This is opposed to encryption in flight (TLS) or encryption in use (rarely used). Different things could be doing encryption at rest (SSD drive, file system, cloud vendor, etc), but by having TiKV do the encryption before storage this helps ensure that attackers must authenticate with the database to gain access to data. For example, when an attacker gains access to the physical machine, data cannot be accessed by copying files on disk.

TiKV supports encryption at rest starting from v4.0.0. The feature allows TiKV to transparently encrypt data files using [AES](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) in [CTR](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation) mode. To enable encryption at rest, an encryption key must be provided by user and this key is called master key. The master key can be provided via AWS KMS (recommended), or specifying a key stored as plaintext in a file. TiKV automatically rotates data keys that it used to encrypt actual data files. Manually rotating the master key can be done occasionally. Note that encryption at rest only encrypts data at rest (namely, on disk) and not while data is transferred over network. It is advised to use TLS together with encryption at rest.
## Encryption support in different TiDB components

In a TiDB cluster, different components use different encryption methods. This section introduces the encryption supports in different TiDB components such as TiKV, TiFlash, PD, and Backup & Restore (BR).

When a TiDB cluster is deployed, the majority of user data is stored on TiKV and TiFlash nodes. Some metadata is stored on PD nodes (for example, secondary index keys used as TiKV Region boundaries). To get the full benefits of encryption at rest, you need to enable encryption for all components. Backups, log files, and data transmitted over the network should also be considered when you implement encryption.

### TiKV

TiKV supports encryption at rest. This feature allows TiKV to transparently encrypt data files using [AES](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) in [CTR](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation) mode. To enable encryption at rest, an encryption key must be provided by the user and this key is called master key. TiKV automatically rotates data keys that it used to encrypt actual data files. Manually rotating the master key can be done occasionally. Note that encryption at rest only encrypts data at rest (namely, on disk) and not while data is transferred over network. It is advised to use TLS together with encryption at rest.

Optionally, you can use AWS KMS for both cloud and on-premises deployments. You can also supply the plaintext master key in a file.

TiKV currently does not exclude encryption keys and user data from core dumps. It is advised to disable core dumps for the TiKV process when using encryption at rest. This is not currently handled by TiKV itself.

TiKV tracks encrypted data files using the absolute path of the files. As a result, once encryption is turned on for a TiKV node, the user should not change data file paths configuration such as `storage.data-dir`, `raftstore.raftdb-path`, `rocksdb.wal-dir` and `raftdb.wal-dir`.

### TiFlash

TiFlash supports encryption at rest. Data keys are generated by TiFlash. All files (including data files, schema files, and temporary files) written into TiFlash (including TiFlash Proxy) are encrypted using the current data key. The encryption algorithms, the encryption configuration (in the `tiflash-learner.toml` file) supported by TiFlash, and the meanings of monitoring metrics are consistent with those of TiKV.

If you have deployed TiFlash with Grafana, you can check the **TiFlash-Proxy-Details** -> **Encryption** panel.

### PD

Also from v4.0.0, BR supports S3 server-side encryption (SSE) when backing up to S3. A customer owned AWS KMS key can also be used together with S3 server-side encryption.
Encryption-at-rest for PD is an experimental feature, which is configured in the same way as in TiKV.

## Warnings
### Backups with BR

The current version of TiKV encryption has the following drawbacks. Be aware of these drawbacks before you get started:
BR supports S3 server-side encryption (SSE) when backing up data to S3. A customer-owned AWS KMS key can also be used together with S3 server-side encryption. See [BR S3 server-side encryption](/encryption-at-rest.md#br-s3-server-side-encryption) for details.

* When a TiDB cluster is deployed, the majority of user data is stored in TiKV nodes, and that data will be encrypted when encryption is enabled. However, a small amount of user data is stored in PD nodes as metadata (for example, secondary index keys used as TiKV region boundaries). As of v4.0.0, PD doesn't support encryption at rest. It is recommended to use storage-level encryption (for example, file system encryption) to help protect sensitive data stored in PD.
* TiFlash supports encryption at rest since v4.0.5. For details, refer to [Encryption at Rest for TiFlash](#encryption-at-rest-for-tiflash-new-in-v405). When deploying TiKV with TiFlash earlier than v4.0.5, data stored in TiFlash is not encrypted.
* TiKV currently does not exclude encryption keys and user data from core dumps. It is advised to disable core dumps for the TiKV process when using encryption at rest. This is not currently handled by TiKV itself.
* TiKV tracks encrypted data files using the absolute path of the files. As a result, once encryption is turned on for a TiKV node, the user should not change data file paths configuration such as `storage.data-dir`, `raftstore.raftdb-path`, `rocksdb.wal-dir` and `raftdb.wal-dir`.
* TiKV, TiDB, and PD info logs might contain user data for debugging purposes. The info log and this data in it are not encrypted. It is recommended to enable [log redaction](/log-redaction.md).
### Logging

TiKV, TiDB, and PD info logs might contain user data for debugging purposes. The info log and this data in it are not encrypted. It is recommended to enable [log redaction](/log-redaction.md).

## TiKV encryption at rest

Expand All @@ -29,24 +53,42 @@ The current version of TiKV encryption has the following drawbacks. Be aware of
TiKV currently supports encrypting data using AES128, AES192 or AES256, in CTR mode. TiKV uses envelope encryption. As a result, two types of keys are used in TiKV when encryption is enabled.

* Master key. The master key is provided by user and is used to encrypt the data keys TiKV generates. Management of master key is external to TiKV.
* Data key. The data key is generated by TiKV and is the key actually used to encrypt data. The data key is automatically rotated by TiKV.
* Data key. The data key is generated by TiKV and is the key actually used to encrypt data.

The same master key can be shared by multiple instances of TiKV. The recommended way to provide a master key in production is via AWS KMS. Create a customer master key (CMK) through AWS KMS, and then provide the CMK key ID to TiKV in the configuration file. The TiKV process needs access to the KMS CMK while it is running, which can be done by using an [IAM role](https://aws.amazon.com/iam/). If TiKV fails to get access to the KMS CMK, it will fail to start or restart. Refer to AWS documentation for [KMS](https://docs.aws.amazon.com/kms/index.html) and [IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) usage.

Alternatively, if using custom key is desired, supplying the master key via file is also supported. The file must contain a 256 bits (or 32 bytes) key encoded as hex string, end with a newline (namely, `\n`), and contain nothing else. Persisting the key on disk, however, leaks the key, so the key file is only suitable to be stored on the `tempfs` in RAM.

Data keys are generated by TiKV and passed to the underlying storage engine (namely, RocksDB). All files written by RocksDB, including SST files, WAL files, and the MANIFEST file, are encrypted by the current data key. Other temporary files used by TiKV that may include user data are also encrypted using the same data key. Data keys are automatically rotated by TiKV every week by default, but the period is configurable. On key rotation, TiKV does not rewrite all existing files to replace the key, but RocksDB compaction are expected to rewrite old data into new data files, with the most recent data key, if the cluster gets constant write workload. TiKV keeps track of the key and encryption method used to encrypt each of the files and use the information to decrypt the content on reads.
Data keys are passed to the underlying storage engine (namely, RocksDB). All files written by RocksDB, including SST files, WAL files, and the MANIFEST file, are encrypted by the current data key. Other temporary files used by TiKV that may include user data are also encrypted using the same data key. Data keys are automatically rotated by TiKV every week by default, but the period is configurable. On key rotation, TiKV does not rewrite all existing files to replace the key, but RocksDB compaction are expected to rewrite old data into new data files, with the most recent data key, if the cluster gets constant write workload. TiKV keeps track of the key and encryption method used to encrypt each of the files and use the information to decrypt the content on reads.

Regardless of data encryption method, data keys are encrypted using AES256 in GCM mode for additional authentication. This required the master key to be 256 bits (32 bytes), when passing from file instead of KMS.

### Key creation

To create a key on AWS, follow these steps:

1. Go to the [AWS KMS](https://console.aws.amazon.com/kms) on the AWS console.
2. Make sure that you have selected the correct region on the top right corner of your console.
3. Click **Create key** and select **Symmetric** as the key type.
4. Set an alias for the key.

You can also perform the operations using the AWS CLI:

```shell
aws --region us-west-2 kms create-key
aws --region us-west-2 kms create-alias --alias-name "alias/tidb-tde" --target-key-id 0987dcba-09fe-87dc-65ba-ab0987654321
```

The `--target-key-id` to enter in the second command is in the output of the first command.

### Configure encryption

To enable encryption, you can add the encryption section in TiKV's configuration file:
To enable encryption, you can add the encryption section in the configuration files of TiKV and PD:

```
[security.encryption]
data-encryption-method = aes128-ctr
data-key-rotation-period = 7d
data-encryption-method = "aes128-ctr"
data-key-rotation-period = "168h" # 7 days
```

Possible values for `data-encryption-method` are "aes128-ctr", "aes192-ctr", "aes256-ctr" and "plaintext". The default value is "plaintext", which means encryption is not turned on. `data-key-rotation-period` defines how often TiKV rotates the data key. Encryption can be turned on for a fresh TiKV cluster, or an existing TiKV cluster, though only data written after encryption is enabled is guaranteed to be encrypted. To disable encryption, remove `data-encryption-method` in the configuration file, or reset it to "plaintext", and restart TiKV. To change encryption method, update `data-encryption-method` in the configuration file and restart TiKV.
Expand All @@ -61,7 +103,9 @@ region = "us-west-2"
endpoint = "https://kms.us-west-2.amazonaws.com"
```

The `key-id` specifies the key id for the KMS CMK. The `region` is the AWS region name for the KMS CMK. The `endpoint` is optional and doesn't need to be specified normally, unless you are using a AWS KMS compatible service from a non-AWS vendor.
The `key-id` specifies the key ID for the KMS CMK. The `region` is the AWS region name for the KMS CMK. The `endpoint` is optional and you do not need to specify it normally unless you are using an AWS KMS-compatible service from a non-AWS vendor or need to use a [VPC endpoint for KMS](https://docs.aws.amazon.com/kms/latest/developerguide/kms-vpc-endpoint.html).

You can also use [multi-Region keys](https://docs.aws.amazon.com/kms/latest/developerguide/multi-region-keys-overview.html) in AWS. For this, you need to set up a primary key in a specific region and add replica keys in the regions you require.

To specify a master key that's stored in a file, the master key configuration would look like the following:

Expand Down Expand Up @@ -141,9 +185,3 @@ When restoring the backup, both `--s3.sse` and `--s3.sse-kms-key-id` should NOT
```
./br restore full --pd <pd-address> --storage "s3://<bucket>/<prefix> --s3.region <region>"
```

## Encryption at rest for TiFlash <span class="version-mark">New in v4.0.5</span>

TiFlash supports encryption at rest since v4.0.5. Data keys are generated by TiFlash. All files (including data files, schema files, and temporary files) written into TiFlash (including TiFlash Proxy) are encrypted using the current data key. The encryption algorithms, the encryption configuration (in the `tiflash-learner.toml` file) supported by TiFlash, and the meanings of monitoring metrics are consistent with those of TiKV.

If you have deployed TiFlash with Grafana, you can check the **TiFlash-Proxy-Details** -> **Encryption** panel.
Loading

0 comments on commit 1af9e6f

Please sign in to comment.