Merge branch 'pingcap:master' into quick-start-tidb-refresh

pingcap · Nov 19, 2024 · 690e280 · 690e280
2 parents 927038f + d3738c3
commit 690e280
Show file tree

Hide file tree

Showing 19 changed files with 174 additions and 92 deletions.
diff --git a/README.md b/README.md
@@ -63,6 +63,13 @@ Currently, we maintain the following versions of TiDB documentation in different
 
 See [TiDB Documentation Contributing Guide](/CONTRIBUTING.md) to become a contributor! 🤓
 
+<a href="https://next.ossinsight.io/widgets/official/compose-recent-active-contributors?repo_id=63995402&limit=30" target="_blank" style="display: block;" align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="https://next.ossinsight.io/widgets/official/compose-recent-active-contributors/thumbnail.png?repo_id=63995402&limit=30&image_size=auto&color_scheme=dark" width="655" height="auto" />
+    <img alt="Active Contributors of pingcap/docs - Last 28 days" src="https://next.ossinsight.io/widgets/official/compose-recent-active-contributors/thumbnail.png?repo_id=63995402&limit=30&image_size=auto&color_scheme=light" width="655" height="auto" />
+  </picture>
+</a>
+
 ## License
 
 All documentation starting from TiDB v7.0 is available under the terms of [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/).
diff --git a/TOC.md b/TOC.md
@@ -157,6 +157,7 @@
     - [Migrate Large Datasets from MySQL](/migrate-large-mysql-to-tidb.md)
     - [Migrate and Merge MySQL Shards of Small Datasets](/migrate-small-mysql-shards-to-tidb.md)
     - [Migrate and Merge MySQL Shards of Large Datasets](/migrate-large-mysql-shards-to-tidb.md)
+    - [Migrate from Vitess](/migrate-from-vitess.md)
     - [Migrate from MariaDB](/migrate-from-mariadb.md)
     - [Migrate from CSV Files](/migrate-from-csv-files-to-tidb.md)
     - [Migrate from SQL Files](/migrate-from-sql-files-to-tidb.md)
@@ -168,11 +169,52 @@
     - [Migrate to a Downstream Table with More Columns](/migrate-with-more-columns-downstream.md)
     - [Filter Binlog Events](/filter-binlog-event.md)
     - [Filter DML Events Using SQL Expressions](/filter-dml-event.md)
-- Integrate
-  - [Overview](/integration-overview.md)
+- Stream Data
+  - [TiCDC Overview](/ticdc/ticdc-overview.md)
+  - [Deploy and Maintain](/ticdc/deploy-ticdc.md)
+  - Changefeed
+    - [Overview](/ticdc/ticdc-changefeed-overview.md)
+    - Create Changefeeds
+      - [Replicate Data to MySQL-compatible Databases](/ticdc/ticdc-sink-to-mysql.md)
+      - [Replicate Data to Kafka](/ticdc/ticdc-sink-to-kafka.md)
+      - [Replicate Data to Pulsar](/ticdc/ticdc-sink-to-pulsar.md)
+      - [Replicate Data to Storage Services](/ticdc/ticdc-sink-to-cloud-storage.md)
+    - [Manage Changefeeds](/ticdc/ticdc-manage-changefeed.md)
+    - [Log Filter](/ticdc/ticdc-filter.md)
+    - [DDL Replication](/ticdc/ticdc-ddl.md)
+    - [Bidirectional Replication](/ticdc/ticdc-bidirectional-replication.md)
+  - Monitor and Alert
+    - [Monitoring Metrics Summary](/ticdc/ticdc-summary-monitor.md)
+    - [Monitoring Metrics Details](/ticdc/monitor-ticdc.md)
+    - [Alert Rules](/ticdc/ticdc-alert-rules.md)
   - Integration Scenarios
+    - [Overview](/integration-overview.md)
     - [Integrate with Confluent and Snowflake](/ticdc/integrate-confluent-using-ticdc.md)
     - [Integrate with Apache Kafka and Apache Flink](/replicate-data-to-kafka.md)
+  - Reference
+    - [TiCDC Architecture](/ticdc/ticdc-architecture.md)
+    - [TiCDC Server Configurations](/ticdc/ticdc-server-config.md)
+    - [TiCDC Changefeed Configurations](/ticdc/ticdc-changefeed-config.md)
+    - [TiCDC Client Authentication](/ticdc/ticdc-client-authentication.md)
+    - [Data Integrity Validation for Single-Row Data](/ticdc/ticdc-integrity-check.md)
+    - [Data Consistency Validation for Upstream and Downstream TiDB Clusters](/ticdc/ticdc-upstream-downstream-check.md)
+    - [TiCDC Behavior in Splitting UPDATE Events](/ticdc/ticdc-split-update-behavior.md)
+    - Output Protocols
+      - [TiCDC Avro Protocol](/ticdc/ticdc-avro-protocol.md)
+      - [TiCDC Canal-JSON Protocol](/ticdc/ticdc-canal-json.md)
+      - [TiCDC CSV Protocol](/ticdc/ticdc-csv.md)
+      - [TiCDC Debezium Protocol](/ticdc/ticdc-debezium.md)
+      - [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md)
+      - [TiCDC Simple Protocol](/ticdc/ticdc-simple-protocol.md)
+    - [TiCDC Open API v2](/ticdc/ticdc-open-api-v2.md)
+    - [TiCDC Open API v1](/ticdc/ticdc-open-api.md)
+    - TiCDC Data Consumption
+      - [TiCDC Row Data Checksum Verification Based on Avro](/ticdc/ticdc-avro-checksum-verification.md)
+      - [Guide for Developing a Storage Sink Consumer](/ticdc/ticdc-storage-consumer-dev-guide.md)
+    - [TiCDC Compatibility](/ticdc/ticdc-compatibility.md)
+  - [Troubleshoot](/ticdc/troubleshoot-ticdc.md)
+  - [FAQs](/ticdc/ticdc-faq.md)
+  - [Glossary](/ticdc/ticdc-glossary.md)
 - Maintain
   - Security
     - [Best Practices for TiDB Security Configuration](/best-practices-for-security-configuration.md)
@@ -586,48 +628,6 @@
       - [FAQ](/tidb-lightning/tidb-lightning-faq.md)
       - [Glossary](/tidb-lightning/tidb-lightning-glossary.md)
   - [Dumpling](/dumpling-overview.md)
-  - TiCDC
-    - [Overview](/ticdc/ticdc-overview.md)
-    - [Deploy and Maintain](/ticdc/deploy-ticdc.md)
-    - Changefeed
-      - [Overview](/ticdc/ticdc-changefeed-overview.md)
-      - Create Changefeeds
-        - [Replicate Data to MySQL-compatible Databases](/ticdc/ticdc-sink-to-mysql.md)
-        - [Replicate Data to Kafka](/ticdc/ticdc-sink-to-kafka.md)
-        - [Replicate Data to Pulsar](/ticdc/ticdc-sink-to-pulsar.md)
-        - [Replicate Data to Storage Services](/ticdc/ticdc-sink-to-cloud-storage.md)
-      - [Manage Changefeeds](/ticdc/ticdc-manage-changefeed.md)
-      - [TiCDC Client Authentication](/ticdc/ticdc-client-authentication.md)
-      - [Log Filter](/ticdc/ticdc-filter.md)
-      - [DDL Replication](/ticdc/ticdc-ddl.md)
-      - [Bidirectional Replication](/ticdc/ticdc-bidirectional-replication.md)
-      - [Data Integrity Validation for Single-Row Data](/ticdc/ticdc-integrity-check.md)
-      - [Data Consistency Validation for TiDB Upstream/Downstream Clusters](/ticdc/ticdc-upstream-downstream-check.md)
-      - [TiCDC Behavior in Splitting UPDATE Events](/ticdc/ticdc-split-update-behavior.md)
-    - Monitor and Alert
-      - [Monitoring Metrics Summary](/ticdc/ticdc-summary-monitor.md)
-      - [Monitoring Metrics Details](/ticdc/monitor-ticdc.md)
-      - [Alert Rules](/ticdc/ticdc-alert-rules.md)
-    - Reference
-      - [Architecture](/ticdc/ticdc-architecture.md)
-      - [TiCDC Server Configurations](/ticdc/ticdc-server-config.md)
-      - [TiCDC Changefeed Configurations](/ticdc/ticdc-changefeed-config.md)
-      - Output Protocols
-        - [TiCDC Avro Protocol](/ticdc/ticdc-avro-protocol.md)
-        - [TiCDC Canal-JSON Protocol](/ticdc/ticdc-canal-json.md)
-        - [TiCDC CSV Protocol](/ticdc/ticdc-csv.md)
-        - [TiCDC Debezium Protocol](/ticdc/ticdc-debezium.md)
-        - [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md)
-        - [TiCDC Simple Protocol](/ticdc/ticdc-simple-protocol.md)
-      - [TiCDC Open API v2](/ticdc/ticdc-open-api-v2.md)
-      - [TiCDC Open API v1](/ticdc/ticdc-open-api.md)
-      - TiCDC Data Consumption
-        - [TiCDC Row Data Checksum Verification Based on Avro](/ticdc/ticdc-avro-checksum-verification.md)
-        - [Guide for Developing a Storage Sink Consumer](/ticdc/ticdc-storage-consumer-dev-guide.md)
-      - [Compatibility](/ticdc/ticdc-compatibility.md)
-    - [Troubleshoot](/ticdc/troubleshoot-ticdc.md)
-    - [FAQs](/ticdc/ticdc-faq.md)
-    - [Glossary](/ticdc/ticdc-glossary.md)
   - PingCAP Clinic Diagnostic Service
     - [Overview](/clinic/clinic-introduction.md)
     - [Quick Start](/clinic/quick-start-with-clinic.md)

diff --git a/check-before-deployment.md b/check-before-deployment.md
@@ -42,6 +42,13 @@ Take the `/dev/nvme0n1` data disk as an example:
     parted -s -a optimal /dev/nvme0n1 mklabel gpt -- mkpart primary ext4 1 -1
     ```
 
+    For large NVMe devices, you can create multiple partitions:
+
+    ```bash
+    parted -s -a optimal /dev/nvme0n1 mklabel gpt -- mkpart primary ext4 1 2000GB
+    parted -s -a optimal /dev/nvme0n1 -- mkpart primary ext4 2000GB -1
+    ```
+
     > **Note:**
     >
     > Use the `lsblk` command to view the device number of the partition: for a NVMe disk, the generated device number is usually `nvme0n1p1`; for a regular disk (for example, `/dev/sdb`), the generated device number is usually `sdb1`.
@@ -93,6 +100,7 @@ Take the `/dev/nvme0n1` data disk as an example:
 
     ```bash
     mkdir /data1 && \
+    systemctl daemon-reload && \
     mount -a
     ```
 
@@ -138,25 +146,25 @@ Some operations in TiDB require writing temporary files to the server, so it is
 
 - `Fast Online DDL` work area
 
-    When the variable [`tidb_ddl_enable_fast_reorg`](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630) is set to `ON` (the default value in v6.5.0 and later versions), `Fast Online DDL` is enabled, and some DDL operations need to read and write temporary files in filesystems. The location is defined by the configuration item [`temp-dir`](/tidb-configuration-file.md#temp-dir-new-in-v630). You need to ensure that the user that runs TiDB has read and write permissions for that directory of the operating system. Taking the default directory `/tmp/tidb` as an example:
+    When the variable [`tidb_ddl_enable_fast_reorg`](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630) is set to `ON` (the default value in v6.5.0 and later versions), `Fast Online DDL` is enabled, and some DDL operations need to read and write temporary files in filesystems. The location is defined by the configuration item [`temp-dir`](/tidb-configuration-file.md#temp-dir-new-in-v630). You need to ensure that the user that runs TiDB has read and write permissions for that directory of the operating system. The default directory `/tmp/tidb` uses tmpfs (temporary file system). It is recommended to explicitly specify a disk directory. The following uses `/data/tidb-deploy/tempdir` as an example:
 
     > **Note:**
     > 
     > If DDL operations on large objects exist in your application, it is highly recommended to configure an independent large file system for [`temp-dir`](/tidb-configuration-file.md#temp-dir-new-in-v630).
 
     ```shell
-    sudo mkdir /tmp/tidb
+    sudo mkdir -p /data/tidb-deploy/tempdir
     ```
 
-    If the `/tmp/tidb` directory already exists, make sure the write permission is granted.
+    If the `/data/tidb-deploy/tempdir` directory already exists, make sure the write permission is granted.
 
     ```shell
-    sudo chmod -R 777 /tmp/tidb
+    sudo chmod -R 777 /data/tidb-deploy/tempdir
     ```
 
     > **Note:**
     >
-    > If the directory does not exist, TiDB will automatically create it upon startup. If the directory creation fails or TiDB does not have the read and write permissions for that directory, [`Fast Online DDL`](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630) might experience unpredictable issues during runtime.
+    > If the directory does not exist, TiDB will automatically create it upon startup. If the directory creation fails or TiDB does not have the read and write permissions for that directory, [`Fast Online DDL`](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630) will be disabled during runtime.
 
 ## Check and stop the firewall service of target machines
 
@@ -336,7 +344,11 @@ sudo systemctl enable ntpd.service
 For TiDB in the production environment, it is recommended to optimize the operating system configuration in the following ways:
 
 1. Disable THP (Transparent Huge Pages). The memory access pattern of databases tends to be sparse rather than consecutive. If the high-level memory fragmentation is serious, higher latency will occur when THP pages are allocated.
-2. Set the I/O Scheduler of the storage media to `noop`. For the high-speed SSD storage media, the kernel's I/O scheduling operations can cause performance loss. After the Scheduler is set to `noop`, the performance is better because the kernel directly sends I/O requests to the hardware without other operations. Also, the noop Scheduler is better applicable.
+2. Set the I/O Scheduler of the storage media.
+
+    - For the high-speed SSD storage, the kernel's default I/O scheduling operations might cause performance loss. It is recommended to set the I/O Scheduler to first-in-first-out (FIFO), such as `noop` or `none`. This configuration allows the kernel to pass I/O requests directly to hardware without scheduling, thus improving performance.
+    - For NVMe storage, the default I/O Scheduler is `none`, so no adjustment is needed.
+
 3. Choose the `performance` mode for the cpufrequ module which controls the CPU frequency. The performance is maximized when the CPU frequency is fixed at its highest supported operating frequency without dynamic adjustment.
 
 Take the following steps to check the current operating system configuration and configure optimal parameters:
@@ -357,9 +369,9 @@ Take the following steps to check the current operating system configuration and
     >
     > If `[always] madvise never` is output, THP is enabled. You need to disable it.
 
-2. Execute the following command to see the I/O Scheduler of the disk where the data directory is located. Assume that you create data directories on both sdb and sdc disks:
+2. Execute the following command to see the I/O Scheduler of the disk where the data directory is located.
 
-    {{< copyable "shell-regular" >}}
+    If your data directory uses an SD or VD device, run the following command to check the I/O Scheduler:
 
     ```bash
     cat /sys/block/sd[bc]/queue/scheduler
@@ -374,6 +386,21 @@ Take the following steps to check the current operating system configuration and
     >
     > If `noop [deadline] cfq` is output, the I/O Scheduler for the disk is in the `deadline` mode. You need to change it to `noop`.
 
+    If your data directory uses an NVMe device, run the following command to check the I/O Scheduler:
+
+    ```bash
+    cat /sys/block/nvme[01]*/queue/scheduler
+    ```
+
+    ```
+    [none] mq-deadline kyber bfq
+    [none] mq-deadline kyber bfq
+    ```
+    
+    > **Note:**
+    >
+    > `[none] mq-deadline kyber bfq` indicates that the NVMe device uses the `none` I/O Scheduler, and no changes are needed.
+
 3. Execute the following command to see the `ID_SERIAL` of the disk:
 
     {{< copyable "shell-regular" >}}
@@ -389,7 +416,8 @@ Take the following steps to check the current operating system configuration and
 
     > **Note:**
     >
-    > If multiple disks are allocated with data directories, you need to execute the above command several times to record the `ID_SERIAL` of each disk.
+    > - If multiple disks are allocated with data directories, you need to execute the above command for each disk to record the `ID_SERIAL` of each disk.
+    > - If your device uses the `noop` or `none` Scheduler, you do not need to record the `ID_SERIAL` or configure udev rules or the tuned profile.
 
 4. Execute the following command to see the power policy of the cpufreq module:
 
@@ -466,6 +494,10 @@ Take the following steps to check the current operating system configuration and
 
         3. Apply the new tuned profile:
 
+            > **Note:**
+            >
+            > If your device uses the `noop` or `none` I/O Scheduler, skip this step. No Scheduler configuration is needed in the tuned profile.
+
             {{< copyable "shell-regular" >}}
 
             ```bash
@@ -495,12 +527,12 @@ Take the following steps to check the current operating system configuration and
             {{< copyable "shell-regular" >}}
 
             ```bash
-            grubby --args="transparent_hugepage=never" --update-kernel /boot/vmlinuz-3.10.0-957.el7.x86_64
+            grubby --args="transparent_hugepage=never" --update-kernel `grubby --default-kernel`
             ```
 
             > **Note:**
             >
-            > `--update-kernel` is followed by the actual default kernel version.
+            > You can also specify the actual version number after `--update-kernel`, for example, `--update-kernel /boot/vmlinuz-3.10.0-957.el7.x86_64`.
 
         3. Execute `grubby --info` to see the modified default kernel configuration:
 
@@ -548,6 +580,10 @@ Take the following steps to check the current operating system configuration and
 
         6. Apply the udev script:
 
+            > **Note:**
+            >
+            > If your device uses the `noop` or `none` I/O Scheduler, skip this step. No udev rules configuration is needed.
+
             {{< copyable "shell-regular" >}}
 
             ```bash
@@ -640,6 +676,7 @@ Take the following steps to check the current operating system configuration and
     > - The setting of `vm.min_free_kbytes` affects the memory reclaim mechanism. Setting it too large reduces the available memory, while setting it too small might cause memory request speeds to exceed background reclaim speeds, leading to memory reclamation and consequent delays in memory allocation.
     > - It is recommended to set `vm.min_free_kbytes` to `1048576` KiB (1 GiB) at least. If [NUMA is installed](/check-before-deployment.md#install-the-numactl-tool), it is recommended to set it to `number of NUMA nodes * 1048576` KiB.
     > - For servers with memory sizes less than 16 GiB, it is recommended to keep the default value of `vm.min_free_kbytes` unchanged.
+    > - `tcp_tw_recycle` is removed in Linux kernel 4.12. Skip this setting if you are using a later kernel version.
 
 10. Execute the following command to configure the user's `limits.conf` file:
 

diff --git a/clustered-indexes.md b/clustered-indexes.md
@@ -37,7 +37,7 @@ On the other hand, tables with clustered indexes have certain disadvantages. See
 
 ## Usages
 
-## Create a table with clustered indexes
+### Create a table with clustered indexes
 
 Since TiDB v5.0, you can add non-reserved keywords `CLUSTERED` or `NONCLUSTERED` after `PRIMARY KEY` in a `CREATE TABLE` statement to specify whether the table's primary key is a clustered index. For example:
 

diff --git a/develop/dev-guide-build-cluster-in-cloud.md b/develop/dev-guide-build-cluster-in-cloud.md
@@ -108,10 +108,10 @@ mysql  Ver 8.0.28 for macos12.0 on arm64 (Homebrew)
 
 <div label="Linux">
 
-For Linux, the following takes CentOS 7 as an example:
+For Linux, the following takes Ubuntu as an example:
 
 ```shell
-yum install mysql
+apt-get install mysql-client
 ```
 
 Then, verify that the MySQL client is installed successfully: