Update TiDB architecture (#3094) (#3299)

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
pingcap · Jul 16, 2020 · 04ffe5a · 04ffe5a
1 parent 7d9737f
commit 04ffe5a
Show file tree

Hide file tree

Showing 4 changed files with 20 additions and 41 deletions.
diff --git a/TOC.md b/TOC.md
@@ -187,7 +187,8 @@
     + [Quick Start](/get-started-with-tispark.md)
     + [User Guide](/tispark-overview.md)
 + Reference
-  + [Architecture](/architecture.md)
+  + Cluster Architecture
+    + [Overview](/architecture.md)
   + Key Monitoring Metrics
     + [Overview](/grafana-overview-dashboard.md)
     + [TiDB](/grafana-tidb-dashboard.md)

diff --git a/architecture.md b/architecture.md
@@ -7,54 +7,32 @@ aliases: ['/docs/stable/architecture/','/docs/v4.0/architecture/']
 
 # TiDB Architecture
 
-The TiDB platform is comprised of three key components: the TiDB server, the PD server, and the TiKV server. In addition, TiDB also provides the [TiSpark](https://github.com/pingcap/tispark/) component for complex OLAP requirements and the [TiDB Operator](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/tidb-operator-overview) to make things simpler for the deployment and management on cloud.
+Compared with the traditional standalone databases, TiDB has the following advantages:
 
-![image alt text](/media/tidb-architecture.png)
+* Has a distributed architecture with flexible and elastic scalability.
+* Fully compatible with the MySQL 5.7 protocol, common features and syntax of MySQL. To migrate your applications to TiDB, you do not need to change a single line of code in many cases.
+* Supports high availability with automatic failover when a minority of replicas fail; transparent to applications.
+* Supports ACID transactions, suitable for scenarios requiring strong consistency such as bank transfer.
+* Provides a rich series of [data migration tools](/migration-overview.md) for migrating, replicating, or backing up data.
 
-## TiDB server
-
-The TiDB server is in charge of the following operations:
-
-1. Receiving the SQL requests
-
-2. Processing the SQL related logics
-
-3. Locating the TiKV address for storing and computing data through Placement Driver (PD)
-
-4. Exchanging data with TiKV
-
-5. Returning the result
-
-The TiDB server is stateless. It does not store data and it is for computing only. TiDB is horizontally scalable and provides the unified interface to the outside through the load balancing components such as Linux Virtual Server (LVS), HAProxy, or F5.
+As a distributed database, TiDB is designed to consist of multiple components. Theses components communicate with each other and form a complete TiDB system. The architecture is as follows:
 
-## Placement Driver server
+![TiDB Architecture](/media/tidb-architecture.png)
 
-The Placement Driver (PD) server is the managing component of the entire cluster and is in charge of the following three operations:
-
-1. Storing the metadata of the cluster such as the region location of a specific key.
-
-2. Scheduling and load balancing regions in the TiKV cluster, including but not limited to data migration and Raft group leader transfer.
-
-3. Allocating the transaction ID that is globally unique and monotonically increasing.
-
-The PD server ensures redundancy by using the Raft consensus algorithm. The Raft leader is responsible for handling all operations, with remaining PD servers available for high availability only. It is recommended to deploy PD as an odd number of nodes.
-
-## TiKV server
-
-The TiKV server is responsible for storing data. From an external view, TiKV is a distributed transactional Key-Value storage engine. Region is the basic unit to store data. Each Region stores the data for a particular Key Range which is a left-closed and right-open interval from StartKey to EndKey. There are multiple Regions in each TiKV node. TiKV uses the Raft protocol for replication to ensure the data consistency and disaster recovery. The replicas of the same Region on different nodes compose a Raft Group. The load balancing of the data among different TiKV nodes is carried out by PD through scheduling the load in units of Region.
+## TiDB server
 
-## TiSpark
+The TiDB server is a stateless SQL layer that exposes the connection endpoint of the MySQL protocol to the outside. The TiDB server receives SQL requests, performs SQL parsing and optimization, and ultimately generates a distributed execution plan. It is horizontally scalable and provides the unified interface to the outside through the load balancing components such as Linux Virtual Server (LVS), HAProxy, or F5. It does not store data and is only for computing and SQL analyzing, transmitting actual data read request to TiKV nodes (or TiFlash nodes).
 
-TiSpark deals with the complex OLAP requirements. TiSpark makes Spark SQL directly run on the storage layer of the TiDB cluster, combines the advantages of the distributed TiKV cluster, and integrates into the big data ecosystem. With TiSpark, TiDB can support both OLTP and OLAP scenarios in one cluster, so the users never need to worry about data replication.
+## Placement Driver (PD) server
 
-## TiDB Operator
+The PD server is the metadata managing component of the entire cluster. It stores metadata of real-time data distribution of every single TiKV node and the topology structure of the entire TiDB cluster, provides the TiDB Dashboard management UI, and allocates transaction IDs to distributed transactions. The PD server is "the brain" of the entire TiDB cluster because it not only stores metadata of the cluster, but also sends data scheduling command to specific TiKV nodes according to the data distribution state reported by TiKV nodes in real time. In addition, the PD server consists of three nodes at least and has high availability. It is recommended to deploy an odd number of PD nodes.
 
-TiDB Operator empowers TiDB users to deploy and manage TiDB clusters on mainstream cloud infrastructure (Kubernetes).
+## Storage servers
 
-TiDB Operator:
+### TiKV server
 
-+ Combines the best practices of the container orchestration from the cloud-native community with the know-how of TiDB operation and maintenance.
+The TiKV server is responsible for storing data. TiKV is a distributed transactional key-value storage engine. [Region](/glossary.md#regionpeerraft-group) is the basic unit to store data. Each Region stores the data for a particular Key Range which is a left-closed and right-open interval from StartKey to EndKey. Multiple Regions exist in each TiKV node. TiKV APIs provide native support to distributed transactions at the key-value pair level and supports the Snapshot Isolation level isolation by default. This is the core of how TiDB supports distributed transactions at the SQL level. After processing SQL statements, the TiDB server converts the SQL execution plan to an actual call to the TiKV API. Therefore, data is stored in TiKV. All the data in TiKV is automatically maintained in multiple replicas (three replicas by default), so TiKV has native high availability and supports automatic failover.
 
-+ Capable of quick deployment, mixed deployment among multiple clusters, automatic operation and maintenance, automatic fail-over, etc.
+### TiFlash server
 
-+ Makes it user-friendly to use and manage TiDB.
+The TiFlash Server is a special type of storage server. Unlike ordinary TiKV nodes, TiFlash stores data by column, mainly designed to accelerate analytical processing.
diff --git a/faq/tidb-faq.md b/faq/tidb-faq.md
@@ -65,7 +65,7 @@ At the bottom layer, TiKV uses a model of replication log + State Machine to rep
 
 Yes. TiDB distributes transactions across your cluster, whether it is a few nodes in a single location or many [nodes across multiple data centers](/multi-data-centers-in-one-city-deployment.md).
 
-Inspired by Google's Percolator, the transaction model in TiDB is mainly a two-phase commit protocol with some practical optimizations. This model relies on a timestamp allocator to assign the monotone increasing timestamp for each transaction, so conflicts can be detected. [PD](/architecture.md#placement-driver-server) works as the timestamp allocator in a TiDB cluster.
+Inspired by Google's Percolator, the transaction model in TiDB is mainly a two-phase commit protocol with some practical optimizations. This model relies on a timestamp allocator to assign the monotone increasing timestamp for each transaction, so conflicts can be detected. [PD](/architecture.md#placement-driver-pd-server) works as the timestamp allocator in a TiDB cluster.
 
 #### What programming language can I use to work with TiDB?
 

diff --git a/media/tidb-architecture.png b/media/tidb-architecture.png