Skip to content

Commit

Permalink
Update TiDB architecture (#3094) (#3299)
Browse files Browse the repository at this point in the history
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
  • Loading branch information
ti-srebot authored Jul 16, 2020
1 parent 7d9737f commit 04ffe5a
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 41 deletions.
3 changes: 2 additions & 1 deletion TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,8 @@
+ [Quick Start](/get-started-with-tispark.md)
+ [User Guide](/tispark-overview.md)
+ Reference
+ [Architecture](/architecture.md)
+ Cluster Architecture
+ [Overview](/architecture.md)
+ Key Monitoring Metrics
+ [Overview](/grafana-overview-dashboard.md)
+ [TiDB](/grafana-tidb-dashboard.md)
Expand Down
56 changes: 17 additions & 39 deletions architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,54 +7,32 @@ aliases: ['/docs/stable/architecture/','/docs/v4.0/architecture/']

# TiDB Architecture

The TiDB platform is comprised of three key components: the TiDB server, the PD server, and the TiKV server. In addition, TiDB also provides the [TiSpark](https://github.com/pingcap/tispark/) component for complex OLAP requirements and the [TiDB Operator](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/tidb-operator-overview) to make things simpler for the deployment and management on cloud.
Compared with the traditional standalone databases, TiDB has the following advantages:

![image alt text](/media/tidb-architecture.png)
* Has a distributed architecture with flexible and elastic scalability.
* Fully compatible with the MySQL 5.7 protocol, common features and syntax of MySQL. To migrate your applications to TiDB, you do not need to change a single line of code in many cases.
* Supports high availability with automatic failover when a minority of replicas fail; transparent to applications.
* Supports ACID transactions, suitable for scenarios requiring strong consistency such as bank transfer.
* Provides a rich series of [data migration tools](/migration-overview.md) for migrating, replicating, or backing up data.

## TiDB server

The TiDB server is in charge of the following operations:

1. Receiving the SQL requests

2. Processing the SQL related logics

3. Locating the TiKV address for storing and computing data through Placement Driver (PD)

4. Exchanging data with TiKV

5. Returning the result

The TiDB server is stateless. It does not store data and it is for computing only. TiDB is horizontally scalable and provides the unified interface to the outside through the load balancing components such as Linux Virtual Server (LVS), HAProxy, or F5.
As a distributed database, TiDB is designed to consist of multiple components. Theses components communicate with each other and form a complete TiDB system. The architecture is as follows:

## Placement Driver server
![TiDB Architecture](/media/tidb-architecture.png)

The Placement Driver (PD) server is the managing component of the entire cluster and is in charge of the following three operations:

1. Storing the metadata of the cluster such as the region location of a specific key.

2. Scheduling and load balancing regions in the TiKV cluster, including but not limited to data migration and Raft group leader transfer.

3. Allocating the transaction ID that is globally unique and monotonically increasing.

The PD server ensures redundancy by using the Raft consensus algorithm. The Raft leader is responsible for handling all operations, with remaining PD servers available for high availability only. It is recommended to deploy PD as an odd number of nodes.

## TiKV server

The TiKV server is responsible for storing data. From an external view, TiKV is a distributed transactional Key-Value storage engine. Region is the basic unit to store data. Each Region stores the data for a particular Key Range which is a left-closed and right-open interval from StartKey to EndKey. There are multiple Regions in each TiKV node. TiKV uses the Raft protocol for replication to ensure the data consistency and disaster recovery. The replicas of the same Region on different nodes compose a Raft Group. The load balancing of the data among different TiKV nodes is carried out by PD through scheduling the load in units of Region.
## TiDB server

## TiSpark
The TiDB server is a stateless SQL layer that exposes the connection endpoint of the MySQL protocol to the outside. The TiDB server receives SQL requests, performs SQL parsing and optimization, and ultimately generates a distributed execution plan. It is horizontally scalable and provides the unified interface to the outside through the load balancing components such as Linux Virtual Server (LVS), HAProxy, or F5. It does not store data and is only for computing and SQL analyzing, transmitting actual data read request to TiKV nodes (or TiFlash nodes).

TiSpark deals with the complex OLAP requirements. TiSpark makes Spark SQL directly run on the storage layer of the TiDB cluster, combines the advantages of the distributed TiKV cluster, and integrates into the big data ecosystem. With TiSpark, TiDB can support both OLTP and OLAP scenarios in one cluster, so the users never need to worry about data replication.
## Placement Driver (PD) server

## TiDB Operator
The PD server is the metadata managing component of the entire cluster. It stores metadata of real-time data distribution of every single TiKV node and the topology structure of the entire TiDB cluster, provides the TiDB Dashboard management UI, and allocates transaction IDs to distributed transactions. The PD server is "the brain" of the entire TiDB cluster because it not only stores metadata of the cluster, but also sends data scheduling command to specific TiKV nodes according to the data distribution state reported by TiKV nodes in real time. In addition, the PD server consists of three nodes at least and has high availability. It is recommended to deploy an odd number of PD nodes.

TiDB Operator empowers TiDB users to deploy and manage TiDB clusters on mainstream cloud infrastructure (Kubernetes).
## Storage servers

TiDB Operator:
### TiKV server

+ Combines the best practices of the container orchestration from the cloud-native community with the know-how of TiDB operation and maintenance.
The TiKV server is responsible for storing data. TiKV is a distributed transactional key-value storage engine. [Region](/glossary.md#regionpeerraft-group) is the basic unit to store data. Each Region stores the data for a particular Key Range which is a left-closed and right-open interval from StartKey to EndKey. Multiple Regions exist in each TiKV node. TiKV APIs provide native support to distributed transactions at the key-value pair level and supports the Snapshot Isolation level isolation by default. This is the core of how TiDB supports distributed transactions at the SQL level. After processing SQL statements, the TiDB server converts the SQL execution plan to an actual call to the TiKV API. Therefore, data is stored in TiKV. All the data in TiKV is automatically maintained in multiple replicas (three replicas by default), so TiKV has native high availability and supports automatic failover.

+ Capable of quick deployment, mixed deployment among multiple clusters, automatic operation and maintenance, automatic fail-over, etc.
### TiFlash server

+ Makes it user-friendly to use and manage TiDB.
The TiFlash Server is a special type of storage server. Unlike ordinary TiKV nodes, TiFlash stores data by column, mainly designed to accelerate analytical processing.
2 changes: 1 addition & 1 deletion faq/tidb-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ At the bottom layer, TiKV uses a model of replication log + State Machine to rep

Yes. TiDB distributes transactions across your cluster, whether it is a few nodes in a single location or many [nodes across multiple data centers](/multi-data-centers-in-one-city-deployment.md).

Inspired by Google's Percolator, the transaction model in TiDB is mainly a two-phase commit protocol with some practical optimizations. This model relies on a timestamp allocator to assign the monotone increasing timestamp for each transaction, so conflicts can be detected. [PD](/architecture.md#placement-driver-server) works as the timestamp allocator in a TiDB cluster.
Inspired by Google's Percolator, the transaction model in TiDB is mainly a two-phase commit protocol with some practical optimizations. This model relies on a timestamp allocator to assign the monotone increasing timestamp for each transaction, so conflicts can be detected. [PD](/architecture.md#placement-driver-pd-server) works as the timestamp allocator in a TiDB cluster.

#### What programming language can I use to work with TiDB?

Expand Down
Binary file modified media/tidb-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 04ffe5a

Please sign in to comment.