title | summary | category | aliases | ||
---|---|---|---|---|---|
Split Region |
An overview of the usage of Split Region for the TiDB database. |
reference |
|
For each new table created in TiDB, one Region is segmented by default to store the data of this table. This default behavior is controlled by split-table
in the configuration file. When the data in this Region exceeds the default Region size limit, the Region starts to split into two.
In the above case, because there is only one Region at the beginning, all write requests occur on the TiKV where the Region is located. If there are a large number of writes for the newly created table, hotspots are caused.
To solve the hotspot problem in the above scenario, TiDB introduces the pre-split function, which can pre-split multiple Regions for a certain table according to the specified parameters and scatter them to each TiKV node.
SplitRegionStmt:
SplitSyntaxOption:
TableName:
PartitionNameListOpt:
SplitOption:
RowValue:
Int64Num:
There are two types of Split Region syntax:
{{< copyable "sql" >}}
SPLIT TABLE table_name [INDEX index_name] BETWEEN (lower_value) AND (upper_value) REGIONS region_num
BETWEEN lower_value AND upper_value REGIONS region_num
defines the upper boundary, the lower boundary, and the Region amount. Then the current region will be evenly spilt into the number of regions (as specified in region_num
) between the upper and lower boundaries.
{{< copyable "sql" >}}
SPLIT TABLE table_name [INDEX index_name] BY (value_list) [, (value_list)] ...
BY value_list…
specifies a series of points manually, based on which the current Region is spilt. It is suitable for scenarios with unevenly distributed data.
The following example shows the result of the SPLIT
statement:
+--------------------+----------------------+
| TOTAL_SPLIT_REGION | SCATTER_FINISH_RATIO |
+--------------------+----------------------+
| 4 | 1.0 |
+--------------------+----------------------+
TOTAL_SPLIT_REGION
: the number of newly split Regions.SCATTER_FINISH_RATIO
: the completion rate of scattering for newly split Regions.1.0
means that all Regions are scattered.0.5
means that only half of the Regions are scattered and the rest are being scattered.
The key of row data in each table is encoded by table_id
and row_id
. The format is as follows:
t[table_id]_r[row_id]
For example, when table_id
is 22 and row_id
is 11:
t22_r11
Row data in the same table have the same table_id
, but each row has its unique row_id
that can be used for Region split.
Because row_id
is an integer, the value of the key to be split can be calculated according to the specified lower_value
, upper_value
, and region_num
. TiDB first calculates the step value (step = (upper_value - lower_value)/num
). Then split will be done evenly per each "step" between lower_value
and upper_value
to generate the number of Regions as specified by num
.
For example, if you want 16 evenly split Regions split from key rangeminInt64
~maxInt64
for table t, you can use this statement:
{{< copyable "sql" >}}
SPLIT TABLE t BETWEEN (-9223372036854775808) AND (9223372036854775807) REGIONS 16;
This statement splits table t into 16 Regions between minInt64 and maxInt64. If the given primary key range is smaller than the specified one, for example, 0~1000000000, you can use 0 and 1000000000 take place of minInt64 and maxInt64 respectively to split Regions.
{{< copyable "sql" >}}
SPLIT TABLE t BETWEEN (0) AND (1000000000) REGIONS 16;
If the known data is unevenly distributed, and you want a Region to be split respectively in key ranges -inf ~ 10000, 10000 ~ 90000, and 90000 ~ +inf, you can achieve this by setting fixed points, as shown below:
{{< copyable "sql" >}}
SPLIT TABLE t BY (10000), (90000);
The key of the index data in the table is encoded by table_id
, index_id
, and the value of the index column. The format is as follows:
t[table_id]_i[index_id][index_value]
For example, when table_id
is 22, index_id
is 5, and index_value
is abc:
t22_i5abc
The table_id
and index_id
of the same index data in one table is the same. To split index Regions, you need to split Regions based on index_value
.
The way to split index evenly works the same as splitting data evenly. However, calculating the value of step is more complicated, because index_value
might not be an integer.
The values of upper
and lower
are encoded into a byte array firstly. After removing the longest common prefix of lower
and upper
byte array, the first 8 bytes of lower
and upper
are converted into the uint64 format. Then step = (upper - lower)/num
is calculated. After that, the calculated step is encoded into a byte array, which is appended to the longest common prefix of the lower
and upper
byte array for index split. Here is an example:
If the column of the idx
index is of the integer type, you can use the following SQL statement to split index data:
{{< copyable "sql" >}}
SPLIT TABLE t INDEX idx BETWEEN (-9223372036854775808) AND (9223372036854775807) REGIONS 16;
This statement splits the Region of index idx in table t into 16 Regions from minInt64
to maxInt64
.
If the column of index idx1 is of varchar type, and you want to split index data by prefix letters.
{{< copyable "sql" >}}
SPLIT TABLE t INDEX idx1 BETWEEN ("a") AND ("z") REGIONS 26;
This statement splits index idx1 into 26 Regions from a~z. The range of Region 1 is [minIndexValue, b)
; the range of Region 2 is [b, c)
; … the range of Region 26 is [z, minIndexValue]
. For the idx
index, data with the a
prefix is written into Region 1, while data with the b
prefix is written into Region 2, and so on.
If the column of index idx2 is of time type like timestamp/datetime, and you want to split index Region by time interval:
{{< copyable "sql" >}}
SPLIT TABLE t INDEX idx2 BETWEEN ("2010-01-01 00:00:00") AND ("2020-01-01 00:00:00") REGIONS 10;
This statement splits the region of index idx2 in table t into 10 Regions from 2010-01-01 00:00:00
to 2020-01-01 00:00:00
. The range of Region 1 is [minIndexValue, 2011-01-01 00:00:00)
; the range of Region 2 is [2011-01-01 00:00:00, 2012-01-01 00:00:00)
and so on.
Region split methods for other types of index columns are similar.
For data Region split of joint indexes, the only difference is that you can specify multiple columns values.
For example, index idx3 (a, b)
contains 2 columns, with column a
of timestamp type and column b
int. If you just want to do a time range split according to column a
, you can use the SQL statement for splitting time index of a single column. In this case, do not specify the value of column b
in lower_value
and upper_velue
.
{{< copyable "sql" >}}
SPLIT TABLE t INDEX idx3 BETWEEN ("2010-01-01 00:00:00") AND ("2020-01-01 00:00:00") REGIONS 10;
Within the same range of time, if you want to do one more split according to column b column. Just specify the value for column b when splitting.
{{< copyable "sql" >}}
SPLIT TABLE t INDEX idx3 BETWEEN ("2010-01-01 00:00:00", "a") AND ("2010-01-01 00:00:00", "z") REGIONS 10;
This statement splits 10 Regions in the range of a~z according to the value of column b, with the same time prefix as column a. If the value specified for column a is different, the value of column b might not be used in this case.
Index data can also be split by specified index values.
For example, there is idx4 (a,b)
, with column a
of the varchar type and column b
of the timestamp type.
{{< copyable "sql" >}}
SPLIT TABLE t1 INDEX idx4 ("a", "2000-01-01 00:00:01"), ("b", "2019-04-17 14:26:19"), ("c", "");
This statement specifies 3 values to split 4 Regions. The range of each Region is as follows:
region1 [ minIndexValue , ("a", "2000-01-01 00:00:01"))
region2 [("a", "2000-01-01 00:00:01") , ("b", "2019-04-17 14:26:19"))
region3 [("b", "2019-04-17 14:26:19") , ("c", "") )
region4 [("c", "") , maxIndexValue )
Splitting Regions for partitioned tables is the same as splitting Regions for ordinary tables. The only difference is that the same split operation is performed for every partition.
-
The syntax of even split:
{{< copyable "sql" >}}
SPLIT [PARTITION] TABLE t [PARTITION] [(partition_name_list...)] [INDEX index_name] BETWEEN (lower_value) AND (upper_value) REGIONS region_num
-
The syntax of uneven split:
{{< copyable "sql" >}}
SPLIT [PARTITION] TABLE table_name [PARTITION (partition_name_list...)] [INDEX index_name] BY (value_list) [, (value_list)] ...
-
Create a partitioned table
t
.{{< copyable "sql" >}}
create table t (a int,b int,index idx(a)) partition by hash(a) partitions 2;
After creating the table
t
, a Region is split for each partition. Use theSHOW TABLE REGIONS
syntax to view the Regions of this table:{{< copyable "sql" >}}
show table t regions;
+-----------+-----------+---------+-----------+-----------------+------------------+------------+---------------+------------+----------------------+------------------+ | REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS | +-----------+-----------+---------+-----------+-----------------+------------------+------------+---------------+------------+----------------------+------------------+ | 1978 | t_1400_ | t_1401_ | 1979 | 4 | 1979, 1980, 1981 | 0 | 0 | 0 | 1 | 0 | | 6 | t_1401_ | | 17 | 4 | 17, 18, 21 | 0 | 223 | 0 | 1 | 0 | +-----------+-----------+---------+-----------+-----------------+------------------+------------+---------------+------------+----------------------+------------------+
-
Use the
SPLIT
syntax to split a Region for each partition. In the following example, four Regions are split in the range of[0,10000]
.{{< copyable "sql" >}}
split partition table t between (0) and (10000) regions 4;
-
Use the
SHOW TABLE REGIONS
syntax to view the Regions of this table again. You can see that this table now has ten Regions, each partition with five Regions, four of which are the row data and one is the index data.{{< copyable "sql" >}}
show table t regions;
+-----------+---------------+---------------+-----------+-----------------+------------------+------------+---------------+------------+----------------------+------------------+ | REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS | +-----------+---------------+---------------+-----------+-----------------+------------------+------------+---------------+------------+----------------------+------------------+ | 1998 | t_1400_r | t_1400_r_2500 | 2001 | 5 | 2000, 2001, 2015 | 0 | 132 | 0 | 1 | 0 | | 2006 | t_1400_r_2500 | t_1400_r_5000 | 2016 | 1 | 2007, 2016, 2017 | 0 | 35 | 0 | 1 | 0 | | 2010 | t_1400_r_5000 | t_1400_r_7500 | 2012 | 2 | 2011, 2012, 2013 | 0 | 35 | 0 | 1 | 0 | | 1978 | t_1400_r_7500 | t_1401_ | 1979 | 4 | 1979, 1980, 1981 | 0 | 621 | 0 | 1 | 0 | | 1982 | t_1400_ | t_1400_r | 2014 | 3 | 1983, 1984, 2014 | 0 | 35 | 0 | 1 | 0 | | 1990 | t_1401_r | t_1401_r_2500 | 1992 | 2 | 1991, 1992, 2020 | 0 | 120 | 0 | 1 | 0 | | 1994 | t_1401_r_2500 | t_1401_r_5000 | 1997 | 5 | 1996, 1997, 2021 | 0 | 129 | 0 | 1 | 0 | | 2002 | t_1401_r_5000 | t_1401_r_7500 | 2003 | 4 | 2003, 2023, 2022 | 0 | 141 | 0 | 1 | 0 | | 6 | t_1401_r_7500 | | 17 | 4 | 17, 18, 21 | 0 | 601 | 0 | 1 | 0 | | 1986 | t_1401_ | t_1401_r | 1989 | 5 | 1989, 2018, 2019 | 0 | 123 | 0 | 1 | 0 | +-----------+---------------+---------------+-----------+-----------------+------------------+------------+---------------+------------+----------------------+------------------+
-
You can also split Regions for the index of each partition. For example, you can split the
[1000,10000]
range of theidx
index into two Regions:{{< copyable "sql" >}}
split partition table t index idx between (1000) and (10000) regions 2;
You can specify the partition to be split. See the following usage example:
-
Create a partitioned table:
{{< copyable "sql" >}}
create table t ( a int, b int, index idx(b)) partition by range( a ) ( partition p1 values less than (10000), partition p2 values less than (20000), partition p3 values less than (MAXVALUE) );
-
Split two Regions in the
[0,10000]
range of thep1
partition:{{< copyable "sql" >}}
split partition table t partition (p1) between (0) and (10000) regions 2;
-
Split two Regions in the
[10000,20000]
range of thep2
partition:{{< copyable "sql" >}}
split partition table t partition (p2) between (10000) and (20000) regions 2;
-
Use the
SHOW TABLE REGIONS
syntax to view the Regions of this table:{{< copyable "sql" >}}
show table t regions;
+-----------+----------------+----------------+-----------+-----------------+------------------+------------+---------------+------------+----------------------+------------------+ | REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS | +-----------+----------------+----------------+-----------+-----------------+------------------+------------+---------------+------------+----------------------+------------------+ | 2040 | t_1406_ | t_1406_r_5000 | 2045 | 3 | 2043, 2045, 2044 | 0 | 0 | 0 | 1 | 0 | | 2032 | t_1406_r_5000 | t_1407_ | 2033 | 4 | 2033, 2034, 2035 | 0 | 0 | 0 | 1 | 0 | | 2046 | t_1407_ | t_1407_r_15000 | 2048 | 2 | 2047, 2048, 2050 | 0 | 35 | 0 | 1 | 0 | | 2036 | t_1407_r_15000 | t_1408_ | 2037 | 4 | 2037, 2038, 2039 | 0 | 0 | 0 | 1 | 0 | | 6 | t_1408_ | | 17 | 4 | 17, 18, 21 | 0 | 214 | 0 | 1 | 0 | +-----------+----------------+----------------+-----------+-----------------+------------------+------------+---------------+------------+----------------------+------------------+
-
Split two Regions in the
[0,20000]
range of theidx
index ofp1
andp2
partitions:{{< copyable "sql" >}}
split partition table t partition (p1,p2) index idx between (0) and (20000) regions 2;
To have evenly split Regions when a table is created, it is recommended you use shard_row_id_bits
together with pre_split_regions
. When a table is created successfully, pre_split_regions
pre-spilts tables into the number of Regions as specified by 2^(pre_split_regions)
.
Note:
The value of
pre_split_regions
must be less than or equal to that ofshard_row_id_bits
.
{{< copyable "sql" >}}
create table t (a int, b int,index idx1(a)) shard_row_id_bits = 4 pre_split_regions=2;
After building the table, this statement splits 4 + 1
Regions for table t. 4 (2^2)
Regions are used to save table row data, and 1 Region is for saving the index data of idx1
.
The ranges of the 4 table Regions are as follows:
region1: [ -inf , 1<<61 )
region2: [ 1<<61 , 2<<61 )
region3: [ 2<<61 , 3<<61 )
region4: [ 3<<61 , +inf )
There are two SPLIT REGION
related session variables: tidb_scatter_region
, tidb_wait_split_region_finish
and tidb_wait_split_region_timeout
. For details, see TiDB specific system variables and syntax.