-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs/design: add the table partition proposal #7969
Conversation
OK. I will review it later~ |
Comment addressed, PTAL, thanks @CaitinChen |
PTAL @CaitinChen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
CI fail, blocked by #8287 |
/run-all-tests |
1 similar comment
/run-all-tests |
PTAL @shenli |
/run-all-tests |
PTAL @shenli |
|
||
## Background | ||
|
||
MySQL has the [table partition](https://dev.mysql.com/doc/refman/8.0/en/partitioning.html) feature. If this feature is supported in TiDB, many issues could be addressed. For example, drop ranged partitions could be used to remove old data; partition by hash could address the hot data issue and thus the write performance is improved; query on partitioned tables could be faster than on manual sharding tables because of the partition pruning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query on partitioned tables could be faster than unpartitioned tables because of the partition pruning.
There are two levels of mapping in TiDB: SQL data to a logical key range, logical key range to the physical storage. | ||
When TiDB maps table data to key-value storage, it first encodes `table id + row id` as key, row data as value. Then the logical key range is split into regions and stored into TiKV. | ||
|
||
Table partition works on the first level of mapping, partition ID will be made equivalent of the table ID. A partitioned table row uses `partition id + row id` as encoded key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention that PartitionID
is unique in cluster scope.
select * from p3 where id < 30) | ||
``` | ||
|
||
During the logical optimization phase, the `DataSource` plan is translated into `UnionAll`, and then each partition generates its own `TableReader` in the physical optimization phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataSource
plan -> DataSource
operator
UnionAll
-> UnionAll
operator
The drawbacks of this implementation are: | ||
|
||
* If the table has many partitions, there will be many readers, and then the `explain` result is not friendly to the user | ||
* The `UnionAll` executor cannot keep the results in order, so if some executor needs ordered results such as `IndexReader`, an extra `Sort` executor is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are talking about plan phrase, so there is no executor
.
|
||
### How to write to the partitioned table | ||
|
||
All the write operation calls functions like `table.AddRecord` eventually, so implementing the write operation on a partitioned table simply implements this interface method on the `PartitionedTable` struct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
singular? or plural?
All the write operation calls
89b2117
to
91e26ef
Compare
Comment addressed @shenli |
LGTM |
What problem does this PR solve?
Add a table partition design proposal document #7907
@CaitinChen @shenli
This change is