-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TPC-H 100GB on TiDB 5.0 with MPP benchmark #5244
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
--- | ||
title: TiDB TPC-H 100GB Performance Test Report -- v5.0 MPP mode vs. Greenplum 6.15.0 and Apache Spark 3.1.1 | ||
summary: Compare the TPC-H 100GB performance of TiDB 5.0 MPP mode, Greenplum 6.15.0, and Apache Spark 3.1.1. | ||
--- | ||
|
||
# TiDB TPC-H 100GB Performance Test Report -- TiDB v5.0 MPP mode vs. Greenplum 6.15.0 and Apache Spark 3.1.1 | ||
|
||
## Test overview | ||
|
||
This test aims at comparing the TPC-H 100GB performance of TiDB v5.0 in the MPP mode with that of Greenplum and Apache Spark, two mainstream analytics engines, in their latest versions. The test result shows that the performance of TiDB v5.0 in the MPP mode is two to three times faster than that of the other two solutions under TPC-H workload. | ||
|
||
In v5.0, TiDB introduces the MPP mode for [TiFlash](/tiflash/tiflash-overview.md), which significantly enhances TiDB's Hybrid Transactional and Analytical Processing (HTAP) capabilities. Test objects in this report are as follows: | ||
|
||
+ TiDB v5.0 columnar storage in the MPP mode | ||
+ Greenplum 6.15.0 | ||
+ Apache Spark 3.1.1 + Parquet | ||
|
||
## Test environment | ||
|
||
### Hardware prerequisite | ||
|
||
+ Node count: 3 | ||
+ CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, 40 cores | ||
+ Memory: 189 GB | ||
+ Disks: NVMe 3TB * 2 | ||
|
||
### Software version | ||
|
||
| Service type | Software version | | ||
|:----------|:-----------| | ||
| TiDB | 5.0 | | ||
| Greenplum | 6.15.0 | | ||
| Apache Spark | 3.1.1 | | ||
|
||
### Parameter configuration | ||
|
||
#### TiDB v5.0 | ||
|
||
For the v5.0 cluster, TiDB uses the default parameter configuration except for the following configuration items. | ||
|
||
In the configuration file `users.toml` of TiFlash, configure `max_memory_usage` as follows: | ||
|
||
```toml | ||
[profiles.default] | ||
max_memory_usage = 10000000000000 | ||
``` | ||
|
||
Set session variables with the following SQL statements: | ||
|
||
```sql | ||
set @@tidb_isolation_read_engines='tiflash'; | ||
set @@tidb_allow_mpp=1; | ||
set @@tidb_opt_broadcast_join=0; | ||
set @@tidb_mem_quota_query = 10 << 30; | ||
``` | ||
|
||
All TPC-H test tables are replicated to TiFlash in columnar format, with no additional partitions or indexes. | ||
|
||
#### Greenplum | ||
|
||
Except for the initial 3 nodes, the Greenplum cluster is deployed using an additional master node. Each segment server contains 8 segments, which means 4 segments per NVMe SSD. So there are 24 segments in total. The storage format is append-only/column-oriented storage and partition keys are used as primary keys. | ||
|
||
{{< copyable "" >}} | ||
|
||
``` | ||
log_statement = all | ||
gp_autostats_mode = none | ||
statement_mem = 2048MB | ||
gp_vmem_protect_limit = 16384 | ||
``` | ||
|
||
#### Apache Spark | ||
|
||
The test of Apache Spark uses Apache Parquet as the storage format and stores the data on HDFS. The HDFS system consists of three nodes. Each node has two assigned NVMe SSD disks as the data disks. The Spark cluster is deployed in standalone mode, using NVMe SSD disks as the local directory of `spark.local.dir` to speed up the shuffle spill, with no additional partitions or indexes. | ||
|
||
{{< copyable "" >}} | ||
|
||
``` | ||
--driver-memory 20G | ||
--total-executor-cores 120 | ||
--executor-cores 5 | ||
--executor-memory 15G | ||
``` | ||
|
||
## Test result | ||
|
||
> **Note:** | ||
> | ||
> The following test results are the average data of three tests. All numbers are in seconds. | ||
|
||
| Query ID | TiDB v5.0 | Greenplum 6.15.0 | Apache Spark 3.1.1 + Parquet | | ||
| :-------- | :----------- | :------------ | :-------------- | | ||
| 1 | 8.08 | 64.1307 | 52.64 | | ||
| 2 | 2.53 | 4.76612 | 11.83 | | ||
| 3 | 4.84 | 15.62898 | 13.39 | | ||
| 4 | 10.94 | 12.88318 | 8.54 | | ||
| 5 | 12.27 | 23.35449 | 25.23 | | ||
| 6 | 1.32 | 6.033 | 2.21 | | ||
| 7 | 5.91 | 12.31266 | 25.45 | | ||
| 8 | 6.71 | 11.82444 | 23.12 | | ||
| 9 | 44.19 | 22.40144 | 35.2 | | ||
| 10 | 7.13 | 12.51071 | 12.18 | | ||
| 11 | 2.18 | 2.6221 | 10.99 | | ||
| 12 | 2.88 | 7.97906 | 6.99 | | ||
| 13 | 6.84 | 10.15873 | 12.26 | | ||
| 14 | 1.69 | 4.79394 | 3.89 | | ||
| 15 | 3.29 | 10.48785 | 9.82 | | ||
| 16 | 5.04 | 4.64262 | 6.76 | | ||
| 17 | 11.7 | 74.65243 | 44.65 | | ||
| 18 | 12.87 | 64.87646 | 30.27 | | ||
| 19 | 4.75 | 8.08625 | 4.7 | | ||
| 20 | 8.89 | 15.47016 | 8.4 | | ||
| 21 | 24.44 | 39.08594 | 34.83 | | ||
| 22 | 1.23 | 7.67476 | 4.59 | | ||
|
||
![TPC-H](/media/tidb-v5-tpch-100-vs-gp-spark.png) | ||
|
||
In the performance diagram above: | ||
|
||
- Blue lines represent TiDB v5.0; | ||
- Red lines represent Greenplum 6.15.0; | ||
- Yellow lines represent Apache Spark 3.1.1. | ||
- The y-axis represents the execution time of the query. The less the time is, the better the performance is. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to add this in another PR. Please take it together.