Skip to content

Suboptimal join query plan for TPC-C "new order" transaction #3797

@andy-wm-arthur

Description

@andy-wm-arthur

TPC-C is composed of 5 transaction types. The "new order" transaction contains a join query on the customer and warehouse tables:

SELECT c_discount, c_last, c_credit, w_tax 
FROM customer1, warehouse1 
WHERE w_id = 1 AND c_w_id = w_id AND c_d_id = 2 AND c_id = 2327

The schemas for the customer and warehouse tables are as follows:

% dolt schema export customer1
CREATE TABLE `customer1` (
  `c_id` int NOT NULL,
  `c_d_id` tinyint NOT NULL,
  `c_w_id` smallint NOT NULL,
  `c_first` varchar(16),
  `c_middle` char(2),
  `c_last` varchar(16),
  `c_street_1` varchar(20),
  `c_street_2` varchar(20),
  `c_city` varchar(20),
  `c_state` char(2),
  `c_zip` char(9),
  `c_phone` char(16),
  `c_since` datetime,
  `c_credit` char(2),
  `c_credit_lim` bigint,
  `c_discount` decimal(4,2),
  `c_balance` decimal(12,2),
  `c_ytd_payment` decimal(12,2),
  `c_payment_cnt` smallint,
  `c_delivery_cnt` smallint,
  `c_data` text,
  PRIMARY KEY (`c_w_id`,`c_d_id`,`c_id`),
  KEY `idx_customer1` (`c_w_id`,`c_d_id`,`c_last`,`c_first`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

% dolt schema export warehouse1
CREATE TABLE `warehouse1` (
  `w_id` smallint NOT NULL,
  `w_name` varchar(10),
  `w_street_1` varchar(20),
  `w_street_2` varchar(20),
  `w_city` varchar(20),
  `w_state` char(2),
  `w_zip` char(9),
  `w_tax` decimal(4,2),
  `w_ytd` decimal(12,2),
  PRIMARY KEY (`w_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

The join condition matches a single row in each table and the result set is also a single row:

+------------+-------------+----------+-------+
| c_discount | c_last      | c_credit | w_tax |
+------------+-------------+----------+-------+
| 0.46       | PRESABLEBAR | GC       | 0.12  |
+------------+-------------+----------+-------+

This join has the following query plan in v0.40.16

+---------------------------------------------------------------------------------------------------------------------+
| plan                                                                                                                |
+---------------------------------------------------------------------------------------------------------------------+
| Project(customer1.c_discount, customer1.c_last, customer1.c_credit, warehouse1.w_tax)                               |
|  └─ IndexedJoin(customer1.c_w_id = warehouse1.w_id)                                                                 |
|      ├─ IndexedTableAccess(warehouse1 on [warehouse1.w_id] with ranges: [{[1, 1]}])                                 |
|      └─ Filter((customer1.c_d_id = 2) AND (customer1.c_id = 2327))                                                  |
|          └─ IndexedTableAccess(customer1 on [customer1.c_w_id,customer1.c_d_id,customer1.c_last,customer1.c_first]) |
+---------------------------------------------------------------------------------------------------------------------+

This query plan has a number of issues that cause it to access much more data than is necessary to answer the query:

  • The plan chooses the secondary index idx_customer1 for table customer1 table, rather than the clustered index, which causes an indirect secondary index lookup rather than a covering index lookup.
  • The plan does not pushdown the projection into the IndexedTableAccess node, meaning that many extraneous fields are accessed. The TEXT field c_data is particularly expensive to access as it requires an additional chunk read.
  • The filter predicate customer1.c_id = 2327 is not pushed down to the IndexedTableAccess node. This is possibly a side effect of the wrong index being chosen, as this column is not present in the index key for idx_customer1. The inability to pushdown this predicate is likely the most expensive problem with this query plan as it increases the cardinality of this index access from 1 to 3000.

The above query can be rewritten as:

SELECT c_discount, c_last, c_credit, w_tax 
FROM (
    SELECT c_w_id, c_discount, c_last, c_credit 
    FROM customer1 
    WHERE c_w_id = 1 AND c_d_id = 2 AND c_id = 2327 
) sub 
JOIN warehouse1 
ON sub.c_w_id = w_id 
WHERE w_id = 1

Which creates a much more optimal query plan:

+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan                                                                                                                                                  |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project(sub.c_discount, sub.c_last, sub.c_credit, warehouse1.w_tax)                                                                                   |
|  └─ InnerJoin(sub.c_w_id = warehouse1.w_id)                                                                                                           |
|      ├─ SubqueryAlias(sub)                                                                                                                            |
|      │   └─ Project(customer1.c_w_id, customer1.c_discount, customer1.c_last, customer1.c_credit)                                                     |
|      │       └─ Projected table access on [c_w_id c_discount c_last c_credit]                                                                         |
|      │           └─ IndexedTableAccess(customer1 on [customer1.c_w_id,customer1.c_d_id,customer1.c_id] with ranges: [{[1, 1], [2, 2], [2327, 2327]}]) |
|      └─ IndexedTableAccess(warehouse1 on [warehouse1.w_id] with ranges: [{[1, 1]}])                                                                   |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+

And results in significant performance improvements for the (modified) TPC-C benchmark

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions