-
-
Couldn't load subscription status.
- Fork 585
Closed
Labels
Description
TPC-C is composed of 5 transaction types. The "new order" transaction contains a join query on the customer and warehouse tables:
SELECT c_discount, c_last, c_credit, w_tax
FROM customer1, warehouse1
WHERE w_id = 1 AND c_w_id = w_id AND c_d_id = 2 AND c_id = 2327
The schemas for the customer and warehouse tables are as follows:
% dolt schema export customer1
CREATE TABLE `customer1` (
`c_id` int NOT NULL,
`c_d_id` tinyint NOT NULL,
`c_w_id` smallint NOT NULL,
`c_first` varchar(16),
`c_middle` char(2),
`c_last` varchar(16),
`c_street_1` varchar(20),
`c_street_2` varchar(20),
`c_city` varchar(20),
`c_state` char(2),
`c_zip` char(9),
`c_phone` char(16),
`c_since` datetime,
`c_credit` char(2),
`c_credit_lim` bigint,
`c_discount` decimal(4,2),
`c_balance` decimal(12,2),
`c_ytd_payment` decimal(12,2),
`c_payment_cnt` smallint,
`c_delivery_cnt` smallint,
`c_data` text,
PRIMARY KEY (`c_w_id`,`c_d_id`,`c_id`),
KEY `idx_customer1` (`c_w_id`,`c_d_id`,`c_last`,`c_first`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
% dolt schema export warehouse1
CREATE TABLE `warehouse1` (
`w_id` smallint NOT NULL,
`w_name` varchar(10),
`w_street_1` varchar(20),
`w_street_2` varchar(20),
`w_city` varchar(20),
`w_state` char(2),
`w_zip` char(9),
`w_tax` decimal(4,2),
`w_ytd` decimal(12,2),
PRIMARY KEY (`w_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
The join condition matches a single row in each table and the result set is also a single row:
+------------+-------------+----------+-------+
| c_discount | c_last | c_credit | w_tax |
+------------+-------------+----------+-------+
| 0.46 | PRESABLEBAR | GC | 0.12 |
+------------+-------------+----------+-------+
This join has the following query plan in v0.40.16
+---------------------------------------------------------------------------------------------------------------------+
| plan |
+---------------------------------------------------------------------------------------------------------------------+
| Project(customer1.c_discount, customer1.c_last, customer1.c_credit, warehouse1.w_tax) |
| └─ IndexedJoin(customer1.c_w_id = warehouse1.w_id) |
| ├─ IndexedTableAccess(warehouse1 on [warehouse1.w_id] with ranges: [{[1, 1]}]) |
| └─ Filter((customer1.c_d_id = 2) AND (customer1.c_id = 2327)) |
| └─ IndexedTableAccess(customer1 on [customer1.c_w_id,customer1.c_d_id,customer1.c_last,customer1.c_first]) |
+---------------------------------------------------------------------------------------------------------------------+
This query plan has a number of issues that cause it to access much more data than is necessary to answer the query:
- The plan chooses the secondary index
idx_customer1for tablecustomer1table, rather than the clustered index, which causes an indirect secondary index lookup rather than a covering index lookup. - The plan does not pushdown the projection into the
IndexedTableAccessnode, meaning that many extraneous fields are accessed. TheTEXTfieldc_datais particularly expensive to access as it requires an additional chunk read. - The filter predicate
customer1.c_id = 2327is not pushed down to theIndexedTableAccessnode. This is possibly a side effect of the wrong index being chosen, as this column is not present in the index key foridx_customer1. The inability to pushdown this predicate is likely the most expensive problem with this query plan as it increases the cardinality of this index access from 1 to 3000.
The above query can be rewritten as:
SELECT c_discount, c_last, c_credit, w_tax
FROM (
SELECT c_w_id, c_discount, c_last, c_credit
FROM customer1
WHERE c_w_id = 1 AND c_d_id = 2 AND c_id = 2327
) sub
JOIN warehouse1
ON sub.c_w_id = w_id
WHERE w_id = 1
Which creates a much more optimal query plan:
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project(sub.c_discount, sub.c_last, sub.c_credit, warehouse1.w_tax) |
| └─ InnerJoin(sub.c_w_id = warehouse1.w_id) |
| ├─ SubqueryAlias(sub) |
| │ └─ Project(customer1.c_w_id, customer1.c_discount, customer1.c_last, customer1.c_credit) |
| │ └─ Projected table access on [c_w_id c_discount c_last c_credit] |
| │ └─ IndexedTableAccess(customer1 on [customer1.c_w_id,customer1.c_d_id,customer1.c_id] with ranges: [{[1, 1], [2, 2], [2327, 2327]}]) |
| └─ IndexedTableAccess(warehouse1 on [warehouse1.w_id] with ranges: [{[1, 1]}]) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
And results in significant performance improvements for the (modified) TPC-C benchmark