Skip to content

Conversation

@max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Nov 6, 2023

The randIO parameter for LOOKUP_JOIN costing was perhaps too strict, since that cost is already stacked on top of the sequential cost. This isn't a replacement for better costing, but boosts TPC-C perf a bit and isn't less correct than the previous version.

This was the motivating query, executed as a HASH_JOIN before:

sbt> explain SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+------------------------------------------------------------------------------------------------------------+
| plan                                                                                                       |
+------------------------------------------------------------------------------------------------------------+
| Project                                                                                                    |
|  ├─ columns: [countdistinct([stock3.s_i_id])]                                                              |
|  └─ GroupBy                                                                                                |
|      ├─ SelectedExprs(COUNTDISTINCT([stock3.s_i_id]))                                                      |
|      ├─ Grouping()                                                                                         |
|      └─ LookupJoin                                                                                         |
|          ├─ IndexedTableAccess(order_line3)                                                                |
|          │   ├─ index: [order_line3.ol_w_id,order_line3.ol_d_id,order_line3.ol_o_id,order_line3.ol_number] |
|          │   ├─ filters: [{[1, 1], [5, 5], [2983, 3003), [NULL, ∞)}]                                       |
|          │   └─ columns: [ol_o_id ol_d_id ol_w_id ol_i_id]                                                 |
|          └─ Filter                                                                                         |
|              ├─ ((stock3.s_w_id = 1) AND (stock3.s_quantity < 18))                                         |
|              └─ IndexedTableAccess(stock3)                                                                 |
|                  ├─ index: [stock3.s_w_id,stock3.s_i_id]                                                   |
|                  ├─ columns: [s_i_id s_w_id s_quantity]                                                    |
|                  └─ keys: 1, order_line3.ol_i_id                                                           |
+------------------------------------------------------------------------------------------------------------+

@max-hoffman
Copy link
Contributor Author

So this breaks one of our index join tests that's faster with MERGE_JOIN...should I close this for now or change the parameter to a value s.t. both queries have the desired plan?

@max-hoffman
Copy link
Contributor Author

The better fix is when I get to phase 2 of costing, and can get an accurate estimate for the LHS and RHS of joins. Small LHS means LOOKUP_JOIN is better, big LHS means MERGE_JOIN or HASH_JOIN is better, but we can't tell the difference right now.

@max-hoffman
Copy link
Contributor Author

I added sysbench plan tests so that we don't accidentally break benchmarks and to compensate for the randIO magic number.

@max-hoffman max-hoffman merged commit 1513b8c into main Nov 7, 2023
@max-hoffman max-hoffman deleted the max/more-tpcc branch November 7, 2023 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants