`Limit` operator can not be pushed down to tikv when using `IN` clause #34882

Yriuns · 2022-05-23T03:50:57Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

create a table

CREATE TABLE `t` (
  `id` bigint(22) NOT NULL AUTO_INCREMENT,
  `a` bigint(20) NOT NULL DEFAULT '0',
  `b` bigint(20) NOT NULL DEFAULT '0',
  `c` bigint(20) NOT NULL DEFAULT '0',
  `d` bigint(20) NOT NULL DEFAULT '0',
  UNIQUE KEY `uk_id` (`id`),
  KEY `a_b_c_id` (`a`,`b`,`c`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=1;

perform a query

explain
SELECT id, a, b, c, d
FROM t
WHERE a = 0 AND b = 1 AND c IN (2, 3)
ORDER BY id ASC
LIMIT 10;

2. What did you expect to see? (Required)

The field in WHERE and ORDER BY clause all hit the index a_b_c_id, so the Limit operator should be pushed down to tikv's each range, then tidb perform a final TopN, something like

mysql> explain SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c IN (2, 3) ORDER BY id ASC LIMIT 10;
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
| id                               | estRows | task      | access object                        | operator info                                                      |
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
| TopN_9                           | 0.00    | root      |                                      | test.t.id, offset:0, count:10                                      |
| └─IndexLookUp_20                 | 0.00    | root      |                                      |                                                                    |
|   ├─Limit_19(Build)              | 0.00    | cop[tikv] |                                      | test.t.id, offset:0, count:10                                      |
|   │ └─IndexRangeScan_17          | 0.00    | cop[tikv] | table:t, index:a_b_c_id(a, b, c, id) | range:[0 1 2,0 1 2], [0 1 3,0 1 3], keep order:true, stats:pseudo |
|   └─TableRowIDScan_18(Probe)     | 0.00    | cop[tikv] | table:t                              | keep order:false, stats:pseudo                                     |
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
5 rows in set (0.00 sec)

3. What did you see instead (Required)

The execution plan choose push down TopN rather than Limit, which results in an unnecessary Scan + Sort.

mysql> explain SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c IN (2, 3) ORDER BY id ASC LIMIT 10;
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
| id                               | estRows | task      | access object                        | operator info                                                      |
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
| TopN_9                           | 0.00    | root      |                                      | test.t.id, offset:0, count:10                                      |
| └─IndexLookUp_20                 | 0.00    | root      |                                      |                                                                    |
|   ├─TopN_19(Build)               | 0.00    | cop[tikv] |                                      | test.t.id, offset:0, count:10                                      |
|   │ └─IndexRangeScan_17          | 0.00    | cop[tikv] | table:t, index:a_b_c_id(a, b, c, id) | range:[0 1 2,0 1 2], [0 1 3,0 1 3], keep order:false, stats:pseudo |
|   └─TableRowIDScan_18(Probe)     | 0.00    | cop[tikv] | table:t                              | keep order:false, stats:pseudo                                     |
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
5 rows in set (0.00 sec)

4. What is your TiDB version? (Required)

Release Version: v6.0.0
Edition: Community
Git Commit Hash: 36a9810441ca0e496cbd22064af274b3be771081
Git Branch: heads/refs/tags/v6.0.0
UTC Build Time: 2022-03-31 10:27:25
GoVersion: go1.18
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false

The text was updated successfully, but these errors were encountered:

winoros · 2022-05-23T09:16:12Z

Hi, this is not a bug.
The index is (a, b, c, d). And there's IN condition on c, equal condition on a and b. You want to return data with the order of d.

Since there's multiple values the column c returned, if you don't add top-n, the data get is with the order of (c, d), not d.

winoros · 2022-05-23T09:18:40Z

And when the request is pushed down to tikv. The base request unit is region, not range.
If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Yriuns · 2022-05-23T11:08:12Z

Hi, this is not a bug. The index is (a, b, c, d). And there's IN condition on c, equal condition on a and b. You want to return data with the order of d.

Since there's multiple values the column c returned, if you don't add top-n, the data get is with the order of (c, d), not d.

I'm afraid you misunderstood my question.

The optimal execution plan is:

tidb ask tikv to scan the first 10 records of (a=0, b=1, c=2), with their id fields. (Limit@cop)
tidb ask tikv to scan the first 10 records of (a=0, b=1, c=3), with their id fields. (Limit@cop)
tidb perform a Top-10 on these 20 records, order by id. (TopN@root)

There is no need to let tikv perform Top-10, right?

Yriuns · 2022-05-23T11:18:57Z

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently.

I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

Yriuns · 2022-05-25T03:33:33Z

@winoros Or is it possible for planner to generate such execution plan (by itself)?

SELECT id, a, b, c, d
FROM (
    (SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c = 2 ORDER BY id ASC LIMIT 10)
    UNION
    (SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c = 3 ORDER BY id ASC LIMIT 10)
) r
ORDER BY id ASC
LIMIT 10;

winoros · 2022-05-25T12:53:59Z

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently.

I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

We cannot guarantee such behavior. Since that would increase the number of requests we send, increasing the pressure of the TiKV side.

Yriuns · 2022-05-25T16:01:41Z

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently.
I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

We cannot guarantee such behavior. Since that would increase the number of requests we send, increasing the pressure of the TiKV side.

I know that, I mean the cost-based optimizer should be able to find the minimum one between the cost of more RPC and the cost of more Scan.

In our senario, the number of records that match a = 0 AND b = 1 AND c IN (2, 3) may reach several millions, which means a range scan of several millions in tikv. However, if the planner can rewrite the SQL to the UNION format, it only needs to scan 20 records.

Yriuns · 2022-06-23T14:05:01Z

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently.
I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

We cannot guarantee such behavior. Since that would increase the number of requests we send, increasing the pressure of the TiKV side.

Hi there, I make a pull request to enable optimizer to generate an execution plan with Limit push down. Will you take a look? If you agree with the idea of this PR, I will continue to improve it. Otherwise, I will just closed it 😢

winoros · 2022-06-24T08:57:28Z

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently.
I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

We cannot guarantee such behavior. Since that would increase the number of requests we send, increasing the pressure of the TiKV side.

Hi there, I make a pull request to enable optimizer to generate an execution plan with Limit push down. Will you take a look? If you agree with the idea of this PR, I will continue to improve it. Otherwise, I will just closed it 😢

Let me contact our PM first.

winoros · 2022-06-24T09:21:21Z

Oh, I just ask, does the RDBMS you currently use support this behavior?

Yriuns · 2022-06-24T12:56:11Z

Oh, I just ask, does the RDBMS you currently use support this behavior?

I provide an optimizer hint to enable this, so I think your concern about "behavior" is uncessary. This is just an identity transformation of relational algebra.

winoros · 2022-06-28T09:07:27Z

Oh, I just ask, does the RDBMS you currently use support this behavior?

I provide an optimizer hint to enable this, so I think your concern about "behavior" is uncessary. This is just an identity transformation of relational algebra.

@Yriuns
Yes, it's not elegant enough to put into TiDB as a release behavior. A switch usually means that if we don't enable it, most of the users won't use it.

Yriuns · 2022-06-28T10:41:46Z

Oh, I just ask, does the RDBMS you currently use support this behavior?

I provide an optimizer hint to enable this, so I think your concern about "behavior" is uncessary. This is just an identity transformation of relational algebra.

@Yriuns Yes, it's not elegant enough to put into TiDB as a release behavior. A switch usually means that if we don't enable it, most of the users won't use it.

Agreed, this feature will increase the burden of physical plan optimization. It should be used only if the gain exceeds the CPU overhead. Under current optimizer framework, it seems impossible for the optimizer itself to decide whether to use or not according to the cost.

But some is better than none, right? : )
And what's the official attitude about optimizer hint? I just notice that v6.1 add a LEADING hint.

Yriuns added the type/bug The issue is confirmed as a bug. label May 23, 2022

ChenPeng2013 added sig/planner SIG: Planner severity/moderate labels May 23, 2022

winoros added type/question The issue belongs to a question. and removed type/bug The issue is confirmed as a bug. severity/moderate labels May 23, 2022

Yriuns linked a pull request Jun 23, 2022 that will close this issue

planner: support expand IN expr #35699

Open

12 tasks

seiya-annie added the report/customer Customers have encountered this bug. label Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Limit` operator can not be pushed down to tikv when using `IN` clause #34882

`Limit` operator can not be pushed down to tikv when using `IN` clause #34882

Yriuns commented May 23, 2022 •

edited

Loading

winoros commented May 23, 2022

winoros commented May 23, 2022

Yriuns commented May 23, 2022 •

edited

Loading

Yriuns commented May 23, 2022 •

edited

Loading

Yriuns commented May 25, 2022 •

edited

Loading

winoros commented May 25, 2022

Yriuns commented May 25, 2022 •

edited

Loading

Yriuns commented Jun 23, 2022 •

edited

Loading

winoros commented Jun 24, 2022

winoros commented Jun 24, 2022

Yriuns commented Jun 24, 2022 •

edited

Loading

winoros commented Jun 28, 2022 •

edited

Loading

Yriuns commented Jun 28, 2022

Limit operator can not be pushed down to tikv when using IN clause #34882

Limit operator can not be pushed down to tikv when using IN clause #34882

Comments

Yriuns commented May 23, 2022 • edited Loading

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

winoros commented May 23, 2022

winoros commented May 23, 2022

Yriuns commented May 23, 2022 • edited Loading

Yriuns commented May 23, 2022 • edited Loading

Yriuns commented May 25, 2022 • edited Loading

winoros commented May 25, 2022

Yriuns commented May 25, 2022 • edited Loading

Yriuns commented Jun 23, 2022 • edited Loading

winoros commented Jun 24, 2022

winoros commented Jun 24, 2022

Yriuns commented Jun 24, 2022 • edited Loading

winoros commented Jun 28, 2022 • edited Loading

Yriuns commented Jun 28, 2022

`Limit` operator can not be pushed down to tikv when using `IN` clause #34882

`Limit` operator can not be pushed down to tikv when using `IN` clause #34882

Yriuns commented May 23, 2022 •

edited

Loading

Yriuns commented May 23, 2022 •

edited

Loading

Yriuns commented May 23, 2022 •

edited

Loading

Yriuns commented May 25, 2022 •

edited

Loading

Yriuns commented May 25, 2022 •

edited

Loading

Yriuns commented Jun 23, 2022 •

edited

Loading

Yriuns commented Jun 24, 2022 •

edited

Loading

winoros commented Jun 28, 2022 •

edited

Loading