-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit
operator can not be pushed down to tikv when using IN
clause
#34882
Comments
Hi, this is not a bug. Since there's multiple values the column c returned, if you don't add top-n, the data get is with the order of (c, d), not d. |
And when the request is pushed down to tikv. The base request unit is |
I'm afraid you misunderstood my question. The optimal execution plan is:
There is no need to let |
Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently. I think such query is quite common. If the |
@winoros Or is it possible for planner to generate such execution plan (by itself)? SELECT id, a, b, c, d
FROM (
(SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c = 2 ORDER BY id ASC LIMIT 10)
UNION
(SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c = 3 ORDER BY id ASC LIMIT 10)
) r
ORDER BY id ASC
LIMIT 10; |
We cannot guarantee such behavior. Since that would increase the number of requests we send, increasing the pressure of the TiKV side. |
I know that, I mean the cost-based optimizer should be able to find the minimum one between the cost of more RPC and the cost of more Scan. In our senario, the number of records that match |
Hi there, I make a pull request to enable optimizer to generate an execution plan with |
Let me contact our PM first. |
Oh, I just ask, does the RDBMS you currently use support this behavior? |
I provide an optimizer hint to enable this, so I think your concern about "behavior" is uncessary. This is just an identity transformation of relational algebra. |
@Yriuns |
Agreed, this feature will increase the burden of physical plan optimization. It should be used only if the gain exceeds the CPU overhead. Under current optimizer framework, it seems impossible for the optimizer itself to decide whether to use or not according to the cost. But some is better than none, right? : ) |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
The field in
WHERE
andORDER BY
clause all hit the indexa_b_c_id
, so theLimit
operator should be pushed down totikv
's each range, thentidb
perform a finalTopN
, something like3. What did you see instead (Required)
The execution plan choose push down
TopN
rather thanLimit
, which results in an unnecessaryScan + Sort
.4. What is your TiDB version? (Required)
The text was updated successfully, but these errors were encountered: