Issue: Long SQL Causing TiKV gRPC to Hang #55845
Labels
component/tikv
may-affects-5.4
This bug maybe affects 5.4.x versions.
may-affects-6.1
may-affects-6.5
may-affects-7.1
may-affects-7.5
may-affects-8.1
report/customer
Customers have encountered this bug.
severity/major
type/bug
The issue is confirmed as a bug.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Background:
The SQL query is select * from t where id in (?), with many values in the IN clause, such as 10,000 entries. The SQL length is several MBs. During execution, TiKV experiences leader drop incidents, and monitoring shows server report failures. There is a noticeable leader drop.
After optimizing the SQL by reducing the contents of the IN clause, the issue no longer occurs. This effectively reduces the request size between TiDB and TiKV.
Problem Description:
This is a known issue. During interactions between TiDB and TiKV, if the request is too large, TiKV's gRPC threads become busy with frequent deserialization of large messages, causing the threads to hang. This affects the sending of other messages, such as heartbeat packets between the leader and followers of a Region.
This issue is caused by business reasons and can be reproduced in older versions, indicating that it exists across versions. The upgrade triggered this phenomenon. Many factors, including data and leader distribution, can affect the requests sent to TiKV, making it difficult to pinpoint specific changes. Restarting might also be a factor. However, the root cause is clear: this is an edge case. Future considerations may include enhancements to address it.
2. What did you expect to see? (Required)
3. What did you see instead (Required)
4. What is your TiDB version? (Required)
The text was updated successfully, but these errors were encountered: