process cop request by tikv instance instead of region

## Description
## Feature Request

**Is your feature request related to a problem? Please describe:**

Let's say we have an `order` table, whose schema is:

```sql
create table order (
    id bigint primary key auto_random, -- to avoid write hotspot
    customer_id bigint,
    ...,
    index idx_customer_id(customer_id)
);
```

A query like this:

```sql
select * from order where customer_id = ?;
```

As you can imagine, an optimal execution plan is to index lookup: scan all the `id` (note the int primary key in this table) based on the index `idx_customer_id`, then scan the table based on the obtained `id`. Since `id` is randomly generated, the obtained `id`s may cross lots of TiKV regions.

I've seen a user case that there are ~800 rows in total, which are placed in ~800 regions, to gather all the data, we need ~800 cop requests in total for all the involved region.

**Describe the feature you'd like:**

Typically, the number of regions is much larger than the number of tikv instances. If we can process cop requests by tikv instance instead of region, the number of cop requests can be greatly reduced, the performance can also be improved, because:

- the total network time between TiDB and TiKV can be reduced.
- the reduced cop task number can result in reduced cop wait time in a high QPS application.

The query optimizer may also need to be modified to adapt to this runtime optimization. For example, if the user queries the TopN on a table. That `TopN` operator may not be pushed down to TiKV coprocessor if the query optimizer thinks that a single TiKV region can not return more than `N` rows. But under the new cop task process mechanism, the query optimizer must estimate whether a TiKV instance can return more than `N` rows, not a single TiKV region.

As for the optimization itself, since the original region may be split or its leader may be transferred to another TiKV instance, the region-miss error and it's retry strategy need to be carefully designed. We may push pipeline-breaks which need to consume all the results of their child operators to complete their execution, like `TopN` and `Hash Aggregate`. So it's better to handle the region-miss error and retry other TiKV instances on the TiKV side. Make the TiKV instance guarantee to return the result if no error happens.

**Describe alternatives you've considered:**

N/A

**Teachability, Documentation, Adoption, Migration Strategy:**

N/A



## SIG slack channel
[#sig-exec](https://slack.tidb.io/invite?team=tidb-community&channel=sig-exec&ref=high-performance)

## Score
- 300

## Mentor
 * @XuHuaiyu


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process cop request by tikv instance instead of region #19381

Description

Feature Request

SIG slack channel

Score

Mentor

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

process cop request by tikv instance instead of region #19381

Description

Description

Feature Request

SIG slack channel

Score

Mentor

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions