Skip to content

process cop request by tikv instance instead of region #19381

Open
@zz-jason

Description

Description

Feature Request

Is your feature request related to a problem? Please describe:

Let's say we have an order table, whose schema is:

create table order (
    id bigint primary key auto_random, -- to avoid write hotspot
    customer_id bigint,
    ...,
    index idx_customer_id(customer_id)
);

A query like this:

select * from order where customer_id = ?;

As you can imagine, an optimal execution plan is to index lookup: scan all the id (note the int primary key in this table) based on the index idx_customer_id, then scan the table based on the obtained id. Since id is randomly generated, the obtained ids may cross lots of TiKV regions.

I've seen a user case that there are ~800 rows in total, which are placed in ~800 regions, to gather all the data, we need ~800 cop requests in total for all the involved region.

Describe the feature you'd like:

Typically, the number of regions is much larger than the number of tikv instances. If we can process cop requests by tikv instance instead of region, the number of cop requests can be greatly reduced, the performance can also be improved, because:

  • the total network time between TiDB and TiKV can be reduced.
  • the reduced cop task number can result in reduced cop wait time in a high QPS application.

The query optimizer may also need to be modified to adapt to this runtime optimization. For example, if the user queries the TopN on a table. That TopN operator may not be pushed down to TiKV coprocessor if the query optimizer thinks that a single TiKV region can not return more than N rows. But under the new cop task process mechanism, the query optimizer must estimate whether a TiKV instance can return more than N rows, not a single TiKV region.

As for the optimization itself, since the original region may be split or its leader may be transferred to another TiKV instance, the region-miss error and it's retry strategy need to be carefully designed. We may push pipeline-breaks which need to consume all the results of their child operators to complete their execution, like TopN and Hash Aggregate. So it's better to handle the region-miss error and retry other TiKV instances on the TiKV side. Make the TiKV instance guarantee to return the result if no error happens.

Describe alternatives you've considered:

N/A

Teachability, Documentation, Adoption, Migration Strategy:

N/A

SIG slack channel

#sig-exec

Score

  • 300

Mentor

Metadata

Assignees

No one assigned

    Labels

    challenge-programcomponent/executorfeature/acceptedThis feature request is accepted by product managersgood first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.sig/executionSIG executiontype/feature-requestCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions