Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when TiKV returns rate limit errors, lightning should continue on next regions instead of retry on the busy region #40205

Closed
lance6716 opened this issue Dec 28, 2022 · 4 comments · Fixed by #40278
Assignees
Labels
component/lightning This issue is related to Lightning of TiDB. type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@lance6716
Copy link
Contributor

Feature Request

Is your feature request related to a problem? Please describe:

Currently lightning will recompute the region that need to ingest and retry from the start

continue WriteAndIngest

when a region is busy, the logic will let lightning keeps retrying on this region.

Describe the feature you'd like:

lightning can continue on next regions

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

@lance6716 lance6716 added the type/feature-request Categorizes issue or PR as related to a new feature. label Dec 28, 2022
@lance6716 lance6716 self-assigned this Dec 28, 2022
@lance6716
Copy link
Contributor Author

/cc @gozssky
do I understand the behaviour correctly?

@lance6716 lance6716 added the component/lightning This issue is related to Lightning of TiDB. label Dec 28, 2022
@sleepymole
Copy link
Contributor

Do you mean too many sst error? It actually means the node is busy. If continue, I think we should try regions that are not on this node.

@sleepymole
Copy link
Contributor

Note that there is no check on the follower node. Before retrying, we'd better do some checks on the follower node.

@lance6716
Copy link
Contributor Author

lance6716 commented Dec 28, 2022

Do you mean too many sst error? It actually means the node is busy. If continue, I think we should try regions that are not on this node.

Yes currently too many sst is the only error of "rate limit" type, we might add more soon.

seems multi-rocksdb feature will be merged soon, I think we should not do that skipping-node optimization. On current single rocksdb architecture, skip only the busy region should improve some performance as well.

Note that there is no check on the follower node. Before retrying, we'd better do some checks on the follower node.

I'll consider merging #40116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/lightning This issue is related to Lightning of TiDB. type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants