Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up health check #216

Closed
Tracked by #465
djshow832 opened this issue Feb 10, 2023 · 0 comments · Fixed by #498
Closed
Tracked by #465

Speed up health check #216

djshow832 opened this issue Feb 10, 2023 · 0 comments · Fixed by #498
Labels
enhancement New feature or request

Comments

@djshow832
Copy link
Collaborator

djshow832 commented Feb 10, 2023

Background

Currently, the health check is serial.

  • The health check interval is 3s
  • The dial timeout for ETCD is 5s but the max retries is unlimited
  • The max retries is 3
  • The dial timeout for both SQL and HTTP port are 2s
  • The retry interval is 1s

For a cluster with N TiDB instances, the maximum overall interval is 3s+5s+(322s+221s)*N=8s+16Ns

If the graceful-wait-before-shutdown of TiDB is set to this duration, then it's too slow for scale-in or upgrading.

Solution

One possible way is to add a goroutine pool to do the health check.

@djshow832 djshow832 self-assigned this Feb 10, 2023
@djshow832 djshow832 removed their assignment Sep 7, 2023
@djshow832 djshow832 added the enhancement New feature or request label Jan 7, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in #498 Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant