Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config, cluster: add an option to halt the cluster scheduling #6498

Merged
merged 4 commits into from
May 25, 2023

Conversation

JmPotato
Copy link
Member

@JmPotato JmPotato commented May 22, 2023

What problem does this PR solve?

Issue Number: ref #6493.

What is changed and how does it work?

Add an option to halt the cluster scheduling.

Check List

Tests

  • Unit test
  • Integration test

During the halt:

image

image

image

Release note

None.

Signed-off-by: JmPotato <ghzpotato@gmail.com>
@JmPotato JmPotato requested review from nolouch and rleungx May 22, 2023 05:06
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 22, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • CabinfeverB
  • nolouch

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label May 22, 2023

// HaltScheduling is the option to halt the scheduling. Once it's on, PD will halt the scheduling,
// and any other scheduling configs will be ignored.
HaltScheduling bool `toml:"halt-scheduling" json:"halt-scheduling,string,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, I am trying to introduce a scheduling mode to cover this case. For me, it's ok to use an individual config to control it. Maybe we can name it enable-scheduling or something else.

Copy link
Member Author

@JmPotato JmPotato May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to use a configuration name with a default value of false to control the global scheduling switch, in order to avoid unexpected behaviors in scenarios that require compatibility considerations such as upgrades. Therefore, from this perspective, I think descriptions like "disable" or "halt" are more appropriate. At the same time, this global shutdown scheduling behavior should not be long-term. In addition, we already have the concept and operation of "pause" for Scheduler. So I ultimately chose the word "halt". WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I work on #6553, I found that maybe it's better to use one config for both unsafe recovery or halt, so that we can decouple the dependencies between cluster and coordinator.

Signed-off-by: JmPotato <ghzpotato@gmail.com>
@codecov
Copy link

codecov bot commented May 22, 2023

Codecov Report

Patch coverage: 60.71% and project coverage change: +0.31 🎉

Comparison is base (ccb0bba) 74.66% compared to head (60b1f87) 74.97%.

❗ Current head 60b1f87 differs from pull request most recent head e84b299. Consider uploading reports for the commit e84b299 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6498      +/-   ##
==========================================
+ Coverage   74.66%   74.97%   +0.31%     
==========================================
  Files         414      410       -4     
  Lines       42323    41910     -413     
==========================================
- Hits        31599    31421     -178     
+ Misses       7936     7727     -209     
+ Partials     2788     2762      -26     
Flag Coverage Δ
unittests 74.97% <60.71%> (+0.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/schedule/config/config.go 33.33% <ø> (ø)
server/cluster/diagnostic_manager.go 75.26% <ø> (ø)
server/config/persist_options.go 90.82% <25.00%> (-0.90%) ⬇️
server/cluster/coordinator.go 72.11% <37.50%> (ø)
server/cluster/cluster.go 81.93% <66.66%> (-0.05%) ⬇️
server/cluster/cluster_worker.go 70.00% <100.00%> (+3.33%) ⬆️
server/cluster/metrics.go 100.00% <100.00%> (ø)
server/config/config.go 75.20% <100.00%> (+0.10%) ⬆️

... and 76 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: JmPotato <ghzpotato@gmail.com>
@JmPotato JmPotato requested review from nolouch and rleungx May 22, 2023 08:50
server/cluster/cluster.go Show resolved Hide resolved
server/cluster/cluster.go Show resolved Hide resolved
server/cluster/cluster.go Show resolved Hide resolved
Copy link
Member

@CabinfeverB CabinfeverB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

"dashLength": 10,
"dashes": false,
"datasource": "${DS_TEST-CLUSTER}",
"description": "The allowance status of the scheduling.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting is near "Scheduler is running"
But it makes sense where it is now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer leaving it here since it's more like a cluster-level status rather than the scheduler. Another reason is that if it is placed in the Scheduler panel, it may cause many changes to the Grafana JSON file. If it is only appended here, there will be fewer changes.

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added status/LGT1 Indicates that a PR has LGTM 1. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 24, 2023
Signed-off-by: JmPotato <ghzpotato@gmail.com>
@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 25, 2023
@JmPotato
Copy link
Member Author

@rleungx @binshi-bing PTAL, thx.

@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 25, 2023
@JmPotato
Copy link
Member Author

/merge

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 25, 2023

@JmPotato: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 25, 2023

This pull request has been accepted and is ready to merge.

Commit hash: e84b299

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 25, 2023
@ti-chi-bot ti-chi-bot bot merged commit 99e2419 into tikv:master May 25, 2023
@JmPotato JmPotato deleted the halt_scheduling branch May 25, 2023 08:56
@HuSharp HuSharp added the needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. label Jun 6, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #6558.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Jun 6, 2023
ref tikv#6493

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this pull request Jun 6, 2023
…#6558)

ref #6493, ref #6498

Add an option to halt the cluster scheduling.

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: husharp <jinhao.hu@pingcap.com>
rleungx pushed a commit to rleungx/pd that referenced this pull request Dec 1, 2023
)

ref tikv#6493

Add an option to halt the cluster scheduling.

Signed-off-by: JmPotato <ghzpotato@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. release-note-none Denotes a PR that doesn't merit a release note. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants