Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] skip evict leader for v7.5.1 #8614

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
63eb0cb
placement: add rule/group count metrics (#7232) (#7243)
ti-chi-bot Oct 25, 2023
8b64ecf
rule_checker: fix the issue of not being able to achieve the better R…
ti-chi-bot Oct 25, 2023
7b3611a
*: check whether region is nil (#7263) (#7267)
ti-chi-bot Oct 26, 2023
a54621a
api: fix cannot dump trace (#7255) (#7265)
ti-chi-bot Oct 26, 2023
595d5b0
dashboard: update hotfix version (#7303) (#7307)
ti-chi-bot Nov 2, 2023
710ffcd
replication mode: fix wrong available store list (#7222) (#7328)
ti-chi-bot Nov 8, 2023
da1e92d
core: batch get region size (#7252) (#7332)
ti-chi-bot Nov 8, 2023
a22710c
checker: reduces the probability of deleting normal peers when the st…
lhy1024 Nov 8, 2023
7c65b8d
chore(dashboard): update tidb dashboard verstion to v2023.11.08.1 (#7…
ti-chi-bot Nov 9, 2023
d09a4f5
mcs/resourcemanager: delete expire tokenSlot (#7344) (#7350)
ti-chi-bot Nov 10, 2023
d0a17ca
etcdutil, leadership: avoid redundant created watch channel (#7352) (…
ti-chi-bot Nov 10, 2023
ef6ba85
resourcemanager: return resource-group priority in OnRequestWait (#73…
ti-chi-bot Nov 16, 2023
a5b9d66
go.mod: upgrade gin version from v1.8.1 to v1.9.1 (#7451) (#7514)
ti-chi-bot Dec 11, 2023
3d7f65e
resource_control: improve trace logs, ctl and metrics (#7510) (#7524)
ti-chi-bot Dec 12, 2023
d2074a9
resource_control: fix data race in controller (#7520) (#7526)
ti-chi-bot Dec 13, 2023
c9c9979
errs: remove redundant `FastGenWithCause` in `ZapError` (#7497) (#7545)
ti-chi-bot Dec 22, 2023
8ea0f6f
client: update the leader even if the connection creation fails (#744…
ti-chi-bot Dec 25, 2023
7ce5860
resource_mananger: deep clone resource group (#7623) (#7625)
ti-chi-bot Jan 2, 2024
511b094
resource_control: unify label name to group_name (#7547) (#7656)
ti-chi-bot Jan 3, 2024
a276843
resource_group: don't accumulate tokens when burstlimit less than 0 (…
ti-chi-bot Jan 4, 2024
0794b5e
memory: support cgroup with systemd (#7627) (#7666)
ti-chi-bot Jan 10, 2024
25071dd
scheduler: add aduit log for scheduler config API and add resp msg fo…
ti-chi-bot Jan 16, 2024
1be15d7
check: remove orphan peer only when the peers is greater than the rul…
ti-chi-bot Feb 1, 2024
6978558
client: return total wait duration in resource interceptor OnRequestW…
ti-chi-bot Feb 2, 2024
ae19047
member: avoid frequent campaign times (#7301) (#7790)
ti-chi-bot Feb 2, 2024
85e1a27
*: cherry-pick the etcd client health checker improvements (#7793)
JmPotato Feb 4, 2024
318a3fd
mcs: fix metrics cleanup (#7652) (#7659)
ti-chi-bot Feb 5, 2024
83f290a
*: fix context usage when watch etcd (#7806) (#7811)
ti-chi-bot Feb 7, 2024
decd310
schedule: fix panic when switching placement rules (#7415) (#7425)
ti-chi-bot Feb 7, 2024
ae9db49
api: fix panic when region doesn't have a leader (#7629) (#7650)
ti-chi-bot Feb 9, 2024
b8feb2b
prepare_check: remove redundant check (#7217) (#7818)
ti-chi-bot Feb 10, 2024
3488a65
*: fix region stats check (#7748) (#7812)
ti-chi-bot Feb 10, 2024
7294ff9
chore(dashboard): update TiDB Dashboard to v7.5.1-43fe8dac [release-7…
baurine Feb 20, 2024
d71a1a3
core: fix datarace in MergeLabels (#7537) (#7830)
ti-chi-bot Feb 20, 2024
463297b
scheduler: skip evict-leader-scheduler when setting schedule deny lab…
okJiang Jun 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
check: remove orphan peer only when the peers is greater than the rul…
…e count (#7581) (#7590)

close #7584

The healthy orphan peer should be the last one to be removed only if there are extra peers to keep the high availablility.

Signed-off-by: bufferflies <1045931706@qq.com>

Co-authored-by: bufferflies <1045931706@qq.com>
  • Loading branch information
ti-chi-bot and bufferflies authored Feb 1, 2024
commit 1be15d7b439e71f5b945c2a6fb0f86c4596dd4f8
4 changes: 3 additions & 1 deletion pkg/schedule/checker/rule_checker.go
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,7 @@ func (c *RuleChecker) fixOrphanPeers(region *core.RegionInfo, fit *placement.Reg
}
}

extra := fit.ExtraCount()
// If hasUnhealthyFit is true, try to remove unhealthy orphan peers only if number of OrphanPeers is >= 2.
// Ref https://github.com/tikv/pd/issues/4045
if len(fit.OrphanPeers) >= 2 {
Expand All @@ -567,7 +568,8 @@ func (c *RuleChecker) fixOrphanPeers(region *core.RegionInfo, fit *placement.Reg
ruleCheckerRemoveOrphanPeerCounter.Inc()
return operator.CreateRemovePeerOperator("remove-unhealthy-orphan-peer", c.cluster, 0, region, orphanPeer.StoreId)
}
if hasHealthPeer {
// The healthy orphan peer can be removed to keep the high availability only if the peer count is greater than the rule requirement.
if hasHealthPeer && extra > 0 {
// there already exists a healthy orphan peer, so we can remove other orphan Peers.
ruleCheckerRemoveOrphanPeerCounter.Inc()
// if there exists a disconnected orphan peer, we will pick it to remove firstly.
Expand Down
36 changes: 36 additions & 0 deletions pkg/schedule/checker/rule_checker_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2025,3 +2025,39 @@ func (suite *ruleCheckerTestAdvancedSuite) TestReplaceAnExistingPeerCases() {
suite.ruleManager.DeleteGroupBundle(groupName, false)
}
}

func (suite *ruleCheckerTestSuite) TestRemoveOrphanPeer() {
suite.cluster.AddLabelsStore(1, 1, map[string]string{"zone": "z1", "host": "h1"})
suite.cluster.AddLabelsStore(2, 1, map[string]string{"zone": "z1", "host": "h1"})
suite.cluster.AddLabelsStore(3, 1, map[string]string{"zone": "z1", "host": "h1"})
suite.cluster.AddLabelsStore(4, 1, map[string]string{"zone": "z2", "host": "h1"})
suite.cluster.AddLabelsStore(5, 1, map[string]string{"zone": "z2", "host": "h2"})
suite.cluster.AddLabelsStore(6, 1, map[string]string{"zone": "z2", "host": "h2"})
rule := &placement.Rule{
GroupID: "pd",
ID: "test2",
Role: placement.Voter,
Count: 3,
LabelConstraints: []placement.LabelConstraint{
{
Key: "zone",
Op: placement.In,
Values: []string{"z2"},
},
},
}
suite.ruleManager.SetRule(rule)
suite.ruleManager.DeleteRule("pd", "default")

// case1: regionA has 3 peers but not extra peer can be removed, so it needs to add peer first
suite.cluster.AddLeaderRegionWithRange(1, "200", "300", 1, 2, 3)
op := suite.rc.Check(suite.cluster.GetRegion(1))
suite.NotNil(op)
suite.Equal("add-rule-peer", op.Desc())

// case2: regionB has 4 peers and one extra peer can be removed, so it needs to remove extra peer first
suite.cluster.AddLeaderRegionWithRange(2, "300", "400", 1, 2, 3, 4)
op = suite.rc.Check(suite.cluster.GetRegion(2))
suite.NotNil(op)
suite.Equal("remove-orphan-peer", op.Desc())
}
9 changes: 9 additions & 0 deletions pkg/schedule/placement/fit.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,15 @@ func (f *RegionFit) IsSatisfied() bool {
return len(f.OrphanPeers) == 0
}

// ExtraCount return the extra count.
func (f *RegionFit) ExtraCount() int {
desired := 0
for _, r := range f.RuleFits {
desired += r.Rule.Count
}
return len(f.regionStores) - desired
}

// GetRuleFit returns the RuleFit that contains the peer.
func (f *RegionFit) GetRuleFit(peerID uint64) *RuleFit {
for _, rf := range f.RuleFits {
Expand Down
Loading