Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] skip evict leader for v7.5.1 #8614

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
63eb0cb
placement: add rule/group count metrics (#7232) (#7243)
ti-chi-bot Oct 25, 2023
8b64ecf
rule_checker: fix the issue of not being able to achieve the better R…
ti-chi-bot Oct 25, 2023
7b3611a
*: check whether region is nil (#7263) (#7267)
ti-chi-bot Oct 26, 2023
a54621a
api: fix cannot dump trace (#7255) (#7265)
ti-chi-bot Oct 26, 2023
595d5b0
dashboard: update hotfix version (#7303) (#7307)
ti-chi-bot Nov 2, 2023
710ffcd
replication mode: fix wrong available store list (#7222) (#7328)
ti-chi-bot Nov 8, 2023
da1e92d
core: batch get region size (#7252) (#7332)
ti-chi-bot Nov 8, 2023
a22710c
checker: reduces the probability of deleting normal peers when the st…
lhy1024 Nov 8, 2023
7c65b8d
chore(dashboard): update tidb dashboard verstion to v2023.11.08.1 (#7…
ti-chi-bot Nov 9, 2023
d09a4f5
mcs/resourcemanager: delete expire tokenSlot (#7344) (#7350)
ti-chi-bot Nov 10, 2023
d0a17ca
etcdutil, leadership: avoid redundant created watch channel (#7352) (…
ti-chi-bot Nov 10, 2023
ef6ba85
resourcemanager: return resource-group priority in OnRequestWait (#73…
ti-chi-bot Nov 16, 2023
a5b9d66
go.mod: upgrade gin version from v1.8.1 to v1.9.1 (#7451) (#7514)
ti-chi-bot Dec 11, 2023
3d7f65e
resource_control: improve trace logs, ctl and metrics (#7510) (#7524)
ti-chi-bot Dec 12, 2023
d2074a9
resource_control: fix data race in controller (#7520) (#7526)
ti-chi-bot Dec 13, 2023
c9c9979
errs: remove redundant `FastGenWithCause` in `ZapError` (#7497) (#7545)
ti-chi-bot Dec 22, 2023
8ea0f6f
client: update the leader even if the connection creation fails (#744…
ti-chi-bot Dec 25, 2023
7ce5860
resource_mananger: deep clone resource group (#7623) (#7625)
ti-chi-bot Jan 2, 2024
511b094
resource_control: unify label name to group_name (#7547) (#7656)
ti-chi-bot Jan 3, 2024
a276843
resource_group: don't accumulate tokens when burstlimit less than 0 (…
ti-chi-bot Jan 4, 2024
0794b5e
memory: support cgroup with systemd (#7627) (#7666)
ti-chi-bot Jan 10, 2024
25071dd
scheduler: add aduit log for scheduler config API and add resp msg fo…
ti-chi-bot Jan 16, 2024
1be15d7
check: remove orphan peer only when the peers is greater than the rul…
ti-chi-bot Feb 1, 2024
6978558
client: return total wait duration in resource interceptor OnRequestW…
ti-chi-bot Feb 2, 2024
ae19047
member: avoid frequent campaign times (#7301) (#7790)
ti-chi-bot Feb 2, 2024
85e1a27
*: cherry-pick the etcd client health checker improvements (#7793)
JmPotato Feb 4, 2024
318a3fd
mcs: fix metrics cleanup (#7652) (#7659)
ti-chi-bot Feb 5, 2024
83f290a
*: fix context usage when watch etcd (#7806) (#7811)
ti-chi-bot Feb 7, 2024
decd310
schedule: fix panic when switching placement rules (#7415) (#7425)
ti-chi-bot Feb 7, 2024
ae9db49
api: fix panic when region doesn't have a leader (#7629) (#7650)
ti-chi-bot Feb 9, 2024
b8feb2b
prepare_check: remove redundant check (#7217) (#7818)
ti-chi-bot Feb 10, 2024
3488a65
*: fix region stats check (#7748) (#7812)
ti-chi-bot Feb 10, 2024
7294ff9
chore(dashboard): update TiDB Dashboard to v7.5.1-43fe8dac [release-7…
baurine Feb 20, 2024
d71a1a3
core: fix datarace in MergeLabels (#7537) (#7830)
ti-chi-bot Feb 20, 2024
463297b
scheduler: skip evict-leader-scheduler when setting schedule deny lab…
okJiang Jun 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
placement: add rule/group count metrics (#7232) (#7243)
close #7242

placement: add rule/group count metrics

Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: nolouch <nolouch@gmail.com>
  • Loading branch information
ti-chi-bot and nolouch authored Oct 25, 2023
commit 63eb0cb4572b9dce804f61703d9a70d479a14c63
105 changes: 105 additions & 0 deletions metrics/grafana/pd.json
Original file line number Diff line number Diff line change
Expand Up @@ -1139,6 +1139,111 @@
"timeFrom": null,
"timeShift": null
},
{
"bars": false,
"cacheTimeout": null,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_TEST-CLUSTER}",
"description": "The current peer count of the cluster",
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 6,
"w": 4,
"x": 16,
"y": 13
},
"hiddenSeries": false,
"id": 22,
"interval": null,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"maxDataPoints": 100,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.10",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "sum(pd_rule_manager_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\"}) by (type)",
"format": "time_series",
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{type}}",
"refId": "A",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Placement Rules Status",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:192",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:193",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"collapsed": true,
"gridPos": {
Expand Down
14 changes: 14 additions & 0 deletions pkg/schedule/placement/rule_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,20 @@ func (m *RuleManager) GetAllRules() []*Rule {
return rules
}

// GetRulesCount returns the number of rules.
func (m *RuleManager) GetRulesCount() int {
m.RLock()
defer m.RUnlock()
return len(m.ruleConfig.rules)
}

// GetGroupsCount returns the number of rule groups.
func (m *RuleManager) GetGroupsCount() int {
m.RLock()
defer m.RUnlock()
return len(m.ruleConfig.groups)
}

// GetRulesByGroup returns sorted rules of a group.
func (m *RuleManager) GetRulesByGroup(group string) []*Rule {
m.RLock()
Expand Down
2 changes: 2 additions & 0 deletions pkg/schedule/placement/rule_manager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ func TestSaveLoad(t *testing.T) {
re.Equal(rules[0].String(), m2.GetRule("pd", "default").String())
re.Equal(rules[1].String(), m2.GetRule("foo", "baz").String())
re.Equal(rules[2].String(), m2.GetRule("foo", "bar").String())
re.Equal(manager.GetRulesCount(), 3)
re.Equal(manager.GetGroupsCount(), 2)
}

func TestSetAfterGet(t *testing.T) {
Expand Down
9 changes: 9 additions & 0 deletions pkg/schedule/schedulers/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -134,10 +134,19 @@ var (
Name: "hot_pending_sum",
Help: "Pending influence sum of store in hot region scheduler.",
}, []string{"store", "rw", "dim"})

ruleStatusGauge = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "pd",
Subsystem: "rule_manager",
Name: "status",
Help: "Status of the rule.",
}, []string{"type"})
)

func init() {
prometheus.MustRegister(schedulerStatusGauge)
prometheus.MustRegister(ruleStatusGauge)
prometheus.MustRegister(schedulerCounter)
prometheus.MustRegister(balanceWitnessCounter)
prometheus.MustRegister(hotSchedulerResultCounter)
Expand Down
20 changes: 18 additions & 2 deletions pkg/schedule/schedulers/scheduler_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@ import (

const maxScheduleRetries = 10

var denySchedulersByLabelerCounter = labeler.LabelerEventCounter.WithLabelValues("schedulers", "deny")
var (
denySchedulersByLabelerCounter = labeler.LabelerEventCounter.WithLabelValues("schedulers", "deny")
rulesCntStatusGauge = ruleStatusGauge.WithLabelValues("rule_count")
groupsCntStatusGauge = ruleStatusGauge.WithLabelValues("group_count")
)

// Controller is used to manage all schedulers.
type Controller struct {
Expand Down Expand Up @@ -108,7 +112,6 @@ func (c *Controller) GetSchedulerHandlers() map[string]http.Handler {
// CollectSchedulerMetrics collects metrics of all schedulers.
func (c *Controller) CollectSchedulerMetrics() {
c.RLock()
defer c.RUnlock()
for _, s := range c.schedulers {
var allowScheduler float64
// If the scheduler is not allowed to schedule, it will disappear in Grafana panel.
Expand All @@ -118,6 +121,15 @@ func (c *Controller) CollectSchedulerMetrics() {
}
schedulerStatusGauge.WithLabelValues(s.Scheduler.GetName(), "allow").Set(allowScheduler)
}
c.RUnlock()
ruleMgr := c.cluster.GetRuleManager()
if ruleMgr == nil {
return
}
ruleCnt := ruleMgr.GetRulesCount()
groupCnt := ruleMgr.GetGroupsCount()
rulesCntStatusGauge.Set(float64(ruleCnt))
groupsCntStatusGauge.Set(float64(groupCnt))
}

func (c *Controller) isSchedulingHalted() bool {
Expand All @@ -127,6 +139,10 @@ func (c *Controller) isSchedulingHalted() bool {
// ResetSchedulerMetrics resets metrics of all schedulers.
func (c *Controller) ResetSchedulerMetrics() {
schedulerStatusGauge.Reset()
ruleStatusGauge.Reset()
// create in map again
rulesCntStatusGauge = ruleStatusGauge.WithLabelValues("rule_count")
groupsCntStatusGauge = ruleStatusGauge.WithLabelValues("group_count")
}

// AddSchedulerHandler adds the HTTP handler for a scheduler.
Expand Down
Loading