-
Notifications
You must be signed in to change notification settings - Fork 76
Description
Hey all,
We are using Tempo, and trying to make use of the Tempo mixin Grafana dashboards... https://github.com/grafana/tempo/tree/main/operations/tempo-mixin
However I've noticed that some of the metrics that the dashboard expects that come from dskit/kv/memberlist/metrics might not be coming out correctly.
tempo_memberlist_client_kv_store_count
This chart https://github.com/grafana/tempo/blob/main/operations/tempo-mixin/dashboards/tempo-operational.json#L4585
is looking for the metrictempo_memberlist_client_kv_store_count. I couldn't find it exposed on any of the/metricsendpoints I looked at.
I went looking for where this metric is configured and found this: https://github.com/grafana/dskit/blob/main/kv/memberlist/metrics.go#L123
m.storeValuesDesc = prometheus.NewDesc(
prometheus.BuildFQName(m.cfg.MetricsNamespace, subsystem, "kv_store_count"), // gauge
"Number of values in KV Store",
nil, nil)
I'm not super familiar with Prometheus, but the rest of the metric definitions in that file have NewGaugeVec or NewCounterVec. From what I can tell NewDesc is used for metric metadata?
But the metric name "kv_store_count" sounds like it should be a numerical value?
gauge_memberlist_health_score
This chart https://github.com/grafana/tempo/blob/main/operations/tempo-mixin/dashboards/tempo-operational.json#LL4319C1-L4319C1 is looking for a metricgauge_memberlist_health_score. Again I couldn't find it exposed in the/metricsendpoint.
I went looking for where this metric is configured and I couldn't find it!
But I did find this: https://github.com/grafana/dskit/blob/main/kv/memberlist/metrics.go#L154
m.memberlistHealthScore = promauto.With(m.registerer).NewGaugeFunc(prometheus.GaugeOpts{
Namespace: m.cfg.MetricsNamespace,
Subsystem: subsystem,
Name: "cluster_node_health_score",
Help: "Health score of this cluster. Lower value is better. 0 = healthy",
}, func() float64 {
// m.memberlist is not set before Starting state
if m.State() == services.Running || m.State() == services.Stopping {
return float64(m.memberlist.GetHealthScore())
}
return 0
})
The Go struct member is memberlistHealthScore similar to the metric I'm looking for, but the actual metric name is cluster_node_health_score.
Could this be a metric that has been changed in code, but the metric name hasn't been updated yet?
Also I think the Tempo mixin might be wrong... gauge_memberlist_health_score seems like it should be tempo_memberlist_client_cluster_node_health_score or something?