Skip to content

Potential memberlist metrics issues. #305

@jcmackie

Description

@jcmackie

Hey all,

We are using Tempo, and trying to make use of the Tempo mixin Grafana dashboards... https://github.com/grafana/tempo/tree/main/operations/tempo-mixin

However I've noticed that some of the metrics that the dashboard expects that come from dskit/kv/memberlist/metrics might not be coming out correctly.

  1. tempo_memberlist_client_kv_store_count
    This chart https://github.com/grafana/tempo/blob/main/operations/tempo-mixin/dashboards/tempo-operational.json#L4585
    is looking for the metric tempo_memberlist_client_kv_store_count. I couldn't find it exposed on any of the /metrics endpoints I looked at.

I went looking for where this metric is configured and found this: https://github.com/grafana/dskit/blob/main/kv/memberlist/metrics.go#L123

	m.storeValuesDesc = prometheus.NewDesc(
		prometheus.BuildFQName(m.cfg.MetricsNamespace, subsystem, "kv_store_count"), // gauge
		"Number of values in KV Store",
		nil, nil)

I'm not super familiar with Prometheus, but the rest of the metric definitions in that file have NewGaugeVec or NewCounterVec. From what I can tell NewDesc is used for metric metadata?
But the metric name "kv_store_count" sounds like it should be a numerical value?

  1. gauge_memberlist_health_score
    This chart https://github.com/grafana/tempo/blob/main/operations/tempo-mixin/dashboards/tempo-operational.json#LL4319C1-L4319C1 is looking for a metric gauge_memberlist_health_score. Again I couldn't find it exposed in the /metrics endpoint.

I went looking for where this metric is configured and I couldn't find it!

But I did find this: https://github.com/grafana/dskit/blob/main/kv/memberlist/metrics.go#L154

	m.memberlistHealthScore = promauto.With(m.registerer).NewGaugeFunc(prometheus.GaugeOpts{
		Namespace: m.cfg.MetricsNamespace,
		Subsystem: subsystem,
		Name:      "cluster_node_health_score",
		Help:      "Health score of this cluster. Lower value is better. 0 = healthy",
	}, func() float64 {
		// m.memberlist is not set before Starting state
		if m.State() == services.Running || m.State() == services.Stopping {
			return float64(m.memberlist.GetHealthScore())
		}
		return 0
	})

The Go struct member is memberlistHealthScore similar to the metric I'm looking for, but the actual metric name is cluster_node_health_score.

Could this be a metric that has been changed in code, but the metric name hasn't been updated yet?

Also I think the Tempo mixin might be wrong... gauge_memberlist_health_score seems like it should be tempo_memberlist_client_cluster_node_health_score or something?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions