cestorage is a cardinality estimator that receives Prometheus remote write streams
and exposes approximate time series cardinality as metrics (TODO: support remote write).
It is useful for tracking how many unique time series are flowing through across all metrics, metric name, or broken down by specific labels.
The goal of this project is to evaluate whether cardinality estimation can provide practical value within VictoriaMetrics and justify its inclusion in the core system (see proposal).
Running:
go run ./app/cestorage/... -config=streams.yaml -httpListenAddr=:8490
Configuration:
streams:
# Track total cardinality with no grouping.
- interval: '1h'
# Track cardinality grouped by metric name.
- interval: '1h'
group_by: ["__name__"]
# Track cardinality grouped by job label.
- interval: '1m'
group_by: ["job"]
# Track cardinality grouped by tenant info
- group_by: ["vm_account_id", "vm_project_id"]
# Track cardinality of jobs, with extra labels on the output metrics.
- group_by: ["job"]
labels:
region: 'eu-central-1'
env: 'production'Fields:
group_by(optional): list of label names to split cardinality by; each distinct combination gets its own estimategroup_limit(optional): maximum number of distinct groups to track; excess groups are counted in a rejected sketch but not individually; defaults to10000buckets(optional): number of internal shards for parallel ingestion; defaults tomin(20, availableCPUs)labels(optional): extra labels attached to all output metrics for this estimatorinterval(optional): how often to rotate (reset) counters; defaults to5mhll_precision(optional): HyperLogLog precision, must be in range[4, 18]; higher values yield more accurate estimates at the cost of more memory; defaults to14hll_sparse(optional): whether to use sparse HyperLogLog representation, which reduces memory for low-cardinality groups; defaults totrue
Cardinality generator:
go run ./app/cegen/main.go -cardI=100 -cardY=20 -template="foo{instance=\"127.0.0.[cardI]\",job=\"ametric[cardY]\"}"
By default, cardinality estimates are merged with regular metrics and exposed at /metrics.
This behavior is controlled by the -cardinalityMetrics.exposeAt flag:
-cardinalityMetrics.exposeAt=/metrics(default): cardinality metrics merged with regular metrics at/metrics-cardinalityMetrics.exposeAt=/cardinality/metrics: only cardinality metrics exposed at that path-cardinalityMetrics.exposeAt=: cardinality metrics not exposed via HTTP
All metrics include interval, group_by_keys, and group_by_values labels. Extra labels from the labels config field are inserted between interval and group_by_keys (sorted alphabetically).
Without grouping (group_by_keys is __global__ and group_by_values is not set):
cardinality_estimate{interval="1h0m0s",group_by_keys="__global__"} 142300
With grouping — one summary line (total distinct group count) plus one line per distinct label value combination. Each per-group line also includes individual by_{key}="{val}" labels for each group key:
cardinality_estimate{interval="5m0s",group_by_keys="__group__",group_by_values="instance,job"} 2
cardinality_estimate{interval="5m0s",group_by_keys="instance,job",group_by_values="host1:9090,prometheus",by_instance="host1:9090",by_job="prometheus"} 312
cardinality_estimate{interval="5m0s",group_by_keys="instance,job",group_by_values="host2:9100,node",by_instance="host2:9100",by_job="node"} 87
With extra labels:
cardinality_estimate{interval="5m0s",env="production",region="eu-central-1",group_by_keys="job",group_by_values="prometheus",by_job="prometheus"} 312
When grouping is enabled, cestorage exposes per-bucket operational metrics at /metrics:
cestorage_group_estimator_size{groupBy, bucket}— number of active groups in this bucket after the last rotationcestorage_group_estimator_rejected_size{groupBy, bucket}— estimated number of distinct group values rejected since the last rotation becausegroup_limitwas reachedcestorage_group_limit{groupBy, bucket}— configuredgroup_limitfor this bucket
There are Grafana dashboards available in dashboards directory:
$ go test ./... -run=none -bench=.
? github.com/makasim/cestimator/app/cegen [no test files]
goos: darwin
goarch: arm64
pkg: github.com/makasim/cestimator/app/cestorage
cpu: Apple M1 Pro
BenchmarkEstimator_WriteMetrics/NoGroup/NoPrev-10 937376 1265 ns/op 1504 B/op 12 allocs/op
BenchmarkEstimator_WriteMetrics/NoGroup/WithPrev-10 625159 1843 ns/op 1504 B/op 12 allocs/op
BenchmarkEstimator_WriteMetrics/Group100/NoPrev-10 56973 21076 ns/op 3745 B/op 81 allocs/op
BenchmarkEstimator_WriteMetrics/Group100/WithPrev-10 43438 27834 ns/op 3745 B/op 81 allocs/op
BenchmarkEstimator_WriteMetrics/Group10k/NoPrev-10 807 1530942 ns/op 3106 B/op 71 allocs/op
BenchmarkEstimator_WriteMetrics/Group10k/WithPrev-10 580 2060489 ns/op 3107 B/op 71 allocs/op
BenchmarkEstimator_InsertManyParallel/NoGroup-10 15398458 78.11 ns/op 0 B/op 0 allocs/op
BenchmarkEstimator_InsertManyParallel/Group100-10 14786208 82.26 ns/op 15 B/op 1 allocs/op
BenchmarkEstimator_InsertManyParallel/Group10k-10 13931193 84.10 ns/op 24 B/op 2 allocs/op
BenchmarkEstimator_InsertManyParallel/Group100k-10 7087110 174.6 ns/op 24 B/op 3 allocs/op
BenchmarkParse_EstimatorGlobal-10 2656 476446 ns/op 18224 B/op 26 allocs/op
BenchmarkParse_EstimatorGroup-10 4430 259190 ns/op 129 B/op 6 allocs/op
PASS
ok github.com/makasim/cestimator/app/cestorage 17.104s
goos: darwin
goarch: arm64
pkg: github.com/makasim/cestimator/app/cestorage/protoparser
cpu: Apple M1 Pro
BenchmarkStreamParse-10 96 12052191 ns/op 162.92 MB/s 225972 B/op 6 allocs/op
PASS
ok github.com/makasim/cestimator/app/cestorage/protoparser 1.482s