[WIP] hotspot telemetry performance demonstration #141518

angles-n-daemons · 2025-02-14T18:37:14Z

No description provided.

To track the movement of samples within a replica, we expose a `SampleMovement` function, which exposes the proportion of samples to the left or right over time. Doing so only occurs during replica traffic sampling, which requires a high threshold of traffic to funnel through a single range. Knowing "absolute sample movement", or rather, all of the movement on one side or the other will help us identify hotspots due to monotonically increasing. Fixes: cockroachdb#138575 Epic: CRDB-43150 Release note: None

blathers-crl · 2025-02-14T18:37:18Z

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

angles-n-daemons · 2025-02-14T18:37:21Z

cockroach-teamcity · 2025-02-14T18:37:26Z

This change is

angles-n-daemons · 2025-02-14T18:38:48Z

Because of competition and variability in speed to writes, writes actually have some variance (50% - 100% rightward direction). I believe is caused by competition to reaching KV within the system, as well as the fact that multiple workload processes are running simultaneously.

angles-n-daemons · 2025-02-14T18:45:36Z

Popular keys with zipfian:

github-actions · 2025-02-18T16:51:16Z

🟡 Sysbench [SQL, 3node, oltp_read_write]

Metric	Old Commit	New Commit	Delta	Note	Threshold
🟡 sec/op	10.93m ±1%	11.00m ±1%	+0.56%	p=0.043 n=10	3.0%
⚪ errs/op	0.000 ±0%	0.000 ±0%	~	p=1.000 n=10	0.0%
⚪ allocs/op	10.27k ±0%	10.27k ±0%	~	p=0.986 n=10	2.0%
⚪ B/op	2.216Mi ±0%	2.216Mi ±0%	~	p=0.971 n=10	2.0%

Reproduce

benchdiff binaries:

mkdir -p benchdiff/66c8c87/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4/bin/pkg_sql_tests benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/562d26b/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/562d26b3c86fbc0fb5363ad2bfeab118c2e08c58/bin/pkg_sql_tests benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=562d26b --new=66c8c87 ./pkg/sql/tests

⚪ Sysbench [KV, 1node, local, oltp_read_only]

Metric	Old Commit	New Commit	Delta	Note	Threshold
⚪ sec/op	670.7µ ±2%	670.0µ ±1%	~	p=0.739 n=10	2.0%
⚪ errs/op	0.000 ±0%	0.000 ±0%	~	p=1.000 n=10	0.0%
⚪ allocs/op	439.0 ±0%	439.0 ±0%	~	p=1.000 n=10	1.5%
⚪ B/op	254.2Ki ±0%	254.2Ki ±0%	~	p=0.853 n=10	1.5%

Reproduce

benchdiff binaries:

mkdir -p benchdiff/66c8c87/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4/bin/pkg_sql_tests benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/562d26b/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/562d26b3c86fbc0fb5363ad2bfeab118c2e08c58/bin/pkg_sql_tests benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/1node_local/oltp_read_only$ --old=562d26b --new=66c8c87 ./pkg/sql/tests

⚪ Sysbench [KV, 1node, local, oltp_write_only]

Metric	Old Commit	New Commit	Delta	Note	Threshold
⚪ sec/op	1.330m ±1%	1.335m ±0%	~	p=0.143 n=10	2.5%
⚪ errs/op	0.000 ±0%	0.000 ±0%	~	p=1.000 n=10	0.0%
⚪ allocs/op	1.393k ±0%	1.393k ±0%	~	p=0.559 n=10	1.8%
⚪ B/op	290.1Ki ±0%	290.1Ki ±0%	~	p=0.631 n=10	1.8%

Reproduce

benchdiff binaries:

mkdir -p benchdiff/66c8c87/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4/bin/pkg_sql_tests benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/562d26b/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/562d26b3c86fbc0fb5363ad2bfeab118c2e08c58/bin/pkg_sql_tests benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/1node_local/oltp_write_only$ --old=562d26b --new=66c8c87 ./pkg/sql/tests

Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4/13420580902-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/562d26b3c86fbc0fb5363ad2bfeab118c2e08c58/13420580902-1/\* old/

Legend

⚪ Neutral: No significant performance change.
🟡 Warning: Slight degradation, likely due to variance, but still within thresholds.
🔴 Regression: Likely performance regression, requiring investigation.
🟢 Improvement: Possible performance gain.

No regressions detected!

built with commit: 66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4

kvoli

Flushing some comments.

Reviewed 9 of 9 files at r1, 16 of 23 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kyle-a-wong)

pkg/server/serverpb/status.proto line 1287 at r2 (raw file):

  // flows.
  bytes flow_id = 1 [
    (gogoproto.nullable) = false,

nit: understand this is a prototype but for the actual PRs drop the refmt.

pkg/kv/kvserver/split/decider.go line 283 at r1 (raw file):

						log.KvDistribution.Infof(ctx, "%s, movement of samples is %.2f",
							causeMsg, movement)
						if math.Abs(movement) == 1 {

Consider updating the metric to be >99% istead of an absolute requirement. That way, even if a few writes/read accesses pop in from the user (or system) that don't match the access direction, the metric still gets bumped and generates a signal.

pkg/kv/kvserver/split/decider_test.go line 479 at r1 (raw file):

	assert.Equal(t, dAbsoluteMovement.loadSplitterMetrics.PopularKeyCount.Count(), int64(0))
	assert.Equal(t, dAbsoluteMovement.loadSplitterMetrics.AbsoluteKeyMovementCount.Count(), int64(0))

Should this be getting bumped for this test case? It seems like it may require more .Record() calls, otherwise it'd hit insufficient counters like the test case above.

pkg/kv/kvserver/store_rebalancer.go line 267 at r2 (raw file):

		}
		desc := r.Desc()
		buf.Printf("\t%d: r%d:%v replicas=[%v] load=%v splitstats=%v",

👍

pkg/server/status.go line 3048 at r2 (raw file):

			// Patch values used for load balancing with shorter duration ones.
			loadStats := replica.LoadStatsShort()
			storeResp.HotRanges[i].QueriesPerSecond = loadStats.QueriesPerSecond

Is the plan to change to use the short load stats response? The top 128 hot ranges per-store are kept based on the longer stats, so this could lead to some confusion.

pkg/kv/kvserver/replica.go line 2542 at r2 (raw file):

	loadStats := r.LoadStats()
	localityInfo := r.loadStats.RequestLocalityInfo()
	mvccStats := r.GetMVCCStats()

This should be included later in generating a hot-ranges response, probably after topk ranking, otherwise it will be generated on each allocator call, without being used.

pkg/ui/workspaces/db-console/src/views/hotRanges/hotRangesTable.tsx line 398 at r2 (raw file):

    return "";
  }
  const direction = access_direction < 0 ? "descending" : "ascending";

When access_direction is 0, or say less than 0.3 (arbitrary), could this instead map to N/A?

Since the majority of the time the split decider won't be engaged and will only temporarily engage if it does, the default state for ranges would be having no data i.e., 0?

angles-n-daemons · 2025-02-18T18:58:01Z

pkg/kv/kvserver/load/replica_load.go

@@ -199,6 +211,23 @@ func (rl *ReplicaLoad) Stats() ReplicaLoadStats {
 	}
 }

+// StatsShort returns a current stat summary of replica load using only one bucket.
+func (rl *ReplicaLoad) StatsShort() ReplicaLoadStats {


This function returns the replica statistics using only one bucket of the ReplicaStats object.

angles-n-daemons · 2025-02-18T18:58:58Z

pkg/kv/kvserver/metrics.go

@@ -2432,6 +2432,12 @@ Note that the measurement does not include the duration for replicating the eval
 		Measurement: "Nanoseconds",
 		Unit:        metric.Unit_NANOSECONDS,
 	}
+	metaAbsoluteKeyMovementCount = metric.Metadata{


This metric tracks when key accesses seem to be moving in an absolute direction. It's similar to the popular key metric in that it aims to surface skewed data access within the cluster.

It also similarly will only show up when the replica deciders are engaged and sampling for split keys.

angles-n-daemons · 2025-02-18T19:00:30Z

pkg/kv/kvserver/replica.go

@@ -2539,8 +2539,11 @@ func (r *Replica) MeasureRaftCPUNanos(start time.Duration) {
 func (r *Replica) RangeUsageInfo() allocator.RangeUsageInfo {
 	loadStats := r.LoadStats()
 	localityInfo := r.loadStats.RequestLocalityInfo()
+	mvccStats := r.GetMVCCStats()
+	garbagePct := 1 - (float64(mvccStats.LiveBytes) / float64(mvccStats.KeyBytes+mvccStats.ValBytes))


Adding garbage percentage to the range usage info, as it seems low cost to compute and has been useful in troubleshooting hotspots in the past:

https://cockroachlabs.atlassian.net/wiki/spaces/CKB/pages/2664300986/Playbook+Best+Buy+Case+Study+-+CPU+attribution+to+culprit+queries

angles-n-daemons · 2025-02-18T19:00:54Z

pkg/kv/kvserver/replica_metrics.go

+	return r.loadStats.StatsShort()
+}
+
+// SplitStats returns the split statistics collected if the decider is engaged.


See comments below on Split Statistics

angles-n-daemons · 2025-02-18T19:01:59Z

pkg/kv/kvserver/split/decider.go

-	// appears in the sampled candidate split keys.
-	PopularKeyFrequency() float64
+	// PopularKey returns the most popular key in the sample.
+	PopularKey() PopularKey


Popular key will now include the key itself, so that if the user has the appropriate access, they can know exactly which is the offending row.

angles-n-daemons · 2025-02-18T19:02:49Z

pkg/kv/kvserver/split/decider.go

+	// right counters of the samples contained. Returns a float64 value between
+	// -1 and 1, where -1 indicates all samples are to the left, 1 indicates all
+	// samples are to the right, and values in between indicate the proportion.
+	KeyAccessDirection() float64


KeyAccessDirection will surface the direction of key accesses over time, from -1 (descending) to 1 (ascending)

angles-n-daemons · 2025-02-18T19:03:30Z

pkg/kv/kvserver/split/decider.go

@@ -278,6 +302,18 @@ func (d *Decider) recordLocked(
 	return false
 }

+func (d *Decider) SplitStatistics() *SplitStatistics {


SplitStatistics will surface information on popular keys and access direction of samples *if the decider is engaged.

angles-n-daemons · 2025-02-18T19:04:21Z

pkg/kv/kvserver/split/weighted_finder.go

@@ -49,6 +49,16 @@ type weightedSample struct {
 	count       int
 }

+func (ws weightedSample) Map() map[string]interface{} {


this is debugging stuff, ignore this.

kvoli

Nice prototype! A concern w.r.t where stats are added in the pipeline back up to status API endpoint and what is displayed by default on the UI when the decider isn't generating any stats. Also some general notes from running this locally I'll post on a follow on comment.

Reviewed 5 of 23 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @kyle-a-wong)

pkg/kv/kvserver/replicastats/replica_load_notifier.go line 10 at r2 (raw file):

import "github.com/cockroachdb/cockroach/pkg/util/syncutil"

// ReplicaLoadNotifier is used to signal a subscriber that load in the system has passed some threshold.

Is this going to be used?

kvoli · 2025-02-19T16:20:27Z

pkg/kv/kvserver/split/decider.go

-	// appears in the sampled candidate split keys.
-	PopularKeyFrequency() float64
+	// PopularKey returns the most popular key in the sample.
+	PopularKey() PopularKey


kvoli · 2025-02-19T16:20:27Z

pkg/kv/kvserver/split/decider.go

@@ -278,6 +302,18 @@ func (d *Decider) recordLocked(
 	return false
 }

+func (d *Decider) SplitStatistics() *SplitStatistics {


kvoli · 2025-02-19T18:43:50Z

I'll post on a follow on comment.

May need to zoom in on the image:

angles-n-daemons

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @kyle-a-wong)

pkg/kv/kvserver/replica.go line 2542 at r2 (raw file):