Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] hotspot telemetry performance demonstration #141518

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

angles-n-daemons
Copy link
Contributor

No description provided.

To track the movement of samples within a replica, we expose a
`SampleMovement` function, which exposes the proportion of samples to
the left or right over time.

Doing so only occurs during replica traffic sampling, which requires a
high threshold of traffic to funnel through a single range. Knowing
"absolute sample movement", or rather, all of the movement on one side
or the other will help us identify hotspots due to monotonically
increasing.

Fixes: cockroachdb#138575
Epic: CRDB-43150

Release note: None
@angles-n-daemons angles-n-daemons requested review from a team as code owners February 14, 2025 18:37
@angles-n-daemons angles-n-daemons requested review from kyle-a-wong and removed request for a team February 14, 2025 18:37
Copy link

blathers-crl bot commented Feb 14, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@angles-n-daemons
Copy link
Contributor Author

image

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@angles-n-daemons angles-n-daemons added the T-microbench-post Enables posting microbenchmark results to a PR label Feb 14, 2025
@angles-n-daemons
Copy link
Contributor Author

Because of competition and variability in speed to writes, writes actually have some variance (50% - 100% rightward direction). I believe is caused by competition to reaching KV within the system, as well as the fact that multiple workload processes are running simultaneously.

@angles-n-daemons
Copy link
Contributor Author

Popular keys with zipfian:

image

@angles-n-daemons angles-n-daemons force-pushed the hotspot-telemetry-no-automated-logging branch 2 times, most recently from e5cc653 to 5bb778a Compare February 18, 2025 16:30
Copy link

github-actions bot commented Feb 18, 2025

🟡 Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note Threshold
🟡 sec/op 10.93m ±1% 11.00m ±1% +0.56% p=0.043 n=10 3.0%
errs/op 0.000 ±0% 0.000 ±0% ~ p=1.000 n=10 0.0%
allocs/op 10.27k ±0% 10.27k ±0% ~ p=0.986 n=10 2.0%
B/op 2.216Mi ±0% 2.216Mi ±0% ~ p=0.971 n=10 2.0%
Reproduce

benchdiff binaries:

mkdir -p benchdiff/66c8c87/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4/bin/pkg_sql_tests benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/562d26b/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/562d26b3c86fbc0fb5363ad2bfeab118c2e08c58/bin/pkg_sql_tests benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=562d26b --new=66c8c87 ./pkg/sql/tests
⚪ Sysbench [KV, 1node, local, oltp_read_only]
Metric Old Commit New Commit Delta Note Threshold
sec/op 670.7µ ±2% 670.0µ ±1% ~ p=0.739 n=10 2.0%
errs/op 0.000 ±0% 0.000 ±0% ~ p=1.000 n=10 0.0%
allocs/op 439.0 ±0% 439.0 ±0% ~ p=1.000 n=10 1.5%
B/op 254.2Ki ±0% 254.2Ki ±0% ~ p=0.853 n=10 1.5%
Reproduce

benchdiff binaries:

mkdir -p benchdiff/66c8c87/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4/bin/pkg_sql_tests benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/562d26b/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/562d26b3c86fbc0fb5363ad2bfeab118c2e08c58/bin/pkg_sql_tests benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/1node_local/oltp_read_only$ --old=562d26b --new=66c8c87 ./pkg/sql/tests
⚪ Sysbench [KV, 1node, local, oltp_write_only]
Metric Old Commit New Commit Delta Note Threshold
sec/op 1.330m ±1% 1.335m ±0% ~ p=0.143 n=10 2.5%
errs/op 0.000 ±0% 0.000 ±0% ~ p=1.000 n=10 0.0%
allocs/op 1.393k ±0% 1.393k ±0% ~ p=0.559 n=10 1.8%
B/op 290.1Ki ±0% 290.1Ki ±0% ~ p=0.631 n=10 1.8%
Reproduce

benchdiff binaries:

mkdir -p benchdiff/66c8c87/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4/bin/pkg_sql_tests benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/66c8c87/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/562d26b/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/562d26b3c86fbc0fb5363ad2bfeab118c2e08c58/bin/pkg_sql_tests benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/562d26b/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/1node_local/oltp_write_only$ --old=562d26b --new=66c8c87 ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4/13420580902-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/562d26b3c86fbc0fb5363ad2bfeab118c2e08c58/13420580902-1/\* old/
Legend
  • Neutral: No significant performance change.
  • 🟡 Warning: Slight degradation, likely due to variance, but still within thresholds.
  • 🔴 Regression: Likely performance regression, requiring investigation.
  • 🟢 Improvement: Possible performance gain.

No regressions detected!

built with commit: 66c8c87fd56b2a1c6068eb5ca8bcc850b0bd23d4

@angles-n-daemons angles-n-daemons force-pushed the hotspot-telemetry-no-automated-logging branch from 5bb778a to dca4fd5 Compare February 18, 2025 17:02
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing some comments.

Reviewed 9 of 9 files at r1, 16 of 23 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kyle-a-wong)


pkg/server/serverpb/status.proto line 1287 at r2 (raw file):

  // flows.
  bytes flow_id = 1 [
    (gogoproto.nullable) = false,

nit: understand this is a prototype but for the actual PRs drop the refmt.


pkg/kv/kvserver/split/decider.go line 283 at r1 (raw file):

						log.KvDistribution.Infof(ctx, "%s, movement of samples is %.2f",
							causeMsg, movement)
						if math.Abs(movement) == 1 {

Consider updating the metric to be >99% istead of an absolute requirement. That way, even if a few writes/read accesses pop in from the user (or system) that don't match the access direction, the metric still gets bumped and generates a signal.


pkg/kv/kvserver/split/decider_test.go line 479 at r1 (raw file):

	assert.Equal(t, dAbsoluteMovement.loadSplitterMetrics.PopularKeyCount.Count(), int64(0))
	assert.Equal(t, dAbsoluteMovement.loadSplitterMetrics.AbsoluteKeyMovementCount.Count(), int64(0))

Should this be getting bumped for this test case? It seems like it may require more .Record() calls, otherwise it'd hit insufficient counters like the test case above.


pkg/kv/kvserver/store_rebalancer.go line 267 at r2 (raw file):

		}
		desc := r.Desc()
		buf.Printf("\t%d: r%d:%v replicas=[%v] load=%v splitstats=%v",

👍


pkg/server/status.go line 3048 at r2 (raw file):

			// Patch values used for load balancing with shorter duration ones.
			loadStats := replica.LoadStatsShort()
			storeResp.HotRanges[i].QueriesPerSecond = loadStats.QueriesPerSecond

Is the plan to change to use the short load stats response? The top 128 hot ranges per-store are kept based on the longer stats, so this could lead to some confusion.


pkg/kv/kvserver/replica.go line 2542 at r2 (raw file):

	loadStats := r.LoadStats()
	localityInfo := r.loadStats.RequestLocalityInfo()
	mvccStats := r.GetMVCCStats()

This should be included later in generating a hot-ranges response, probably after topk ranking, otherwise it will be generated on each allocator call, without being used.


pkg/ui/workspaces/db-console/src/views/hotRanges/hotRangesTable.tsx line 398 at r2 (raw file):

    return "";
  }
  const direction = access_direction < 0 ? "descending" : "ascending";

When access_direction is 0, or say less than 0.3 (arbitrary), could this instead map to N/A?

Since the majority of the time the split decider won't be engaged and will only temporarily engage if it does, the default state for ranges would be having no data i.e., 0?

@@ -199,6 +211,23 @@ func (rl *ReplicaLoad) Stats() ReplicaLoadStats {
}
}

// StatsShort returns a current stat summary of replica load using only one bucket.
func (rl *ReplicaLoad) StatsShort() ReplicaLoadStats {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns the replica statistics using only one bucket of the ReplicaStats object.

@@ -2432,6 +2432,12 @@ Note that the measurement does not include the duration for replicating the eval
Measurement: "Nanoseconds",
Unit: metric.Unit_NANOSECONDS,
}
metaAbsoluteKeyMovementCount = metric.Metadata{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric tracks when key accesses seem to be moving in an absolute direction. It's similar to the popular key metric in that it aims to surface skewed data access within the cluster.

It also similarly will only show up when the replica deciders are engaged and sampling for split keys.

@@ -2539,8 +2539,11 @@ func (r *Replica) MeasureRaftCPUNanos(start time.Duration) {
func (r *Replica) RangeUsageInfo() allocator.RangeUsageInfo {
loadStats := r.LoadStats()
localityInfo := r.loadStats.RequestLocalityInfo()
mvccStats := r.GetMVCCStats()
garbagePct := 1 - (float64(mvccStats.LiveBytes) / float64(mvccStats.KeyBytes+mvccStats.ValBytes))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding garbage percentage to the range usage info, as it seems low cost to compute and has been useful in troubleshooting hotspots in the past:

https://cockroachlabs.atlassian.net/wiki/spaces/CKB/pages/2664300986/Playbook+Best+Buy+Case+Study+-+CPU+attribution+to+culprit+queries

return r.loadStats.StatsShort()
}

// SplitStats returns the split statistics collected if the decider is engaged.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments below on Split Statistics

// appears in the sampled candidate split keys.
PopularKeyFrequency() float64
// PopularKey returns the most popular key in the sample.
PopularKey() PopularKey
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Popular key will now include the key itself, so that if the user has the appropriate access, they can know exactly which is the offending row.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

// right counters of the samples contained. Returns a float64 value between
// -1 and 1, where -1 indicates all samples are to the left, 1 indicates all
// samples are to the right, and values in between indicate the proportion.
KeyAccessDirection() float64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyAccessDirection will surface the direction of key accesses over time, from -1 (descending) to 1 (ascending)

@@ -278,6 +302,18 @@ func (d *Decider) recordLocked(
return false
}

func (d *Decider) SplitStatistics() *SplitStatistics {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitStatistics will surface information on popular keys and access direction of samples *if the decider is engaged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -49,6 +49,16 @@ type weightedSample struct {
count int
}

func (ws weightedSample) Map() map[string]interface{} {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is debugging stuff, ignore this.

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice prototype! A concern w.r.t where stats are added in the pipeline back up to status API endpoint and what is displayed by default on the UI when the decider isn't generating any stats. Also some general notes from running this locally I'll post on a follow on comment.

Reviewed 5 of 23 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @kyle-a-wong)


pkg/kv/kvserver/replicastats/replica_load_notifier.go line 10 at r2 (raw file):

import "github.com/cockroachdb/cockroach/pkg/util/syncutil"

// ReplicaLoadNotifier is used to signal a subscriber that load in the system has passed some threshold.

Is this going to be used?

// appears in the sampled candidate split keys.
PopularKeyFrequency() float64
// PopularKey returns the most popular key in the sample.
PopularKey() PopularKey
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@@ -278,6 +302,18 @@ func (d *Decider) recordLocked(
return false
}

func (d *Decider) SplitStatistics() *SplitStatistics {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kvoli
Copy link
Collaborator

kvoli commented Feb 19, 2025

I'll post on a follow on comment.

May need to zoom in on the image:
image

Copy link
Contributor Author

@angles-n-daemons angles-n-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @kyle-a-wong)


pkg/kv/kvserver/replica.go line 2542 at r2 (raw file):

Previously, kvoli (Austen) wrote…

This should be included later in generating a hot-ranges response, probably after topk ranking, otherwise it will be generated on each allocator call, without being used.

I see, that makes sense to me


pkg/kv/kvserver/replicastats/replica_load_notifier.go line 10 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Is this going to be used?

I'm trying to figure out if the store rebalancer logs in all the situations we want (or if we can modify it slightly to do so). Shouldn't be too much longer till I feel confident.


pkg/kv/kvserver/split/decider.go line 283 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Consider updating the metric to be >99% istead of an absolute requirement. That way, even if a few writes/read accesses pop in from the user (or system) that don't match the access direction, the metric still gets bumped and generates a signal.

Oh, this is a great point. I noticed that a lot of the inserts tend to race each other, so yeah in practice this ends up being 90-99% instead of 1.

Going to modify the threshold to be something like 85%.


pkg/server/status.go line 3048 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Is the plan to change to use the short load stats response? The top 128 hot ranges per-store are kept based on the longer stats, so this could lead to some confusion.

This is a good question, and one which might be better discussed in person. For hotspots where the ranges quickly fill up and move on, as well as hotspots where access comes in bursts, this smaller window allows operators to better see where / when the hotspots are occurring. I hadn't thought much about how this would be affect how the operators are using information which the system isn't acting on. Let me know if you have any immediate thoughts here, otherwise I'm going to think on this in my own time.


pkg/server/serverpb/status.proto line 1287 at r2 (raw file):

Previously, kvoli (Austen) wrote…

nit: understand this is a prototype but for the actual PRs drop the refmt.

definitely can do - this comment may as well be a good reminder for me to configure my editor to stop formatting protobuf files.


pkg/ui/workspaces/db-console/src/views/hotRanges/hotRangesTable.tsx line 398 at r2 (raw file):

Previously, kvoli (Austen) wrote…

When access_direction is 0, or say less than 0.3 (arbitrary), could this instead map to N/A?

Since the majority of the time the split decider won't be engaged and will only temporarily engage if it does, the default state for ranges would be having no data i.e., 0?

Yeah we definitely could. I'll plan on going with that.


pkg/kv/kvserver/split/decider_test.go line 479 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Should this be getting bumped for this test case? It seems like it may require more .Record() calls, otherwise it'd hit insufficient counters like the test case above.

Ahh I think I forgot to finish this test ou.

@angles-n-daemons angles-n-daemons force-pushed the hotspot-telemetry-no-automated-logging branch from dca4fd5 to 66c8c87 Compare February 19, 2025 19:17
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 23 files at r2, 3 of 11 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @kyle-a-wong)


pkg/kv/kvserver/replicastats/replica_load_notifier.go line 10 at r2 (raw file):

Previously, angles-n-daemons (Brian Dillmann) wrote…

I'm trying to figure out if the store rebalancer logs in all the situations we want (or if we can modify it slightly to do so). Shouldn't be too much longer till I feel confident.

ack


pkg/server/status.go line 3048 at r2 (raw file):

Previously, angles-n-daemons (Brian Dillmann) wrote…

This is a good question, and one which might be better discussed in person. For hotspots where the ranges quickly fill up and move on, as well as hotspots where access comes in bursts, this smaller window allows operators to better see where / when the hotspots are occurring. I hadn't thought much about how this would be affect how the operators are using information which the system isn't acting on. Let me know if you have any immediate thoughts here, otherwise I'm going to think on this in my own time.

Lets talk in person.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-microbench-post Enables posting microbenchmark results to a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants