asim: introduce eval configurations and a chart viewer #150847

tbg · 2025-07-28T11:21:45Z

This PR

replaces the manual plot command with the automatic generation of all relevant charts.
moves from static PNG image outputs to a single JSON file per run, which can be viewed and analyzed in an interactive HTML viewer.
excludes those generated files from the repository. This significantly reduces churn.
introduces eval configurations (e.g., sma, mma, both) that automatically set the appropriate cluster settings for a test run.
consolidates numerous redundant and messy test files into fewer, cleaner tests that leverage the new configuration system.
add validation to warn against unrealistic test parameters.
address all such warnings.

I think some more work is needed to tell, based on the datadriven files along, whether the configured allocator is "doing a good enough job". But at least there are fewer test scenarios now, and none of them configure obviously bogus loads or capacities.

Epic: CRDB-52631

blathers-crl · 2025-07-28T11:21:51Z

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2025-07-28T11:22:25Z

This change is

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained

-- commits line 20 at r4:
this will be called "configs" later.

-- commits line 69 at r10:
This is stale, since the generated files are excluded (with one exception, example_rebalancing, for illustration). I hadn't yet done that when I first wrote this commit, but later rearranged things.

-- commits line 149 at r23:
as in, the files were completely identical, even run under the same config (mma-only) despite the one file name claiming it was running SMA.
I must have broken that sometime in the past.

pkg/kv/kvserver/asim/tests/testdata/generated/example_rebalancing/example_rebalancing_2_qps.png line 0 at r2 (raw file):
These shouldn't have been removed in r2 but in r1. Oh well.

pkg/kv/kvserver/asim/tests/testdata/viewer.html line 4 at r13 (raw file):

<html lang="en">
<head>
  <meta charset="UTF-8">

Made with Claude Code... I've looked at it a little bit while iterating on it, but I'm definitely not the expert on this. I don't think reviewing this too closely makes sense, better to try it out.

pkg/kv/kvserver/asim/tests/plotting_test.go line 115 at r13 (raw file):

		// TODO(tbg): there are too many metrics to print here; they clutter up
		// the datadriven output. We should rely on assertions instead.
		if false {

A later commit will allow passing a list of metrics to eval and they will be printed here. It's a compromise for now.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 180 at r2 (raw file):

//     ..US_3
//     ....└── [11 12 13 14 15]
func TestDataDriven(t *testing.T) {

reminder: need to update the help text above with the updates.

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma_high_cpu_unable_to_shed_leases.txt line 36 at r22 (raw file):

{s1,s3:*,s5}:6

gen_load rate=1000 rw_ratio=1.0 min_key=0 max_key=10000 raft_cpu_per_write=500000

This seems wrong. The node is supposed to be overloaded due to RAFT cpu. Now I turned this into a read-only workload and raft_cpu_per_write isn't even used.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 441 at r20 (raw file):

					// manually.
					"default": func(*gen.StaticEvents) {},
					// 'mma-only' runs with the multi-metric allocator and turns off the

It's worth thinking about these configs. Right now I'm lumping the SMA StoreRebalancer into the "sma" variable. It won't ever make sense to turn on both the legacy and MMA store rebalancer at the same time, so so far it seems to work out - there is no reason to want "both store rebalancers on".

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 562 at r7 (raw file):

				return ""
			case "setting":
				// NB: delay could be supported for the below settings,

I might be missing something here. Couldn't all setting changes be implemented through events?

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 429 at r4 (raw file):

				require.NotZero(t, rangeGen)

				modes := map[kvserver.LBRebalancingMode]string{

This will be inverted later. Didn't think it through the first time around.

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma_skewed_cpu_skewed_write.txt line 42 at r2 (raw file):

OK

print

Need to check if I removed print in the end. I may not have.

pkg/kv/kvserver/asim/tests/default_settings.go line 84 at r15 (raw file):

		replicationFactor: defaultReplicationFactor,
		bytes:             defaultBytes,
		// TODO(tbg): unused, remove.

TODO actually do this.

tbg · 2025-07-28T23:10:35Z

@wenyihu6 this is going to be a bigger one to chew on. I hope the commits are reasonably self-contained. They definitely aren't the direct line: some things I figured out properly only after having done them. I think the amount of backtracking is reasonable, though, so it may not be worth rewriting history. Happy to chat synchronously as well (tomorrow). This PR could be split up if you'd get something out of that; I may do this once I have parts reviewed, depending on how it goes.

wenyihu6

Exciting! Looks good, only minor comments. Feel free to let me know if I missed anything - some parts might have been outdated by later commits.

Reviewed 2 of 101 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

-- commits line 20 at r4:

Previously, tbg (Tobias Grieger) wrote…

this will be called "configs" later.

Ack.

-- commits line 69 at r10:

Previously, tbg (Tobias Grieger) wrote…

This is stale, since the generated files are excluded (with one exception, example_rebalancing, for illustration). I hadn't yet done that when I first wrote this commit, but later rearranged things.

Ack.

-- commits line 149 at r23:

Previously, tbg (Tobias Grieger) wrote…

as in, the files were completely identical, even run under the same config (mma-only) despite the one file name claiming it was running SMA.
I must have broken that sometime in the past.

Ack.

pkg/kv/kvserver/asim/tests/testdata/viewer.html line 4 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Made with Claude Code... I've looked at it a little bit while iterating on it, but I'm definitely not the expert on this. I don't think reviewing this too closely makes sense, better to try it out.

Ack.

pkg/kv/kvserver/asim/tests/testdata/generated/example_rebalancing/example_rebalancing_2_qps.png line at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

These shouldn't have been removed in r2 but in r1. Oh well.

Ack.

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma_high_cpu_unable_to_shed_leases.txt line 36 at r22 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This seems wrong. The node is supposed to be overloaded due to RAFT cpu. Now I turned this into a read-only workload and raft_cpu_per_write isn't even used.

Looks like it used to be

gen_load rate=1000 rw_ratio=0 min_block=0 max_block=0 min_key=0 max_key=10000 raft_cpu_per_write=100

Changed in tbg@46346ad

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma_skewed_cpu_skewed_write.txt line 42 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Need to check if I removed print in the end. I may not have.

Looks like your recent commit did it.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 180 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

reminder: need to update the help text above with the updates.

Ack.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 429 at r4 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This will be inverted later. Didn't think it through the first time around.

Ack.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 562 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I might be missing something here. Couldn't all setting changes be implemented through events?

I think you’re right - there’s nothing technically wrong with it. I was concerned that a delay of 0 might subtly change behavior since settings are now applied slightly after the simulation starts. It is the first thing we do in a tick

cockroach/pkg/kv/kvserver/asim/asim.go

Line 253 in bd66dc4

s.tickEvents(ctx, tick)

. Worth double checking if the setting is used in any meaningful way when generating the simulation before the simulation run begins.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 441 at r20 (raw file):

Previously, tbg (Tobias Grieger) wrote…

It's worth thinking about these configs. Right now I'm lumping the SMA StoreRebalancer into the "sma" variable. It won't ever make sense to turn on both the legacy and MMA store rebalancer at the same time, so so far it seems to work out - there is no reason to want "both store rebalancers on".

Agreed that I don't see a reason having both store rebalancers enabled at the same time. We might switch between them (probably as a delayed event).

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 233 at r45 (raw file):

float64(requestCPUPerAccess)*rwRatio

Should we remove *rwRatio since requestCPUPerAccess applies to both reads and writes (not just reads)?

cockroach/pkg/kv/kvserver/asim/workload/workload.go

Lines 139 to 158 in df2d9b2

    
           for read := 0; read < reads; read++ { 
        
           	size := int64(rwg.rand.Intn(rwg.maxSize-rwg.minSize+1) + rwg.minSize) 
        
           	key := rwg.keyGenerator.readKey() 
        
           	event := next[key] 
        
           	event.Reads++ 
        
           	event.ReadSize += size 
        
           	event.RequestCPU += rwg.requestCPUPerAccess 
        
           	next[key] = event 
        
           } 
        
           for write := 0; write < writes; write++ { 
        
           	size := int64(rwg.rand.Intn(rwg.maxSize-rwg.minSize+1) + rwg.minSize) 
        
           	key := rwg.keyGenerator.writeKey() 
        
           	event := next[key] 
        
           	event.Writes++ 
        
           	event.WriteSize += size 
        
           	event.RequestCPU += rwg.requestCPUPerAccess 
        
           	event.RaftCPU += rwg.raftCPUPerWrite 
        
           	next[key] = event 
        
           }

pkg/kv/kvserver/asim/tests/plotting_test.go line 115 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

A later commit will allow passing a list of metrics to eval and they will be printed here. It's a compromise for now.

Is this comment referring to the fact that we’re wrapping code inside if false blocks here?

pkg/kv/kvserver/asim/tests/testdata/non_rand/example_splitting.txt line 15 at r46 (raw file):

# threshold set above. Also set to use only reads, to avoid size based splits
# (we could also disable this via setting the range max size).
gen_load rate=10000 rw_ratio=1.0 raft_cpu_per_write=500000

Are we setting raft cpu with a read only workload?

pkg/kv/kvserver/asim/tests/testdata/non_rand/example_splitting.txt line 43 at r46 (raw file):

# number of splits as the uniform workload. However, that is not the case as we
# require more splits with a zipfian distribution.
gen_load rate=10000 rw_ratio=1.0 access_skew=true replace=true raft_cpu_per_write=500000

Ditto.

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma_constraint_satisfaction1.txt line 6 at r45 (raw file):

# specify zone survivability (2, 2, 1) replicas across 3 regions.
# a(n1.n2) b(n3,n4) c(n5) d(n6-n9)
gen_cluster nodes=9 region=(a,b,c,d) nodes_per_region=(3,2,1,3) node_cpu_rate_capacity=50000

Did a warning fire for this 50k ns-cpu/sec setting? Seems too low. It’s not part of your PR tho (we can do it in a follow-up unless you want to do it now).

pkg/kv/kvserver/asim/tests/testdata/non_rand/load_distribution_movement_disabled_enable_later.txt line 2 at r38 (raw file):

# Disable all lease and replica movement.
setting rebalance_mode=0 replicate_queue_enabled=false lease_queue_enabled=false split_queue_enabled=false

Perhaps this is the intended behavior - feel free to ignore if so. Just noting that sma-only was running here with all other queues disabled originally.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 592 at r4 (raw file):

				require.GreaterOrEqual(t, len(runs[len(runs)-1].hs), sample)

				h := runs[len(runs)-1].hs[sample-1]

Is this code reachable? iiuc, it also only plots the last simulation run for the last mode? This comment might become irrelevant later since I saw your demo with multiple plots in which case feel free to ignore.

pkg/kv/kvserver/asim/tests/testdata/non_rand/example_skewed_cpu_even_ranges_mma_and_queues.txt line 34 at r22 (raw file):

----

eval duration=35m samples=1 seed=42 cfgs=(mma-only) metrics=(cpu,leases,replicas,write_bytes_per_second)

This might be the intended behavior in which case feel free to ignore me - just noting that purely based on this diff, it used to enable other rebalancing queues (which should be cfg=both)?

tbg

Dismissed @wenyihu6 from 3 discussions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)

pkg/kv/kvserver/asim/tests/default_settings.go line 84 at r15 (raw file):

Previously, tbg (Tobias Grieger) wrote…

TODO actually do this.

Done on next push.

pkg/kv/kvserver/asim/tests/testdata/non_rand/example_skewed_cpu_even_ranges_mma_and_queues.txt line 34 at r22 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

This might be the intended behavior in which case feel free to ignore me - just noting that purely based on this diff, it used to enable other rebalancing queues (which should be cfg=both)?

I'll double check, but at least at the end of this PR we have

example_skewed_cpu_even_ranges_mma.txt

with

eval duration=35m samples=1 seed=42 cfgs=(sma-only,mma-only,both) metrics=(cpu,leases,replicas,write\_bytes\_per\_second)

so we have a version with both at least by the time the dust settles.

pkg/kv/kvserver/asim/tests/testdata/non_rand/example_splitting.txt line 15 at r46 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Are we setting raft cpu with a read only workload?

Not sure where this is coming from 😅 removed with next push.

pkg/kv/kvserver/asim/tests/testdata/non_rand/example_splitting.txt line 43 at r46 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Ditto.

Done.

pkg/kv/kvserver/asim/tests/testdata/non_rand/load_distribution_movement_disabled_enable_later.txt line 2 at r38 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Perhaps this is the intended behavior - feel free to ignore if so. Just noting that sma-only was running here with all other queues disabled originally.

Sorry to slip this in. Yeah, I thought when looking at this that it seemed silly to run a ten node simulation without any activity for the first minutes. Do you understand why the test may originally have looked like that?

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma_constraint_satisfaction1.txt line 6 at r45 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Did a warning fire for this 50k ns-cpu/sec setting? Seems too low. It’s not part of your PR tho (we can do it in a follow-up unless you want to do it now).

🤦

Leaving this comment open to fix this up.

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma_high_cpu_unable_to_shed_leases.txt line 36 at r22 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Looks like it used to be
gen_load rate=1000 rw_ratio=0 min_block=0 max_block=0 min_key=0 max_key=10000 raft_cpu_per_write=100
Changed in tbg@46346ad

Fixed with next push.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 180 at r2 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Ack.

Just did a pass and it looked up to date.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 592 at r4 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Is this code reachable? iiuc, it also only plots the last simulation run for the last mode? This comment might become irrelevant later since I saw your demo with multiple plots in which case feel free to ignore.

No, wasn't reachable at that point, and it's gone by the end of the PR.

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go line 233 at r45 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

float64(requestCPUPerAccess)*rwRatio

Should we remove *rwRatio since requestCPUPerAccess applies to both reads and writes (not just reads)?

cockroach/pkg/kv/kvserver/asim/workload/workload.go

Lines 139 to 158 in df2d9b2

for read := 0; read < reads; read++ {

size := int64(rwg.rand.Intn(rwg.maxSize-rwg.minSize+1) + rwg.minSize)

key := rwg.keyGenerator.readKey()

event := next[key]

event.Reads++

event.ReadSize += size

event.RequestCPU += rwg.requestCPUPerAccess

next[key] = event

}

for write := 0; write < writes; write++ {

size := int64(rwg.rand.Intn(rwg.maxSize-rwg.minSize+1) + rwg.minSize)

key := rwg.keyGenerator.writeKey()

event := next[key]

event.Writes++

event.WriteSize += size

event.RequestCPU += rwg.requestCPUPerAccess

event.RaftCPU += rwg.raftCPUPerWrite

next[key] = event

}

Done w/ next push.

pkg/kv/kvserver/asim/tests/plotting_test.go line 115 at r13 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Is this comment referring to the fact that we’re wrapping code inside if false blocks here?

Yeah, pointing out the temporary hack which is gone by the end of this PR.

tbg

Dismissed @wenyihu6 from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma_constraint_satisfaction1.txt line 6 at r45 (raw file):

Previously, tbg (Tobias Grieger) wrote…

🤦

Leaving this comment open to fix this up.

Done with last push.

wenyihu6

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

pkg/kv/kvserver/asim/tests/testdata/non_rand/example_skewed_cpu_even_ranges_mma_and_queues.txt line 34 at r22 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I'll double check, but at least at the end of this PR we have

example_skewed_cpu_even_ranges_mma.txt

with
eval duration=35m samples=1 seed=42 cfgs=(sma-only,mma-only,both) metrics=(cpu,leases,replicas,write\_bytes\_per\_second)
so we have a version with both at least by the time the dust settles.

Nice, no change needed then.

pkg/kv/kvserver/asim/tests/testdata/non_rand/load_distribution_movement_disabled_enable_later.txt line 2 at r38 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Sorry to slip this in. Yeah, I thought when looking at this that it seemed silly to run a ten node simulation without any activity for the first minutes. Do you understand why the test may originally have looked like that?

Agreed what you did here makes sense to me. I might have done it this way becaus mma doesn't interact well with other rebalancing queues, so I wanted to ensure it ran with the others disabled. And to get a more isolated comparison of just the store rebalancer between the existing and mma store rebalancer, I just disabled all other rebalancing queues throughout the test. 10 mins without any activity was probably silly tho.

pkg/kv/kvserver/asim/tests/plotting_test.go line 115 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Yeah, pointing out the temporary hack which is gone by the end of this PR.

Makes sense.

150972: orchestration: released CockroachDB version v25.2.3. Next version: v25.2.4 r=celiala a=cockroach-teamcity Release note: None Epic: None Release justification: non-production (release infra) change. 151169: sniffarg: unconditionally convert to double-dash args r=tbg a=tbg The old code was being dumb while trying to be too clever - it tried to do only the minimal amount of rewriting, but really should've just rewritten all the flags to `--format` instead. Found during #150847. Epic: CRDB-25222 Co-authored-by: Justin Beaver <teamcity@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>

tbg · 2025-07-31T22:41:07Z

TFTR! I just rebased (needed to push a commit that resolves a diff - probably similar to #151143) and will merge tomorrow after I do a final pass and find everything in good order.

tbg · 2025-08-01T08:01:24Z

TFTR!

bors r+

The CI failures are unrelated to this code. I pinged both issues.

tbg · 2025-08-01T08:01:54Z

bors r-

need another rebase first.

craig · 2025-08-01T08:01:57Z

Canceled.

This turns `plot`, `topology`, `print` into no-ops temporarily so that we can better refactor the `eval` command to run multiple iterations of the test under different configurations.

This introduces the ability to `eval` a given configuration across multiple "modes" (i.e. configurations) such as SMA vs MMA. For now, to keep the change minimal, it runs each test file in a "default" mode which just does exactly what it did before. In follow-up commits, we'll gradually convert test files to select the appropriate mode(s).

We're going to want to set a setting via an event depending on the eval mode, but the static event generator isn't reusable enough. I haven't quite figured out why we need it in the first place (over a slice of events, as introduced here), so potentially it can be removed at some point. For now, I left it alone.

We have since introduced inter-test concurrency, that ought to be enough, for now.

A delay of zero is a fine delay.

Rebrand the previously introduced stub "modes" as "configurations", which is hopefully a more evocative name. Also, prepare them for being able to set cluster settings. We'll use this in upcoming commits.

Port all the tests and remove the plot command.

This leaves only a few tests that manually set rebalance_mode, to be handled in follow-ups.

Save for unintentional drift, they are the same spec, just under different configurations.

Before this fix, the storeID on the graphs started at zero.

Strips the .txt suffix.

It was using two evals that were hard to parse. The second test didn't really do very much. In both, the CPU usage was basically nonexistent. Instead, do a single test that has a physically separated 5 vcpu read and a 10mb/s write workload. Let them run for two minutes under the single-metric allocator, then also enable the multi-metric allocator. It's hard to see from the testdata output, but the charts show that write and CPU both converge, along with lease count, qps, and replica count.

It wasn't enabling any queue, so replicas were just sitting around. This isn't worth the CPU time.

This prints a warning in the tests whenever "unrealistic" CPUs or loads are provisioned. We can work through this list and pick them off one by one.

I also realized some of these were not actually running what the filename claimed. It was all a mess: now there's just one file, running in SMA, MMA, and combined config.

The files were identical in their inputs, except that the deleted file used the `both` config. We run that one in the remaining file now.

The asim tests run in parallel now, and this was erasing a log marker that allows us to separate the log lines by test case.

It was accidentally added to a read-only workload.

It was accidentally turned into a read-only workload.

Writes also consume the configured request access CPU. The code was written under the assumption that only reads would.

We can address them in a separate PR.

Some output changed while rebasing, and the changes are not qualitatively meaningful. This might be similar to cockroachdb#151143. Epic: CRDB-25222

tbg · 2025-08-01T08:05:33Z

bors r+

craig · 2025-08-01T08:05:38Z

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

craig · 2025-08-01T08:46:22Z

This PR was included in a batch that successfully built, but then failed to merge into master (it was a non-fast-forward update). It will be automatically retried.

craig · 2025-08-01T09:20:12Z

Build succeeded:

tbg force-pushed the mma-spec branch 2 times, most recently from 759521a to 59fdf0d Compare July 28, 2025 22:46

tbg commented Jul 28, 2025

View reviewed changes

tbg requested review from a team and wenyihu6 July 28, 2025 23:08

tbg marked this pull request as ready for review July 28, 2025 23:08

tbg requested a review from a team as a code owner July 28, 2025 23:08

tbg force-pushed the mma-spec branch from 59fdf0d to 03b6baf Compare July 29, 2025 21:00

wenyihu6 reviewed Jul 31, 2025

View reviewed changes

tbg mentioned this pull request Jul 31, 2025

sniffarg: unconditionally convert to double-dash args #151169

Merged

tbg commented Jul 31, 2025

View reviewed changes

tbg requested a review from wenyihu6 July 31, 2025 21:11

wenyihu6 approved these changes Jul 31, 2025

View reviewed changes

tbg force-pushed the mma-spec branch from 49f46a3 to 6c094d5 Compare July 31, 2025 22:40

tbg added 7 commits August 1, 2025 10:02

asim: temporarily disable some commands

cc69340

This turns `plot`, `topology`, `print` into no-ops temporarily so that we can better refactor the `eval` command to run multiple iterations of the test under different configurations.

asim: error in eval when no ranges configured

8bd997e

asim: address a TODO

0d3ad09

We have since introduced inter-test concurrency, that ought to be enough, for now.

asim: unify two code paths

c2588ae

A delay of zero is a fine delay.

asim: modes -> configurations

05547b6

Rebrand the previously introduced stub "modes" as "configurations", which is hopefully a more evocative name. Also, prepare them for being able to set cluster settings. We'll use this in upcoming commits.

tbg added 20 commits August 1, 2025 10:02

asim: remove plot

400da36

Port all the tests and remove the plot command.

asim: add and adopt eval configs

06acd81

This leaves only a few tests that manually set rebalance_mode, to be handled in follow-ups.

asim: merge two tests

037f538

Save for unintentional drift, they are the same spec, just under different configurations.

asim: fix a one-off

a746149

Before this fix, the storeID on the graphs started at zero.

asim: print nicer name

16c6444

Strips the .txt suffix.

asim: delete load_distribution_movement_disabled.txt

f590847

It wasn't enabling any queue, so replicas were just sitting around. This isn't worth the CPU time.

asim: warn against likely accidental load or capacity

7a9149e

This prints a warning in the tests whenever "unrealistic" CPUs or loads are provisioned. We can work through this list and pick them off one by one.

asim: consolidate example_skewed_cpu_even_ranges_mma tests

d2c089f

I also realized some of these were not actually running what the filename claimed. It was all a mess: now there's just one file, running in SMA, MMA, and combined config.

asim: consolidate onto mma_skewed_cpu_skewed_write.txt

9a27281

The files were identical in their inputs, except that the deleted file used the `both` config. We run that one in the remaining file now.

asim: remove print

4093ac1

asim: do not reset log tags

e59a556

The asim tests run in parallel now, and this was erasing a log marker that allows us to separate the log lines by test case.

asim: better log tag

aed0b5c

asim: extend example_skewed_cpu_even_ranges_mma by 10m

ac17bfd

asim: remove errant raft_cpu_per_write

8970052

It was accidentally added to a read-only workload.

asim: revert accidental change to write workload

56a5f7e

It was accidentally turned into a read-only workload.

asim: correct workload usage approximation

ac0d578

Writes also consume the configured request access CPU. The code was written under the assumption that only reads would.

asim: remove some unused fields

cb5ef1f

asim: don't swallow warning about unrealistic node capacity

f36b72d

We can address them in a separate PR.

asim: regenerate files

13987c8

Some output changed while rebasing, and the changes are not qualitatively meaningful. This might be similar to cockroachdb#151143. Epic: CRDB-25222

tbg force-pushed the mma-spec branch from 6c094d5 to 13987c8 Compare August 1, 2025 08:05

craig bot merged commit 9919801 into cockroachdb:master Aug 1, 2025
22 checks passed

celeste-cockroachdb bot added the target-release-25.4.0 label Aug 1, 2025

celeste-cockroachdb bot added v25.4.0-prerelease and removed target-release-25.4.0 labels Sep 22, 2025

	for read := 0; read < reads; read++ {
	size := int64(rwg.rand.Intn(rwg.maxSize-rwg.minSize+1) + rwg.minSize)
	key := rwg.keyGenerator.readKey()
	event := next[key]
	event.Reads++
	event.ReadSize += size
	event.RequestCPU += rwg.requestCPUPerAccess
	next[key] = event
	}

	for write := 0; write < writes; write++ {
	size := int64(rwg.rand.Intn(rwg.maxSize-rwg.minSize+1) + rwg.minSize)
	key := rwg.keyGenerator.writeKey()
	event := next[key]
	event.Writes++
	event.WriteSize += size
	event.RequestCPU += rwg.requestCPUPerAccess
	event.RaftCPU += rwg.raftCPUPerWrite
	next[key] = event
	}

asim: introduce eval configurations and a chart viewer #150847

asim: introduce eval configurations and a chart viewer #150847

Uh oh!

Conversation

tbg commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blathers-crl bot commented Jul 28, 2025

Uh oh!

cockroach-teamcity commented Jul 28, 2025

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

tbg commented Jul 28, 2025

Uh oh!

wenyihu6 left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

wenyihu6 left a comment

Choose a reason for hiding this comment

Uh oh!

tbg commented Jul 31, 2025

Uh oh!

tbg commented Aug 1, 2025

Uh oh!

tbg commented Aug 1, 2025

Uh oh!

craig bot commented Aug 1, 2025

Uh oh!

tbg commented Aug 1, 2025

Uh oh!

craig bot commented Aug 1, 2025

Uh oh!

craig bot commented Aug 1, 2025

Uh oh!

craig bot commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tbg commented Jul 28, 2025 •

edited

Loading