Add support for Atomic Slot Migration to CLI #2755

murphyjacob4 · 2025-10-20T22:35:07Z

Adds a new option --cluster-use-atomic-slot-migration. This will apply to both --cluster reshard and --cluster rebalance commands.

We could do some more optimizations here, but for now we batch all the slot ranges for one (source, target) pair and send them off as one CLUSTER MIGRATESLOTS request. We then wait for this request to finish through polling CLUSTER GETSLOTMIGRATIONS once every 100ms. We parse CLUSTER GETSLOTMIGRATIONS and look for the most recent migration affecting the requested slot range, then check if it is in progress, failed, cancelled, or successful. If there is a failure or cancellation, we give this error to the user.

Fixes #2504

codecov · 2025-10-20T23:00:06Z

Codecov Report

❌ Patch coverage is 73.07692% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.42%. Comparing base (faac14a) to head (2339313).

Files with missing lines	Patch %	Lines
src/valkey-cli.c	73.07%	70 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2755      +/-   ##
============================================
- Coverage     72.48%   72.42%   -0.06%     
============================================
  Files           128      128              
  Lines         70485    70710     +225     
============================================
+ Hits          51088    51215     +127     
- Misses        19397    19495      +98

Files with missing lines	Coverage Δ
src/valkey-cli.c	`56.79% <73.07%> (+0.58%)`	⬆️

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zuiderkwast

LGTM in general

src/valkey-cli.c

zuiderkwast · 2025-10-30T11:03:11Z

src/valkey-cli.c

+                if (opts & CLUSTER_MANAGER_CMD_FLAG_USE_ATOMIC_SLOT_MIGRATION) {
+                    /* Now that the migration is done, print all the #'s */
+                    printf("#");
+                    continue;
+                }


Hehe, this is an atomic progress bar, completing atomically in one step. 😄

Would it make sense to track the progress in clusterManagerMoveSlotRangesASM and print some progress indicator based on the syncslots states or something? Maybe later? We can ignore it for now.

To maintain full compatibility - the progress should show a "." for each slot that is moved.

We could do something like map the migration state (e.g. snapshot, replicating, failing over) to a percentage (e.g. 0%, 33%, 66% respectively) then multiply by the migration slot count. But it would be semantically different than before, since you might see 100 "."s, but then have the migration fail at the last step and end up with no slots migrated.

I would prefer that we go with a new text UI entirely rather than working on the previous one. It would be nice if we had a general purpose CLI progress-bar (something that looks like https://cli.r-lib.org/reference/cli_progress_bar.html#basic-usage or similar) that we could use for this and other long running ops. But yeah, I would say lets tackle this problem separately.

Good point. I'm OK with all the dots appearing atomically.

I don't think we strictly need to preserve the exact output with these dots though. Another option is to just skip the dots.

These text UIs are not that elaborate. I assume simplicity and no dependencies were prioritized.

src/valkey-cli.c

enjoy-binbin

overall LGTM.

src/valkey-cli.c

tests/unit/cluster/cli.tcl

src/valkey-cli.c

enjoy-binbin · 2025-11-10T02:41:09Z

src/valkey-cli.c

+        fflush(stdout);
+        sdsfree(to_print);
+    }
+    int print_dots = (opts & CLUSTER_MANAGER_OPT_VERBOSE), option_cold = (opts & CLUSTER_MANAGER_OPT_COLD), success = 1, in_progress = 0;


what does CLUSTER_MANAGER_OPT_COLD do in ASM? do we actually use cold in ASM?

I thought cold was == dryrun, but it looks like it is actually just migrating the keys without moving the slots. There is no such concept for ASM, so I will make this option do nothing.

COLD is used in the valkey-cli --cluster fix mode, which can handle cases like a slot has two owners, a slot has zero owners, an aborted slot migration has left slots in multiple nodes, etc.

Maybe it's better to error out if someone tries to combine --cluster fix with --cluster-use-atomic-slot-migration?

zuiderkwast · 2025-11-26T14:05:22Z

@murphyjacob4 will you have some time to close on these details and get this merged?

murphyjacob4 · 2025-11-26T18:01:48Z

will you have some time to close on these details and get this merged?

Yeah, apologies I lost track of this PR. Let me work on the feedback

Signed-off-by: Jacob Murphy <jkmurphy@google.com>

zuiderkwast

Good. It's very near. Only question is regarding COLD to avoid breaking some feature later when we make ASM enabled by default (in 10.0).

zuiderkwast · 2025-11-27T09:10:54Z

src/valkey-cli.c

+    while ((ln = listNext(&li)) != NULL) {
+        clusterManagerReshardTableItem *item = ln->value;
+        char *err;
+        if (opts & CLUSTER_MANAGER_OPT_USE_ATOMIC_SLOT_MIGRATION) {


If COLD is used in this code path (by --cluster fix) it seems safer to skip ASM in this case.

Suggested change

if (opts & CLUSTER_MANAGER_OPT_USE_ATOMIC_SLOT_MIGRATION) {

if ((opts & CLUSTER_MANAGER_OPT_USE_ATOMIC_SLOT_MIGRATION) &&

!(opts & CLUSTER_MANAGER_OPT_COLD)) {

Atomic + cold = Cold fusion. (It doesn't work.)

github-actions bot assigned murphyjacob4 Oct 20, 2025

murphyjacob4 requested a review from enjoy-binbin October 20, 2025 22:38

zuiderkwast reviewed Oct 30, 2025

View reviewed changes

enjoy-binbin reviewed Nov 10, 2025

View reviewed changes

enjoy-binbin added this to Valkey 9.1 Nov 10, 2025

enjoy-binbin added the release-notes This issue should get a line item in the release notes label Nov 10, 2025

zuiderkwast added the to-be-merged Almost ready to merge label Nov 26, 2025

zuiderkwast mentioned this pull request Nov 26, 2025

Add replica sync checks before rebalancing in cluster manager #2864

Open

murphyjacob4 added 3 commits November 26, 2025 18:15

Add support for Atomic Slot Migration to CLI

bb8483b

Signed-off-by: Jacob Murphy <jkmurphy@google.com>

Clang format fixes

03a238c

Signed-off-by: Jacob Murphy <jkmurphy@google.com>

Address review feedback

2339313

Signed-off-by: Jacob Murphy <jkmurphy@google.com>

murphyjacob4 force-pushed the asm_cli branch from 2287bef to 2339313 Compare November 26, 2025 21:33

murphyjacob4 requested review from enjoy-binbin and zuiderkwast November 26, 2025 21:35

zuiderkwast reviewed Nov 27, 2025

View reviewed changes

	if (opts & CLUSTER_MANAGER_OPT_USE_ATOMIC_SLOT_MIGRATION) {
	if ((opts & CLUSTER_MANAGER_OPT_USE_ATOMIC_SLOT_MIGRATION) &&
	!(opts & CLUSTER_MANAGER_OPT_COLD)) {

Add support for Atomic Slot Migration to CLI #2755

Are you sure you want to change the base?

Add support for Atomic Slot Migration to CLI #2755

Conversation

murphyjacob4 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zuiderkwast Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

murphyjacob4 Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

enjoy-binbin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

enjoy-binbin Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

murphyjacob4 Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

zuiderkwast commented Nov 26, 2025

Uh oh!

murphyjacob4 commented Nov 26, 2025

Uh oh!

zuiderkwast left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

murphyjacob4 commented Oct 20, 2025 •

edited

Loading

codecov bot commented Oct 20, 2025 •

edited

Loading

zuiderkwast left a comment •

edited

Loading

zuiderkwast Nov 27, 2025 •

edited

Loading