Skip to content

[Bug] Regression: RKCL slow upgrading gest sutcked when resharding #19

@albertompe

Description

@albertompe

Description

When upgrading a Redkey cluster the operation gets stucked when resharding a node before rolling update.

If a slot stays in importing or migrating state, the stabilizing mechanism is not launched.

Steps to Reproduce

Deploy the Operator and the sample Redkey cluster:

make manifests
make install
make deploy
make apply-rkcl

Edit the property purgeKeysOnRebalance to set the value to false.

Scale the cluster to 15 primaries.

Force an upgrade editting the rkcl object and changing spec.config.maxmemory-samples to 6 (or any other value / addign a comment).

Force the upgrade again by making configuration changes until the problem is reproduced (it does not occur in all cases).

The log should show Robin status error:

2026-01-25T08:41:43Z	INFO	controllers.RedkeyCluster	RedkeyCluster reconciler called	{"redis-cluster": {"name":"redis-cluster-ephemeral","namespace":"redkey-operator"}, "name": "redis-cluster-ephemeral", "ns": "redkey-operator"}
2026-01-25T08:41:43Z	INFO	controllers.RedkeyCluster	Found RedkeyCluster	{"redis-cluster": {"name":"redis-cluster-ephemeral","namespace":"redkey-operator"}, "name": "redis-cluster-ephemeral", "GVK": "redis.inditex.dev/v1, Kind=RedkeyCluster", "status": "Upgrading"}
2026-01-25T08:41:43Z	INFO	controllers.RedkeyCluster	RedkeyCluster reconciler start	{"redkey-cluster": {"name":"redis-cluster-ephemeral","namespace":"redkey-operator"}, "status": "Upgrading"}
2026-01-25T08:41:43Z	INFO	controllers.RedkeyCluster	PodDisruptionBudget not deployed	{"redkey-cluster": {"name":"redis-cluster-ephemeral","namespace":"redkey-operator"}, "PodDisruptionBudget Name": "redis-cluster-ephemeral-pdb"}
2026-01-25T08:41:43Z	INFO	controllers.RedkeyCluster	Delete PVCs feature disabled in cluster spec or not specified. PVCs won't be deleted after scaling down or cluster deletion	{"redkey-cluster": {"name":"redis-cluster-ephemeral","namespace":"redkey-operator"}}
2026-01-25T08:41:43Z	INFO	controllers.RedkeyCluster	Redis node pods are ready	{"redkey-cluster": {"name":"redis-cluster-ephemeral","namespace":"redkey-operator"}, "pods": 16}
2026-01-25T08:41:43Z	INFO	controllers.RedkeyCluster	Waiting for cluster to be Ready in Robin	{"redkey-cluster": {"name":"redis-cluster-ephemeral","namespace":"redkey-operator"}, "currentStatus": "ReshardingError"}
2026-01-25T08:41:43Z	INFO	controllers.RedkeyCluster	RedkeyCluster reconciler end	{"redkey-cluster": {"name":"redis-cluster-ephemeral","namespace":"redkey-operator"}, "status": "Upgrading"}

Expected Behavior

The upgrade must be completed leaving all nodes updated and without data loss.
If an slot stays opened, the stabilize mechanism must be launched to solve the problem.

Version / Environment

No response

Additional context or logs

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions