(rfc): Design Proposal for additional CRDs by jdheyburn · Pull Request #63 · valkey-io/valkey-operator

jdheyburn · 2026-01-23T14:08:42Z

Would love to get as many opinions on this.

Please leave comments on the PR where appropriate.

I don't anticipate merging this in its current form; rather when we have agreed on a design for each CRD they can be merged in individually (or discussed on Discussions).

There are still some CRD design guidelines that I need to enforce:

The guidelines:

Here's a mock diagram of the CRD relationships. ValkeyCluster is similar, but it only provisions ValkeyNode.


⏺ ┌─────────────────────────────────────────────────────────────────────────────┐
  │                              ValkeyPool                                     │
  │                              (cache)                                        │
  │  ┌─────────────────────────────────────────────────────────────────────┐   │
  │  │ spec:                                                                │   │
  │  │   shards: 4                                                          │   │
  │  │   replicasPerShard: 2                                                │   │
  │  │   sentinel:                                                          │   │
  │  │     enabled: true ─────────────────────────────────┐                 │   │
  │  │   template: ...                                    │                 │   │
  │  └────────────────────────────────────────────────────│─────────────────┘   │
  └───────────────────────────────────────────────────────│─────────────────────┘
                            │                             │
                            │ creates                     │ creates
                            ▼                             ▼
  ┌─────────────────────────────────────────┐   ┌─────────────────────────┐
  │              Valkey (cache-0)           │   │    ValkeySentinel       │
  │  ┌───────────────────────────────────┐  │   │    (cache-sentinel)     │
  │  │ spec:                             │  │   │  ┌───────────────────┐  │
  │  │   replicas: 2                     │  │   │  │ spec:             │  │
  │  │   failover:                       │  │   │  │   replicas: 3     │  │
  │  │     mode: sentinel                │  │   │  └───────────────────┘  │
  │  │     sentinel:                     │  │   │                         │
  │  │       ref: ───────────────────────│──│───│───▶ (referenced)        │
  │  │         name: cache-sentinel      │  │   │                         │
  │  │       config:                     │  │   │  Creates 3 Sentinel     │
  │  │         quorum: 2                 │  │   │  pods that monitor      │
  │  │         downAfterMillis: 30000    │  │   │  all Valkey instances   │
  │  └───────────────────────────────────┘  │   │  referencing it         │
  │                   │                      │   └─────────────────────────┘
  │                   │ creates              │               ▲
  │                   ▼                      │               │
  │  ┌───────────────────────────────────┐  │               │
  │  │ ValkeyNode (cache-0-primary)      │  │               │
  │  │ ValkeyNode (cache-0-replica-0)    │  │               │
  │  │ ValkeyNode (cache-0-replica-1)    │  │               │
  │  └───────────────────────────────────┘  │               │
  └─────────────────────────────────────────┘               │
                                                            │
  ┌─────────────────────────────────────────┐               │
  │              Valkey (cache-1)           │               │
  │  ┌───────────────────────────────────┐  │               │
  │  │ spec.failover.sentinel.ref: ──────│──│───────────────┘
  │  │   name: cache-sentinel            │  │
  │  └───────────────────────────────────┘  │
  │                   │                      │
  │                   ▼                      │
  │  ┌───────────────────────────────────┐  │
  │  │ ValkeyNode (cache-1-primary)      │  │
  │  │ ValkeyNode (cache-1-replica-0)    │  │
  │  │ ValkeyNode (cache-1-replica-1)    │  │
  │  └───────────────────────────────────┘  │
  └─────────────────────────────────────────┘

    ... (cache-2, cache-3 similar)

jdheyburn · 2026-01-23T14:11:56Z

docs/rfc/INITIAL_CRD_DESIGN_STRATEGY.md

+
+### Design principles
+
+TODO comment to ask if there are any others that should be included


Anything else I might've missed?

jdheyburn · 2026-01-23T14:12:44Z

docs/rfc/INITIAL_CRD_DESIGN_STRATEGY.md

+
+### Replication and Failover Semantics
+
+TODO these semantics needs to be finalised


We'll need another RFC to discuss how we want replication/failover/etc, to be managed. I have a great understanding of how we would do this for Sentinel, but I might need some help when it is operator managed.

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

ysqyang · 2026-01-29T20:04:04Z

Are we allowing users to create ValkeyNode directly? I see that it's listed as an "internal" CRD in the doc but just wanted to clarify.

jdheyburn · 2026-01-29T20:22:56Z

@ysqyang I would not expect users to create them, where they are managed by the Operator. The user can query for ValkeyNode on the CLI so that it can get the status. For example:

kubectl get valkeynode --selector valkey.io/valkeycluster=mycluster
NAME               ROLE
valkey-master-0    master
valkey-replica-0   replica

Documents the approved design for implementing ValkeyNode CRD with: - StatefulSet-based singleton workloads - Headless Service for stable DNS identity - Minimal operational status fields - Unit + integration testing strategy Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Prevents worktree contents from being tracked. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bjosv · 2026-02-05T12:59:33Z

docs/rfc/INITIAL_CRD_DESIGN_STRATEGY.md

+    type: ClusterIP
+    annotations: {}
+
+  # Valkey-specific configuration (only when type=valkey)


Not sure what type=valkey means here?

I am removing it. In a previous iteration I had Sentinel config here too, but I then decided to keep ValkeyNode exclusive for a valkey server.

bjosv · 2026-02-05T13:11:08Z

docs/rfc/INITIAL_CRD_DESIGN_STRATEGY.md

+
+  # Valkey-specific configuration (only when type=valkey)
+  valkeyConfig:
+    # Cluster configuration (presence-based)


Shouldn't this be used for standalone mode as well as in cluster mode, i.e. to set a config with some options that a user wants?

valkeyConfig or config is defined in standalone mode too. This comment refers to the cluster section below it

bjosv · 2026-02-05T13:19:27Z

docs/rfc/INITIAL_CRD_DESIGN_STRATEGY.md

+    # Cluster configuration (presence-based)
+    # If set, node runs in cluster mode with assigned slots
+    # Omit entirely for standalone/replicated mode
+    # cluster:


Slots are provided to the Valkey Cluster instance when the controller has decided that this specific pod should be a primary, then connects to the valkey process via a client and then sends commands to add slots.
If we want to use configs as an extra step it makes it abit more complex I think. Something needs to give commands to the instance.

In either case, the slots allocated to a ValkeyNode would need to be defined somewhere in the spec - then the ValkeyNode controller can issue the commands to the node. Does that sound about right?

Perhaps this section can be reworked to:

# saved to ConfigMap, mounted to pod at valkey.conf # also config: key: value # if defined, this is in cluster mode, and issues the commands to the cluster server cluster: slots: 0-10000

bjosv · 2026-02-05T13:34:04Z

docs/rfc/INITIAL_CRD_DESIGN_STRATEGY.md

+  message: "All shards healthy, all slots assigned"
+
+  totalShards: 3
+  readyShards: 3


Its tough for a cluster-user to get a grip on that we are using shard from the Valkey domain, but the replicas is from the K8s domain and not Valkey.. the primary-replica concept is so deeply rooted..

I think it is a grey-area, one that I am happy to go with the consensus on. We could have replicasPerShard or podsPerShard. For now, I wanted to reuse the terminology from K8s domain.

bjosv · 2026-02-05T13:40:34Z

docs/rfc/INITIAL_CRD_DESIGN_STRATEGY.md

+**ValkeyNode naming:** `<cluster-name>-<shard-index>-<replica-index>`
+
+Examples:
+- `mycluster-0-0`: Shard 0, replica index 0 (primary)


I think we should use a short hash here instead of indexes, at least we shouldn't give an example where replica index 0 always is a primary. The primary will be moved when there is a failure.

I think we need a discussion about how the pods are named, everyone has their own opinion of it.

bjosv · 2026-02-05T13:48:16Z

docs/rfc/INITIAL_CRD_DESIGN_STRATEGY.md

+
+**Formula:**
+```go
+slotsPerShard := 16384 / shards


Just a feeling, but maybe could skip code in this doc to tighten it, it's a big document

Very fair. I'll work on condensing it down so that it focuses on CRD spec.

jdheyburn added documentation Improvements or additions to documentation question Further information is requested labels Jan 23, 2026

jdheyburn commented Jan 23, 2026

View reviewed changes

jdheyburn added 3 commits January 23, 2026 14:16

Initial commit of CRD design

2d8782a

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

Initial commit of CRD design

b611c90

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

Checkin latest

1406af5

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

jdheyburn force-pushed the jdheyburn/rfc/crd-design branch from 9fae52d to 1406af5 Compare January 23, 2026 14:17

jdheyburn added 4 commits January 24, 2026 18:27

Update rfc

e53c1d4

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

Enhanced authentication ACL design

e8fcffb

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

Commit structured permissions future enhancement

fd4564f

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

Use enum for rdb settings

d641add

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

jdheyburn force-pushed the jdheyburn/rfc/crd-design branch from d43e773 to d641add Compare January 24, 2026 18:27

jdheyburn added 4 commits January 26, 2026 11:12

Use enum for enabling aof

01e815b

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

Use enum for PVC

2b5da18

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

Use presence-base enabling for TLS config

b540ed7

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

Use presence-base enabling for ValkeyNode cluster

99f5047

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

jdheyburn force-pushed the jdheyburn/rfc/crd-design branch from 7cbb7e6 to 99f5047 Compare January 26, 2026 21:26

jdheyburn and others added 2 commits February 4, 2026 17:35

Add .worktrees to gitignore

d14eb2c

Prevents worktree contents from being tracked. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bjosv reviewed Feb 5, 2026

View reviewed changes

Remove comment

f042e6f

jdheyburn mentioned this pull request Feb 19, 2026

Road to Release: 1.0.0 #89

Open

bjosv mentioned this pull request Feb 20, 2026

RFC for ValkeyNode CRD proposal #83

Open


		### Design principles

		TODO comment to ask if there are any others that should be included


		### Replication and Failover Semantics

		TODO these semantics needs to be finalised

Comments

Conversation

jdheyburn commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ysqyang commented Jan 29, 2026

Uh oh!

jdheyburn commented Jan 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jdheyburn commented Jan 23, 2026 •

edited

Loading