(rfc): Design Proposal for additional CRDs#63
(rfc): Design Proposal for additional CRDs#63jdheyburn wants to merge 14 commits intovalkey-io:mainfrom
Conversation
|
|
||
| ### Design principles | ||
|
|
||
| TODO comment to ask if there are any others that should be included |
There was a problem hiding this comment.
Anything else I might've missed?
|
|
||
| ### Replication and Failover Semantics | ||
|
|
||
| TODO these semantics needs to be finalised |
There was a problem hiding this comment.
We'll need another RFC to discuss how we want replication/failover/etc, to be managed. I have a great understanding of how we would do this for Sentinel, but I might need some help when it is operator managed.
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
9fae52d to
1406af5
Compare
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
d43e773 to
d641add
Compare
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
7cbb7e6 to
99f5047
Compare
|
Are we allowing users to create ValkeyNode directly? I see that it's listed as an "internal" CRD in the doc but just wanted to clarify. |
|
@ysqyang I would not expect users to create them, where they are managed by the Operator. The user can query for ValkeyNode on the CLI so that it can get the status. For example: |
Documents the approved design for implementing ValkeyNode CRD with: - StatefulSet-based singleton workloads - Headless Service for stable DNS identity - Minimal operational status fields - Unit + integration testing strategy Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Prevents worktree contents from being tracked. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| type: ClusterIP | ||
| annotations: {} | ||
|
|
||
| # Valkey-specific configuration (only when type=valkey) |
There was a problem hiding this comment.
Not sure what type=valkey means here?
There was a problem hiding this comment.
I am removing it. In a previous iteration I had Sentinel config here too, but I then decided to keep ValkeyNode exclusive for a valkey server.
|
|
||
| # Valkey-specific configuration (only when type=valkey) | ||
| valkeyConfig: | ||
| # Cluster configuration (presence-based) |
There was a problem hiding this comment.
Shouldn't this be used for standalone mode as well as in cluster mode, i.e. to set a config with some options that a user wants?
There was a problem hiding this comment.
valkeyConfig or config is defined in standalone mode too. This comment refers to the cluster section below it
| # Cluster configuration (presence-based) | ||
| # If set, node runs in cluster mode with assigned slots | ||
| # Omit entirely for standalone/replicated mode | ||
| # cluster: |
There was a problem hiding this comment.
Slots are provided to the Valkey Cluster instance when the controller has decided that this specific pod should be a primary, then connects to the valkey process via a client and then sends commands to add slots.
If we want to use configs as an extra step it makes it abit more complex I think. Something needs to give commands to the instance.
There was a problem hiding this comment.
In either case, the slots allocated to a ValkeyNode would need to be defined somewhere in the spec - then the ValkeyNode controller can issue the commands to the node. Does that sound about right?
Perhaps this section can be reworked to:
# saved to ConfigMap, mounted to pod at valkey.conf
# also
config:
key: value
# if defined, this is in cluster mode, and issues the commands to the cluster server
cluster:
slots: 0-10000
| message: "All shards healthy, all slots assigned" | ||
|
|
||
| totalShards: 3 | ||
| readyShards: 3 |
There was a problem hiding this comment.
Its tough for a cluster-user to get a grip on that we are using shard from the Valkey domain, but the replicas is from the K8s domain and not Valkey.. the primary-replica concept is so deeply rooted..
There was a problem hiding this comment.
I think it is a grey-area, one that I am happy to go with the consensus on. We could have replicasPerShard or podsPerShard. For now, I wanted to reuse the terminology from K8s domain.
| **ValkeyNode naming:** `<cluster-name>-<shard-index>-<replica-index>` | ||
|
|
||
| Examples: | ||
| - `mycluster-0-0`: Shard 0, replica index 0 (primary) |
There was a problem hiding this comment.
I think we should use a short hash here instead of indexes, at least we shouldn't give an example where replica index 0 always is a primary. The primary will be moved when there is a failure.
There was a problem hiding this comment.
I think we need a discussion about how the pods are named, everyone has their own opinion of it.
|
|
||
| **Formula:** | ||
| ```go | ||
| slotsPerShard := 16384 / shards |
There was a problem hiding this comment.
Just a feeling, but maybe could skip code in this doc to tighten it, it's a big document
There was a problem hiding this comment.
Very fair. I'll work on condensing it down so that it focuses on CRD spec.
Would love to get as many opinions on this.
Please leave comments on the PR where appropriate.
I don't anticipate merging this in its current form; rather when we have agreed on a design for each CRD they can be merged in individually (or discussed on Discussions).
There are still some CRD design guidelines that I need to enforce:
The guidelines:
Here's a mock diagram of the CRD relationships. ValkeyCluster is similar, but it only provisions ValkeyNode.