Add Pool-based Rack Awareness proposal #184

griffindvs · 2025-10-28T14:27:01Z

This PR introduces a proposal for Kafka rack awareness where node pools are assigned to racks/availability zones.

I have created a prototype implementation here. I have used this prototype with the following configuration:

Kafka CR:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  annotations:
    strimzi.io/kraft: enabled
    strimzi.io/node-pools: enabled
  name: my-kafka
  namespace: strimzi
spec:
  kafka:
    rack:
      idType: pool-name
  ...

Three KafkaNodePool CRs, one for each zone, according to the following format:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  labels:
    strimzi.io/cluster: my-kafka
  name: zoneX
  namespace: strimzi
spec:
  replicas: Y
  roles:
  - broker
  - controller
  template:
    pod:
      affinity:
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 50
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    strimzi.io/cluster: my-kafka
                    strimzi.io/pool-name: zoneX
                topologyKey: topology.kubernetes.io/zone
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 90
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: strimzi.io/cluster
                  operator: In
                  values:
                  - my-kafka
                - key: strimzi.io/pool-name
                  operator: NotIn
                  values:
                  - zoneX
              topologyKey: topology.kubernetes.io/zone
          - weight: 80
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  strimzi.io/cluster: my-kafka
              topologyKey: kubernetes.io/hostname

An example using five brokers and three zones:

kubectl get kafkanodepool.kafka -n strimzi
NAME    DESIRED REPLICAS   ROLES                     NODEIDS
zone0   2                  ["controller","broker"]   [0,1]
zone1   2                  ["controller","broker"]   [2,3]
zone2   1                  ["controller","broker"]   [4]

NAME                                       ZONE         NODE
my-kafka-entity-operator-fbbc6859-fpr8q    Raleigh      worker0.example.com
my-kafka-zone0-0                           ChapelHill   worker2.example.com
my-kafka-zone0-1                           ChapelHill   worker5.example.com
my-kafka-zone1-2                           Durham       worker1.example.com
my-kafka-zone1-3                           Durham       worker4.example.com
my-kafka-zone2-4                           Raleigh      worker3.example.com
strimzi-cluster-operator-558d7b695-th8mv   ChapelHill   worker5.example.com

Metadata for all topics (from broker -1: sasl_ssl://localhost:9094/bootstrap):
 5 brokers:
  broker 0 at my-kafka-zone0-0-strimzi.example.com:443
  broker 1 at my-kafka-zone0-1-strimzi.example.com:443 (controller)
  broker 2 at my-kafka-zone1-2-strimzi.example.com:443
  broker 3 at my-kafka-zone1-3-strimzi.example.com:443
  broker 4 at my-kafka-zone2-4-strimzi.example.com:443
 1 topics:
  topic "my-topic" with 5 partitions:
    partition 0, leader 4, replicas: 4,1,2, isrs: 2,4,1
    partition 1, leader 3, replicas: 1,3,4, isrs: 3,4,1
    partition 2, leader 2, replicas: 2,4,1, isrs: 2,4,1
    partition 3, leader 4, replicas: 4,1,3, isrs: 3,4,1
    partition 4, leader 3, replicas: 1,3,4, isrs: 3,4,1

Sample broker config:

  server.config: |-
    ##########
    # Node ID
    ##########
    node.id=0

    ##########
    # Rack ID
    ##########
    broker.rack=zone0

...

Signed-off-by: Griffin Davis <gcd@ibm.com>

scholzj · 2025-10-28T15:38:37Z

1xx-pool-rack-awareness.md

+Many users may require adherence to the [separation of duty security principle](https://csrc.nist.gov/glossary/term/separation_of_duty)
+under which application pods processing user data should not have access to the Kubernetes API.
+All usage of the Kubernetes API must then be delegated to the operator.


Sorry, but there is nothing like that being said in the link.

The linked definition from NIST discussed separation of duty at a higher level, not specific to Kubernetes.

In our specific Kubernetes case, we are separating two different roles for two different entities:

Operators which act on the Kubernetes API and therefore have associated RBAC

Operands which process/manage data and do not have access to the Kubernetes API

If you feel the link is misleading, I can remove it.

The underlying principle is one IBM has discussed with many enterprise clients in highly regulated industries (eg. financial services, telecommunications, etc). I'm not sure if the specific requirements are publicly documented by those companies.

I do think it is absolutely misleading. And I do not think it is any principle Strimzi aims to follow.

Also, keep in mind that the broker is also reading Kubernetes Secrets for example. So even if you want to follow your interpretation of this rule, this proposal won't help you much.

scholzj · 2025-10-28T15:47:21Z

1xx-pool-rack-awareness.md

+spec:
+  kafka:
+    rack:
+      idType: pool-name


If there would be some reasonable justification for changing this, then it would make much more sense to just configure it in a separate field then hardcode it to node pool name which is very limited.

Would this involve moving the API change to the KafkaNodePool CR and allow specifying an arbitrary rack ID for each pool?

I'm not sure what you mean by that, sorry. But the node pool name is clearly not the right determinant as there would be many reasons for multiple pools in a single zone.

scholzj · 2025-10-28T15:48:58Z

1xx-pool-rack-awareness.md

+This proposal maintains CRD compatibility by introducing a new, optional field.
+All existing configurations would continue to be valid and maintain their existing behavior.
+
+## Rejected alternatives


Please keep in mind that there are also some new Kubernetes features coming to the downward API as discussed in strimzi/strimzi-kafka-operator#11504. If nothing else, that should be mentioned here. But likely we might want to wait for how it turns out.

Add Pool-based Rack Awareness proposal

3d3f24d

Signed-off-by: Griffin Davis <gcd@ibm.com>

scholzj reviewed Oct 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Pool-based Rack Awareness proposal #184

Add Pool-based Rack Awareness proposal #184

Uh oh!

griffindvs commented Oct 28, 2025

Uh oh!

scholzj Oct 28, 2025

Uh oh!

griffindvs Oct 28, 2025

Uh oh!

scholzj Oct 28, 2025

Uh oh!

scholzj Oct 28, 2025

Uh oh!

griffindvs Oct 28, 2025

Uh oh!

scholzj Oct 28, 2025

Uh oh!

scholzj Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add Pool-based Rack Awareness proposal #184

Are you sure you want to change the base?

Add Pool-based Rack Awareness proposal #184

Uh oh!

Conversation

griffindvs commented Oct 28, 2025

Uh oh!

scholzj Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

griffindvs Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

scholzj Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

scholzj Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

griffindvs Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

scholzj Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

scholzj Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants