Skip to content

Conversation

@griffindvs
Copy link

This PR introduces a proposal for Kafka rack awareness where node pools are assigned to racks/availability zones.

I have created a prototype implementation here. I have used this prototype with the following configuration:

Kafka CR:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  annotations:
    strimzi.io/kraft: enabled
    strimzi.io/node-pools: enabled
  name: my-kafka
  namespace: strimzi
spec:
  kafka:
    rack:
      idType: pool-name
  ...

Three KafkaNodePool CRs, one for each zone, according to the following format:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  labels:
    strimzi.io/cluster: my-kafka
  name: zoneX
  namespace: strimzi
spec:
  replicas: Y
  roles:
  - broker
  - controller
  template:
    pod:
      affinity:
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 50
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    strimzi.io/cluster: my-kafka
                    strimzi.io/pool-name: zoneX
                topologyKey: topology.kubernetes.io/zone
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 90
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: strimzi.io/cluster
                  operator: In
                  values:
                  - my-kafka
                - key: strimzi.io/pool-name
                  operator: NotIn
                  values:
                  - zoneX
              topologyKey: topology.kubernetes.io/zone
          - weight: 80
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  strimzi.io/cluster: my-kafka
              topologyKey: kubernetes.io/hostname

An example using five brokers and three zones:

kubectl get kafkanodepool.kafka -n strimzi
NAME    DESIRED REPLICAS   ROLES                     NODEIDS
zone0   2                  ["controller","broker"]   [0,1]
zone1   2                  ["controller","broker"]   [2,3]
zone2   1                  ["controller","broker"]   [4]
NAME                                       ZONE         NODE
my-kafka-entity-operator-fbbc6859-fpr8q    Raleigh      worker0.example.com
my-kafka-zone0-0                           ChapelHill   worker2.example.com
my-kafka-zone0-1                           ChapelHill   worker5.example.com
my-kafka-zone1-2                           Durham       worker1.example.com
my-kafka-zone1-3                           Durham       worker4.example.com
my-kafka-zone2-4                           Raleigh      worker3.example.com
strimzi-cluster-operator-558d7b695-th8mv   ChapelHill   worker5.example.com
Metadata for all topics (from broker -1: sasl_ssl://localhost:9094/bootstrap):
 5 brokers:
  broker 0 at my-kafka-zone0-0-strimzi.example.com:443
  broker 1 at my-kafka-zone0-1-strimzi.example.com:443 (controller)
  broker 2 at my-kafka-zone1-2-strimzi.example.com:443
  broker 3 at my-kafka-zone1-3-strimzi.example.com:443
  broker 4 at my-kafka-zone2-4-strimzi.example.com:443
 1 topics:
  topic "my-topic" with 5 partitions:
    partition 0, leader 4, replicas: 4,1,2, isrs: 2,4,1
    partition 1, leader 3, replicas: 1,3,4, isrs: 3,4,1
    partition 2, leader 2, replicas: 2,4,1, isrs: 2,4,1
    partition 3, leader 4, replicas: 4,1,3, isrs: 3,4,1
    partition 4, leader 3, replicas: 1,3,4, isrs: 3,4,1

Sample broker config:

  server.config: |-
    ##########
    # Node ID
    ##########
    node.id=0

    ##########
    # Rack ID
    ##########
    broker.rack=zone0

...

Signed-off-by: Griffin Davis <gcd@ibm.com>
Comment on lines +23 to +25
Many users may require adherence to the [separation of duty security principle](https://csrc.nist.gov/glossary/term/separation_of_duty)
under which application pods processing user data should not have access to the Kubernetes API.
All usage of the Kubernetes API must then be delegated to the operator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but there is nothing like that being said in the link.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked definition from NIST discussed separation of duty at a higher level, not specific to Kubernetes.

In our specific Kubernetes case, we are separating two different roles for two different entities:

  • Operators which act on the Kubernetes API and therefore have associated RBAC
  • Operands which process/manage data and do not have access to the Kubernetes API

If you feel the link is misleading, I can remove it.

The underlying principle is one IBM has discussed with many enterprise clients in highly regulated industries (eg. financial services, telecommunications, etc). I'm not sure if the specific requirements are publicly documented by those companies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it is absolutely misleading. And I do not think it is any principle Strimzi aims to follow.

Also, keep in mind that the broker is also reading Kubernetes Secrets for example. So even if you want to follow your interpretation of this rule, this proposal won't help you much.

spec:
kafka:
rack:
idType: pool-name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there would be some reasonable justification for changing this, then it would make much more sense to just configure it in a separate field then hardcode it to node pool name which is very limited.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this involve moving the API change to the KafkaNodePool CR and allow specifying an arbitrary rack ID for each pool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by that, sorry. But the node pool name is clearly not the right determinant as there would be many reasons for multiple pools in a single zone.

This proposal maintains CRD compatibility by introducing a new, optional field.
All existing configurations would continue to be valid and maintain their existing behavior.

## Rejected alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep in mind that there are also some new Kubernetes features coming to the downward API as discussed in strimzi/strimzi-kafka-operator#11504. If nothing else, that should be mentioned here. But likely we might want to wait for how it turns out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants