-
Notifications
You must be signed in to change notification settings - Fork 73
Add Pool-based Rack Awareness proposal #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,127 @@ | ||
| # Pool-based Rack Awareness | ||
|
|
||
| Rack-awareness in Strimzi Kafka can be achieved without broker access to cluster-level Kubernetes APIs | ||
| by assigning one node pool per rack/availability zone. | ||
|
|
||
| ## Current situation | ||
|
|
||
| In Kubernetes clusters spanning multiple availability zones, Kafka can tolerate the loss of an | ||
| entire zone by ensuring partition replicas are spread across brokers in multiple zones. | ||
|
|
||
| This is achieved by configuring the [`broker.rack`](https://kafka.apache.org/documentation/#brokerconfigs_broker.rack) | ||
| property in Kafka to enable rack-aware replication assignment, where a rack is analogous to an availability zone. | ||
|
|
||
| Strimzi currently provides rack-awareness through the usage of an init container which queries the | ||
| Kubernetes API for the value of a specified topology label on the Kubernetes node where that | ||
| broker is running. | ||
| This requires access to the Kubernetes API by the broker pods and requires cluster-scoped RBAC. | ||
|
|
||
| ## Motivation | ||
|
|
||
| For users interested in a heightened security posture, the requirements of the current rack-awareness | ||
| implementation are prohibitive. | ||
| Many users may require adherence to the [separation of duty security principle](https://csrc.nist.gov/glossary/term/separation_of_duty) | ||
| under which application pods processing user data should not have access to the Kubernetes API. | ||
| All usage of the Kubernetes API must then be delegated to the operator. | ||
|
|
||
| Furthermore, many Kubernetes cluster administrators may restrict access to cluster-scoped Kubernetes | ||
| resources to ensure an application and the user managing it are contained within a limited set of namespaces. | ||
| Today, Strimzi only requires access to cluster-scoped Kubernetes resources for rack-awareness and NodePort | ||
| listener configuration. | ||
|
|
||
| Implementing the proposed method for pool-based rack awareness removes these potentially prohibitive | ||
| requirements while maintaining a simple deployment model. | ||
|
|
||
| ## Proposal | ||
|
|
||
| The `Kafka` CR will be updated to include an `idType` field which signifies the type of rack ID | ||
| which will be configured. | ||
|
|
||
| ```yaml | ||
| apiVersion: kafka.strimzi.io/v1beta2 | ||
| kind: Kafka | ||
| metadata: | ||
| annotations: | ||
| strimzi.io/node-pools: enabled | ||
| name: my-cluster | ||
| spec: | ||
| kafka: | ||
| rack: | ||
| idType: pool-name | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there would be some reasonable justification for changing this, then it would make much more sense to just configure it in a separate field then hardcode it to node pool name which is very limited. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would this involve moving the API change to the KafkaNodePool CR and allow specifying an arbitrary rack ID for each pool? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what you mean by that, sorry. But the node pool name is clearly not the right determinant as there would be many reasons for multiple pools in a single zone. |
||
| ``` | ||
|
|
||
| The field will be an optional enumeration type with the following values: | ||
|
|
||
| * `node-label` (default) | ||
| * This rack ID type maintains the existing behavior where rack IDs are configured using a node label | ||
| * The node label is determined using the existing `topologyKey` field | ||
| * `pool-name` | ||
| * This rack ID type will use the node pool name as the rack ID | ||
|
|
||
| When using the `pool-name` rack ID type, users must configure pod affinity and anti-affinity in the Kafka | ||
| pod template to ensure: | ||
|
|
||
| * Brokers within a pool are scheduled in the same availability zone | ||
| * Brokers in different pools are scheduled in different availability zones | ||
|
|
||
| This affinity configuration would be configured in the KafkaNodePool: | ||
|
|
||
| ```yaml | ||
| apiVersion: kafka.strimzi.io/v1beta2 | ||
| kind: KafkaNodePool | ||
| metadata: | ||
| name: my-zone0-pool | ||
| labels: | ||
| strimzi.io/cluster: my-cluster | ||
| spec: | ||
| template: | ||
| pod: | ||
| affinity: | ||
| podAffinity: | ||
| preferredDuringSchedulingIgnoredDuringExecution: | ||
| - weight: 50 | ||
| podAffinityTerm: | ||
| labelSelector: | ||
| matchLabels: | ||
| strimzi.io/cluster: my-cluster | ||
| strimzi.io/pool-name: my-zone0-pool | ||
| topologyKey: topology.kubernetes.io/zone | ||
| podAntiAffinity: | ||
| preferredDuringSchedulingIgnoredDuringExecution: | ||
| - weight: 90 | ||
| podAffinityTerm: | ||
| labelSelector: | ||
| matchExpressions: | ||
| - key: strimzi.io/cluster | ||
| operator: In | ||
| values: | ||
| - my-cluster | ||
| - key: strimzi.io/pool-name | ||
| operator: NotIn | ||
| values: | ||
| - my-zone0-pool | ||
| topologyKey: topology.kubernetes.io/zone | ||
| ``` | ||
|
|
||
| ## Affected/not affected projects | ||
|
|
||
| Only the [strimzi-kafka-operator](https://github.com/strimzi/strimzi-kafka-operator/) would be affected: | ||
|
|
||
| * Changes to the `Rack` API type | ||
| * New optional `idType` field | ||
| * Change `topologyKey` field from required to optional | ||
| * Changes to the cluster operator to: | ||
| * Only create the ClusterRoleBinding when using the `node-label` rack ID type or when using a NodePort listener | ||
| * Modify the `KafkaBrokerConfigurationBuilder` to use the pool name as the rack ID when the rack ID type is `pool-name` | ||
|
|
||
| ## Compatibility | ||
|
|
||
| This proposal maintains CRD compatibility by introducing a new, optional field. | ||
| All existing configurations would continue to be valid and maintain their existing behavior. | ||
|
|
||
| ## Rejected alternatives | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please keep in mind that there are also some new Kubernetes features coming to the downward API as discussed in strimzi/strimzi-kafka-operator#11504. If nothing else, that should be mentioned here. But likely we might want to wait for how it turns out. |
||
|
|
||
| * Continue with existing rack awareness support using Kubernetes API access in a broker init container | ||
| * This alternative was rejected due to the heightened security requirements of some users, as outlined in the [Motivation](#motivation) | ||
| * Use node topology labels via the downward API as suggested in [strimzi-kafka-operator#11504](https://github.com/strimzi/strimzi-kafka-operator/issues/11504) | ||
| * This alternative was rejected due to a reliance on a Kubernetes feature that is only in alpha as of 1.33 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but there is nothing like that being said in the link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked definition from NIST discussed separation of duty at a higher level, not specific to Kubernetes.
In our specific Kubernetes case, we are separating two different roles for two different entities:
If you feel the link is misleading, I can remove it.
The underlying principle is one IBM has discussed with many enterprise clients in highly regulated industries (eg. financial services, telecommunications, etc). I'm not sure if the specific requirements are publicly documented by those companies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think it is absolutely misleading. And I do not think it is any principle Strimzi aims to follow.
Also, keep in mind that the broker is also reading Kubernetes Secrets for example. So even if you want to follow your interpretation of this rule, this proposal won't help you much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given there are broader concerns as you note with limiting broker access to the Kubernetes API, I've opened https://github.com/orgs/strimzi/discussions/12086 to focus on that. I can remove this as a motivation from the proposal and focus on the motivation of removing cluster-scoped access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in general it is always better to start with a discussion to see if there is an actual interest rather than open a proposal out of the blue. Keep in mind that the motivation might make sense to you. But it also needs to make sense to the community and the people working on the projects day to day in order to justify the effort it brings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for misunderstanding the process here -- I thought the proposal would be the avenue to start the discussion.
I'll try to improve the proposal to make the motivation easier to understand for others. I also want to note that I am happy to help implement the necessary changes. I do recognize though there is still a cost on the maintainers to review PRs and ensure continued functionality going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is one of the ways how to start the discussion. But pretty lengthy way. Starting a GitHub discussion, opening an issue where you describe the feature you are interested in, asking on Slack, or on the community call often makes it much faster and easier to get some quick feedback. (But don't get me wrong - it is not like you broke any special rules or did something wrong. It just might not be the most efficient way.)