-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4742: Expose Node Labels via Downward API #4747
base: master
Are you sure you want to change the base?
Changes from 1 commit
d9e8407
f39745e
a671421
cfa525a
f98b0aa
d3ce8be
3db26ae
844b95e
13441f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,7 +58,7 @@ If none of those approvers are still appropriate, then changes to that list | |
should be approved by the remaining approvers and/or the owning SIG (or | ||
SIG Architecture for cross-cutting KEPs). | ||
--> | ||
# KEP-4742: Expose Node Labels to Pods via Downward API | ||
# KEP-4742: Expose Node Labels to pods via Downward API | ||
|
||
<!-- | ||
This is the title of your KEP. Keep it short, simple, and descriptive. A good | ||
|
@@ -161,6 +161,9 @@ to extract information. | |
## Motivation | ||
|
||
We’d like to change the runtime behavior of containers based on node labels. | ||
In our case, we’re using a CNI with DaemonSets to perform network setup, and | ||
would like to configure the network differently based on the presence of a node | ||
label. | ||
|
||
A number of other use cases exist for providing node labels to pods. One | ||
example is utilizing topology data from cloud providers, which are automatically | ||
|
@@ -169,45 +172,78 @@ transfers and reduce costs. Having an easy way for pods to access these node | |
topology labels would provide users a straightforward, maintainable way to | ||
optimize their workloads given topology constraints. | ||
|
||
While "topology" is usually associated with the physical layout of a cluster, | ||
it can also be used to describe other types of information about the cluster. | ||
This KEP proposes to allow the expansion of the concept of topology to include | ||
user-defined aspects about their cluster nodes, and in turn provide a way for | ||
pods to receive this information. | ||
|
||
Workarounds today typically involve using an initContainer to query the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe mention the NRI workaround as well: kubernetes/kubernetes#40610 (comment) |
||
Kubernetes API and then pass data via shared volume to other containers within | ||
docandrew marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the same pod. This adds additional demand on the API server and is burdensome | ||
compared to the ease of using downwardAPI for pod labels and metadata. | ||
the same pod. By comparison, this proposal would reduce the number of service | ||
accounts and API server clients. Another workaround is to use webhooks to inject | ||
labels into pods, but this relies on advance knowledge of where the pod is going | ||
to be scheduled and requires the webhook to be running and available at the time | ||
of pod creation. This proposal would provide an easier way to access node labels | ||
from pods, and would be more efficient than the current workarounds. | ||
|
||
### Goals | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should also call out that this would become a property that load balancers/Service objects could select upon too, if it is a label. This may create a bit of confusion wrt topology aware routing, and its usage as a selector label should likely be discouraged/its caveats noted in documentation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @munnerz you are implying a property which is not stated here, which your POC provides but I think the original idea of this KEP does not. This KEP says that the ONLY goals are for a pod to be able to access the node's labels via volumens and env. It does NOT say that it is a goal to make those labels actually visible in the API for use by outsider observers (which would include LBs). IOW - do we think it is a goal for users, LBs, etc to be able to do |
||
|
||
* Gain access to node labels in form of `topology.k8s.io/*` on pods through volume mounts | ||
* Gain access to node labels in form of `topology.k8s.io/*` on pods through environmental variables | ||
* Gain access to node labels in form of `topology.k8s.io/*` and | ||
`*.topology.k8s.io/*` on pods through volume mounts | ||
* Gain access to node labels in form of `topology.k8s.io/*` and | ||
`*.topology.k8s.io/*` on pods through environmental variables | ||
|
||
### Non-Goals | ||
|
||
* Not to expose additional node info outside of labels | ||
* Not to pass any additional node labels other than `topology.k8s.io/*` to pods | ||
* Not to guarantee the label value assigned at pod creation is the most recent node label value because it is assigned at pod creation time | ||
* Not to pass any additional node labels other than `topology.k8s.io/*` and | ||
`*.topology.k8s.io/*` to pods | ||
* Not to update pod labels after the initial node -> pod copy has been made | ||
* Not to make assurances regarding timing and availability of the label beyond | ||
the initial pod label copy at scheduling time | ||
* Not to make assurances about the immutability of the pod label after the | ||
initial copy. As with other labels, the pod label can be updated by the user | ||
after the pod is created. | ||
|
||
## Proposal | ||
|
||
The initial design includes: | ||
|
||
In KEP 1659, the following labels are defined: | ||
* topology.kubernetes.io/region | ||
* topology.kubernetes.io/zone | ||
|
||
In addition to the above labels, KEP 1659 declares the entire `topology.kubernetes.io` prefix space as reserved for use by the Kubernetes project. | ||
|
||
This KEP expands upon KEP 1659 in the following ways: | ||
- The `x.topology.kubernetes.io` prefix is allocated for use by end users. The kubernetes project itself will not define any standard labels with that prefix. | ||
- The `<domain>.x.topology.kubernetes.io` prefix is likewise allocated for use by end users or third-parties. The `<domain>` portion is treated the same as a "normal" label prefix. For example, `example.com.x.topology.k8s.io/label-name`. | ||
- All labels using the `topology.kubernetes.io` or `*.topology.kubernetes.io` prefix spaces are considered "safe" for workloads. A workload may be exposed to the values of these labels which directly apply to the workload. For example, a pod may learn the topology of the node on which it is running. | ||
|
||
The idea is that we will expose those labels from nodes to pods via a literal copy from the Node, for instance using the method `GetNode` from Kubelet in the `podFieldSelectorRuntimeValue` function and `volume.VolumeHost` `GetNodeLabels` function in the `CollectData` function in the downward API. | ||
KEP 1659 defines the following labels: `topology.kubernetes.io/region` and | ||
`topology.kubernetes.io/zone` to be used for topology information. These labels | ||
are useful for pods as well to be able to make application decisions based on | ||
the region or zone the pod is running in. This KEP proposes to make these labels | ||
available to pods while also expanding upon KEP 1659 to allow for user-defined | ||
labels in the `*.topology.kubernetes.io` namespace. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that |
||
|
||
This KEP expands on KEP 1659 in the following ways: | ||
|
||
1. Label prefixes of the form `<domain>.topology.kubernetes.io` are allocated | ||
for use by end users. The Kubernetes project itself will not define any | ||
labels with this prefix. | ||
2. Labels of the form `<domain>.topology.kubernetes.io/<field>` will be passed | ||
to pods. | ||
3. Labels of the form `topology.kubernetes.io/*` will be passed to pods but will | ||
continue to be reserved by the Kubernetes project. | ||
4. All labels with `topology.kubernetes.io` and `*.topology.kubernetes.io` | ||
prefixes should be considered safe for pods and should only contain | ||
information that pods and containers can safely consume. | ||
|
||
The idea is that we will expose those labels from nodes to pods via a literal | ||
copy from the node. From that point, the topology labels can be used in the same | ||
way as any other label. | ||
|
||
### User Stories | ||
|
||
* As a cluster operator, I want to make decisions based on node topology labels. | ||
* As a cluster operator, I want to access node topology labels inside of my pod | ||
* As a cluster operator, I want to access node instance types labels inside of my pod | ||
* As a developer, I want to know which region my app is serving, to be able to diagnose problems they may face in certain AZs or regions | ||
* As a cluster operator, I want to access node topology labels inside of my pod. | ||
* As a cluster operator, I want to access node instance types labels inside of | ||
my pod. | ||
* As a developer, I want to know which region my app is serving, to be able to | ||
diagnose problems they may face in certain AZs or regions. | ||
* As a cloud service provider, I want to make sure that this feature goes | ||
through the standard k8s feature graduation criteria to ensure that it is | ||
production-ready and that the exposure of `topology.k8s.io/*` and | ||
`*.topology.k8s.io/*` is widely accepted. | ||
|
||
### Notes/Constraints/Caveats (Optional) | ||
|
||
|
@@ -227,7 +263,7 @@ form `topology.k8s.io/*`. | |
|
||
* Exposing sensitive data as node labels to pods. This is mitigated by ensuring | ||
node labels contain the specific pattern `topology.k8s.io/*` in order to be | ||
available to Pods. | ||
available to pods. | ||
|
||
* Stale data. Information obtained through node labels is like information | ||
attained through a configmap or secret mounted to a pod, being passed on | ||
|
@@ -706,7 +742,7 @@ Describe them, providing: | |
Describe them, providing: | ||
- API type(s): | ||
- Estimated increase in size: (e.g., new annotation of size 32B) | ||
- Estimated amount of new objects: (e.g., new Object X for every existing Pod) | ||
- Estimated amount of new objects: (e.g., new Object X for every existing pod) | ||
--> | ||
|
||
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For very specific cases, we have workarounds like sidecars. If these DaemonSets need more info, that's their escape path.