-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow user-controlled naming of Machines in Machine collections #10577
Comments
@mdbooth: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This issue is currently awaiting triage. CAPI contributors will take a look as soon as possible, apply one of the Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/area provider/control-plane-kubeadm |
Can we include some use cases that are impacted/improved by this? |
I have 'corporate naming policy' which accounts for a great many sins in practise. Such policy is often independent of the specifics of any particular application, such as CAPI. Regardless of which provider you are using, the name of the Machine will almost certainly form a component of, quite possibly the entirety of, the name of the cloud resource which is eventually created. If your policy covers the naming of these resources, you will need flexibility in the naming of the Machine. Similarly, the name of the cloud resource will typically be the basis, or the entirety, of the hostname of the guest OS running on that resource. This will typically be used as the Node name, and is therefore visible to the end-user. If policy, user documentation, or training covers the naming of these user-visible components, you will need flexibility in the naming of the Machine. Additional tooling may have been built up which assumes that certain kinds of Nodes are named in a particular way. It may be undesirable to the actor deploying a cluster to leak 'implementation' details such as the name of a specific template to end users of the cluster. In this case they will need flexibility to name created machines independently of the creating template. @lentzi90 Are you able to add your specifics? I'll collate into the description when I write it up properly. |
Sure thing! The 'corporate naming policy' is pretty much it. It just takes multiple shapes. For example we have downstream requirements on the hostnames to make security and/or network monitoring easier. The same goes for cloud resources (VMs, load balancers, etc.). In many cases the resources created by CAPI/CAPx need to co-exist with other resources and "fit in" with them. The other resource may be managed manually or through third-party tools. At minimum, we must be able to avoid names clashing. In practice it is also necessary that the resources are named in a way that human operators can understand easily. We cannot have confusion about where resources "belong" or who owns them. It must be possible to tell at a glance what is going on. Existing third-party tooling must also be able to tell the resources apart. A way to influence the naming of Machines would probably be enough to address all of this. As @mdbooth mentioned, the cloud resources will typically be named according to the Machine/InfraMachines and then trickle through to hostnames and node names. As long as we get at least some kind of control over the prefix in these names it will be enough. |
/priority important-longterm |
ACK trying to help, I pick up where I lelf in #10463 (comment): There are 3 layers of this problem:
If I look at this issue so far, it is not clear which of these three problems are we trying to address.
So I think we should clarify better what we are trying to achieve to make it easier to follow. Also, I think we might surface why the desired output cannot achieved by simply naming KCD or the MachineDeployment according to the 'corporate naming policy'. Last but not least, the main point question that IMO we have to sort out before moving on to implementation details, Is if it is an acceptable trade off for everyone to sacrify the consistency of the UX inside the management cluster because we are avoiding/we cannot to fix point two and three... |
For MachineDeployments we can do this. However, IIUC a KCP cannot be replaced after the cluster has been created, which means that:
We discussed allowing flexible machine naming for only KCP, but the feeling was that it would be more consistent to apply it to both KCP and MD/MS. |
I want to leave the specifics of how providers name cloud resources out of this discussion. For the purposes of this issue, the only requirement is that cloud resources are named by a function of at least Machine name. Some (e.g. CAPO) use InfraMachine name, but as that is now identical to Machine name, those providers are covered. Others use additional inputs such as namespace. As long as Machine name is a component, they are covered. As you point out, there has never been a provider contract for cloud resource naming. Trying to impose one now may not be practical at all, but at the very least it is likely to take a very long time to implement thoroughly. I also don't think it's necessary if we control the inputs to whatever naming function the providers have chosen to implement. An example is that the previous effort to standardise these mentioned such issues as name length limitations and special characters. CAPI does not need to concern itself with these matters. As long as CAPI provides a well-understood set of inputs to a naming function, the cloud provider can implement any function which suits their environment, and their users will understand what effects various changes will make. I agree we could do with some documented guidance, though. So when the description talks about 'Machine name', 'cloud resource', and 'Node', the assumption is only that these things are related in some manner which an experienced user of a specific provider can easily understand and predict. I believe this is currently true in all cases. I should definitely clarify that! |
I think it is also worth noting that each provider can make promises on how the naming works if they like. Having control over the input is essential though and that is the part we can address here. |
Thanks @mdbooth and @lentzi90 for the clarification, from my point of view we now have a clean explanation of what we are trying to achieve and why. Considering all that, I'm personally ok we the ask, allow user-controlled naming of Machines + corresponding Infra Machine and bootstrap config object in Machine collections, given that this will be an opt-in feature and we can document what trade-off this implies (Implementation details TBD, but we already chatted about the option of doing something similar to naming strategies in CC for consistency). Let's hear if this is acceptable also for everyone else:
|
Sounds good to me! |
@mboersma to check this from the MachinePool PoV (if applicable) |
FYI @sbueringer already had some thoughts on this: https://kubernetes.slack.com/archives/C8TSNPY4T/p1715178789376599?thread_ts=1715173795.198159&cid=C8TSNPY4T tl;dr My initial suggestion was to add this naming scheme to MachineTemplateSpec so it would be inherited everywhere that machines are created from a template. @sbueringer pointed out that this would introduce it into the MachinePool API, which is unlikely to be able to support it. |
Unfortunately I don't think I'll make it for today as I've had some fires to put out. I think I can do this for next week, though. Would that work? |
I'll also try to join next week. apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: example
spec:
namingStrategy:
template: "{{ .cluster.name }}-{{ .kubeadmControlPlane.name }}-{{ .random }}"
...
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: example
spec:
namingStrategy:
template: "{{ .cluster.name }}-{{ .machineSet.name }}-{{ .random }}"
... |
Sounds good to me! The MachinePool topic is a bit complex. First, I think there are various reasons why we shouldn't add it to MachineTemplateSpec:
Regarding support of this feature in MachinePools specifically. Today we create Machines like this (if an InfraMachinePool supports MachinePool Machines):
So we don't have direct control over the name of the Machine because we just take the name of the InfraMachine. I have no idea:
So I don't know if it would be possible to establish a contract like "if MachinePool.spec.namingStrategy is set, InfraMachinePools have to calculate InfraMachine names based on the MachinePool.spec.namingStrategy.template". |
Thanks @sbueringer for the thorough explanation! I guess this answers the question "why (don't) do it this way". |
Should we take the opportunity to fix this mess? The field seems really to belong to MachineTemplateSpec IMO |
Where would you want to add it? // MachineTemplateSpec describes the data needed to create a Machine from a template.
type MachineTemplateSpec struct {
// Standard object's metadata.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
// +optional
ObjectMeta `json:"metadata,omitempty"`
// Specification of the desired behavior of the machine.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
// +optional
Spec MachineSpec `json:"spec,omitempty"`
} I think that MD.spec.template.namingStrategy is really not a good fit |
I need to carve out some time to take a look a this, I was just reacting to the list of problems above (BTW, we should probably track host problems somewhere, those are really good points) |
Definitely agree with solving these problems. It would be good if the "right place" to put the namingStrategy is an orthogonal concern and not influenced by some problems elsewhere. Independent of these problems my current take is that spec.namingStrategy is a good fit (but of course open for discussions :)) (I just like the idea that template is basically a "template" for the Machine, in the sense of that we ~ copy & paste ObjectMeta & Spec from there to generate a new Machine. I don't think that namingStrategy field in there fits, just like other fields we have directly in MD.spec vs in MD.spec.template) |
@fabriziopandini With the workaround in place we have some breathing space and I'm not in a rush. How long should I leave it? |
I will stick to the original plan that assumes the workaround should be deprecated now and go away by EOY (CAPI 1.9). |
Ok. I'll update the description before tomorrow's office hours. It'll be based on @lentzi90 's namingStrategy proposal. |
I'm not opposed to start the implementation from the control plane (TBD if we consider this something KCP specific or if we want somehow to make this a feature in the contract for all the control plane providers) WRT to the API, I would prefer consistency with something already exist:
|
Definitely out of scope for this, but I wonder if this is a layering issue: perhaps the KCP should have a machine collection rather than be a machine collection.
My preference would be generate name, but it would require a small amount of refactoring. The current order of creation is:
With a little refactoring this could be:
This would be equivalent to what we currently do, except the name would be generated server-side. Incidentally, do we have a resource leak here? We reconcile resources based on Machines, but we create the Machine last. e.g. if we create the InfraMachine and bootstrap but Machine creation fails, e.g. due to an ephemeral API interruption, what is going to cleanup the InfraMachine and the bootstrap? It looks to me like they will leak. |
Just a reminder. ObjectMeta is a struct that is used in various places. We cannot just add a new field there. It also very much makes it impossible to evolve the API going forward (How would we introduce something like template in a consistent way later on if we go with generateName now? That is the reason why we went with namingStrategy.template for ClusterClass) |
Can you expand on this? |
This is a half-baked idea and would be a huge architecture change, but... What if KCP referenced, e.g. a MachineDeployment which created and owned the machines. KCP could manage quorum with, e.g. the pre-drain hook. |
Was that for me? If it was for me, using generateName would be an implementation detail. It would not be exposed in the API. |
@mdbooth Yup
I thought this means you would add the field to the KCP type as |
According to the discussion going with |
@adilGhaffarDev I would be fine with someone opening a PR to implement it on the KCP object. Maybe let's do one round of review just with changes to the API type. Once we have consensus on that the rest of the implementation can be completed (on the same PR). I had some somewhat related discussions with some folks downstream. I would prefer implementing a namingStrategy.template field instead of namingStrategy.prefix, because:
I also found an additional use case, on Windows there are certain naming restrictions (https://learn.microsoft.com/en-us/windows/win32/adschema/a-samaccountname). Especially:
In general, it looks like interest on this decreased a bit since we introduced the flag. Just to be clear, we won't keep the flag forever just because there is no progress in implementing a proper alternative. |
Thanks @sbueringer , I will start implementing it and share the draft PR as soon as it's ready. I will go with namingStrategy.template implementation. |
Since we adding it in KCP, should we add the same naming strategy in MachineDeployments too? The issue is in both MD and KCP, just less severe in MD, as explained here:
|
Do you need it for MDs for your use case? (it's just that MDs require a bunch more work to surface it in Cluster.spec.topology etc.) |
Yes, it is needed, the flag was added for MDs too, and that flag will also be removed in 1.9, and will cause the issue again. In MDs a workaround is to create new MachineDeployments with the required name, but it's not an ideal situation and we should add a naming strategy for MD too. We can merge the KCP one first and I can open a separate PR with the changes for MD machine names. |
Yeah makes sense let's implement it via 2 separate PRs |
This would be the long-term solution to the problem discussed in #10463
What would you like to be added (User Story)?
As a user, I must ensure all my nodes are named according to a corporate policy which, amongst other things, ensures that alerts from centralised monitoring are directed to the correct team.
Additional Context
Naming of Nodes
Although other configurations are technically possible, the name of a Node will almost always be based on the name of the cloud resource which is hosting the Node. This may be taken from the hostname of the guest OS, which will typically have been served to the OS on startup from a metadata service, or may have been set directly in kubelet configuration based on information queried from the same metadata service.
Naming of cloud resources
CAPI has never had a contract covering the naming of cloud resources by infrastructure providers. Consequently different providers have settled on different conventions. That I am aware of, various different infrastructure providers use some combination of:
In the following, we assume only that cloud resources (and therefore Nodes) are named by some function of these inputs which an end-user of the infrastructure provider can understand intuitively.
Breaking change
#9833 changed the creation of InfrastructureMachine objects to have the same name as their associated Machine object. In general this change is welcome, as now the bootstrap, Machine, and InfrastructureMachine all have the same name. This simplifies operating the cluster.
However, it changed one of the inputs to our naming function. The infrastructure machine was previously named after the template it was generated from. With this change it is now unrelated to the infrastructure machine template. This changes the naming of Nodes on any platform which uses InfrastructureMachine name in its naming function, which includes OpenStack.
MachineDeployments and MachineSets
MachineDeployments are affected because the InfrastructureTemplate name used to be the basis of the InfrastructureMachine name and therefore an input to the cloud resource naming function, but it no longer is.
However, as the MachineDeployment name now serves the same naming function as the InfrastructureTemplate name, a workaround is to create new MachineDeployments with the required name, instead of relying on the name of the InfrastructureTemplate.
KubeadmControlPlane
Control plane machines created by KubeadmControlPlane are affected for the same reason as MachineDeployments.
However, unlike with MachineDeployments, it is not possible to create a new KubeadmControlPlane for an existing cluster. This means that it is not possible to reflect Node naming policy in control planes for:
Detailed Description
I propose to make the smallest possible API change to enables flexible Node naming for all Nodes. As the only controller with no reasonable workaround is KubeadmControlPlane, I propose to add a naming field to KubeadmControlPlane.
We will add a new field to
KubeadmControlPlaneMachineTemplate
:In a yaml template this would look like:
The code implementing this change will live in:
The latter change will ensure that updating namePrefix will result in a control plane rollout.
Updating MachineDeployment/MachineSet
We had previously discussed applying the same solution to MachineDeployment for consistency. We can still do that, but as it would be a distinct API change I propose we consider it separately. However, on reflection I now believe we should not do this at all.
Firstly, I think we should not do this because it is a completely distinct change which must be made separately to both separate APIs, and separate implementations, and it is not required by the use case.
Secondly, the naming of objects according to their owning collection is a common and convenient pattern in kubernetes, which we would be breaking. Consistent naming was the principal driver for #9833 and despite this specific regression it remains desirable. If the use case can be covered without separating the naming of Machines from their owning MachineDeployment, I think that is preferrable.
Lastly, we cannot be entirely consistent in this in any case, because the proposed API cannot be implemented by MachinePools. Given that there will be inconsistency in any case, I prefer to reduce the scope of the 'special case' to the smallest possible which covers the use case.
Anything else you would like to add?
Relevant slack discussion here: https://kubernetes.slack.com/archives/C8TSNPY4T/p1715173795198159
Label(s) to be applied
/kind feature
/area provider/control-plane-kubeadm
/area machineset
/area machinedeployment
The text was updated successfully, but these errors were encountered: