-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 add ipam integration proposal #6000
Conversation
Skipping CI for Draft Pull Request. |
|
||
// IPAddressClaimStatus contains the status of an IPAddressClaim | ||
type IPAddressClaimStatus struct { | ||
// Address is a reference to the address that was allocated for this claim. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we foresee the need conditions here to coordinate with the provider and signal the lifecycle of the claim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, like IP exhaustion.
Maybe here or maybe IPAddressStatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I thought about that a bit, but didn't get around to specify them yet. There needs to be some way to signal errors like exhausted pools for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To start with, we do not have to delve into extreme details on the specific conditions necessary for this. Just adding a Conditions
array to the IPClaim status might be all that is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea. Ultimately the conditions could also be provider-specific.
It might make sense to have a Summary
or Phase
field that provides a set of values that is identical between all providers, and then have additional details in the conditions. E.g. Pending
, Bound
, Error
.
```go | ||
// IPAddressClaimSpec describes the desired state of an IPAddressClaim | ||
type IPAddressClaimSpec struct { | ||
// Pool is a reference to the pool from which an IP address should be allocated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given an IP Pool is an arbitrary API of choice for the IPAM providers, wouldn't it be possible for multiple IPAM providers to coexist and have pools with the same name? i.e wouldn't this need to be a TypedLocalObjectReference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this is not really clear here: From the work on moving away from v1.ObjectReference I had some opinion on those reference types. This one would actually consist of Kind, Name and APIVersion. Not sure if that makes sense.
I planned to open a PR that adds those types to the current v1beta1 API to be able to be used for further additions to that version, as well as to hopefully migrate to them with v1beta2 (even though its arguably a breaking change). I think with the types already being there and discussed already, it would be a lot easier to migrate, since the discussion doesn't have to be part of the migration anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what @enxebre pointed that, it would be necessary to include the Kind, APIVersion and Name of the IPPool since the IPPool
implementation is tied to the IPAM providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to Group, Kind and Name. Version is explicitly excluded as it actually does not make sense to have that. The client should decide on it's own which version it wants to consume.
Is this proposal sharing any concept or implementation detail with podCIDRs for nodes proposal https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2593-multiple-cluster-cidrs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally +1 about this proposal because it re-uses a pattern that already exists in the ecosystem and it addresses a problem getting more and more relevant
|
||
- Only the custom resource definitions for IPAddress and IPAddressClaim, and relevant webhooks should become part of the main cluster-api project | ||
- An example provider that provides in-cluster IP Address management (also useful for testing, or when no other IPAM solution is available) should be implemented as a separate project in a separate repository. | ||
- A shared library for providers and consumers should be implemented as part of that extra project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a few notes on what the API library is expected to offer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will once I figure that out exactly 😄 I'm working on a reference in-cluster implementation, which might help with that. Not sure if it's necessary/helpful to try to design this beforehand, or if it's easier to build it when creating the first few implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense; If possible we should document high level the capabilities we expect this library should offer, otherwise let's just add a note that details will be added in a follow up iteration
network: | ||
devices: | ||
- dhcp4: false | ||
fromPool: # reference to the pool | ||
group: ipam.cluster.x-k8s.io/v1alpha1 | ||
kind: IPPool | ||
name: testpool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there was already some back and forth, but given the fact this is getting relevant from a growing number of providers, I'm wondering if we should move this to Machine level (we can also get there incrementally, but why not now?)
@vincepri @CecileRobertMichon @enxebre @sbueringer @yastij opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not opposed though based on the journey that took us from v1alpha1 to v1alpha2 where we figured the new abstractions as CRDs and the Refs and we are still evolving conditions, I think It'd be good to see some real usage to gain some knowledge, see a few ipam implementations and how's adoption from the providers before we figure the best way to plumb it into core API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would certainly make this proposal easier. Claims could be created by CAPI. Providers would still be required to fetch IPAddresses though, if they are used.
The hard part probably is figuring out all available parameters that can be set on Machine level, and how to add additional parameters on provider side in a good way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vincepri @CecileRobertMichon @sbueringer @yastij opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be definitely save the providers from doing one-off code changes for creating the Claim objects. I will group this as an enhancement in the similar bucket as failure domains. Specifying the IPPool will follow the Ref conventions seen all through CAPI objects.
Plus it makes more sense for Machine objects to create the IPClaim
s as the proposal intends to introduce these API types as part of the CAPI codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it makes more sense to have this as an optional field as part of Core vs. distributing it across infra providers.
One issue is definitely API stability and guarantees. But as a lot of providers are also already on v1beta1 I think the impact would be less if we add the field to the core Machine CRD vs in a bunch of infra providers. (with the usual feature gate mechanism)
We still have some room for changes until a potential v1beta2 or v1.
It definitely brings up the question how we would integrate this with ClusterClass as InfraMachineTemplates can be provided via templates and even patched while a MachineDeployment is generated based on fields in the topology. But I think we have a few options to make that happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm given that we need the CRDs and at least a webhook in core CAPI already I lean towards making it a part of core CAPI so we can kind of make sure that everything works as intended / coordinate if necessary
(more or less like we already handle other contracts like the InfrastructureMachine where we have a machine controller which coordinates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go the path of including in core types, let's make sure the proposal is updated to reflect this and elaborate behaviour that that core controllers would own, e.g creating/deleting the IPClaim if there's a pool, etc.
531523a
to
f312f31
Compare
For a WIP implementation of this proposal in CAPI see #6313. |
|
||
#### Additional Notes | ||
|
||
- An example provider that provides in-cluster IP Address management (also useful for testing, or when no other IPAM solution is available) should be implemented as a separate project in a separate repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we go even further and make sure the core types contain useful information about address pools and can be managed without external information?
IP space management is a standard we can rely upon, this might be a bit of change from the current proposal, but shouldn't an IPAM provider integrate and inform Cluster API rather than providing the entire set of functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think this going to be especially important If we want to leverage carving out huge CIDRs into multiple small ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you want to add the in-cluster provider to CAPI directly? When I started with this proposal I got the impression that you wouldn't want to have something like that as part of core CAPI.
We can definitely include a standard IPPool
or InClusterIPPool
as part of CAPI. It's not that much anyway, see telekom/cluster-api-ipam-provider-in-cluster.
Be aware that the proposal currently doesn't include separating large Pools into smaller subnets. It's only about allowing to integrate with the machine creation process, as there is currently no way to automate that. Allocating Subnets from a larger pool is very interesting in conjunction with ClusterClass, but is currently not part of this proposal at all, and could also be another iteration later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the proposal in its current state and I think what is proposed is consistent with the goal to allow CAPI to consume a set of IP pools defined and managed by something external to CAPI and I think that this is already something valuable.
If I got it right from a CAPI cor PoV:
- We should host two CRD - with minimal validation logic and no controllers - plus a library of functions to help implementation in providers
- Ideally in CAPI there should be also a reference implementation of an IPAM provider.
- It is not clearly defined how to allow users to install IPAM providers (e.g with clusterctl), but personally, I'm ok to defer this to the next iteration of this proposal.
- There are no impacts on ClusterClass, because IPAM management could be activated by configuring infrastructureMachineTemplates (eventually via patches)
@schrej Is this an accurate recap?
I'll defer to @yastij @CecileRobertMichon @enxebre a final check from providers PoV, given that most of the implementation should happen on provider's side
@fabriziopandini yes, that's a very good summary, thank you! |
I think Vince had some thoughts about reconsidering the approach #6000 (comment). This lgtm otherwise. |
As decided in 20th April office hours |
For tracking, those are the outcomes of the discussion we had beginning this week: Trying to summarise main point discussed today about the current proposal:
Given above considerations the intent is to ask in the community meeting to proceed with current proposal after a lazy consensus, and eventually re-evaluate convergence to a shared network/machine network model in a follow up step after gaining more expertise in the domain /lgtm |
The lazy consensus on this PR expires today. Rebasing to fix broken links check. |
Quick note: Vince is on PTO and should be back on Monday (afaik) |
I'll look at this this morning and go back on PTO :) |
I think with this being open since 3 months it can also wait until Monday, enjoy your PTO! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold cancel
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This adds a proposal to add an API Contract for IP address management to Cluster API.
Also fixes the name of the cluster class proposal so the order of the proposals is correct.
Which issue(s) this PR fixes:
cc due to previous interest: @randomvariable @fabriziopandini @maelk @lubronzhan @MaxRink @Cellebyte @macaptain @yastij