-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDN-4604: Networking: egress IP per destination proposal #1594
base: master
Are you sure you want to change the base?
Conversation
@martinkennelly: This pull request references SDN-4604 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@martinkennelly: GitHub didn't allow me to request PR reviews from the following users: martinkennelly. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
04adba5
to
1435ec2
Compare
/unhold |
With the Kubernetes label selector semantics, cluster administrators can define a set of destination networks that multiple `EgressIP` | ||
CRs can consume. | ||
|
||
If greater than one `EgressIPTraffic` CRs are selected, the networks are added together, and if a destination IP is within one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the EgressIPTraffic have overlapping subnets? Do we do longest prefix match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they have overlapping subnets, its not a concern because all we care about is if the destination IP is within one of the subnets. Once one matches, SNATing occurs.
Lets say we have EIPTraffic1
with networks 1.1.1.0/24
and EIPTraffic2
with networks 1.1.0.0/16
. The aforementioned EgressIPTraffic
CRs are selected by an EIP and if a destination IP falls within either, then SNAT occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the reverse "overlap" occurs where two EgressIP on the same pod have overlapping destination CIDR? It could be the same EgressIPTraffic matching two EIP, or separate EgressIPTraffic with overlapping CIDRs on the two EIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a pod attempts to connect to an IP in a network that is selected by multiple `EgressIP`, then the behaviour is undefined, and either `EgressIP` maybe selected as source IP.
Added this to the enhancement.
CRs can consume. | ||
|
||
If greater than one `EgressIPTraffic` CRs are selected, the networks are added together, and if a destination IP is within one | ||
of the networks, then the pods source IP will be the `EgressIP`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if TrafficSelector is specified, and the traffic does not match any of the destinations. Does it get dropped? Does it SNAT out its normal node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't get dropped, just SNATing to an EIP does not occur but SNAT to node IP occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm... Is it correct that in this case the packet would egress at the node where the pod is hosted (instead of being forwarded to one of the egress-able nodes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it would egress the node where the pod is hosted.
|
||
If greater than one `EgressIPTraffic` CRs are selected, the networks are added together, and if a destination IP is within one | ||
of the networks, then the pods source IP will be the `EgressIP`. | ||
If `TrafficSelector` is not specific, existing `EgressIP` behaviour is conserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is adding a TrafficSelector with 0.0.0.0/0 the same as not specifying a traffic selector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because theres no traffic constraints specified.
If greater than one `EgressIPTraffic` CRs are selected, the networks are added together, and if a destination IP is within one | ||
of the networks, then the pods source IP will be the `EgressIP`. | ||
If `TrafficSelector` is not specific, existing `EgressIP` behaviour is conserved. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if someone specifies Egress IP A with a TrafficSelector and Egress IP B with no traffic selector, and they both match on the same pod. The pod sends traffic that does not match the TrafficSelector, will it still be sent with Egress IP A?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That scenario would be undefined - either EgressIP
CR can be "active" but not both because EgressIP B would select all traffic.
Today, if 2 EgressIP
s select the same set of pods, one is active and the other is a standby.
With this enhancement, we allow overlapping (i.e. select set of one or more pods) EgressIP
CRs only if the selected EgressIPTraffic
destination networks do not overlap. If the destination networks overlap, then the behaviour is undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining this question with the previous one (a selector of 0.0.0.0/0). I would anticipate that a common pattern would be a pod with 2 egress networks where one is the "default" and the other hosts specific functionality on a given subnet. In this scenario the user would likely set EIP-A with a specific traffic selector (eg 1.2.3.0/24) and EIB-B with the default 0.0.0.0/0 to catch everything else.
- Would these be treated as overlapping and fall into the one-active-one-standby scenario?
- If so, would it be possible to allow at least this narrower use of overlap where one of the EgressIPTraffic contains the default route? Or could one EgressIPTraffic have an
isDefault: <true|false>
specifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for this insight from the users perspective.
This would complicate the implementation. We've have to have multiple logical router policy priorities to implement this.
Would these be treated as overlapping and fall into the one-active-one-standby scenario?
Good question and this makes me rethink the existing one-active one standby and instead if there are overlapping networks (like your example), its simply undefined which one gets selected.
Id love to do what you suggested but it increases the complexity of the implementation that I am slow to agree to.
If so, would it be possible to allow at least this narrower use of overlap where one of the EgressIPTraffic contains the default route? Or could one EgressIPTraffic have an isDefault: <true|false> specifier?
We could do this in a future RFE if its a limitation. If we added this, we'd need to add new logical route policies in OVN and new rule priorities in iproute2 ip rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question pending with the customer on whether this aspect is a limitation that can be worked with, or if it is seen as a serious issue. I'll follow up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up with customer did not raise any concerns.
// EgressIPTrafficSpec defines the desired state description of EgressIPTraffic. | ||
// +kubebuilder:validation:MaxProperties=1 | ||
// +kubebuilder:validation:MinProperties=1 | ||
type EgressIPTrafficSpec struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with this CRD we can specify a list of networks. I expect some customers will want to also match by layer 4 ports. Should we include that in scope for this enhancement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Id rather leave it out of this enhancement as it wasnt a defined goal - but its up to you, if you feel strongly about it.
|
||
### API Extensions | ||
|
||
Add `TrafficSelector` field to `EgressIPSpec`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this exist (?) type defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" | ||
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object | ||
// EgressIPTraffic defines a set of networks outside the cluster network. | ||
type EgressIPTraffic struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plan to featuregate the type: https://github.com/openshift/api/blob/master/example/v1alpha1/types_notstable.go#L19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
// TrafficSelector applies the egress IP only to the network traffic defined within the selected EgressIPTraffic(s). | ||
// If not set, all egress traffic is selected. | ||
// +optional | ||
TrafficSelector metav1.LabelSelector `json:"trafficSelector"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when a single traffic selector matches many egressIPSpecs?
What happens when a traffic selector doesn't match any egressIPSpecs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when a single traffic selector matches many egressIPSpecs?
Ill assume you meant matchs many EgressIPTraffic
CRs. Ive added an update to the enhancement.
What happens when a traffic selector doesn't match any egressIPSpecs?
Ditto.
|
||
```golang | ||
// EgressIPSpec is a desired state description of EgressIP. | ||
type EgressIPSpec struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plan to featuregate the new fields: // +openshift:enable:FeatureGate=Example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@trozet , will i have to create a downstream only ovn-k patch to add this featuregate restriction to the CRD?
yes, in CNO
A new `TrafficSelector` field is added to `EgressIP` spec which uses Kubernetes label selector semantics | ||
to select a new CRD called `EgressIPTraffic` which contains a list of destination networks. This list of networks defines when an `EgressIP` should | ||
be used as source IP when an application attempts to talk to a destination address, and it is within one of the predefined networks | ||
defined within a `EgressIPTraffic` CR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows a user to define the external networks once per network (ie in the EgressIPTraffic CR) which is really nice for maintenance. If a newly created subnet is reachable behind an interface the cluster admin only needs to update the EgressIPTraffic CR which contains the reachable routes for that interface and all EgressIPs making use of it will be updated.
One question...
- When adding another reachable subnet via an interface, is there a difference in functionality/performance if the user creates an additional EgressIPTraffic CR (with the same label) vs adding an additional CIDR within the existing EgressIPTraffic CR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope - no data path performance difference and negligible control plane cpu usage increase.
With the Kubernetes label selector semantics, cluster administrators can define a set of destination networks that multiple `EgressIP` | ||
CRs can consume. | ||
|
||
If greater than one `EgressIPTraffic` CRs are selected, the networks are added together, and if a destination IP is within one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the reverse "overlap" occurs where two EgressIP on the same pod have overlapping destination CIDR? It could be the same EgressIPTraffic matching two EIP, or separate EgressIPTraffic with overlapping CIDRs on the two EIP.
CRs can consume. | ||
|
||
If greater than one `EgressIPTraffic` CRs are selected, the networks are added together, and if a destination IP is within one | ||
of the networks, then the pods source IP will be the `EgressIP`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm... Is it correct that in this case the packet would egress at the node where the pod is hosted (instead of being forwarded to one of the egress-able nodes)?
If greater than one `EgressIPTraffic` CRs are selected, the networks are added together, and if a destination IP is within one | ||
of the networks, then the pods source IP will be the `EgressIP`. | ||
If `TrafficSelector` is not specific, existing `EgressIP` behaviour is conserved. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining this question with the previous one (a selector of 0.0.0.0/0). I would anticipate that a common pattern would be a pod with 2 egress networks where one is the "default" and the other hosts specific functionality on a given subnet. In this scenario the user would likely set EIP-A with a specific traffic selector (eg 1.2.3.0/24) and EIB-B with the default 0.0.0.0/0 to catch everything else.
- Would these be treated as overlapping and fall into the one-active-one-standby scenario?
- If so, would it be possible to allow at least this narrower use of overlap where one of the EgressIPTraffic contains the default route? Or could one EgressIPTraffic have an
isDefault: <true|false>
specifier?
// match this pod selector. | ||
// +optional | ||
PodSelector metav1.LabelSelector `json:"podSelector,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be an update to the EgressIP status fields to show what EgressIPTraffic CRs are bound and/or the set of resolved destination CIDR applicable to this EIP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trozet WDYT? I am slow to add a new status field for this because it would be optionally filled.
All traffic egress-ing the cluster will be SNAT'd. | ||
|
||
If a user defines an `EgressIP` CR and populates the `TrafficSelector` field with a selector, and it matches or does not match an `EgressIPTraffic` | ||
CR but regardless, there are no defined networks among the selected `EgressIPTraffic` CRs, then no cluster external traffic is SNAT'd. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also the same case as the question @trozet asked above about a packet which does not fall within any CIDR in any selected EgressIPTraffic CRs? In this case there is no SNAT due to EgressIP but the packet may (would?) be SNAT to the node's interface IP as it egresses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its sent out the primary interface and SNAT'd to that IP.
## Proposal | ||
|
||
A new `TrafficSelector` field is added to `EgressIP` spec which uses Kubernetes label selector semantics | ||
to select a new CRD called `EgressIPTraffic` which contains a list of destination networks. This list of networks defines when an `EgressIP` should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is implied, but to confirm, these aspects of EgressIP do not change as a result of this enhancement (either with or without a traffic selector applied):
- Traffic egressing under an EIP (either without traffic selector or as a result of being selected) will be load balanced across multiple IP addresses defined within that EIP.
- Placement of IP addresses from an EIP is based on egress-assignable nodes and interfaces with appropriate subnets to host the IP.
One aspect of this is that the user may design networking connectivity as desired/needed. The multiple egress networks (EIP) do not necessarily need to be connected to the same node(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traffic egressing under an EIP (either without traffic selector or as a result of being selected) will be load balanced across multiple IP addresses defined within that EIP.
yes
Placement of IP addresses from an EIP is based on egress-assignable nodes and interfaces with appropriate subnets to host the IP.
yes
The cluster admin labels this node egress-able. | ||
The cluster admin creates two `EgressIPTraffic` CRs, specifies a destination network in each and labels each CR differently. | ||
The cluster admin creates two `EgressIP` CRs each containing an IP that falls within the previously added networks range. | ||
The two `EgressIP`s selects the same set of pods but different `TrafficSelector`s to select the previously created `EgressIPTraffic` CRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be an invalid configuration if the EgressIP CRs had IP addresses (not traffic selection cidr) which overlap on the same subnet (ie potentially would egress the same interface on the egress-able nodes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following - can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just a theoretical case...
EIP-a has eip 1.1.1.1 and traffic selector of 10.10.10.0/24
EIP-b has eip 1.1.1.2 and traffic selector of 11.11.11.0/24
Both EIP could be bound to the same interface on an egress node, but traffic going to the 10 net gets SNAT to 1.1.1.1 and traffic going to the 11 net gets SNAT to 1.1.1.2.
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
1435ec2
to
e1c548f
Compare
@trozet @cgoncalves @deads2k @imiller0 Ive updated the enhancement following your comments. PTAL. |
@martinkennelly: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Debugging / introspection of this feature is not defined within this enhancement and requires definition. I will add. |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle rotten |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle rotten |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/hold
/cc