-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WireGuard tunnels for Multi-cluster traffic #4606
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4606 +/- ##
==========================================
- Coverage 69.79% 69.60% -0.20%
==========================================
Files 402 403 +1
Lines 59570 59981 +411
==========================================
+ Hits 41579 41747 +168
- Misses 15189 15424 +235
- Partials 2802 2810 +8
*This pull request uses carry forward flags. Click here to find out more.
|
@hjiajing please update your PR summary with well-structure sentences, and also include summary in your top commit to help developer to check what does the commit do without checking the PR in the future. |
2d4d744
to
5ca2c28
Compare
08769ca
to
2154901
Compare
014d4ae
to
4ccdccc
Compare
multicluster/controllers/multicluster/member/gateway_controller.go
Outdated
Show resolved
Hide resolved
pkg/agent/openflow/pipeline.go
Outdated
@@ -377,7 +377,8 @@ var DispositionToString = map[uint32]string{ | |||
var ( | |||
// snatPktMarkRange takes an 8-bit range of pkt_mark to store the ID of | |||
// a SNAT IP. The bit range must match SNATIPMarkMask. | |||
snatPktMarkRange = &binding.Range{0, 7} | |||
snatPktMarkRange = &binding.Range{0, 7} | |||
multiclusterPktMarkRange = &binding.Range{0, 31} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to add a comment for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need this range as the mark is constant. LoadPktMarkRange accepts nil as the second parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One bit is enough if the mark is constant, like 0x10/0x10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
multicluster/controllers/multicluster/member/gateway_controller.go
Outdated
Show resolved
Hide resolved
e25e55c
to
f5fe28f
Compare
pkg/agent/openflow/pipeline.go
Outdated
@@ -377,7 +377,8 @@ var DispositionToString = map[uint32]string{ | |||
var ( | |||
// snatPktMarkRange takes an 8-bit range of pkt_mark to store the ID of | |||
// a SNAT IP. The bit range must match SNATIPMarkMask. | |||
snatPktMarkRange = &binding.Range{0, 7} | |||
snatPktMarkRange = &binding.Range{0, 7} | |||
multiclusterPktMarkRange = &binding.Range{0, 31} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need this range as the mark is constant. LoadPktMarkRange accepts nil as the second parameter.
pkg/agent/openflow/multicluster.go
Outdated
|
||
// MulticlusterPktMark is the packet marker used for mark the cross-cluster | ||
// traffic in Antrea Multi-cluster. | ||
MulticlusterPktMark uint32 = 0x4d2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if 0x4d2 (1234 in decimal) is a meaningful value here. Egress allocates [1,255] for the pkt_mark. So in theory any value above 255 should be OK. But we might want to reservice more bits in case we want to extend the egress feature in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Just use one bit now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some description about how the traffic of a Pod is forwarded to another cluster via WireGuard.
A question about the design: why policy-based routing (ip rule, pkt mark and another route table) is used? Is there any possible to implement the target by adding static route in main route table? If we introduce policy-based routing in this PR, do you think we can use policy-based routing in other components of Antrea @jianjuns @tnqn ?
pkg/agent/openflow/pipeline.go
Outdated
@@ -377,7 +377,8 @@ var DispositionToString = map[uint32]string{ | |||
var ( | |||
// snatPktMarkRange takes an 8-bit range of pkt_mark to store the ID of | |||
// a SNAT IP. The bit range must match SNATIPMarkMask. | |||
snatPktMarkRange = &binding.Range{0, 7} | |||
snatPktMarkRange = &binding.Range{0, 7} | |||
multiclusterPktMarkRange = &binding.Range{0, 31} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One bit is enough if the mark is constant, like 0x10/0x10.
} | ||
|
||
func (c *MCRouteController) initializeWireGuardInterface() error { | ||
c.wireGuardConfig.MTU = c.nodeConfig.NodeMTU - config.WireGuardOverhead - config.GeneveOverhead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why geneve overhead is included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, @hjiajing you should check current tunnel type in NetworkConfig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice, will do it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use NetworkConfig.MTUDeduction
in Lan's PR #4407.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminder, will rebase and fix it .
9017a57
to
d979b4e
Compare
/test-all |
@hjiajing could you check if possible to improve the patch test coverage? |
@hjiajing please resolve the file conflict. |
/test-all |
/test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, a few nits on the test file. @jianjuns could you take a look again? thanks.
multicluster/config/crd/bases/multicluster.crd.antrea.io_gateways.yaml
Outdated
Show resolved
Hide resolved
pkg/agent/route/route_linux.go
Outdated
LinkIndex: linkIndex, | ||
} | ||
|
||
return c.netlink.RouteDel(route) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should check the error is not NotFound, otherwise if the route doesn't exist somehow, the caller will keep retrying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A check is added, when got "no such process", which means the route does not exist, the function will return nil
e02018e
to
0055bef
Compare
@tnqns Hi Quan. I have addressed the comments you left, but after I pushed the latest code, some comments disappeared both in the conversation and |
remoteWireGuardIP := serviceIP.Mask(serviceCIDR.Mask) | ||
remoteWireGuardNet := &net.IPNet{IP: remoteWireGuardIP, Mask: net.CIDRMask(32, 32)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the value is provided be user, not discovered by program?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, some minor comments
_, serviceCIDR, err := net.ParseCIDR(spec.ServiceCIDR) | ||
if err != nil { | ||
klog.Warningf("Failed to get the peer Gateway IP for WireGuard, error: %v", err) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can now remove the error handling of net.ParseCIDR
assuming the CIDR in the spec is either empty or valid. We should validate input before it's accepted and no need to validate it again when consuming it.
func getLocalGatewayIP(gateway *mcv1alpha1.Gateway, enableWireGuard bool) net.IP { | ||
if enableWireGuard { | ||
if gateway.ServiceCIDR == "" { | ||
klog.InfoS("The ServiceCIDR has not been updated, skip.", "Gateway", klog.KObj(gateway)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.InfoS("The ServiceCIDR has not been updated, skip.", "Gateway", klog.KObj(gateway)) | |
klog.InfoS("The ServiceCIDR of the Gateway has not been updated, skip it", "Gateway", klog.KObj(gateway)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if err != nil { | ||
klog.Warningf("Failed to get the local Gateway IP for WireGuard, error: %v", err) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
if cache.Spec.ServiceCIDR == cur.Spec.ServiceCIDR && cache.Spec.WireGuard != nil && cur.Spec.WireGuard != nil && | ||
cache.Spec.WireGuard.PublicKey == cur.Spec.WireGuard.PublicKey { | ||
return false | ||
} | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should return false when both old WireGuard and new WireGuard are nil.
if cache.Spec.ServiceCIDR == cur.Spec.ServiceCIDR && cache.Spec.WireGuard != nil && cur.Spec.WireGuard != nil && | |
cache.Spec.WireGuard.PublicKey == cur.Spec.WireGuard.PublicKey { | |
return false | |
} | |
return true | |
if cache.Spec.ServiceCIDR != cur.Spec.ServiceCIDR { | |
return true | |
} | |
if cache.Spec.WireGuard == nil && cur.Spec.WireGuard == nil { | |
return false | |
} | |
if cache.Spec.WireGuard == nil || cur.Spec.WireGuard == nil { | |
return true | |
} | |
return cache.Spec.WireGuard.PublicKey != cur.Spec.WireGuard.PublicKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/agent/route/route_windows.go
Outdated
} | ||
|
||
func (c *Client) DeleteRouteForLink(dstCIDR *net.IPNet, linkIndex int) error { | ||
return errors.New("AddRouteForLink is not implemented on Windows") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return errors.New("AddRouteForLink is not implemented on Windows") | |
return errors.New("DeleteRouteForLink is not implemented on Windows") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Add a traffic encryption mode for cross-cluster traffic. If WireGuard enabled, corresponding WireGuard configuration will be created on the Gateway Node. And all cross-cluster traffic will go through the WireGuard tunnel to remote Gateway. Signed-off-by: hjiajing <hjiajing@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jianjuns @luolanzone do you have other comments for this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianjuns @luolanzone do you have other comments for this one?
I am good.
/test-multicluster-e2e |
/test-multicluster-e2e |
…ntrea-io#4606) Add a traffic encryption mode for cross-cluster traffic. If WireGuard is enabled, corresponding WireGuard configuration will be created on the Gateway Node. And all cross-cluster traffic will go through the WireGuard tunnel to remote Gateway. Signed-off-by: hjiajing <hjiajing@vmware.com>
Add WireGuard tunnels mode for Multi-cluster traffic. WireGuard tunnels will be established between member clusters and responsible for all cross-cluster traffic in this mode.
Implementation:
10.10.1.0/24
, then the WireGuard interface's IP address is10.10.1.0
.