Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify some functions in pkg/agent/route #3889

Merged
merged 1 commit into from
Apr 4, 2023

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Jun 13, 2022

This PR unifies the functions in route_linux.go and route_windows.go:

  • AddLoadBalancer
  • DeleteLoadBalancer

Remove:

  • DeleteClusterIPRoute
  • AddClusterIPRoute

Fix:

  • Add the route for calculated ClusterIP CIDR to Service route cache,
    then the route can be restored when it was deleted since members of
    the route cache are synchronized periodically.

Signed-off-by: Hongliang Liu lhongliang@vmware.com

@hongliangl hongliangl added action/release-note Indicates a PR that should be included in release notes. area/OS/windows Issues or PRs related to the Windows operating system. labels Jun 13, 2022
@hongliangl
Copy link
Contributor Author

/test-windows-proxyall-e2e

@hongliangl
Copy link
Contributor Author

/test-windows-proxyall-e2e

@hongliangl
Copy link
Contributor Author

/test-windows-e2e

@hongliangl
Copy link
Contributor Author

/test-all

@hongliangl
Copy link
Contributor Author

/test-e2e

pkg/agent/route/route_windows.go Show resolved Hide resolved
return fmt.Errorf("error listing ip routes: %w", err)
}
for _, rt := range routes {
if rt.GatewayAddress.Equal(gw) && !rt.DestinationSubnet.IP.Equal(svcIP) && rt.DestinationSubnet.Contains(svcIP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly installed route in line 269 also satisfies this condition, so is it possible it is removed by mistake?

Copy link
Contributor Author

@hongliangl hongliangl Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since AntreaProxy only has one worker (goroutine), it cannot be removed by mistake since it is only used when the first Service is installed, and curClusterIPCIDR is nil.

pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
nodeConfig *config.NodeConfig
networkConfig *config.NetworkConfig
nodeRoutes *sync.Map
serviceRoutes *sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find functions to load route entry from serviceRoutes, all calls are for store or delete. Is this cache really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to explain what is stored in serviceRoutes?

pkg/agent/route/route_windows.go Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jun 21, 2022

Codecov Report

Merging #3889 (161692c) into main (6590c2a) will decrease coverage by 4.15%.
The diff coverage is 21.05%.

❗ Current head 161692c differs from pull request most recent head 3975a91. Consider uploading reports for the commit 3975a91 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3889      +/-   ##
==========================================
- Coverage   71.95%   67.81%   -4.15%     
==========================================
  Files         406      403       -3     
  Lines       60766    58449    -2317     
==========================================
- Hits        43723    39635    -4088     
- Misses      14113    16046    +1933     
+ Partials     2930     2768     -162     
Flag Coverage Δ *Carryforward flag
e2e-tests 38.59% <21.05%> (+0.20%) ⬆️
integration-tests 34.47% <ø> (+0.69%) ⬆️ Carriedforward from 3ecf6fd
kind-e2e-tests 38.55% <ø> (-8.56%) ⬇️ Carriedforward from 3ecf6fd
unit-tests 59.97% <ø> (-2.65%) ⬇️ Carriedforward from 3ecf6fd

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <ø> (ø)
pkg/agent/agent.go 55.87% <ø> (-1.20%) ⬇️
...gent/controller/noderoute/node_route_controller.go 63.93% <0.00%> (-5.53%) ⬇️
pkg/agent/nodeportlocal/rules/netnat_rule.go 0.00% <ø> (ø)
pkg/agent/proxy/proxier.go 66.40% <ø> (-1.96%) ⬇️
pkg/agent/route/route_windows.go 30.76% <ø> (-6.54%) ⬇️
pkg/agent/util/net.go 47.56% <0.00%> (-35.39%) ⬇️
pkg/agent/util/net_windows.go 7.25% <ø> (-76.68%) ⬇️
pkg/agent/route/route_linux.go 69.42% <23.52%> (-4.66%) ⬇️

... and 106 files with indirect coverage changes

pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
}
}
}

// Remove stale routes.
for _, rt := range staleRoutes {
if err = netlink.RouteDel(rt); err != nil {
return fmt.Errorf("failed to uninstall stale ClusterIP route %s: %w", rt.String(), err)
if err.Error() == "no such process" {
klog.InfoS("Failed to delete stale ClusterIP route since the route doesn't exist", "route", route)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this log if the routing entry does not exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was introduced in this PR https://github.com/antrea-io/antrea/pull/3827/files.

for i := 0; i < len(routes); i++ {
if routes[i].Gw.Equal(gw) && !routes[i].Dst.IP.Equal(svcIP) && routes[i].Dst.Contains(svcIP) {
// Skip the route whose destination CIDR only has one address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip the routing entry for a single IP address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The routing entry for a single IP address (mask is 32 or 128) could be for LoadBalancer or the first ClusterIP, and such routing entry should be reserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code comment is unclear and puzzling.
And I feel the above explanation still doesn't make sense. It will never delete stale LoadBalancer IPs, and it doesn't make sense to handle first ClusterIP in this way.

Copy link
Contributor Author

@hongliangl hongliangl Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

// Skip the route whose destination CIDR has only one address since it could be the route for the first
// ClusterIP newly installed. Note that, this could also match a route for a LoadBalancer ingress IP, but
// it doesn't affect skipping the route for the first ClusterIP.

@tnqn

nodeConfig *config.NodeConfig
networkConfig *config.NetworkConfig
nodeRoutes *sync.Map
serviceRoutes *sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to explain what is stored in serviceRoutes?

pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Show resolved Hide resolved
@hongliangl
Copy link
Contributor Author

/test-windows-proxyall-e2e
/test-windows-e2e
/test-e2e

@hongliangl
Copy link
Contributor Author

/test-windows-proxyall-e2e
/test-windows-e2e

pkg/agent/route/route_windows.go Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
route := generateRoute(svcIPNet, gw, linkIndex, metric)
if err := util.RemoveNetRoute(route); err != nil {
if strings.Contains(err.Error(), "No matching MSFT_NetRoute objects") {
klog.InfoS("Failed to delete LoadBalancer ingress IP route since the route doesn't exist", "route", route)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this log since the final result is the route is not existing, which is as expected?

Copy link
Contributor Author

@hongliangl hongliangl Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to consistent with Linux code.

pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
@hongliangl
Copy link
Contributor Author

/test-windows-proxyall-e2e
/test-windows-e2e

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, otherwise LGTM

pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
Comment on lines 168 to 170
if reflect.DeepEqual(rt.DestinationSubnet, util.NewIPNet(c.nodeConfig.GatewayConfig.IPv4)) {
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there a bug causing this route to be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the comment #3889 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know why it is added. But was there a bug causing this route to be deleted? If yes, what's the consequence of missing this route?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was no bug causing this route to be deleted. The route could be only deleted manually by mistake. This is copied from Linux code, see

// These routes are installed automatically by the kernel when the address is configured on
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does listRoutes contain the route? If yes, does the checks in Reconcile exclude the route?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, add IP 192.168.2.1 to interface antrea-gw0, and the following routes will be installed automatically.

    192.168.2.0    255.255.255.0                  On-Link     192.168.2.1    281
    192.168.2.1 255.255.255.255            On-Link      192.168.2.1   281
    192.168.2.255  255.255.255.255           On-Link      192.168.2.1    281

In listRoutes, the broadcast route (the third one) will be ignored.

if reflect.DeepEqual(rt.DestinationSubnet, c.nodeConfig.PodIPv4CIDR) {
      continue
}

The code above is used to exclude the first route in Reconcile.

if reflect.DeepEqual(rt.DestinationSubnet, util.NewIPNet(c.nodeConfig.GatewayConfig.IPv4)) {
      continue
}

The code above is used to exclude the second route in Reconcile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of this, but it's not related to my question:

But was there a bug causing this route to be deleted? If yes, what's the consequence of missing this route?

Not sure how to further explain. The check is newly added, right? Then how did it work previously if the route was removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is newly added, right?

Yes, the check is newly added.

Then how did it work previously if the route was removed?

If I remember correctly, if the route is removed, it will not be restored.

pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
@hongliangl
Copy link
Contributor Author

/test-all
/test-windows-all

// 10.10.0.0 255.255.255.0 On-link 10.10.0.1 281
// 10.10.0.255 255.255.255.255 On-link 10.10.0.1 281
// The host (10.10.0.1) and broadcast (10.10.0.255) routes should be ignored since they are unnecessary and redundant.
// We can ignore them by comparing them to the calculated broadcast IP.
if !rt.GatewayAddress.Equal(config.VirtualServiceIPv4) && rt.DestinationSubnet.IP.Equal(iputil.GetLocalBroadcastIP(rt.DestinationSubnet)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host route is ignored here since the broadcast IP of the destination IP in host route is the destination itself. @tnqn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then should we remove the new check added here https://github.com/antrea-io/antrea/pull/3889/files/720023a9ed444a790842d8a62db295f70161d315#r1153399542? otherwise the code is contradictory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should remove the check since the route it should match will not be collected in Antrea.

// 10.10.0.0 255.255.255.0 On-link 10.10.0.1 281
// 10.10.0.255 255.255.255.255 On-link 10.10.0.1 281
// The host (10.10.0.1) and broadcast (10.10.0.255) routes should be ignored since they are unnecessary and redundant.
// We can ignore them by comparing them to the calculated broadcast IP.
if !rt.GatewayAddress.Equal(config.VirtualServiceIPv4) && rt.DestinationSubnet.IP.Equal(iputil.GetLocalBroadcastIP(rt.DestinationSubnet)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then should we remove the new check added here https://github.com/antrea-io/antrea/pull/3889/files/720023a9ed444a790842d8a62db295f70161d315#r1153399542? otherwise the code is contradictory.

pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
This PR unifies the functions in route_linux.go and route_windows.go:
  - AddLoadBalancer
  - DeleteLoadBalancer

Remove:
  - DeleteClusterIPRoute
  - AddClusterIPRoute

Fix:
  - Add the route for calculated ClusterIP CIDR to Service route cache,
    then the route can be restored when it was deleted since members of
    the route cache are synchronized periodically.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Mar 31, 2023

/test-all
/test-windows-all

@hongliangl
Copy link
Contributor Author

/test-windows-networkpolicy
/test-windows-conformance
/test-windows-networkpolicy

@hongliangl
Copy link
Contributor Author

/test-windows-containerd-networkpolicy

@hongliangl
Copy link
Contributor Author

/test-windows-conformance

@hongliangl
Copy link
Contributor Author

/test-windows-containerd-networkpolicy

@jianjuns jianjuns merged commit 6143e78 into antrea-io:main Apr 4, 2023
XinShuYang added a commit to XinShuYang/antrea that referenced this pull request Apr 5, 2023
jainpulkit22 pushed a commit to urharshitha/antrea that referenced this pull request Apr 28, 2023
This commit unifies the following functions in route_linux.go and
route_windows.go:
  - AddLoadBalancer
  - DeleteLoadBalancer

Removes:
  - DeleteClusterIPRoute
  - AddClusterIPRoute

Adds a fix to restore the ClusterIP CIDR route after it is deleted, by saving
the route to the Service route cache which will be re-sync'd periodically.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
@tnqn tnqn mentioned this pull request May 24, 2023
@hongliangl hongliangl deleted the unify-route branch December 5, 2023 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. area/OS/windows Issues or PRs related to the Windows operating system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants