Skip to content

Commit

Permalink
Improve AntreaProxy route syncing on Windows (antrea-io#4941)
Browse files Browse the repository at this point in the history
This PR fixes the following issues:

1. AntreaAgent logs "Failed to sync route" when attempting to sync route entries
  every time.
2. AntreaAgent logs "Failed to install route for Service CIDR" err="failed to
  delete stale Service CIDR route" during startup.

For the first issue, previously, to recover the connected route of antrea-gw0
(assuming the IP address is 10.10.0.1/24) that may have been deleted by mistake,
a route with a destination 10.10.0.1/24 and gateway 10.10.0.1 was periodically
synced. However, this caused an error because an existing active route with the
same destination but a different gateway 0.0.0.0 should have already been
automatically installed when antrea-gw0 was created. To address this issue, this
PR changes the gateway of the recover route from 10.10.0.1 to 0.0.0.0, which
matches the existing installed route. This ensures that the periodic sync will
not cause any errors.

For the second issue, previously, when syncing the second ClusterIP, the stale
route entry installed for the first ClusterIP is added to the stale routes twice.
This results in the error log.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
  • Loading branch information
hongliangl authored and ceclinux committed May 30, 2023
1 parent b3a28d3 commit d6c5525
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 14 deletions.
35 changes: 23 additions & 12 deletions pkg/agent/route/route_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -285,23 +285,34 @@ func (c *Client) addServiceCIDRRoute(serviceCIDR *net.IPNet) error {
// a new route with a newly calculated destination CIDR has been installed.
if serviceCIDRRouteExists {
staleRoutes = append(staleRoutes, oldServiceCIDRRoute.(*util.Route))
}
routes, err := c.listIPRoutesOnGW()
if err != nil {
return fmt.Errorf("error listing ip routes: %w", err)
}
// Collect stale per-IP routes for ClusterIPs before this patch.
for _, rt := range routes {
ones, _ := rt.DestinationSubnet.Mask.Size()
if ones == net.IPv4len*8 && serviceCIDR.Contains(rt.DestinationSubnet.IP) {
staleRoutes = append(staleRoutes, rt)
} else {
routes, err := c.listIPRoutesOnGW()
if err != nil {
return fmt.Errorf("error listing ip routes: %w", err)
}
for _, rt := range routes {
if !rt.GatewayAddress.Equal(gw) {
continue
}
// It's the latest route we just installed.
if iputil.IPNetEqual(rt.DestinationSubnet, serviceCIDR) {
continue
}
// The route covers the desired route. It was installed when the calculated ServiceCIDR is larger than the current one, which could happen after some Services are deleted.
if iputil.IPNetContains(rt.DestinationSubnet, serviceCIDR) {
staleRoutes = append(staleRoutes, rt)
}
// The desired route covers the route. It was either installed when the calculated ServiceCIDR is smaller than the current one, or a per-IP route generated before v1.12.0.
if iputil.IPNetContains(serviceCIDR, rt.DestinationSubnet) {
staleRoutes = append(staleRoutes, rt)
}
}
}

// Remove stale routes.
for _, rt := range staleRoutes {
if err := util.RemoveNetRoute(rt); err != nil {
if err.Error() == "No matching MSFT_NetRoute objects" {
if strings.Contains(err.Error(), "No matching MSFT_NetRoute objects") {
klog.InfoS("Failed to delete stale Service CIDR route since the route has been deleted", "route", rt)
} else {
return fmt.Errorf("failed to delete stale Service CIDR route %s: %w", rt.String(), err)
Expand Down Expand Up @@ -383,7 +394,7 @@ func (c *Client) syncRoute() error {
gwAutoconfRoute := &util.Route{
LinkIndex: c.nodeConfig.GatewayConfig.LinkIndex,
DestinationSubnet: c.nodeConfig.PodIPv4CIDR,
GatewayAddress: c.nodeConfig.GatewayConfig.IPv4,
GatewayAddress: net.IPv4zero,
RouteMetric: util.MetricDefault,
}
restoreRoute(gwAutoconfRoute)
Expand Down
4 changes: 2 additions & 2 deletions pkg/agent/util/powershell/powershell_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ func RunCommand(cmd string) (string, error) {
// The try/catch command idea is from the following page:
// https://stackoverflow.com/questions/19282870/how-can-i-use-try-catch-and-get-my-script-to-stop-if-theres-an-error/19285405
psCmd := exec.Command("powershell.exe", "-NoLogo", "-NoProfile", "-NonInteractive", "-Command",
fmt.Sprintf(`$ErrorActionPreference="Stop";try {%s} catch {Write-Host $_;os.Exit(1)}`, cmd)) // #nosec G204
stdout, err := psCmd.Output()
fmt.Sprintf(`$ErrorActionPreference="Stop";try {%s} catch {Write-Host $_;Exit(1)}`, cmd)) // #nosec G204
stdout, err := psCmd.CombinedOutput()
stdoutStr := string(stdout)
if err != nil {
return "", fmt.Errorf("failed to run command '%s': output '%s', %v", cmd, stdoutStr, err)
Expand Down

0 comments on commit d6c5525

Please sign in to comment.