You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// if stateless CNI fail to get the endpoint from CNS for any reason other than Endpoint Not found or CNS connection failure
1079
-
// return a retriable error so the container runtime will retry this DEL later
1080
-
// the implementation of this function returns nil if the endpoint doesn't exist, so
1081
-
// we don't have to check that here
1082
-
iferr!=nil {
1083
-
switch {
1084
-
caseerrors.Is(err, network.ErrConnectionFailure):
1085
-
logger.Error("Failed to connect to CNS", zap.Error(err))
1086
-
logger.Info("Endpoint will be deleted from state file asynchronously", zap.String("containerID", args.ContainerID))
1087
-
// In SwiftV2 Linux stateless CNI mode, if the plugin cannot connect to CNS,
1088
-
// we asynchronously remove the secondary (delegated) interface from the pod’s network namespace in the absence of the endpoint state.
1089
-
// This is necessary because leaving the delegated NIC in the pod netns can cause the kernel to block rtnetlink operations.
1090
-
// When that happens, kubelet and containerd hang during sandbox creation or teardown.
1091
-
// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace,
1092
-
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state.
1093
-
// This workaround mitigates the issue by removing the secondary NIC from the pod netns when CNS is unreachable during DEL to provide the endpoint state.
returnplugin.RetriableError(fmt.Errorf("failed to retrieve endpoint: %w", err))
1112
1076
}
1113
-
1114
-
// for Stateful CNI when the endpoint is not created, but the ips are already allocated (only works if single network, single infra)
1115
-
// this block is applied to stateless CNI only if there was a connection failure in previous block and asynchronous delete by CNS will remover the endpoint from state file
1077
+
// when the endpoint is not created, but the ips are already allocated (only works if single network, single infra)
returnplugin.RetriableError(fmt.Errorf("failed to release address: %w", err))
1118
+
// This is an special case for stateless CNI when Asynchronous DEL to CNS will take place for SwiftV2 after the endpointinfo is recreated locally
1119
+
// At this point the endpoint is already deleted and since it is created locally the IPAddress is nil. CNS will release the IP asynchronously whenever it is up
// GetEndpointInfos gets all endpoint infos associated with a container id and networkID
889
+
// In stateless CNI mode, it calls CNS GetEndpointState API to get the endpoint infos or genreate them locally if CNS is unreachable in SwiftV2 mode and AsyncDelete is enabled
890
+
// In stateful CNI mode, it fetches the endpoint infos by calling GetEndpointInfosFromContainerID
logger.Info("Failed to connect to CNS, endpoint will be deleted from state file asynchronously", zap.String("containerID", args.ContainerID))
900
+
// In SwiftV2 Linux stateless CNI mode, if the plugin cannot connect to CNS,
901
+
// we still have to remove the secondary (delegated) interface from the pod’s network namespace in the absence of the endpoint state.
902
+
// This is necessary because leaving the delegated NIC in the pod netns can cause the kernel to block rtnetlink operations.
903
+
// When that happens, kubelet and containerd hang during sandbox creation or teardown.
904
+
// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace,
905
+
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state.
906
+
// This workaround mitigates the issue by generating a minimal endpointInfo via containerd args and netlink APIs that can be then passed to DeleteEndpoint API.
907
+
epInfos, err=nm.generateEndpointLocally(args)
908
+
iferr!=nil {
909
+
logger.Error("Failed to fetch secondary endpoint from pod netns", zap.String("netns", args.Netns), zap.Error(err))
910
+
returnnil, errors.Wrap(err, "failed to fetch secondary interfaces")
911
+
}
912
+
caseerrors.Is(err, ErrEndpointStateNotFound):
913
+
logger.Info("Endpoint Not found", zap.String("containerID", args.ContainerID), zap.Error(err))
914
+
returnnil, nil
915
+
default:
916
+
logger.Error("Get Endpoint State API returned error", zap.String("containerID", args.ContainerID), zap.Error(err))
917
+
returnnil, ErrEndpointRetrievalFailure
918
+
}
919
+
}
920
+
for_, epInfo:=rangeepInfos {
921
+
logger.Info("Found endpoint to delete", zap.String("IfName", epInfo.IfName), zap.String("EndpointID", epInfo.EndpointID), zap.Any("NICType", epInfo.NICType))
0 commit comments