You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
logger.Info("add containerid file for Asynch delete", zap.String("containerID", args.ContainerID), zap.Error(addErr))
1079
-
ifaddErr!=nil {
1080
-
logger.Error("failed to add file to watcher", zap.String("containerID", args.ContainerID), zap.Error(addErr))
1081
-
returnerrors.Wrap(addErr, fmt.Sprintf("failed to add file to watcher with containerID %s", args.ContainerID))
1077
+
switch {
1078
+
caseerrors.Is(err, network.ErrConnectionFailure):
1079
+
logger.Error("Failed to connect to CNS", zap.Error(err))
1080
+
logger.Info("Endpoint will be deleted from state file asynchronously", zap.String("containerID", args.ContainerID))
1081
+
// In SwiftV2 Linux stateless CNI mode, if the plugin cannot connect to CNS,
1082
+
// we asynchronously remove the secondary (delegated) interface from the pod’s network namespace in the absence of the endpoint state.
1083
+
// This is necessary because leaving the delegated NIC in the pod netns can cause the kernel to block rtnetlink operations.
1084
+
// When that happens, kubelet and containerd hang during sandbox creation or teardown.
1085
+
// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace,
1086
+
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state.
1087
+
// This workaround mitigates the issue by removing the secondary NIC from the pod netns when CNS is unreachable during DEL to provide the endpoint state.
// for when the endpoint is not created, but the ips are already allocated (only works if single network, single infra)
1097
-
// this block is not applied to stateless CNI
1108
+
// for Stateful CNI when the endpoint is not created, but the ips are already allocated (only works if single network, single infra)
1109
+
// this block is applied to stateless CNI only if there was a connection failure in previous block and asynchronous delete by CNS will remover the endpoint from state file
0 commit comments