Skip to content

Conversation

@behzad-mir
Copy link
Contributor

@behzad-mir behzad-mir commented Aug 27, 2025

A number of changes is made to stateless CNI to fully support SWiftV2 in Linux:

  1. For Stateless CNI, the EndpointID should include ifName when the NicType is Delegated. This ensures it can be distinguished from the InfraNIC endpoint. The reason for this behavior is that Stateless CNI uses only the ContainerID as the endpoint ID when saving the state file in CNS, to maintain consistency with Cilium. However, within CNI’s in-memory representation, the EndpointID must uniquely identify each NIC. As a result:
    • We always pass the ContainerID to CNS as the endpoint state key.
    • Internally, CNI uses EndpointID (which includes ifName for delegated NICs) to differentiate endpoints.
  2. Delete flow has been revised and NetNSPath has been added to the statefile since it is needed by the TransparentClient for Frontend NIC.
  3. The Transparent mode used by statefull CNI for SWiftV1 and V2 and stateless CNI should follow the same. TransparentVlan which is the original value seems to be a mistake.
  4. Async delete file creation in Stateless CNI has been moved to CNS_invoker to be consistent with Stateful CNI.
  5. Removing secondary interface from pod namespace in SwiftV2 Linux in case of asynchronous delete. This is needed to be covered for stateless CNI since if CNS is down, CNI does not have the secondary interface name. A dedicated function has been added to list the interface in pod namespace and remove the secondary.

For validating the scenario ADD/Delete calls have been issues on SwiftV2 cluster and and logs and satefile has been analyzed to make sure it is consistent with Stateful CNI and also nothing gets leaked.

Requirements:

Notes:

Copilot AI review requested due to automatic review settings August 27, 2025 07:19
@behzad-mir behzad-mir requested review from a team as code owners August 27, 2025 07:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues with Stateless CNI delete operations in SwiftV2 scenarios by modifying endpoint ID generation and improving the delete flow. The changes ensure proper distinction between different NIC types and provide necessary context for transparent client operations.

  • Modifies GetEndpointID to accept a NICType parameter and append interface name for delegated NICs
  • Updates delete flow to use proper network manager clients and adds NetNsPath to state information
  • Adds NetworkNameSpace field to CNS REST server structures for frontend NIC support

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
network/manager.go Core logic changes for endpoint ID generation and delete flow improvements
network/manager_mock.go Mock implementation updated to match new GetEndpointID signature
cns/restserver/restserver.go Added NetworkNameSpace field to IPInfo struct
cns/restserver/ipam.go Updated validation and state management for NetworkNameSpace field
cni/network/network.go Updated callers to pass NICType parameter to GetEndpointID

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@santhoshmprabhu santhoshmprabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of context on the need for these changes. The changes look fine, but we should improve the PR a bit -

  1. Update the description to cover why the opMode change
  2. Address other copilot comments (one about NICType in particular seems serious)
  3. Add a description of what validation steps have been carried out. Include screenshots/logs if necessary.
  4. Add tests.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@QxBytes QxBytes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preliminary suggestions-- could you also add how you tested and what happens without each of these changes?

@behzad-mir behzad-mir force-pushed the swiftv2-stateless branch 5 times, most recently from 4b7c752 to 5c01db3 Compare September 25, 2025 17:30
@behzad-mir behzad-mir force-pushed the swiftv2-stateless branch 4 times, most recently from 6b96a7c to 8584493 Compare October 6, 2025 18:22
Copy link
Contributor

@QxBytes QxBytes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm reading this it seems like a lot of the code is geared around accessing FetchInterfacesFromNetnsPath in secondary_endpoint_client_linux.go, requiring us to drill down through cni/network/network.go > network/manager.go > network/endpoint_linux.go > secondary_endpoint_client_linux.go, adding methods to interfaces which are only specific to swiftv2/secondary endpoint client.

What is it from secondary endpoint client that we actually need? At GetEndpoint() > nm.generateEndpointLocally, there is no guarantee we are even in swiftv2 right? Why can't we just make a helper function (doesn't even need to be part of a client) where generateEndpointLocally is called that does the following:

  1. Enters the pod's netns
  2. Searches for any swiftv2/secondary interfaces if they exist
  3. Creates endpoint infos and returns them if found
    Instead of going through the trouble of creating a secondary endpoint client etc.

// This is an special case for stateless CNI when Asychronous DEL to CNS will take place
// At this point the endpoint is already deleted in previous block and CNS will release the IP whenever it is up
if epInfo.IPAddresses == nil && plugin.nm.IsStatelessCNIMode() {
logger.Warn("Release ip Asynchronously by CNS",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why releaseip only for stateless cni and not for stateful cni case? current code calling release for both cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we generate the ednpoint locally for SwiftV2 stateless when CNS is not up, then we will get to this block at the end to release the IP. The datapath already has been cleand up and we just do a Asynch delete via IPAMInvoker.Delete.

Comment on lines 125 to 127
GetEndpoint(networkID string, args *cniSkel.CmdArgs) ([]*EndpointInfo, error)
GetEndpointInfosFromContainerID(containerID string) []*EndpointInfo
GetEndpointState(networkID, containerID string) ([]*EndpointInfo, error)
GetEndpointState(networkID, containerID, netns string) ([]*EndpointInfo, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need 2 apis? can we merge to 1 api, otherwise this is confusing and not sure about purpose of 2 apis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We previous;y doscussed that, and you suggested to have two APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetEndpointInfos will be a genrall API for both Statefull and Stateless and then GetEndpointState is just for stateless CNI

// GetEndpointIDByNicType returns a unique endpoint ID based on the CNI mode and NIC type.
func (nm *networkManager) GetEndpointIDByNicType(containerID, ifName string, nicType cns.NICType) string {
// For stateless CNI, secondary NICs use containerID-ifName as endpointID.
if nm.IsStatelessCNIMode() && nicType != cns.InfraNIC {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for stateful cni, this is not an issue? what's the impact if we remove statelesscnimode check here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make sure we only add - ifname for stateless. Stateful already has this.

}
return epInfos, nil
}
return nm.GetEndpointInfosFromContainerID(args.ContainerID), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this api is called only for stateful cni, can we keep in else to avoid any error in future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will cause a linter error, so I added a comment instead.

@github-actions
Copy link

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale Stale due to inactivity. label Oct 22, 2025
@github-actions
Copy link

Pull request closed due to inactivity.

@github-actions github-actions bot closed this Oct 29, 2025
@github-actions github-actions bot deleted the swiftv2-stateless branch October 29, 2025 00:01
@behzad-mir behzad-mir restored the swiftv2-stateless branch November 14, 2025 18:10
@behzad-mir behzad-mir reopened this Nov 14, 2025
@behzad-mir behzad-mir force-pushed the swiftv2-stateless branch 2 times, most recently from 89c39e4 to 38ada84 Compare November 14, 2025 23:35
@github-actions github-actions bot removed the stale Stale due to inactivity. label Nov 15, 2025
@behzad-mir behzad-mir force-pushed the swiftv2-stateless branch 2 times, most recently from e0bea0b to ece1a70 Compare November 17, 2025 06:02
@behzad-mir behzad-mir force-pushed the swiftv2-stateless branch 5 times, most recently from f9af850 to 3ca88d5 Compare November 18, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants