-
Notifications
You must be signed in to change notification settings - Fork 260
fix: fixing Stateless CNI delete in SwiftV2 scenario #3967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes issues with Stateless CNI delete operations in SwiftV2 scenarios by modifying endpoint ID generation and improving the delete flow. The changes ensure proper distinction between different NIC types and provide necessary context for transparent client operations.
- Modifies
GetEndpointIDto accept aNICTypeparameter and append interface name for delegated NICs - Updates delete flow to use proper network manager clients and adds
NetNsPathto state information - Adds
NetworkNameSpacefield to CNS REST server structures for frontend NIC support
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| network/manager.go | Core logic changes for endpoint ID generation and delete flow improvements |
| network/manager_mock.go | Mock implementation updated to match new GetEndpointID signature |
| cns/restserver/restserver.go | Added NetworkNameSpace field to IPInfo struct |
| cns/restserver/ipam.go | Updated validation and state management for NetworkNameSpace field |
| cni/network/network.go | Updated callers to pass NICType parameter to GetEndpointID |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
santhoshmprabhu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a lot of context on the need for these changes. The changes look fine, but we should improve the PR a bit -
- Update the description to cover why the opMode change
- Address other copilot comments (one about NICType in particular seems serious)
- Add a description of what validation steps have been carried out. Include screenshots/logs if necessary.
- Add tests.
52b1ed1 to
a6b378e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
QxBytes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preliminary suggestions-- could you also add how you tested and what happens without each of these changes?
4b7c752 to
5c01db3
Compare
6b96a7c to
8584493
Compare
QxBytes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I'm reading this it seems like a lot of the code is geared around accessing FetchInterfacesFromNetnsPath in secondary_endpoint_client_linux.go, requiring us to drill down through cni/network/network.go > network/manager.go > network/endpoint_linux.go > secondary_endpoint_client_linux.go, adding methods to interfaces which are only specific to swiftv2/secondary endpoint client.
What is it from secondary endpoint client that we actually need? At GetEndpoint() > nm.generateEndpointLocally, there is no guarantee we are even in swiftv2 right? Why can't we just make a helper function (doesn't even need to be part of a client) where generateEndpointLocally is called that does the following:
- Enters the pod's netns
- Searches for any swiftv2/secondary interfaces if they exist
- Creates endpoint infos and returns them if found
Instead of going through the trouble of creating a secondary endpoint client etc.
| // This is an special case for stateless CNI when Asychronous DEL to CNS will take place | ||
| // At this point the endpoint is already deleted in previous block and CNS will release the IP whenever it is up | ||
| if epInfo.IPAddresses == nil && plugin.nm.IsStatelessCNIMode() { | ||
| logger.Warn("Release ip Asynchronously by CNS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why releaseip only for stateless cni and not for stateful cni case? current code calling release for both cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we generate the ednpoint locally for SwiftV2 stateless when CNS is not up, then we will get to this block at the end to release the IP. The datapath already has been cleand up and we just do a Asynch delete via IPAMInvoker.Delete.
network/manager.go
Outdated
| GetEndpoint(networkID string, args *cniSkel.CmdArgs) ([]*EndpointInfo, error) | ||
| GetEndpointInfosFromContainerID(containerID string) []*EndpointInfo | ||
| GetEndpointState(networkID, containerID string) ([]*EndpointInfo, error) | ||
| GetEndpointState(networkID, containerID, netns string) ([]*EndpointInfo, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need 2 apis? can we merge to 1 api, otherwise this is confusing and not sure about purpose of 2 apis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We previous;y doscussed that, and you suggested to have two APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetEndpointInfos will be a genrall API for both Statefull and Stateless and then GetEndpointState is just for stateless CNI
| // GetEndpointIDByNicType returns a unique endpoint ID based on the CNI mode and NIC type. | ||
| func (nm *networkManager) GetEndpointIDByNicType(containerID, ifName string, nicType cns.NICType) string { | ||
| // For stateless CNI, secondary NICs use containerID-ifName as endpointID. | ||
| if nm.IsStatelessCNIMode() && nicType != cns.InfraNIC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for stateful cni, this is not an issue? what's the impact if we remove statelesscnimode check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make sure we only add - ifname for stateless. Stateful already has this.
| } | ||
| return epInfos, nil | ||
| } | ||
| return nm.GetEndpointInfosFromContainerID(args.ContainerID), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this api is called only for stateful cni, can we keep in else to avoid any error in future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will cause a linter error, so I added a comment instead.
|
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
|
Pull request closed due to inactivity. |
89c39e4 to
38ada84
Compare
e0bea0b to
ece1a70
Compare
f9af850 to
3ca88d5
Compare
3ca88d5 to
65b5530
Compare
A number of changes is made to stateless CNI to fully support SWiftV2 in Linux:
For validating the scenario ADD/Delete calls have been issues on SwiftV2 cluster and and logs and satefile has been analyzed to make sure it is consistent with Stateful CNI and also nothing gets leaked.
Requirements:
Notes: