Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep the cached Endpoints the same as those installed in OVS #4691

Merged
merged 1 commit into from
Mar 16, 2023

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Mar 8, 2023

The main purpose of this PR is to avoid potential inconsistencies between
the cached Endpoints and those installed in OVS, like #4681, #4692.

This PR also updates:

  • Method UninstallEndpointFlows of ofClient, support deleting flows of
    multiple Endpoints.
  • Remove possible groups when a Service is deleted.
  • Log something when a group for a Service is not created.
  • Optimize and unify log.

Signed-off-by: Hongliang Liu lhongliang@vmware.com

@hongliangl hongliangl added the area/proxy Issues or PRs related to proxy functions in Antrea label Mar 8, 2023
@hongliangl hongliangl force-pushed the 20230308-fix-proxy branch 2 times, most recently from 7ed1fb7 to 2a1a6db Compare March 8, 2023 17:13
pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
// Remove NodePort flows and configurations.
if p.proxyAll && svcInfo.NodePort() > 0 {
if err := p.uninstallNodePortService(uint16(svcInfo.NodePort()), svcInfo.OFProtocol); err != nil {
klog.ErrorS(err, "Failed to remove flows and configurations of Service", "Service", svcPortName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error log sounds quite generic, given that this case is specific to NodePort Services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with

klog.ErrorS(err, "Error when uninstalling NodePort flows and configurations for Service", "ServiceName", svcPortName, "ServiceInfo", svcInfoStr)

continue
}
p.groupCounter.Recycle(svcPortName, true)
// Remove Service group which has only local Endpoints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the svcInfo.ExternalPolicyLocal() condition was removed, yet the commend still mentions "local Endpoints". Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when svcInfo.InternalPolicyLocal() is true, the group has only local Endpoints too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there was a bug regarding this? a group may never be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there was a bug regarding this? a group may never be deleted?

For a ClusterIP, when svcInfo.InternalPolicyLocal() is true, a group may never be delete.

updatedEndpointsInstalled := map[string]k8sproxy.Endpoint{}
for _, endpoint := range allReachableEndpoints {
updatedEndpointsInstalled[endpoint.String()] = endpoint
// If the Endpoint is not newly installed, remove it from stale endpoints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"remove it from stale endpoints": that doesn't sound correct. You are removing it from endpointsInstalled, which doesn't sound like a list of stale endpoints

Copy link
Contributor Author

@hongliangl hongliangl Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rephrased the comments here.

if mcsLocalService != nil {
allReachableEndpoints = append(allReachableEndpoints, mcsLocalService.Endpoint)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this function (installServices) is getting very long. Is there a way to extract some functionality into helper functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is quite long for this function, I thought of extracting the code under if needUpdateEndpoints and if needUpdateService into functions, but I am not sure if this is an appropriate idea.

continue
}
p.groupCounter.Recycle(svcPortName, true)
// Remove Service group which has only local Endpoints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there was a bug regarding this? a group may never be deleted?

pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
@tnqn tnqn added this to the Antrea v1.11 release milestone Mar 10, 2023
@hongliangl hongliangl force-pushed the 20230308-fix-proxy branch 2 times, most recently from 597b534 to 5ed391b Compare March 14, 2023 13:40
pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/proxy/proxier.go Show resolved Hide resolved
pkg/agent/proxy/proxier.go Show resolved Hide resolved
pkg/agent/proxy/proxier.go Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20230308-fix-proxy branch 2 times, most recently from 54f15f9 to 6ab1d39 Compare March 14, 2023 17:16
pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20230308-fix-proxy branch 3 times, most recently from fcde930 to b053114 Compare March 15, 2023 04:41
pkg/agent/proxy/proxier.go Show resolved Hide resolved
pkg/agent/proxy/proxier.go Show resolved Hide resolved
continue
}

// If previous Service is nil or NodePort flows and configurations of previous Service have been removed,
// install NodePort flows and configurations for current Service.
if svcInfo.NodePort() > 0 && (pSvcInfo == nil || needRemoval) {
if err := p.installNodePortService(nGroupID, uint16(svcInfo.NodePort()), svcInfo.OFProtocol, uint16(affinityTimeout), svcInfo.ExternalPolicyLocal()); err != nil {
klog.ErrorS(err, "Failed to install NodePort flows and configurations of Service", "Service", svcPortName)
groupID, exists = p.groupCounter.Get(svcPortName, externalPolicyLocal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why do you expect the OF group for the NodePort traffic to share the same identifier as the ClusterIP traffic for a Service? In the previous implementation, here calls p.groupCounter.AllocateIfNotExist to get or create an OF group ID for the new configured NodePort scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous implementation, here calls p.groupCounter.AllocateIfNotExist to get or create an OF group ID for the new configured NodePort scenario.

We should use method Get here to get a OVS group ID, otherwise, there is a chance that we allocate a group ID and pass it to InstallServiceGroup and the corresponding group was not created in OVS.

if err := p.ofClient.UninstallServiceFlows(svcInfo.ClusterIP(), uint16(svcInfo.Port()), svcInfo.OFProtocol); err != nil {
klog.ErrorS(err, "Failed to remove flows of Service", "Service", svcPortName)
klog.ErrorS(err, "Error when uninstalling ClusterIP flows for Service", "ServiceName", svcPortName, "ServiceInfo", svcInfoStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key "ServiceName" in such logs should be "servicePortName"?
BTW, the keys in the logs should not start with a capital according to the logs in other package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key "ServiceName" in such logs should be "servicePortName"?

I'll change that.

BTW, the keys in the logs should not start with a capital according to the logs in other package?

I thought that Service in ServiceName means Kubernetes Service, and I capitalized them all.

The main purpose of this PR is to avoid potential inconsistencies between
the cached Endpoints and those installed in OVS, like antrea-io#4681, antrea-io#4692.

This PR also updates:

- Method UninstallEndpointFlows of ofClient, support deleting flows of
  multiple Endpoints.
- Remove possible groups when a Service is deleted.
- Log something when a group for a Service is not created.
- Optimize and unify log.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hongliangl
Copy link
Contributor Author

Thanks for reviewing and approving @tnqn @wenyingd

@hongliangl
Copy link
Contributor Author

/test-all

@tnqn
Copy link
Member

tnqn commented Mar 16, 2023

@wenyingd do you have other comments?

@wenyingd
Copy link
Contributor

@wenyingd do you have other comments?

No other comments

@tnqn tnqn merged commit 4a80363 into antrea-io:main Mar 16, 2023
@hongliangl hongliangl deleted the 20230308-fix-proxy branch April 4, 2023 01:05
jainpulkit22 pushed a commit to urharshitha/antrea that referenced this pull request Apr 28, 2023
…io#4691)

The main purpose of this PR is to avoid potential inconsistencies between
the cached Endpoints and those installed in OVS, like antrea-io#4681, antrea-io#4692.

This PR also updates:

- Method UninstallEndpointFlows of ofClient, support deleting flows of
  multiple Endpoints.
- Remove possible groups when a Service is deleted.
- Log something when a group for a Service is not created.
- Optimize and unify log.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy Issues or PRs related to proxy functions in Antrea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants