Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Service Traffic Distribution in Antrea Proxy #6604

Merged

Conversation

hongliangl
Copy link
Contributor

For #6592

ServiceTrafficDistribution enables Traffic Distribution for Services in Antrea Proxy. This
feature allows for more flexible and intelligent routing decisions by considering both
topology and non-topology factors. For more details, refer to this link.

Note: To activate this feature in Antrea Proxy, Kubernetes must be version 1.30 or higher, with
the ServiceTrafficDistribution feature gate (a Kubernetes-specific feature gate) enabled to add
Endpoint zone hints in the Kubernetes EndpointSlice controller. Without these prerequisites,
the feature may not function as intended in Antrea Proxy.

@hongliangl hongliangl added the action/release-note Indicates a PR that should be included in release notes. label Aug 9, 2024
@hongliangl hongliangl added this to the Antrea v2.2 release milestone Aug 9, 2024
@hongliangl hongliangl force-pushed the 20240808-support-svc-traffic-distribute branch 6 times, most recently from bde680b to 4a6d33d Compare August 9, 2024 07:12
if hintsAnnotation != "" && hintsAnnotation != "Disabled" && hintsAnnotation != "disabled" {
klog.InfoS("Skipping topology aware Endpoint filtering since Service has unexpected value", "annotationTopologyAwareHints", v1.DeprecatedAnnotationTopologyAwareHints, "hints", hintsAnnotation)

// Ignore the value of hintsAnnotation if the ServiceTrafficDistribution feature is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, I wonder why this code is necessary for correctness. It seems to me that the real consumer of service.kubernetes.io/topology-mode and .spec.trafficDistribution is the EndpointSlice controller, and not the Service proxy. @tnqn do you know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! The current code is wrong. I made #6607 to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas both EndpointSlice controller and Service proxy are the consumers of the configuration.

  • EndpointSlice controller adds hints to each endpoint when the feature is enabled.
  • Service proxy consumes hints of endpoints when the feature is enabled, instead of unconditionally using the hints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I get your point. You are talking about each Service's configuration, instead of the global configuration. I agree with you this looks duplicate, and inconsistent for service.kubernetes.io/topology-mode and .spec.trafficDistribution. Perhaps it's just because the feature is implemented by different authors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I find the inconsistency strange and that's also related to my other comment. This seems to be the existing logic:

  1. if both features are disabled, ignore the hints
  2. if the TrafficDistribution feature is disabled (hence, the TopologyAwareHints feature is enabled), and we don't see an annotation, ignore the hints
  3. otherwise, consume the hints

For consistency, IMO it should have been:

  1. Consume the hints if and only if at least one the of the following conditions is true
  • the TrafficDistribution feature is enabled and .spec.trafficDistribution is set
  • the TopologyAwareHints feature is enabled and we see an annotation

However, I only see the current implementation being an issue if Feature Gates are set inconsistently between the proxy and the EndpointSlice controller. And we should probably just follow what kube-proxy is doing here.

Copy link
Contributor Author

@hongliangl hongliangl Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the TrafficDistribution feature is enabled and .spec.trafficDistribution is set

One more thing, we need to bump k8s.io/api v0.29.2 (the version we use right now) to v0.30.x, than we can consume .spec.trafficDistribution in Antrea Proxy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have decided to keep the implementation consistent with kube-proxy, there is no need to consume .spec.trafficDistribution directly and we don't need to update K8s dependencies just yet.

klog.InfoS("Skipping topology aware Endpoint filtering since Service has unexpected value", "annotationTopologyAwareHints", v1.DeprecatedAnnotationTopologyAwareHints, "hints", hintsAnnotation)

// Ignore the value of hintsAnnotation if the ServiceTrafficDistribution feature is enabled.
if !p.serviceTrafficDistributionEnabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is what kube-proxy is doing, but it's confusing to me because serviceTrafficDistributionEnabled is a global setting (determined only by the feature gate). It seems that checking svc.Spec.TrafficDistribution would make more sense (specific to this Service). But maybe I am missing something and this is the desired / expected behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also confused by that svc.Spec.TrafficDistribution is not consumed by kube-proxy code, and it is consumed by EndpointSlice controller code. If I understood correctly, the serviceTrafficDistributionEnabled is a global setting to decide whether to check svc.Spec.TrafficDistribution for a Service and reconcile hints for the corresponding EndpointSlice of the Service. https://github.com/kubernetes/kubernetes/blob/6cca485fe71dc5fb0cb8004b98d7aec792e9d791/staging/src/k8s.io/endpointslice/trafficdist/trafficdist.go#L28

@antoninbas antoninbas requested a review from tnqn August 9, 2024 20:35
@hongliangl hongliangl force-pushed the 20240808-support-svc-traffic-distribute branch from 4a6d33d to b6f6fd0 Compare August 14, 2024 07:28
@@ -35,6 +35,7 @@ edit the Agent configuration in the
| `AntreaProxy` | Agent | `true` | GA | v0.8 | v0.11 | v1.14 | Yes | Must be enabled for Windows. |
| `EndpointSlice` | Agent | `true` | GA | v0.13.0 | v1.11 | v1.14 | Yes | |
| `TopologyAwareHints` | Agent | `true` | Beta | v1.8 | v1.12 | N/A | Yes | |
| `ServiceTrafficDistribution` | Agent | `true` | Alpha | v2.2 | N/A | N/A | Yes | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is true? If the change is straightfoward and we are confident in its quality and want it to be enabled by default, the stage should be Beta to be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I personally support going straight to Beta and enabling it by default.
K8s 1.31 was just released and the feature has been promoted already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, this should be false. I copied this from the line above it and forgot to modify the default value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongliangl It may still make sense to introduce this as Beta right away:

  • there is no requirement that feature gates always go through Alpha stage
  • code changes are small and mostly ported over from kube-proxy
  • this change should have no impact unless users set .spec.trafficDistribution
  • the feature gate is already Beta / enabled by default in the latest K8s version, as it only stayed in Alpha stage for one minor release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done as you said.

@hongliangl hongliangl force-pushed the 20240808-support-svc-traffic-distribute branch from b6f6fd0 to c8e76c0 Compare August 22, 2024 03:19
`ServiceTrafficDistribution` enables Traffic Distribution for Services in Antrea Proxy. This
feature allows for more flexible and intelligent routing decisions by considering both
topology and non-topology factors. For more details, refer to this [link](https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/4444-service-traffic-distribution).

Note: To activate this feature in Antrea Proxy, Kubernetes must be version 1.30 or higher, with
the `ServiceTrafficDistribution` feature gate (a Kubernetes-specific feature gate) enabled to add
Endpoint zone hints in the Kubernetes EndpointSlice controller. Without these prerequisites,
the feature may not function as intended in Antrea Proxy.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
@hongliangl hongliangl force-pushed the 20240808-support-svc-traffic-distribute branch from c8e76c0 to 19c6c43 Compare August 22, 2024 03:57
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Aug 22, 2024

/test-all

@antoninbas antoninbas merged commit b352435 into antrea-io:main Aug 22, 2024
52 of 58 checks passed
@hongliangl hongliangl deleted the 20240808-support-svc-traffic-distribute branch August 23, 2024 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants