Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for endpoint slices #5745

Merged
merged 5 commits into from
Oct 24, 2023

Conversation

clayton-gonsalves
Copy link
Contributor

This PR adds support for Kubernetes endpoint slices.

Fixes #2696

@clayton-gonsalves clayton-gonsalves requested a review from a team as a code owner September 5, 2023 12:39
@clayton-gonsalves clayton-gonsalves requested review from tsaarni, stevesloka, skriss and sunjayBhatia and removed request for a team, tsaarni and stevesloka September 5, 2023 12:39
@clayton-gonsalves clayton-gonsalves added the release-note/minor A minor change that needs about a paragraph of explanation in the release notes. label Sep 5, 2023
@clayton-gonsalves clayton-gonsalves self-assigned this Sep 5, 2023
@codecov
Copy link

codecov bot commented Sep 5, 2023

Codecov Report

Merging #5745 (362a5ca) into main (8f6aff1) will increase coverage by 0.03%.
Report is 9 commits behind head on main.
The diff coverage is 77.66%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5745      +/-   ##
==========================================
+ Coverage   78.51%   78.55%   +0.03%     
==========================================
  Files         138      139       +1     
  Lines       19291    19602     +311     
==========================================
+ Hits        15146    15398     +252     
- Misses       3853     3901      +48     
- Partials      292      303      +11     
Files Coverage Δ
cmd/contour/servecontext.go 83.63% <100.00%> (+0.07%) ⬆️
...ovisioner/objects/rbac/clusterrole/cluster_role.go 80.00% <100.00%> (+1.05%) ⬆️
pkg/config/parameters.go 85.93% <ø> (+0.13%) ⬆️
internal/xdscache/v3/endpointstranslator.go 88.64% <88.88%> (+0.40%) ⬆️
cmd/contour/serve.go 19.55% <0.00%> (-0.28%) ⬇️
internal/xdscache/v3/endpointslicetranslator.go 82.90% <82.90%> (ø)

... and 4 files with indirect coverage changes

Copy link
Contributor

@davinci26 davinci26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one question and one test request

var healthCheckPort int32

for _, enpointSlice := range endpointSliceMap {
sort.Slice(enpointSlice.Endpoints, func(i, j int) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to sort here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from a carry over from the endpoint slice implementation. I think we can get rid of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's desirable to sort, to produce stable results across runs. I forget the exact details but impacts performance and/or load balancing algorithms. So I'd say let's leave this in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to call out, that here the sorting happens only per slice, since we store all the slices in a map it's not possible to sort the combination of all the endpoints.

},
}

f.CreateHTTPProxyAndWaitFor(p, e2e.HTTPProxyValid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a case where we scale and we also have not ready pods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be hard to achieve a case where we can deterministically say during a test if the endpoints are ready or not. sure, w could remove the sleep function to catch some unready pods but that might lead to flakey tests

@clayton-gonsalves clayton-gonsalves force-pushed the use-endpoint-slices branch 4 times, most recently from ff1ed4f to 03a4e3d Compare September 19, 2023 09:47
@clayton-gonsalves
Copy link
Contributor Author

@skriss @sunjayBhatia this is ready for the first pass

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @clayton-gonsalves, this is looking pretty good - first pass just getting a few nits out of the way. Still looking into a few of the details

changelogs/unreleased/5745-clayton-gonsalves-minor.md Outdated Show resolved Hide resolved
internal/xdscache/v3/endpointslicetranslator.go Outdated Show resolved Hide resolved
internal/xdscache/v3/endpointslicetranslator.go Outdated Show resolved Hide resolved
internal/xdscache/v3/endpointslicetranslator.go Outdated Show resolved Hide resolved
internal/xdscache/v3/endpointslicetranslator.go Outdated Show resolved Hide resolved
internal/xdscache/v3/endpointslicetranslator.go Outdated Show resolved Hide resolved
internal/xdscache/v3/endpointslicetranslator.go Outdated Show resolved Hide resolved
internal/xdscache/v3/endpointslicetranslator.go Outdated Show resolved Hide resolved
internal/xdscache/v3/endpointslicetranslator.go Outdated Show resolved Hide resolved
cmd/contour/serve.go Outdated Show resolved Hide resolved
cmd/contour/serve.go Outdated Show resolved Hide resolved
Comment on lines 88 to 91
// useEndpointSlice configures contour to fetch endpoint data
// from k8s endpoint slices. defaults to false and reading endpoint
// data from the k8s endpoints.
UseEndpointSlice bool `json:"useEndpointSlice,omitempty"`
Copy link
Member

@skriss skriss Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested to hear what the other maintainers think about adding a more generic FeatureFlags field instead. It could be used for other things, and deprecating a value for it should be simpler than deprecating an entire field.

cc @sunjayBhatia @tsaarni

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think thats a good idea, rather than polluting the config every time we want to add/deprecate a field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made the change in 1fd6eea but will also wait for feedback from the other maintainers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking a string slice, rather than a concrete field per feature, so that as features come and go it doesn't require any changes to the type itself, just to the values that are accepted in the field. But, would like to get a second opinion before making this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for string slice, similar in spirit to the --disable-feature flag which is more generic than adding a specific flag per thing we want to disable

I guess that naming makes it sound more generic than it is, it's really for disabling reconciling some optional extended support CRDs, i dunno if we should repurpose the existing flag, have to think about it a bit

Copy link
Member

@skriss skriss Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was looking at that flag as well for possible use, but since its semantics are opt-out rather than opt-in, it seemed hard to use here (would you opt out of using Endpoints which would result in opting into EndpointSlices? Seems unintuitive). I also don't quite remember why we went with a flag vs. a config file/ContourConfig field for that, will have to look back.

If we do just go ahead with a config file/ContourConfig field for opt-in features, then down the road we could always look at deprecating the --disable-feature flag and replacing with something more aligned with the opt-in feature flags. Or, alternately, I guess we could decide to not reconcile experimental GWAPI resources by default, and require them to be opted into (the opposite of what we have now). xref. #5801

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a string slice in ee4aa23

is this what y'all had in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me!

cmd/contour/serve.go Outdated Show resolved Hide resolved
@skriss
Copy link
Member

skriss commented Oct 3, 2023

Modulo the feature flags spec, this is looking good to me, no further comments at the moment

## Add Kubernetes Endpoint Slice support

This change optionally enables Contour to consume the kubernetes endpointslice API to determine the endpoints to configure Envoy with.
Note: This change is off by default and is gated by the feature flag `useEndpointSlice`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
Note: This change is off by default and is gated by the feature flag `useEndpointSlice`.
Note: This change is off by default and is gated by the feature flag `useEndpointSlices`.

@@ -84,8 +84,19 @@ type ContourConfigurationSpec struct {

// Tracing defines properties for exporting trace data to OpenTelemetry.
Tracing *TracingConfig `json:"tracing,omitempty"`

// FeatureFlags defines toggle to enable new contour features.
// available toggles are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// available toggles are
// Available toggles are:
// useEndpointSlices - Configures contour to fetch endpoint data
// from k8s EndpointSlices instead of the default behavior which reads
// endpoint data from Endpoint resources.

@skriss
Copy link
Member

skriss commented Oct 10, 2023

@clayton-gonsalves unfortunately looks like there are a couple small merge conflicts to resolve

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No further comments from me, ready for this to go once the merge conflicts and latest comments are resolved

Signed-off-by: Clayton Gonsalves <clayton.gonsalves@reddit.com>
Signed-off-by: Clayton Gonsalves <clayton.gonsalves@reddit.com>
Signed-off-by: Clayton Gonsalves <clayton.gonsalves@reddit.com>
Copy link
Member

@tsaarni tsaarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the featureFlags also in Contour config file documentation here site/content/docs/main/configuration.md? It often tends to get out-of-sync and the naming for attributes is not very consistent (dash vs camelcase).

There is also example config in the same doc page, and same example config is in deployment manifest examples/contour/01-contour-config.yaml.

Signed-off-by: Clayton Gonsalves <clayton.gonsalves@reddit.com>
@clayton-gonsalves
Copy link
Contributor Author

Could you add the featureFlags also in Contour config file documentation here site/content/docs/main/configuration.md? It often tends to get out-of-sync and the naming for attributes is not very consistent (dash vs camelcase).

There is also example config in the same doc page, and same example config is in deployment manifest examples/contour/01-contour-config.yaml.

I added it to the `configuration.md, I would prefer keeping the example configs as is because this feature is still opt in. I don't want folks copy pasting the config and enabling something by default.

@clayton-gonsalves
Copy link
Contributor Author

@skriss @sunjayBhatia I think all comments are resolved. is this good to go?

Copy link
Member

@tsaarni tsaarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just couple more questions from me, nothing important though!

Comment on lines 78 to 80
// we can safely take the first element here.
// Refer to: https://issue.k8s.io/106267
addr := envoy_v3.SocketAddress(endpoint.Addresses[0], int(*p.Port))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the linked issue is not super trivial at a glance, maybe we could just refer to the description of kubernetes endpoint.Addresses where this assumption is also stated, including even the issue link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +46 to +48
sort.Slice(endpointSlice.Endpoints, func(i, j int) bool {
return endpointSlice.Endpoints[i].Addresses[0] < endpointSlice.Endpoints[j].Addresses[0]
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why do the addresses need to be sorted?

In the test there was also comment "addresses should be sorted". I know sorting was also done in endpointtranslator.go, where it likely found its way here as well, but it is not clear to me if sorting is necessary. Maybe the idea is just that the endpoints are potentially bit more stable over time that way? Also, the addresses are sorted as string, which is not the most logical order to sort IP addresses 😄.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a discussion around it here,.
I tried to follow the pattern from the endpoint implementation.

// endpointSliceMap may be nil, in which case, the result is also nil.
func (c *EndpointSliceCache) RecalculateEndpoints(port, healthPort v1.ServicePort, endpointSliceMap map[string]*discoveryv1.EndpointSlice) []*LoadBalancingEndpoint {
var lb []*LoadBalancingEndpoint
uniqueEndpoints := make(map[string]struct{}, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why map of structs as a "flag" to keep track of unique keys, instead of map of bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It;s out of habit but I reason that a map[T]struct{} is intuitive as a set compared to a map[T]bool. A map[T]bool could have an element as false which doesn't make much sense and means you need to have an if check to be sure.

If I am not mistaken, an empty struct also doesn't allocate memory. although at the scale we are using it, it won't make that big of a difference. If there is a precedence to use map[T]bool in this project I am happy to change it.

Signed-off-by: Clayton Gonsalves <clayton.gonsalves@reddit.com>
@skriss
Copy link
Member

skriss commented Oct 19, 2023

@tsaarni @sunjayBhatia anything else blocking here? I'd definitely like to get this into the upcoming 1.27 RC. Can always address any remaining non-blocking items between RC and GA if needed.

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will get this into the upcoming 1.27 RC

@sunjayBhatia
Copy link
Member

@tsaarni @sunjayBhatia anything else blocking here? I'd definitely like to get this into the upcoming 1.27 RC. Can always address any remaining non-blocking items between RC and GA if needed.

nothing blocking from me 👍🏽

@skriss skriss merged commit a216475 into projectcontour:main Oct 24, 2023
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor A minor change that needs about a paragraph of explanation in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EndpointSlice Support
5 participants