Skip to content

Conversation

@arkodg
Copy link
Contributor

@arkodg arkodg commented Aug 5, 2025

  • sorts a collection of resources on gatewayClass - creationTimestamp then name
  • moves all per resource sort function to provider layer

This ensures all resources will be sorted, and duplicate updates wont result into watchable subscribe calls to runners. It also means per XDS IR is also ordered

This allows us to delete Equal() functions added for ProviderResources and XDS IR which called expensive DeepCopy and sort functions, and rely on the top level

  • the internal svcMap field is removed, because it was affecting caching and equality performance, it can be introduced back in the gateway-api layer as part of Translator context to improve config convergence time once we start accurately tracking it

Tested with 2000 HTTPRoutes and a aggressive cache sync period of 1m

Observations

  • Watchable subscriber counts go down to 0
  • CPU usage drops by 80%
  • Memory drops by 20%

Before

Screenshot 2025-08-04 at 9 08 02 PM Screenshot 2025-08-04 at 9 07 58 PM Screenshot 2025-08-04 at 9 07 54 PM -0f7e-4495-9fdb-9e44cbdf1f81" /> Screenshot 2025-08-04 at 9 07 48 PM

After

Screenshot 2025-08-04 at 9 39 50 PM Screenshot 2025-08-04 at 9 39 19 PM Screenshot 2025-08-04 at 9 39 01 PM Screenshot 2025-08-04 at 9 38 22 PM

Fixes: #6690

@arkodg arkodg requested a review from a team as a code owner August 5, 2025 00:37
@arkodg arkodg added this to the v1.5.0 Release milestone Aug 5, 2025
@codecov
Copy link

codecov bot commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 30.91787% with 143 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.06%. Comparing base (406537f) to head (6d1e08d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/resource/resource.go 25.13% 124 Missing and 19 partials ⚠️

❌ Your patch status has failed because the patch coverage (30.91%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6695      +/-   ##
==========================================
- Coverage   71.30%   71.06%   -0.24%     
==========================================
  Files         224      224              
  Lines       39558    39615      +57     
==========================================
- Hits        28207    28154      -53     
- Misses       9710     9799      +89     
- Partials     1641     1662      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines -437 to +453
name: target-httproute-with-attachment-conflict-truncated-ancestors
name: target-httproute-with-accepted-truncated-ancestors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't follow alphabetical order when createtime is same?

Copy link
Contributor Author

@arkodg arkodg Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no sorting in this layer, it's following test case index order, the sorting has now moved to the provider which is ensuring index order

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed something, is there a test to make sure the sort worked as expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah in internal/message/watchutil_test.go

@Xunzhuo
Copy link
Member

Xunzhuo commented Aug 5, 2025

/retest

@arkodg arkodg requested review from a team August 5, 2025 04:54
// or sort alphabetically by “{namespace}/{name}” if multiple gateways share same timestamp.
sort.Slice(c[idx].Gateways, func(i, j int) bool {
if c[idx].Gateways[i].CreationTimestamp.Equal(&(c[idx].Gateways[j].CreationTimestamp)) {
gatewayKeyI := fmt.Sprintf("%s/%s", c[idx].Gateways[i].Namespace, c[idx].Gateways[i].Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to compare the namespace/name directly instead of doing a printf here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we track this as a follow up, since ive already run the numbers with this code, and it is moving the same logic from gateway-api layer to provider layer

in the future we can run go test -bench tests to track func performance

Xunzhuo
Xunzhuo previously approved these changes Aug 5, 2025
Copy link
Member

@Xunzhuo Xunzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improves, looks like CPU cost reduces largely

Arko Dasgupta added 3 commits August 6, 2025 07:23
* sorts a collection of resources on gatewayClass - creationTimestamp
then name
* moves all per resource sort function to provider layer

This ensures all resources will be sorted, and duplicate updates
wont result into watchable subscribe calls to runners. It also
means per XDS IR is also ordered

This allows us to delete `Equal()` functions added for
`ProviderResources` and `XDS IR` which called expensive `DeepCopy`
and `sort` functions, and rely on the top level

* the internal `svcMap` field is removed, because it was affecting
caching and equality performance, it can be introduced back in the
gateway-api layer as part of Translator context to improve config
convergence time once we start accurately tracking it

Fixes: envoyproxy#6690

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: Arko Dasgupta <arko@tetrate.io>
@Xunzhuo Xunzhuo merged commit 958ef7a into envoyproxy:main Aug 6, 2025
29 of 30 checks passed
@arkodg arkodg deleted the sort-provider branch August 6, 2025 01:03
@arkodg
Copy link
Contributor Author

arkodg commented Aug 6, 2025

im linking some more data to this PR, of numbers from v1.4.2 thats running the same test
Screenshot 2025-08-05 at 6 00 34 PM

Screenshot 2025-08-05 at 6 00 29 PM Screenshot 2025-08-05 at 6 00 21 PM Screenshot 2025-08-05 at 6 00 17 PM

Summarizing the results in this table, justifying the reason to add this into v1.5.0

Metric v1.4.2 v1.5.0-rc.2 PR Comments
CPU Usage 0.01–0.02 0.03–0.06 0.01–0.04 This PR improves CPU compared to v1.5.0-rc.2 and brings it closer to v1.4.2
Container Mem Usage 240M–340M 190M–250M 180M–200M memory improvement in this PR
Go Mem Usage 80M–140M 60M–140M 70M–150M Looks similar for all
Watchable Subscriber Rate 20 120 0 Both v1.4.2 and v1.5.0-rc.2 have an issue, they shouldnt be publishing or subscribing when input resources dont change, which this PR addresses

zirain pushed a commit to zirain/gateway that referenced this pull request Aug 8, 2025
zirain pushed a commit to zirain/gateway that referenced this pull request Aug 8, 2025
Signed-off-by: zirain <zirain2009@gmail.com>
zirain added a commit that referenced this pull request Aug 8, 2025
* docs: rm latest from install egctl docs (#6700)

* we've removed the latest egctl artifacts

Relates to #6551

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* chore: fix globalResources in GNM (#6701)

Signed-off-by: zirain <zirain2009@gmail.com>

* move ordering gateway-api resources to provider (#6695)

Signed-off-by: zirain <zirain2009@gmail.com>

* Rate Limiter: Enable rate limit for month and year (#6715)

* update go control plane rate limiter version

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

* enable rate limit for month and year

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

* add attribution

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

* fix gen

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

* remove helm binaries

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

---------

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>
Co-authored-by: Rico Pahlisch <pahli88@googlemail.com>
Signed-off-by: zirain <zirain2009@gmail.com>

* docs: rm alpha alert for gateway namespace mode (#6709)

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* allow SNI and Cert SAN mismatch (#6719)

Fixes: #6442

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* chore(charts): update metadata for gateway-crds-helm (#6725)

* chore(charts): update metadata for gateway-crds-helm

Signed-off-by: Maxime Brunet <max@brnt.mx>
Signed-off-by: zirain <zirain2009@gmail.com>

* chore: Don't render bootstrap local cluster if topologyInjector is disabled (#6718)

* Consider topologyInjector when rendering bootstrap

Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>

* Add logic to provider

Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>
Signed-off-by: zirain <zirain2009@gmail.com>

* remove nit log when Backend API is disabled (#6708)

* remove nit log when Backend API is disabled

Signed-off-by: zirain <zirain2009@gmail.com>

* fix: EnvoyProxy image with digest is rejected (#6720)

* fix: EnvoyProxy image with digest is rejected

Signed-off-by: zirain <zirain2009@gmail.com>

* Revert "feat: add listener metadata (#6639)" (#6727)

This reverts commit 20cb68b.

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* docs: enhance extensibility index page (#6728)

make it easier to pick the relevant extension type

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* docs: add docs explaining graceful shutdown (#6729)

* docs: add docs explaining graceful shutdown

fixes: #2686

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* docs: ClusterTrustBundle Support in BackendTLSPolicy (#6714)

Signed-off-by: zirain <zirain2009@gmail.com>

* chore: bump go 1.24.6 (#6732)

Signed-off-by: zirain <zirain2009@gmail.com>

* [release-1.5] release-notes for v1.5.0 (#6731)

* [release-1.5] release-notes for v1.5.0

Signed-off-by: zirain <zirain2009@gmail.com>

* update compatibility matrix

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Pascal van Leeuwen <pascal@grove.city>
Signed-off-by: Maxime Brunet <max@brnt.mx>
Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>
Co-authored-by: Arko Dasgupta <arkodg@users.noreply.github.com>
Co-authored-by: commoddity <47662958+commoddity@users.noreply.github.com>
Co-authored-by: Rico Pahlisch <pahli88@googlemail.com>
Co-authored-by: Maxime Brunet <max@brnt.mx>
Co-authored-by: Isaac <10012479+jukie@users.noreply.github.com>
zirain added a commit to zirain/gateway that referenced this pull request Sep 16, 2025
* docs: rm latest from install egctl docs (envoyproxy#6700)

* we've removed the latest egctl artifacts

Relates to envoyproxy#6551

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* chore: fix globalResources in GNM (envoyproxy#6701)

Signed-off-by: zirain <zirain2009@gmail.com>

* move ordering gateway-api resources to provider (envoyproxy#6695)

Signed-off-by: zirain <zirain2009@gmail.com>

* Rate Limiter: Enable rate limit for month and year (envoyproxy#6715)

* update go control plane rate limiter version

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

* enable rate limit for month and year

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

* add attribution

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

* fix gen

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

* remove helm binaries

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>

---------

Signed-off-by: Pascal van Leeuwen <pascal@grove.city>
Co-authored-by: Rico Pahlisch <pahli88@googlemail.com>
Signed-off-by: zirain <zirain2009@gmail.com>

* docs: rm alpha alert for gateway namespace mode (envoyproxy#6709)

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* allow SNI and Cert SAN mismatch (envoyproxy#6719)

Fixes: envoyproxy#6442

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* chore(charts): update metadata for gateway-crds-helm (envoyproxy#6725)

* chore(charts): update metadata for gateway-crds-helm

Signed-off-by: Maxime Brunet <max@brnt.mx>
Signed-off-by: zirain <zirain2009@gmail.com>

* chore: Don't render bootstrap local cluster if topologyInjector is disabled (envoyproxy#6718)

* Consider topologyInjector when rendering bootstrap

Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>

* Add logic to provider

Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>
Signed-off-by: zirain <zirain2009@gmail.com>

* remove nit log when Backend API is disabled (envoyproxy#6708)

* remove nit log when Backend API is disabled

Signed-off-by: zirain <zirain2009@gmail.com>

* fix: EnvoyProxy image with digest is rejected (envoyproxy#6720)

* fix: EnvoyProxy image with digest is rejected

Signed-off-by: zirain <zirain2009@gmail.com>

* Revert "feat: add listener metadata (envoyproxy#6639)" (envoyproxy#6727)

This reverts commit 20cb68b.

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* docs: enhance extensibility index page (envoyproxy#6728)

make it easier to pick the relevant extension type

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* docs: add docs explaining graceful shutdown (envoyproxy#6729)

* docs: add docs explaining graceful shutdown

fixes: envoyproxy#2686

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>

* docs: ClusterTrustBundle Support in BackendTLSPolicy (envoyproxy#6714)

Signed-off-by: zirain <zirain2009@gmail.com>

* chore: bump go 1.24.6 (envoyproxy#6732)

Signed-off-by: zirain <zirain2009@gmail.com>

* [release-1.5] release-notes for v1.5.0 (envoyproxy#6731)

* [release-1.5] release-notes for v1.5.0

Signed-off-by: zirain <zirain2009@gmail.com>

* update compatibility matrix

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Pascal van Leeuwen <pascal@grove.city>
Signed-off-by: Maxime Brunet <max@brnt.mx>
Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>
Co-authored-by: Arko Dasgupta <arkodg@users.noreply.github.com>
Co-authored-by: commoddity <47662958+commoddity@users.noreply.github.com>
Co-authored-by: Rico Pahlisch <pahli88@googlemail.com>
Co-authored-by: Maxime Brunet <max@brnt.mx>
Co-authored-by: Isaac <10012479+jukie@users.noreply.github.com>
Signed-off-by: zirain <zirain2009@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move sorting Gateway API resources to the Provider layer

3 participants