Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway-api v1.1.0 standard-install breaks envoy-gateway and cilium-operator #3075

Open
networkhermit opened this issue May 11, 2024 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@networkhermit
Copy link
Contributor

networkhermit commented May 11, 2024

What happened:

Recently I upgraded the gateway-api to v1.1.0 from the standard channel, but I found that both envoy-gateway and cilium-operator are in error, even though I have not directly used GRPCRoute gateway.networking.k8s.io/v1alpha2. (istio controller is fine)

From the v1.1.0 release notes:

GRPCRoute has graduated to GA (v1) and is now part of the Standard Channel. If
you are already using the experimental version GRPCRoute, we recommend holding
off on upgrading to the standard channel version of GRPCRoute until the
controllers you're using have been updated to support GRPCRoute v1. Until then,
it is safe to upgrade to the experimental channel version of GRPCRoute in v1.1
that includes both v1alpha2 and v1 API versions.

It seems to me that the experimental channel is more safe to upgrade, which is unintuitive.

I have two questions regarding this issue:

  1. Is it possible to keep old api versions in the standard channel and deprecate them only after some period and future releases?
  2. Does this issue supposed to be fixed by the envoy-gateway and cilium-operator implementations? They both requires some experimental gateway api crds for the controller to start, even when users don't use them at all. And upgrade the gateway-api using the stable channel accidentally breaks them.

Edited:

I wrongly made the assumption that the v1alpha2 GRPCRoute was installed from the gateway api standard channel, but in reality it was installed by envoy-gateway helm charts which embed the experimental grpcroutes crd. (The cilium-operator also requires the experimental gateway api crds, but istio only requires stable gateway crds so when I upgrade the gateway api it's not broken.)

envoy-gateway log:

Error: failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "GRPCRoute" in version "gateway.networking.k
8s.io/v1alpha2"
Usage:
  envoy-gateway server [flags]

Aliases:
  server, serve

Flags:
  -c, --config-path string   The path to the configuration file.
  -h, --help                 help for server

failed to create provider Kubernetes: failted to create gatewayapi controller: no matches for kind "GRPCRoute" in version "gateway.networking.k8s.io/v
1alpha2"

cilium-operator log:

time="2024-05-11T05:43:22Z" level=info msg="Checking for required GatewayAPI resources" requiredGVK="[gateway.networking.k8s.io/v1, Kind=gatewayclasse
s gateway.networking.k8s.io/v1, Kind=gateways gateway.networking.k8s.io/v1, Kind=httproutes gateway.networking.k8s.io/v1beta1, Kind=referencegrants ga
teway.networking.k8s.io/v1alpha2, Kind=grpcroutes gateway.networking.k8s.io/v1alpha2, Kind=tlsroutes]" subsys=gateway-api
time="2024-05-11T05:43:22Z" level=error msg="Invoke failed" ="gateway-api.initGatewayAPIController (pkg/gateway-api/cell.go:70)" error="failed to crea
te gateway controller: failed to setup reconciler: no matches for kind \"GRPCRoute\" in version \"gateway.networking.k8s.io/v1alpha2\"" subsys=hive
time="2024-05-11T05:43:22Z" level=info msg=Stopping subsys=hive
time="2024-05-11T05:43:22Z" level=fatal msg="failed to start: failed to create gateway controller: failed to setup reconciler: no matches for kind \"G
RPCRoute\" in version \"gateway.networking.k8s.io/v1alpha2\"" subsys=cilium-operator-generic

What you expected to happen:

standard-install is more safe to upgrade.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

@networkhermit networkhermit added the kind/bug Categorizes issue or PR as related to a bug. label May 11, 2024
@tao12345666333
Copy link
Member

  1. This issue has been discussed before. Maintainers believe that there is no scenario to upgrade GRPCRoute from v1alpha2 to v1 through standard channels.
    Therefore, only the latest v1 version is retained in the standard channel.

  2. It is difficult to ensure upgrades through various implementations.
    After upgrading their dependency on GWAPI to v1.1, it is difficult for them to maintain support for v1alpha2.

@youngnick
Copy link
Contributor

Yes, each implementation will need to update to support v1.1, which for Cilium will require supporting both v1 and v1alpha2 of GRPCRoute. Until then, Cilium only supports v1.0 of Gateway API.

@networkhermit
Copy link
Contributor Author

Does this issue supposed to be fixed by the envoy-gateway and cilium-operator implementations? They both requires some experimental gateway api crds for the controller to start, even when users don't use them at all. And upgrade the gateway-api using the stable channel accidentally breaks them.

I mean, users might not use any v1alpha2 GRPCRoute at all and so there is no need for CR upgrade. But the current situation is that both envoy-gateway and cilium-operator can't even start due to the hard requirement of v1alpha2 GRPCRoute crd.

Maybe it's a good idea for envoy-gateway, cilium-operator and other implementations to ignore missing experimental alpha/beta crds and finish starting.

P.S. I edited my original comment to correct my wrongly made assumption.

@howardjohn
Copy link
Contributor

Maybe it's a good idea for envoy-gateway, cilium-operator and other implementations to ignore missing experimental alpha/beta crds and finish starting.

FWIW this is basically how we did it in Istiod. We already had old logic for "If CRD doesn't exist, ignore it and startup anyways". GW API somewhat broke that by removing versions (our own CRDs do not do this), so we made it so you have to explicitly opt into experimental CRDs like GRPCRoute v1alpha2

@robscott
Copy link
Member

Created #3084 to help document this pain point when upgrading to v1.1.

@youngnick
Copy link
Contributor

I opened cilium/cilium#32539 to cover building out support for Cilium that's similar to what @howardjohn described for Istio.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 13, 2024
@networkhermit
Copy link
Contributor Author

Not stale.

@networkhermit
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 13, 2024
@youngnick
Copy link
Contributor

The required Cilium changes (in cilium/cilium#34212) have been merged into main and will be in Cilium 1.17.

We probably also need some updates to the Gateway API CRD managment guide at https://gateway-api.sigs.k8s.io/guides/crd-management/ and the Implementer's guide at https://gateway-api.sigs.k8s.io/guides/implementers/ to recommend that implementations ensure that Experimental CRDs are optional for startup. Suggestions or PRs welcomed on those fronts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

7 participants