Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP-1651: Gateway Routability #1653

Merged
merged 34 commits into from
Jun 26, 2023
Merged

Conversation

dprotaso
Copy link
Contributor

@dprotaso dprotaso commented Jan 16, 2023

/kind gep

What this PR does / why we need it:
Support different types of routable Gateways

Which issue(s) this PR fixes:

Supports (but does not resolve) #1651

@k8s-ci-robot k8s-ci-robot added kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 16, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @dprotaso. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 16, 2023
@dprotaso dprotaso marked this pull request as draft January 16, 2023 21:47
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 16, 2023
@dprotaso
Copy link
Contributor Author

Going to leave this as draft - until I propose an implementation/API - @evankanderson had a simple proposal in the discussion but wanted to dwell on it a bit before committing it to this GEP

@dprotaso dprotaso changed the title Initial GEP draft with goals/non-goals, intro and references GEP-1651: Cluster local Gateways Jan 16, 2023
@dprotaso
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 16, 2023
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this @dprotaso! Have some questions about scoping here, especially around some of the edge cases about what "external" means in this context.

site-src/geps/gep-1651.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 13, 2023
@dprotaso dprotaso marked this pull request as ready for review February 13, 2023 16:04
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2023
@robscott
Copy link
Member

@dprotaso added this to agenda for community meeting today, want to try to figure out where this fits between a theoretical future ClusterIP GatewayClass, and the broader goal of providing shared config for all in-cluster implementations that are currently mapping a Gateway to a Service type=LB.

@howardjohn
Copy link
Contributor

Discussed on zoom: this seems like a specialized case of a more general problem: how do we customize the generate resources (Deployment/Service commonly, maybe others) for in cluster deployments? See also #1713 and #1355.

Currently I think most implementations seem to do this in different ways. It may make sense to have a unified approach here rather than tackling one specific use case

@robscott
Copy link
Member

Another follow up from community meeting. I think we have 3 questions to answer:

  1. Should we group this together with broader efforts to configure how Gateways are translated to Services (and maybe Deployments)? That's really @howardjohn's question above.
  2. Where should this config live? Is this something that should be per-Gateway or per-GatewayClass? In the case of cloud LBs, you're often talking about entirely different types of LB depending on if they're internal or external.
  3. Should we differentiate between same-cluster and same-network? For example, many in-cluster implementations will only be able to implement same-cluster, while many cloud providers will only be able to implement same-network.

@bowei
Copy link
Contributor

bowei commented Feb 14, 2023

+1 on seeing if this fits into a category of standardizing the experience for the user of Gateway proxies/infrastructure that is built and deployed using standard K8s.

Regarding whether GatewayClass or Gateway -- this really rests on the role you think who should be determining these parameters. Our most typical use case is the person owning the lifecycle of the GW would not be also configuring # of replicas etc. They may give hints as to the capacity they need, but not be the one fixing the actual resource values.

@robscott
Copy link
Member

Here's an attempt to answer the questions from the meeting I commented above:

Another follow up from community meeting. I think we have 3 questions to answer:

  1. Should we group this together with broader efforts to configure how Gateways are translated to Services (and maybe Deployments)? That's really @howardjohn's question above.

At a minimum, we should try to list out what users commonly want to configure here. Determining what those features are and how often they're likely to vary would help us decide if that's a separate effort or directly linked with this.

  1. Where should this config live? Is this something that should be per-Gateway or per-GatewayClass? In the case of cloud LBs, you're often talking about entirely different types of LB depending on if they're internal or external.

I'm biased because GKE already has different GatewayClasses for internal and external load balancers. If we were to include this on GWC, we could add something like a scope field that had the following options: External|SameNetwork|SameCluster.

While I can see how this could also fit on the Gateway itself, to me this feels more fundamentally about the type/class of Gateway that is being used than a switch on the Gateway.

I'm still not really sure how this would interact with the potential of a future "ClusterIP" GatewayClass, but I think this may be a bit less likely to conflict with that.

  1. Should we differentiate between same-cluster and same-network? For example, many in-cluster implementations will only be able to implement same-cluster, while many cloud providers will only be able to implement same-network.

I think these probably should be different values. It's possible that in-cluster implementations would be able to configure internal LBs on some providers that effectively provided "SameNetwork" behavior.

site-src/geps/gep-1651.md Outdated Show resolved Hide resolved
@howardjohn
Copy link
Contributor

Regarding whether GatewayClass or Gateway -- this really rests on the role you think who should be determining these parameters. Our most typical use case is the person owning the lifecycle of the GW would not be also configuring # of replicas etc. They may give hints as to the capacity they need, but not be the one fixing the actual resource values.

FWIW this is not how many users use our product, at least. See https://istio.io/latest/docs/setup/additional-setup/gateway/#dedicated-application-gateway and #567.

I think my views on GatewayClass haven't necessarily been congruent with the projects position, but in my mind, if we draw an analogy to deployments (which is not a stretch, we literally make a Deployment for us :-) )

GatewayClass: deploy the meta-infra. i.e. Deployment controller in controller-manager.
Gateway: similar to Deployment itself, configures a deployment; all configuration for a given instance is here

I know there is "params" on Gatewayclass but I think this is an anti-pattern for most use cases. This would be akin to putting a cluster scoped "ResourceClass" resource that has cpu/memory requests, and deployments reference that vs inlining it - this isn't a terrible idea, but feels more like a higher level abstraction that a PaaS may offer, while I would expect Kubernetes to be a bit lower level. Similarly, having the ability to configure things at a higher level seems nice for Gateway, but being able to configure them on a per-Gateway basis remains important.

I would also like to point out that the "1-2 Gateways per cluster" is a common case, but not the only one. We expect to have 100s of Gateways in a cluster which all may have bespoke config. Making each of theserequire a GatewayClass is problematic

@pleshakov
Copy link
Contributor

Could introducing parametersRef in the Gateway help with per-Gateway implementation-specific configuration?

@dprotaso
Copy link
Contributor Author

dprotaso commented Feb 21, 2023

Expanding scope of the GEP

Should we group this together with broader efforts to configure how Gateways are translated to Services (and maybe Deployments)? That's really @howardjohn's question above.

I think it's premature to suggest this use case warrants an expansion in scope to cover the definition of the Gateway's deployment. It feels this locks us into a 1:1 mapping between a Gateway and it's underlying k8s resources.

I worry this would interfere with my the proposal to define how multiple gateways can be merged. See #1713

Empirically, Knative's deployment of Istio makes use of a single deployment to handle cluster local traffic AND traffic from the internet. This is accomplished by using multiple K8s Services (one type: LoadBalancer and the other type: ClusterIP) each pointing to different ports on Istio's K8s deployment.

If a 1:1 mapping between Gateway => (Service & Deployment) is required we would have to push scope to be defined at the listener level and then add a corresponding scope property on the GatewayStatus.Addresses to figure out the address.

ie.

# Bad example
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: prod-web
spec:
  gatewayClassName: acme-lb
  listeners:  
  - protocol: HTTP
    port: 80
    scope: ClusterLocal
    name: prod-internal-gw
  - protocol: HTTP
    port: 80
    scope: Public
    name: prod-web-gw
status:
  addresses: 
  - type: IPAddress
    scope: Public
    value: 53.457.48.54
  - type: IPAddress
    value: 10.0.0.13
    scope: ClusterLocal

GatewayClass or Gateway

Where should this config live? Is this something that should be per-Gateway or per-GatewayClass? In the case of cloud LBs, you're often talking about entirely different types of LB depending on if they're internal or external.

GatewayClass seems solely for providing implementation specific configuration - given that it's so underspecified. Gateway is advertised as 1:1 with the life cycle of the configuration of infrastructure and since the proposed property influences the infrastructure being configured I would lean that it exists on the Gateway.

From howardjohn

Similarly, having the ability to configure things at a higher level seems nice for Gateway, but being able to configure them on a per-Gateway basis remains important.

Yeah +100

From howardjohn

I would also like to point out that the "1-2 Gateways per cluster" is a common case, but not the only one. We expect to have 100s of Gateways in a cluster which all may have bespoke config. Making each of theserequire a GatewayClass is problematic

Knative could easily have 1000s of Gateways resources that are merged (because of unique certs issued by HTTP01)

Same Cluster or Same Network

Should we differentiate between same-cluster and same-network? For example, many in-cluster implementations will only be able to implement same-cluster, while many cloud providers will only be able to implement same-network.

@robscott Can you elaborate on while many cloud providers will only be able to implement same-network.? What is considered same-network and what are the boundaries?

Given that I had to even ask that question I think SameCluster/ClusterIP is a very straight forward definition and has clear boundaries for users. I would encourage cloud providers to support this.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2023
@dprotaso
Copy link
Contributor Author

dprotaso commented Jun 16, 2023

sorry folks - I rebase to resolve nav conflict

@dprotaso
Copy link
Contributor Author

3 ERROR: failed to copy: httpReadSeeker: failed open: content at https://mirror.gcr.io/v2/library/golang/manifests/sha256:6fb612aac0ae076bd4f6a76e48c4c8e59a4bae89dc5201252ec2b4eb8a2ae2a0?ns=docker.io not found: not found

weird
/retest

@dprotaso
Copy link
Contributor Author

gcr transient error

/retest

@robscott
Copy link
Member

Thanks @dprotaso! This LGTM, but leaving for @youngnick to give final LGTM.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, howardjohn, robscott, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@youngnick
Copy link
Contributor

After one last pass, yes, this LGTM

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2023
@robscott
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2023
@k8s-ci-robot k8s-ci-robot merged commit d03200b into kubernetes-sigs:main Jun 26, 2023
@dprotaso dprotaso deleted the gep-1651 branch June 26, 2023 21:22
@dprotaso
Copy link
Contributor Author

wow no way - thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.