Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Listeners in terms of an endpoint concept. #193

Merged

Conversation

jpeach
Copy link
Contributor

@jpeach jpeach commented May 13, 2020

This change explores one way to solve the problem described here,
which is that there is an implied endpoint abstraction that is awkwardly
split across the cluster operator and application owner roles.

Rewrite Listeners to model the concept of a logical network endpoint.
This change defines a Gateway more strongly as the network element to
which addresses can be attached, and Listeners as logical endpoints
within the network element.

The main goal of this change is to consolidate the name of an application
(i.e. the endpoint) into a single part of the API that is controlled
by one user persona. In this change, the application name is strictly
part of the Gateway spec and controlled by the cluster operator role,
while routes are controlled by the application owner.

The Listener protocol is now strongly defined to a core set of common
protocol stacks. This loss of flexibility improves clarity because it
makes it possible to specify how Listeners should be collapsed into
virtual hosts in an underlying proxy implementation.

This change doesn't update ListenerStatus, though clearly it no longer
makes complete sense. The most likely change needed for GatewayStatus
is to drop the Listeners field and fold listener errors directly into
the Conditions field.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 13, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @jpeach. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 13, 2020
@jpeach
Copy link
Contributor Author

jpeach commented May 13, 2020

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 13, 2020
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This an interesting concept, thanks for the work on it! The diff is relatively large, but if I'm understanding it correctly, the change entails moving hostname from routes to listeners and moving the route selector into listeners. Did I miss any other significant changes in the diff?

Comment on lines 93 to 130
// TLSProtocolType accepts TLS sessions over TCP.
TLSProtocolType ProtocolType = "TLS"

// TLSProtocolType accepts TCP sessions.
TCPProtocolType ProtocolType = "TCP"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding the scope of this PR, but on the surface these protocol additions feel out of scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully. The problem is that certain behaviour can only be specified in terms of the protocol field. This means that, maybe, the open-ended protocol is a false abstraction and that in reality, the set of supported protocols is fixed.

Comment on lines 129 to 130
// Hostname must be supplied required for "HTTP" and "HTTPS" protocols, and may
// be supplied for "TLS" protocol.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional validation complexity here is an unfortunate consequence of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the complexity was already there, but this change lets us make it explicit. Previously, there was no apparant complexity exposed in the API because all the complexity of how to reconcile the hostname and TLS was implicit. The controller has to do something but the API provides no rules or other guidance.

api/v1alpha1/gateway_types.go Outdated Show resolved Hide resolved
// +optional
Protocol *string `json:"protocol,omitempty" protobuf:"bytes,4,opt,name=protocol"`
// +required
Port int32 `json:"port,omitempty" protobuf:"varint,3,opt,name=port"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under this model it seems like it would be helpful to have multiple ports/protocols per listener. I know we'd already discussed that as an option, but now that routes are per listener, it seems to be even more necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for that, and I'd argue that it doesn't work. You can't have multiple processes accepting on the same socket unless you know the application protocol and that protocol has some method of discrimination. This means that you are, by definition, doing something protocol-specific, so trying to make that generic is a false abstraction that obscures rather than clarifies.

In this model, you can have a listener per port, and the rules for collapsing listeners onto the same underlying network port are clear. If you need more ports, then add more listeners with the same selector.

A further aspect that I didn't draw out in this PR is that the protocol and the route selector fields are tightly coupled and there have to be rules about how to handle this. Tying them together in the same struct makes it easier to elaborate the rules for handling that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, as I've thought about this more it really does make sense to keep listeners unique per port/protocol combination.

@jpeach
Copy link
Contributor Author

jpeach commented May 13, 2020

This an interesting concept, thanks for the work on it! The diff is relatively large, but if I'm understanding it correctly, the change entails moving hostname from routes to listeners and moving the route selector into listeners. Did I miss any other significant changes in the diff?

Nope, you have nailed the main intent. See attached doc for my reasoning that there is a "logical endpoint" object that we should model.

@bowei
Copy link
Contributor

bowei commented May 13, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 13, 2020
@bowei
Copy link
Contributor

bowei commented May 13, 2020

/assign

@bowei
Copy link
Contributor

bowei commented May 19, 2020

Splitting the ports by listener seems to clean things up.
I'm not so sure about the hostname requirement:

How to have wildcard hostnames.
This is a common use case (in fact we added support for it Ingress GA, maybe step backwards).

We at least need to be able to handle default hostname.

Given that with the listener + TLS + associated routes => you can validate the hostname matches and certs w/ all of the objects involved, can we remove hostname from the Gateway level?

api/v1alpha1/gateway_types.go Outdated Show resolved Hide resolved
//
// Listeners are considered compatible if the Protocol field is
// "HTTP", HTTPS or "TLS", the Hostname field is specified, and the
// Port field matches.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this defining compatibility (if yes, then compatibility with what) or validity of a Listener?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defining compatibility for the purposes of "collapsing compatible Listeners" in the preceding paragraph. The concept of collapsing Listeners is what allows virtual hosting).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more text to specify the rules around collapsing Listeners into a virtual host configuration. I think it's clearer, but could still be better.

@hbagdi
Copy link
Contributor

hbagdi commented May 19, 2020

Given that with the listener + TLS + associated routes => you can validate the hostname matches and certs w/ all of the objects involved, can we remove hostname from the Gateway level?

That works for HTTP but not HTTPS.

I think this change is a positive one and we should proceed.
I have the same thought as @bowei on the hostname. Default and wildcard domains need to be supported here.
Here is what I think might work:

Keep Hostname in HTTPRoute. This is used for route matching.
In Gateway's Listener, we keep a field AllowedHostnames []string which acts as a filter.
Cluster Operator populate the field after they do the due diligence on their part (ensure DNS setup, TLS certs, approvals, etc).
App owners are now free to bind routes as they see fit as long as the hostname they wish to use is allowed in the Gateway Listener. If not, then they go and talk to the Cluster Operator.

AllowedHostnames can be strict, meaning each and every domain needs to be added, semi-strict by allowed a wildcard set of domains, or relaxed, where it allows for all domains (the easy path).

@jpeach jpeach force-pushed the introduce-endpoint-concept branch from 2fb6ca4 to 81fd996 Compare May 20, 2020 11:35
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 20, 2020
@jpeach
Copy link
Contributor Author

jpeach commented May 20, 2020

Splitting the ports by listener seems to clean things up.
I'm not so sure about the hostname requirement:

How to have wildcard hostnames.
This is a common use case (in fact we added support for it Ingress GA, maybe step backwards).

I need to go dig up and read the Ingress spec for this! I changed the Listener hostname to be a match type, so you can express wildcards as a domain match. The default would be an exact host name match.

We at least need to be able to handle default hostname.

I documented the case where the Listener can omit the Hostname. This would allow your to specify routes without pinning to any name.

This, together with domain matching, required a fair amount of new text to specify how Listeners should be collapsed into a virtual host configuration.

Given that with the listener + TLS + associated routes => you can validate the hostname matches and certs w/ all of the objects involved, can we remove hostname from the Gateway level?

Yeh I agree with @hbagdi on this. You need some deterministic information about the hostname in the Listener to do the right certificate selection. In the Google doc linked from the initial comment I discussed some of the corner cases here. Keeping some hostname spec in the Listener means that the semantics are consistent across changes in what the route selector selects.

@jpeach
Copy link
Contributor Author

jpeach commented May 20, 2020

Given that with the listener + TLS + associated routes => you can validate the hostname matches and certs w/ all of the objects involved, can we remove hostname from the Gateway level?

That works for HTTP but not HTTPS.

I think this change is a positive one and we should proceed.
I have the same thought as @bowei on the hostname. Default and wildcard domains need to be supported here.
Here is what I think might work:

Keep Hostname in HTTPRoute. This is used for route matching.
In Gateway's Listener, we keep a field AllowedHostnames []string which acts as a filter.
Cluster Operator populate the field after they do the due diligence on their part (ensure DNS setup, TLS certs, approvals, etc).
App owners are now free to bind routes as they see fit as long as the hostname they wish to use is allowed in the Gateway Listener. If not, then they go and talk to the Cluster Operator.

AllowedHostnames can be strict, meaning each and every domain needs to be added, semi-strict by allowed a wildcard set of domains, or relaxed, where it allows for all domains (the easy path).

I restored the Hostname in HTTPRoute as per this suggestion. Probably we need some more explanatory text to describe how we intend it to work.

To address the default and wildcard cases, I borrowed the match concept from routes and change the Listener hostname to have "domain" and "exact" match types. A hostname can be omitted, which would give the Listener default route semantics.

// Listeners associated with this Gateway. Listeners define logical
// endpoints that are bound on this Gateway. If no Listeners are
// provided, the GatewayClass may associate the Gateway with
// an address but the Gateway will not be able to accept any
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not be able to accept ? If you don't specify any listener it could default to HTTP/HTTPS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no Listeners, there's no way to know where to send the traffic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no listeners, you don't even know which port to listen on to accept any traffic in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for associated an address but not accepting connections? IF it's not meaningful/useful, we are better off making it an invalid configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can resolve this by making Listeners required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- that's probably easiest rather than trying to figure out what empty listeners mean (for now)

@jpeach
Copy link
Contributor Author

jpeach commented Jun 4, 2020

Updated field tags and examples so that all examples can be applies to a cluster. Filed the following issues to track follow on work:

#206 Discuss domain match vs. wildcard match for listener hostnames
#205 WebSocket protocol support
#204 h2c (cleartext HTTP/2) protocol support
#203 Protocol type extensibility
#202 Re-address Gateway and Listener error conditions

@jpeach jpeach force-pushed the introduce-endpoint-concept branch 2 times, most recently from 479ebba to a0723b3 Compare June 4, 2020 04:31
@hanlins
Copy link
Member

hanlins commented Jun 4, 2020

Can you change gateway conditions since route is not part of Gateway? Also it would be nice to add GatewayConditionType for Addresses like InvalidAddress.

@jpeach
Copy link
Contributor Author

jpeach commented Jun 4, 2020

Can you change gateway conditions since route is not part of Gateway? Also it would be nice to add GatewayConditionType for Addresses like InvalidAddress.

There's a lot of conditions that don't make sense after this change, and also conditions that are missing. Since this PR is quite intrusive already, I'm proposing to spin that out into #202.

// definitions into single implementation-defined acceptor
// configuration even if their Port fields would otherwise conflict.
//
// Listeners are compatible if all of the following conditions are true:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix this in a later commit, but it would be good to frame this from the user's standpoint, as the current description is somewhat hard to parse.

@jpeach
Copy link
Contributor Author

jpeach commented Jun 5, 2020 via email

@bowei
Copy link
Contributor

bowei commented Jun 7, 2020

/approve

let's merge and iterate

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bowei, jpeach

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 7, 2020
Rewrite Listeners to model the concept of a logical network endpoint.
This change defines a Gateway more strongly as the network element to
which addresses can be attached, and Listeners as logical endpoints
within the network element.

The main goal of this change is to consolidate the name of an application
(i.e. the endpoint) into a single part of the API that is controlled
by one user persona. In this change, the application name is strictly
part of the Gateway spec and controlled by the cluster operator role,
while routes are controlled by the application owner.

The Listener protocol is now strongly defined to a core set of common
protocol stacks. This loss of flexibility improves clarity because it
makes it possible to specify how Listeners should be collapsed into
virtual hosts in an underlying proxy implementation.

This change doesn't update ListenerStatus, though clearly it no longer
makes complete sense. The most likely change needed for GatewayStatus
is to drop the Listeners field and fold listener errors directly into
the Conditions field.

Signed-off-by: James Peach <jpeach@vmware.com>
@jpeach jpeach force-pushed the introduce-endpoint-concept branch from a0723b3 to e1fc4ab Compare June 8, 2020 23:42
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 8, 2020
@robscott
Copy link
Member

robscott commented Jun 9, 2020

Just confirmed with @jpeach that we can remove the hold and get this in.

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2020
@k8s-ci-robot k8s-ci-robot merged commit 008ac89 into kubernetes-sigs:master Jun 9, 2020
@hbagdi hbagdi mentioned this pull request Jun 11, 2020
jpeach added a commit to jpeach/gateway-api that referenced this pull request Jun 16, 2020
The structure for exposing Listener status was broken in kubernetes-sigs#193 because
the Listener Name field was removed. This means that status can no longer
be associated to a Listener by matching the name.

However, since Listeners are basically metadata wrapped around ports,
we can expose granular Listener status by publishing a ListenerStatus
object for each port that is bound on the Gateway. This lets us model
port conflicts in a reasonable way as well as scope other errors more
tightly than the entire gateway.

In practice, the scoping of Listener conditions is not as granular as
previously, since most applications will all be exposed on port 443. In
this case, it may not be easy to distinguish which Listener's routes
were not ready.

Since IP addresses are not a property of the Gateway, they should also
be a property of the GatewayStatus. Note that the status field contains
the bound addresses, not the requested addresses.

Condition types have been redistributed between the GatewayConditionType
and ListenerConditionType in accordance with the responsibility for the
error that each condition signifies.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/gateway-api that referenced this pull request Jun 17, 2020
The structure for exposing Listener status was broken in kubernetes-sigs#193 because
the Listener Name field was removed. This means that status can no longer
be associated to a Listener by matching the name.

However, since Listeners are basically metadata wrapped around ports,
we can expose granular Listener status by publishing a ListenerStatus
object for each port that is bound on the Gateway. This lets us model
port conflicts in a reasonable way as well as scope other errors more
tightly than the entire gateway.

In practice, the scoping of Listener conditions is not as granular as
previously, since most applications will all be exposed on port 443. In
this case, it may not be easy to distinguish which Listener's routes
were not ready.

Since IP addresses are not a property of the Gateway, they should also
be a property of the GatewayStatus. Note that the status field contains
the bound addresses, not the requested addresses.

Condition types have been redistributed between the GatewayConditionType
and ListenerConditionType in accordance with the responsibility for the
error that each condition signifies.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/gateway-api that referenced this pull request Jun 17, 2020
The structure for exposing Listener status was broken in kubernetes-sigs#193 because
the Listener Name field was removed. This means that status can no longer
be associated to a Listener by matching the name.

However, since Listeners are basically metadata wrapped around ports,
we can expose granular Listener status by publishing a ListenerStatus
object for each port that is bound on the Gateway. This lets us model
port conflicts in a reasonable way as well as scope other errors more
tightly than the entire gateway.

In practice, the scoping of Listener conditions is not as granular as
previously, since most applications will all be exposed on port 443. In
this case, it may not be easy to distinguish which Listener's routes
were not ready.

Since IP addresses are a property of the Gateway, they should also be
a property of the GatewayStatus. Note that the status field contains
the bound addresses, not the requested addresses.

Condition types have been redistributed between the GatewayConditionType
and ListenerConditionType in accordance with the responsibility for the
error that each condition signifies.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/gateway-api that referenced this pull request Jun 29, 2020
The structure for exposing Listener status was broken in kubernetes-sigs#193 because
the Listener Name field was removed. This means that status can no longer
be associated to a Listener by matching the name.

However, since Listeners are basically metadata wrapped around ports,
we can expose granular Listener status by publishing a ListenerStatus
object for each port that is bound on the Gateway. This lets us model
port conflicts in a reasonable way as well as scope other errors more
tightly than the entire gateway.

In practice, the scoping of Listener conditions is not as granular as
previously, since most applications will all be exposed on port 443. In
this case, it may not be easy to distinguish which Listener's routes
were not ready.

Since IP addresses are a property of the Gateway, they should also be
a property of the GatewayStatus. Note that the status field contains
the bound addresses, not the requested addresses.

Condition types have been redistributed between the GatewayConditionType
and ListenerConditionType in accordance with the responsibility for the
error that each condition signifies.

Signed-off-by: James Peach <jpeach@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants