Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS: Align APIs with Personas #95

Closed
danehans opened this issue Feb 14, 2020 · 20 comments
Closed

TLS: Align APIs with Personas #95

danehans opened this issue Feb 14, 2020 · 20 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@danehans
Copy link
Contributor

What would you like to be added:
Docs that specify what persona or personas are responsible for TLS configuration.

Why is this needed:
This is needed to add TLS support to service-api's. Currently, all TLS config is associated to a gateway listener. Gateways will need to perform different actions for TLS connections, i.e. terminate the client-side connection and create a new tls connection to a backend service. Whether per route tls configuration is added to Gateway or xRoute should be dictated by the persona responsible for managing route-specific tls configurations.

/assign @bowei
/cc @ironcladlou @Miciah @jpeach

PR xref: #71 #81

Issue xref: #94 #90 #49

@danehans danehans added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 14, 2020
@jpeach
Copy link
Contributor

jpeach commented Feb 17, 2020

Yup, a few user stories showing the different possible personas in TLS configuration would be informative.

@danehans
Copy link
Contributor Author

User Stories

Option 1: As a cluster operator, I need the ability to manage TLS configuration for Gateway and xRoute resources. This will allow the cluster operator to act as a single authority for managing TLS-related configuration.

#71 implements Gateway and xRoute TLS configuration managed by the cluster operator persona.

Option 2: As a developer, I need the ability to manage TLS configuration for Gateway and xRoute resources. This will allow the developer to act as a single authority for managing TLS-related configuration.

Option 3: As a cluster operator, I need the ability to manage TLS configuration for the Gateway resource and have the developer persona manage TLS configuration for their xRoute resources. This will allow cluster operators the ability to manage client<>gateway TLS configuration, while allowing developers to manage gateway<>backend service TLS configuration.

Currently, service-apis allows the cluster operator persona to manage TLS configuration for a Gateway listener. #81 implements xRoute TLS configuration managed by a developer persona.

@smarterclayton
Copy link

It might be useful to write out an alternative way of looking at option 3 as:

  1. As an application deployer on a cluster, I may own or manage the secrets for my ingress as part of my application process, because this is a shared cluster OR I am the owner of those secrets
  2. As a cluster operator, I may provide a default TLS termination policy (which may prefer reencryption from gateway to backend) for ingress so that application deployers on the cluster do not need to (or are explicitly prevented from) provide their own TLS certificates

The distinction is entirely dependent on user role and purpose of cluster - cluster authors often start with 1, then move to 2, but there will always be a dynamic tension between them.

There's a third use case that is implicit in the above but not fully called out:

  1. As a cluster operator I can configure end to end TLS encryption between gateway and backend without requiring the application deployers to specify explicit credentials (i.e. using a wildcard on the frontend, and auto generating TLS certs for pods based on their DNS name, and then reencrypting from the gateway using the autogenerated CA and the DNS name).

@jpeach
Copy link
Contributor

jpeach commented Feb 17, 2020

Related user stories:

#103
#94
#93
#92
#91
#90
#52
#51

@jpeach
Copy link
Contributor

jpeach commented Feb 17, 2020

TLS is a property of an application. That is, TLS policy is per- hosted domain. A public website "www.example.com" has a different TLS policy from an internal website "www.internal.example.com", has a different TLS policy from an API endpoint "foo.api.example.com".

TLS is a concern of a cluster operator. That is, an operator has security policies and business concerns that affect which finds of TLS policies are allowed. This could range from allowed TLS versions to CA bundles, allowed cipher suites, etc. An operator can also provide services to automatically provision TLS (by some criteria).

It seems to me that we are still in the process of unpicking the entanglement of roles and thinking things through.

For example:

Option 3: As a cluster operator, I need the ability to manage TLS configuration for the Gateway resource and have the developer persona manage TLS configuration for their xRoute resources. This will allow cluster operators the ability to manage client<>gateway TLS configuration, while allowing developers to manage gateway<>backend service TLS configuration.

In this option, are the cluster operator and the developer doing the same kind of management? I don't think we have enough information to be sure. Based on my assertions above, both personas have some interest in this scenario, but there's a number of different ways you could assign responsibility in different scenarios.

@youngnick
Copy link
Contributor

I think it's important to remember that @danehans' option 1 definitely has a demand in the end-user community - it's why we built HTTPProxy for Contour the way we did. I've written up #102 and #103 covering the option 1 and option 2 ends of the "Who manages the TLS" spectrum.

I think that, as @jpeach mentioned in another discussion, there are a few decisions tied together when we talk about "TLS":

  • Is TLS terminated at the Gateway?

If so:

If not:

  • Is TLS passed through the Gateway to the backend service?
  • If so, is routing based on SNI required?

I think that the questions that are underneath all of this are:

What's the difference between a Gateway and a Route?
Is the Gateway the right place for most use cases to put TLS?
Should there be another place where TLS lives? On the Route?

@danehans
Copy link
Contributor Author

In this option, are the cluster operator and the developer doing the same kind of management?

@jpeach in option 3, the Cluster Operator is responsible for managing how an app (i.e. TCPRoute) TLS connection is terminated. An App Dev can request a TLS termination type, but the Cluster Operator is responsible for making this decision.

  • If the Gateway terminates the connection, the App Dev is responsible for providing the required TLS config to the Cluster Operator. The Cluster Operator is then responsible for adding this config to the Gateway.
  • If the Cluster Operator requires TLS from Gateway to backend Service, then the App Dev is responsible for providing the TLS config, i.e. tcproute.spec.hosts["foo.example.com"].tls.
  • If the TLS connection is allowed to passthrough to the backend Service, then the Gateway performs a route match based on SNI and the App Dev is responsible for providing the TLS config, i.e. tcproute.spec.hosts["foo.example.com"].tls.

Is the Gateway the right place for most use cases to put TLS?
I think the Gateway is the right place for TLS config when the connection is terminated at the Gateway. If the Gateway initiates the TLS connection to the backend Service or the TLS connection is passed-through, xRoute is the correct place for TLS config.

Should there be another place where TLS lives? On the Route?
Another approach that I've been considering is the use of a separate resource for managing TLS config, i.e. TLSConfig. Any of the personas can create the TLSConfig resource. A Cluster Operator could associate a TLSConfig to aGateway listener. An App Dev could associate a TLSConfig to an xRoute. This may ease the coordination efforts between the personas. We could even extend the idea to automate the coordination by adding a TLS config policy to Gateway that automatically allows TLS configs from certain namespaces, etc..

@youngnick
Copy link
Contributor

I'm not sure if I quite understood what you meant, @danehans, but here's what I understood:

  • There should be multiple places where a TLS config MAY be applied, but the choice of where it is applied is dependent on the TLS termination and reencryption requirements.
  • In the case of TLS termination at the gateway, the TLS config would be applied at the Gateway level.
  • In the case where TLS is reestablished to the backend, or the TLS connection is passed through, the TLS configuration should live on the xRoute.

I'm not sure if I agree or not, I need to think more about it, just want to check I've understood correctly.

I don't think that a separate CRD for TLSConfig is an answer here. There are definitely already complaints (which I think are fair) that we are replacing one object with three, let's not make it worse. I do think that a common type for TLSConfig might be useful, however. That type can then be used in the CRD document (which is what the user sees).

@danehans
Copy link
Contributor Author

danehans commented Feb 18, 2020

I'm not sure if I agree or not, I need to think more about it, just want to check I've understood correctly.

Yes, you understand it correctly. Simply put, 1) a Gateway should control TLS termination and 2) if the TLS connection is terminated at the Gateway, it should be where the TLS config is provided. With that said, I can see how this may be confusing. Users would need to think about where the TLS connection is terminated and then configure the corresponding resource.

I agree with your perspective about adding a new resource for TLS config. Every time I consider the idea, I tell myself the same thing. On the other hand, it's something we should consider if it provides the necessary flexibility for managing the TLS config variations.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 24, 2020
@hbagdi
Copy link
Contributor

hbagdi commented Jun 24, 2020

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 24, 2020
@hbagdi
Copy link
Contributor

hbagdi commented Jun 24, 2020

I've written up a proposal to support TLS feature requests that we have seen so far: https://docs.google.com/document/d/15fkzMrhN_7tA-i2mHKwZpqcjN1o2Pe9Am9Qt828x1lo/edit.

Your feedback is welcome and greatly appreciated!

@hbagdi
Copy link
Contributor

hbagdi commented Aug 27, 2020

@danehans @robscott @jpeach @bowei Has this issue served its purpose and can it be closed now?

@szuecs
Copy link

szuecs commented Sep 2, 2020

I have another point that was not discussed here, but should be.

As cluster operator I want to set a tls default configuration for gateways.
As developer I have to be able to overwrite the default decision to support my legacy service.

We normally set the strongest as default and let developers weaken in case it’s required for their application for whatever reason.

I think the problem I see is that the split between between gateway and route makes these kind of patterns almost impossible.

@jpeach
Copy link
Contributor

jpeach commented Sep 4, 2020

I have another point that was not discussed here, but should be.

As cluster operator I want to set a tls default configuration for gateways.
As developer I have to be able to overwrite the default decision to support my legacy service.

For some sites, many of these cases can be implemented by admission controllers such as Gatekeeper. There are some policy mechanisms in the API, but it's not going to be able to capture all the policy needs.

@jpeach
Copy link
Contributor

jpeach commented Sep 4, 2020

@danehans @robscott @jpeach @bowei Has this issue served its purpose and can it be closed now?

I think that we need some editorial to put the available options into context.

@szuecs
Copy link

szuecs commented Sep 4, 2020

For me default would be a flag to a controller that can be overridden by configuration. We use this pattern in all our controllers successfully since years. Especially in kube-ingress-aws-controller that creates cloud load balancers this pattern is used for all possible configurations.

@hbagdi
Copy link
Contributor

hbagdi commented Oct 29, 2020

I'm going to go ahead and close this issue as it has served the original purpose.
If there are bugs or feature requests or any other discussion that stems from here, please open a new issue.

/close

@k8s-ci-robot
Copy link
Contributor

@hbagdi: Closing this issue.

In response to this:

I'm going to go ahead and close this issue as it has served the original purpose.
If there are bugs or feature requests or any other discussion that stems from here, please open a new issue.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

9 participants