Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we sort routes produced by HTTPPRoxy #5773

Closed
davinci26 opened this issue Sep 21, 2023 · 14 comments
Closed

Should we sort routes produced by HTTPPRoxy #5773

davinci26 opened this issue Sep 21, 2023 · 14 comments
Labels
kind/decision Categorizes issue as a decision to be voted on. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@davinci26
Copy link
Contributor

Abstract

Since Envoy is greedy matching path routes, order is important. Contour decides to sort the routes in a way that is not really intuitive and can lead to suprises. In particular with the sorter.go we choose to sort the routing table based on heuristics. This approach can lead to confusion since sorting is low-level implementation primitive of Contour and most users will not know about.

Personas

There are two main personas in question are:

  • Cluster Administrator: A cluster administrator is responsible for providing core functionality to a k8s cluster. For the purposes of this document the cluster administrator installs Contour to the cluster and exposes to the internet. The administrator is responsible for the health of the Contour services.

  • ResourceOwner: Someone who manages an HTTPProxy resource inside the Kubernetes cluster configuring the Route/Service combination to allow users of the application to access. A resource owner might not be familiar with Envoy, doesn't have access to the Envoy admin interface and has surface level understanding of how the the HTPProxy Routes are enforced.

Problem

Imagine the following scenario where you have an HTTPProxy that looks like the following

apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: foo
spec:
  ingressClassName: foo
  - authPolicy:
      disabled: true
    conditions:
    - regex: /(?i).foo2.*
    - name: service-foo
      port: 80
  - authPolicy:
      disabled: true
    conditions:
    - regex: /(?i).foo2bar.*
    services:
    - name: service-bar
      port: 80

The HTTPProxy above will generate a route where the route matching /(?i).foo2bar.* will be before the route matching /(?i).foo2.*. This is due to the sorter.go implementation. The behaviour of the sorter is problematic for the following cases:

  • HTTPProxy is written/maintained/debugged by ResourceOwner which are not familiar with the internals of Contour and the sorting. If a routing rule doesn't behave as expected the Cluster Administrator (or some tooling) would need to verify that the order is correct and there is no other issue.
  • Both Cluster Administrator and ResourceOwner have a hard time reasoning about the order through pure code inspection. This makes debugging issues more complicated.
  • It is hard to write tooling/abstractions on top of HTTPProxy without leaking this implementation detail to the users.

In more generic terms the sorter is too magical and it can work in unexpected ways.

What do you want the project to decide?

It is the opinion of the author that we should not be sorting the routes provided in HTTProxy. Instead we should create the Envoy routes in the exact same order that we get them from the user for each Virtual Host.

See draft PR to get a sense of the code change: #5772

@davinci26 davinci26 added lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. kind/decision Categorizes issue as a decision to be voted on. labels Sep 21, 2023
@davinci26
Copy link
Contributor Author

cc @projectcontour/maintainers

@davinci26
Copy link
Contributor Author

cc @clayton-gonsalves

@sunjayBhatia
Copy link
Member

sunjayBhatia commented Sep 21, 2023

In general with this topic we have to consider how route matching with includes present should work. Contour has attempted to favor consistency (i.e. sorting route matches) to prevent surprises when routes/included come and go.

This has meant that we aim to sort route matches on specificity, ordered by (for paths considered first, and then headers, query params, etc.)

  • exact matches
  • regex matches
  • prefix matches

@sunjayBhatia
Copy link
Member

sunjayBhatia commented Sep 21, 2023

Also at least with this example, if the order is kept the same/the shorter regex is sorted first somehow, i.e. the resulting order of routes matches is /(?i).foo2.* then /(?i).foo2bar.*, isn't it impossible for anything to match /(?i).foo2bar.*? Requests with paths like /afoo2bar or /afoo2barbaz will be only ever be matched on the first route

Just looking at the example, I would expect /(?i).foo2bar.* to take precedence because it is a more specific match, it happens to be the longer regex which is what we've so far used as a guess as to measure specificity in the route match

@davinci26
Copy link
Contributor Author

davinci26 commented Sep 21, 2023

In general with this topic we have to consider how route matching with includes present should work

Currently what the tests do is do a depth-first-approach such that includes go before the root routes. I think that should be fairly deterministic

@davinci26
Copy link
Contributor Author

davinci26 commented Sep 21, 2023

Also at least with this example, if the order is kept the same/the shorter regex is sorted first somehow,

Yeah the order here is maintained but there are other things in the sorter such as the presence of additional headers etc that also come into play. Maybe the example was not perfect but longest regex is only a heuristic for most specific rules and when the heuristic is wrong it can be a pain to debug without deep understanding of how contour works and for me that is the root of the problem.

Point taken: I will create a better example that has the same issue. I should be able to get one, I believe.

It is hard to socialize and communicate this sorting to people outside of the team that interacts with contour, yet HTTPProxy (or abstractions on top of it) are written by application developers that control routing.

@clayton-gonsalves
Copy link
Contributor

I believe that having the route sorter is a great QOL feature. However, with the includes and complex regex on the headers, params and paths, it can become hard to debug and reason about for service owners who may not necessarily have access to the generated envoy config.

As Contour has in the past, I think it would be great if we could grant cluster admins a flag to toggle this behaviour via the global config

@sunjayBhatia
Copy link
Member

To clear up some of this ambiguity we should definitely document more detail about how the default route sorting/precedence behavior works, a good spot to start adding to would be here: https://projectcontour.io/docs/1.26/config/request-routing/#conditions.

Gateway API has a pretty specific matching precedence order defined here which came out of Contour and other ingress controller implementations: https://gateway-api.sigs.k8s.io/references/spec/#gateway.networking.k8s.io%2fv1beta1.HTTPRouteRule. These rules are backed up by conformance tests as well.

K8s Ingress v1 conformance tests were quite light in this regard, there is some specification on matching precedence, but much is left up to the implementer.

If we do add the suggested feature however, it will likely be an outlier/only apply to HTTPProxy, since Gateway API especially has a defined spec we must conform to. This could mean migrating from HTTPProxy to Gateway API could become problematic in the future for some users, a consideration to take into account.

Regardless, I think to consider this change we should write up a full design document with examples etc. since that will force us to tease out and make specific all the ambiguities in behavior that could arise. Especially with regard to how includes work, this seems like it relates to #5045 (if routes are no longer sorted on specificity, ordering and conflict/duplicate resolution becomes even more important IMO).

@sunjayBhatia
Copy link
Member

added #5774

@davinci26
Copy link
Contributor Author

davinci26 commented Sep 25, 2023

I went through our routing corpus and the PR #5752 is working as expecting for our existing rules. So this puts quite a bit off pressure off from us and we can discuss a better solution.

For the next steps let me do some homework:

  1. Find some usecases where the routing with sorting is a bit intuitive. Where does the heuristic not work so well?
  2. Read up on the conformance for API gateway
  3. Figure out if we can build an abstraction internally that makes it easier for us to expose this implementation detail to ResourceOwner

If we do add the suggested feature however, it will likely be an outlier/only apply to HTTPProxy, since Gateway API especially has a defined spec we must conform to. This could mean migrating from HTTPProxy to Gateway API could become problematic in the future for some users, a consideration to take into account.

Yeah that is correct, this would need to be implemented only for the HTTPProxy

And from there I will make a design doc that works similar to what Clayton has suggested: As Contour has in the past, I think it would be great if we could grant cluster admins a flag to toggle this behaviour via the global config

@davinci26
Copy link
Contributor Author

@sunjayBhatia I think I found the one case where this heuristic doesn't work well:

func TestSortRoutesLongestHeaders(t *testing.T) {
	want := []*dag.Route{
		{
                          // Goes to svcA always
			PathMatchCondition: matchPrefixString("/"),
			HeaderMatchConditions: []dag.HeaderMatchCondition{
				exactHeader("important-experiment", "this-should-go-to-dedicated-service"),
			},
		},
		// Goes to svc-foo
		{
			PathMatchCondition: matchExact("/foo"),
		},
                // Goes to svc-default
		{
			PathMatchCondition: matchExact("/"),
		},
	}
	shuffleAndCheckSort(t, want)
}

Consider that you have a special header that you want to use to basically "force" a specific backend and that fails because exact patch takes precedent.

Should we have narrow fix extending the heuristic in some way to address this usecase?

@sunjayBhatia
Copy link
Member

sunjayBhatia commented Oct 25, 2023

For that case above, given the existing sorting/route matching semantics, you would have to add an additional route match on /foo and the header important-experiment that goes to svcA to get the desired behavior. This new route would "override" the route match only on /foo and give a more specific one with the addition of the header match.

result would be:

apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: example
spec:
  virtualhost:
    fqdn: example.com
  routes:
    - conditions:
      - prefix: /
      - header:
          exact: this-should-go-to-dedicated-service
          name: important-experiment
      services:
        - name: svcA
          port: 80
    - conditions: # new route match
      - prefix: /foo
      - header:
          exact: this-should-go-to-dedicated-service
          name: important-experiment
      services:
        - name: svcA
          port: 80
    - conditions:
      - prefix: /foo
      services:
        - name: svc-foo
          port: 80
    - conditions:
      - prefix: /
      services:
        - name: svc-default
          port: 80

Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2023
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/decision Categorizes issue as a decision to be voted on. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants