Requests can be misrouted due to HTTP/2 Connection Coalescing under certain scenarios #1493

alexbrand · 2019-09-12T14:56:53Z

tl;dr: Browsers reuse connections when using HTTP/2. With certain IngressRoute configurations (Envoy configurations), requests can be routed to the wrong backend or produce puzzling 404s.

We believe the root cause to be a bug in Envoy that has an open issue here: envoyproxy/envoy#6767

What we observed

We noticed that requests destined for an application (let's call it foo) were being routed to another application (let's call it bar).

Interestingly, the misrouting would only happen if we accessed bar before trying to reach foo. That is, if we quit the browser, opened it again, and accessed foo without first accessing bar, we would get routed to foo as expected. Once we accessed bar, all future requests to foo would get misrouted to bar.

In addition, we noticed that the misrouting only happened when using Chrome as the client. Requests produced by Firefox and cURL were being routed properly.

The deployment configuration

The environment has a wildcard DNS that resolves to the IP addresses of ingress nodes (e.g. *.platform.us-east.example.com)
Application foo is exposed via foo.platform.us-east.example.com using a regular IngressRoute with TLS enabled. The certificate used for TLS is a wildcard cert for *.platform.us-east.example.com.
Application bar is exposed using a tcpproxy IngressRoute with TLS passthrough enabled. The certificate used for TLS is a wildcard cert for *.platform.us-east.example.com.

Our understanding of the issue

Based on envoyproxy/envoy#6767 (comment), we believe the problem is caused due to the connection coalescing that happens in HTTP/2.

Once the browser opens the connection to bar.platform.us-east.example.com, it latches that connection to the Envoy listener/filter pair that is responsible for serving bar. Future requests to other applications on the same wildcard DNS entry (e.g. foo.platform.us-east.example.com) are sent over the same connection to the same Envoy listener/filter pair.

Depending on which application is accessed first, we see different behavior:

App exposed over TCP Proxy with TLS passthrough is accessed first

Open fresh browser process
Browse to https://bar.platform.us-east.example.com
Browser opens a connection to Envoy's ingress_https listener on the envoy.tcp_proxy filter.
Request routed successfully
Browse to https://foo.platform.us-east.example.com
Browser re-uses the connection to Envoy's ingress_https listener on the envoy.tcp_proxy filter.
Request is misrouted to bar

App exposed over HTTPS IngressRoute is accessed first

Open fresh browser process
Browse to https://foo.platform.us-east.example.com
Browser opens a connection to Envoy's ingress_https listener on the envoy.http_connection_manager filter.
Request routed successfully
Browse to https://bar.platform.us-east.example.com
Browser re-uses the connection to Envoy's ingress_https listener on the envoy.http_connection_manager filter.
Request results in a 404 presented by Envoy. We believe the 404 is returned because an entry for bar is missing from the list of server names in the envoy.http_connection_manager filter chain.

Repro steps

I've put together a set of Kubernetes deployment manifests that can be used to reproduce this issue with Contour. The apps are based on this comment (istio/istio#13589 (comment)).

The apps are:

index-application.yaml: An nginx container listening for HTTPS connections. The TLS certificates are mounted into the container via a volume. Exposed using an IngressRoute with TLS passthrough.
png-serving-application.yaml: An nginx container listening for HTTP connections. An init container downloads a png into an emptyDir volume. The emptyDir volume is served by nginx. Exposed using a TLS-enabled IngressRoute.

Repro steps:

Install contour

kubectl apply -f https://raw.githubusercontent.com/heptio/contour/v0.15.0/examples/render/daemonset-rbac.yaml

Apply application manifests

# Index application
kubectl apply -f https://gist.githubusercontent.com/alexbrand/3f6e41afc04af902879f674604b6ee5e/raw/09ea42338e6b31d64370241ad0394e9ff7ff944a/index-application.yaml

# PNG serving application
kubectl apply -f https://gist.githubusercontent.com/alexbrand/3f6e41afc04af902879f674604b6ee5e/raw/09ea42338e6b31d64370241ad0394e9ff7ff944a/png-serving-application.yaml

Open a tunnel to Envoy using kubectl (might need to use sudo -E to bind port 443)
```
kubectl -n heptio-contour port-forward svc/contour 443
```

Verify both IngressRoutes are accessible with curl (Look for 200 response):

curl -I -k https://app.127.0.0.1.nip.io
curl -I -k https://image.127.0.0.1.nip.io/images/image.png

Browse to https://app.127.0.0.1.nip.io. Notice that the image does not load. If you open the browser's developer tools, you can see that the request for the image results in a 404.
Browse to https://image.127.0.0.1.nip.io. Notice that the request is served by the index application.
Open an incognito window and go to https://image.127.0.0.1.nip.io/images/image.png. Notice that the image is served as expected.
Browse to https://app.127.0.0.1.nip.io in the incognito window. Notice that you get a 404 from Envoy.

The text was updated successfully, but these errors were encountered:

davecheney · 2019-09-13T06:24:58Z

Thank you for the report. After investigation I'm not sure what steps we can take to address this.

For the case where requests for bar arrive at Envoy's http router, we don't have a route for bar so 404 is the correct answer -- we do not have any http routes for bar. The fact that bar is being served directly from a k8s service which we're fowarding tcp traffic to isn't helpful because we don't know anything about the k8s service hosting bar.

A workaround could be to add a catch all http route for bar to return 421.

For the case where requests for foo land on the k8s service for bar there is even less we can do. This traffic is in tcp mode, we don't even intercept with the tls handshake apart from the inital SNI routing and at this point the connection is established and Envoy doesn't know the Host: foo request has been proxied to bar. The misrouting of foo's traffic when it arrives at bar sounds like bar is not looking at the incoming Host: header on the request.

alexbrand · 2019-09-13T12:45:25Z

Thanks Dave. Your analysis lines up with my understanding described in the issue.

For the case where requests for bar arrive at Envoy's http router, we don't have a route for bar so 404 is the correct answer.

^ My understanding is that Envoy should be responding with a 421 in this case, but it does not do that today due to envoyproxy/envoy#6767

As you say, the TCP scenario seems even worse. I am not sure if Envoy itself can do anything in this case.

davecheney · 2019-09-14T01:05:05Z

How could envoy respond 421? Bar is not a hostname registered with the http manager. More correctly there is not virtualhost record for bar in the RDS tables.

alexbrand · 2019-09-16T12:37:29Z

After reading through envoyproxy/envoy#6767 (comment), my understanding is that Envoy would return a 421 if bar is registered with another listener. I could be misunderstanding, of course.

alexbrand · 2019-09-16T18:36:46Z

This table summarizes the behavior that occurs when mixing L7 IngressRoutes and TCP Proxy IngressRoutes, assuming wildcard certificates and a wildcard DNS entry is in play.

The table shows the results of what happens when two requests are made to separate IngressRoutes. The first request is made to the IngressRoute on the left-most column, while the second request is made to the IngressRoute in the top row.

	L7 IngressRoute	TCP Proxy w/ TLS Passthrough	TCP Proxy no TLS Passthrough
L7 IngressRoute	GOOD	(1) BAD: Both requests go to the L7 IR	(2) BAD: Both requests go to the L7 IR
TCP Proxy w/ TLS Passthrough	(3) BAD: Both requests go to TCPProxy IngressRoute	(4) BAD: Both requests go to first IR	(5) BAD: Both requests go to first IR
TCP Proxy no TLS Passthrough	GOOD	GOOD	GOOD

(1) L7 IngressRoute + TCP Proxy with TLS passthrough

The initial connection is opened to the L7 IngressRoute. Requests to the TCPProxy Ingressroute will fail with a 404.

(2) L7 IngressRoute + TCP Proxy without TLS passthrough

The initial connection is opened to the L7 IngressRoute. Requests to the TCPProxy Ingressroute will fail with a 404.

(3) TCP Proxy w/ TLS Passthrough + L7 IngressRoute

The initial connection is opened to the TCP Proxy IngressRoute. Requests to the L7 IngressRoute are misrouted to the TCP Proxy IngressRoute.

(4) Two TCP Proxy w/ TLS Passthrough

The initial connection is opened to TCP Proxy IngressRoute 1. Requests to TCP Proxy IngressRoute 2 are misrouted to TCP Proxy IngressRoute 1.

(5) TCP Proxy w/ TLS Passthrough + TCP Proxy without TLS Passthrough

The initial connection is opeend to the TCP Proxy IngressRoute with TLS passthrough. Requests to the other TCP Proxy IngressRoute are misrouted to the first TCP Proxy IngressRoute.

davecheney · 2019-09-20T04:16:34Z

Assigning to 0.15.1 for tracking purposes.

davecheney · 2019-09-22T23:47:19Z

Hello,

I want to give an update for the watchers of this ticket. The short version is my interpretation of this issue has not changed since #1493 (comment).

The longer version is there is a possibility of applying a partial workaround in Envoy but I am wary of promoting this as a solution as its effectiveness is entirely dependant on the percentage of virtual hosts handled in http mode vs the percentage in tcp proxy mode. The higher the latter, the less effective this workaround will be. Unless all connections are handled by envoy in http mode, this issue will be present.

What I am proposing is

We can program Envoy to respond with a 421 to http requests which are handled by virtual hosts in tcpproxy mode. This should cause connections which were intended to a tcp proxy virtual host that are misdirected by the browser to receive a 421 client error response code and reconnect as per https://tools.ietf.org/html/rfc7540#section-9.1.2
We can publish an advisory document describing how HTTP/2 connection reuse can affect Contour users.

I don't see a way that we can address connections which were intended for envoy in http mode but are misdirected by the browser to an established connection in tcpproxy mode because Envoy is not able to interdict those requests because they are not in HTTP mode, don't go through Envoy's HTTP connection manager, and in the case where TCP passthrough is used, are still encrypted into and out of Envoy.

jpeach · 2020-04-28T04:15:51Z

Raising priority, since the 1.4 release means that all users of wildcard certificates will be affected by this problem.

youngnick · 2020-04-28T21:46:15Z

Probably should link in envoyproxy/envoy#6767 here as well, as it may help. Oops, it was linked in the initial comment, but this should ensure it's top-of-mind.

TLS routes are specialized to a unique virtual hostname. However, if wildcard certificates are being used, browsers will aggressively coalesce and reuse server connections even when the full origin hostname doesn't match. This results on 404 responses because each TLS virtual host only has routes for one host. We can avoid this behaviour bleeding out to users by generating a 421 Misdirected Request response if the request authority doesn't match the FQDN of the virtual host. In this case, the browser is supposed to understand that the request wasn't processed and re-send it on a new connection. This fixes projectcontour#1493. Signed-off-by: James Peach <jpeach@vmware.com>

alexbrand changed the title ~~Surprising behavior due to HTTP/2 Connection Coalescing~~ Requests can be misrouted due to HTTP/2 Connection Coalescing Sep 12, 2019

alexbrand changed the title ~~Requests can be misrouted due to HTTP/2 Connection Coalescing~~ Requests can be misrouted due to HTTP/2 Connection Coalescing under certain scenarios Sep 12, 2019

davecheney added blocked Blocked waiting on a dependency priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Sep 12, 2019

davecheney added this to the 1.0.0-beta.1 milestone Sep 12, 2019

davecheney self-assigned this Sep 12, 2019

davecheney removed the blocked Blocked waiting on a dependency label Sep 13, 2019

davecheney added the blocked/needs-info Categorizes the issue or PR as blocked because there is insufficient information to advance it. label Sep 13, 2019

davecheney removed this from the 1.0.0-beta.1 milestone Sep 13, 2019

davecheney removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 17, 2019

davecheney added this to the Backlog milestone Sep 17, 2019

davecheney removed the blocked/needs-info Categorizes the issue or PR as blocked because there is insufficient information to advance it. label Sep 20, 2019

davecheney modified the milestones: Backlog, 0.15.1 Sep 20, 2019

davecheney modified the milestones: 0.15.1, Backlog Sep 30, 2019

shabx added the ZD3650 label Oct 10, 2019

davecheney removed their assignment Mar 10, 2020

lmickh mentioned this issue Apr 27, 2020

Possible envoy regression causes HTTP 404 #2468

Closed

jpeach self-assigned this Apr 28, 2020

jpeach added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 28, 2020

jpeach added the blocked Blocked waiting on a dependency label Apr 28, 2020

jpeach modified the milestones: Backlog, 1.5.0 Apr 28, 2020

jpeach mentioned this issue Apr 29, 2020

internal: filter misdirected TLS requests #2483

Merged

jpeach closed this as completed in e596934 May 22, 2020

jpeach mentioned this issue Jun 15, 2020

Envoy misdirecting valid URLs when Envoy HTTP/S default ports are not used #2568

Closed

primeroz mentioned this issue Jun 22, 2020

Add support to disable HTTP2. #2619

Closed

youngnick mentioned this issue Dec 17, 2020

Get upstream_connection_options configurable to do workaround for the flaky behavior of envoy #3214

Closed

youngnick mentioned this issue Apr 28, 2021

Interaction between TLS config on HTTPRoute and Gateway kubernetes-sigs/gateway-api#577

Closed

desimone mentioned this issue Apr 28, 2021

safari does not respect 421 status for HTTP/2 connection reuse / coalescing pomerium/pomerium#2150

Closed

keithhand mentioned this issue Jan 25, 2023

Refresh sometimes 404s when using Contour/Envoy kubecost/cost-analyzer-helm-chart#1923

Closed

sunjayBhatia mentioned this issue Apr 3, 2023

Wildcard certificate with HTTP/2 causes 421 Misdirected Request on cross-host connection reuse #5240

Closed

m-yosefpor mentioned this issue Oct 11, 2023

Add Per-HTTPProxy HTTP-Version Support to Address HTTP2 Coalescing Issues with Wildcards #5822

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requests can be misrouted due to HTTP/2 Connection Coalescing under certain scenarios #1493

Requests can be misrouted due to HTTP/2 Connection Coalescing under certain scenarios #1493

alexbrand commented Sep 12, 2019 •

edited

Loading

davecheney commented Sep 13, 2019

alexbrand commented Sep 13, 2019

davecheney commented Sep 14, 2019 •

edited

Loading

alexbrand commented Sep 16, 2019 •

edited

Loading

alexbrand commented Sep 16, 2019 •

edited

Loading

davecheney commented Sep 20, 2019

davecheney commented Sep 22, 2019

jpeach commented Apr 28, 2020

youngnick commented Apr 28, 2020 •

edited

Loading

Requests can be misrouted due to HTTP/2 Connection Coalescing under certain scenarios #1493

Requests can be misrouted due to HTTP/2 Connection Coalescing under certain scenarios #1493

Comments

alexbrand commented Sep 12, 2019 • edited Loading

What we observed

The deployment configuration

Our understanding of the issue

App exposed over TCP Proxy with TLS passthrough is accessed first

App exposed over HTTPS IngressRoute is accessed first

Repro steps

davecheney commented Sep 13, 2019

alexbrand commented Sep 13, 2019

davecheney commented Sep 14, 2019 • edited Loading

alexbrand commented Sep 16, 2019 • edited Loading

alexbrand commented Sep 16, 2019 • edited Loading

(1) L7 IngressRoute + TCP Proxy with TLS passthrough

(2) L7 IngressRoute + TCP Proxy without TLS passthrough

(3) TCP Proxy w/ TLS Passthrough + L7 IngressRoute

(4) Two TCP Proxy w/ TLS Passthrough

(5) TCP Proxy w/ TLS Passthrough + TCP Proxy without TLS Passthrough

davecheney commented Sep 20, 2019

davecheney commented Sep 22, 2019

jpeach commented Apr 28, 2020

youngnick commented Apr 28, 2020 • edited Loading

alexbrand commented Sep 12, 2019 •

edited

Loading

davecheney commented Sep 14, 2019 •

edited

Loading

alexbrand commented Sep 16, 2019 •

edited

Loading

alexbrand commented Sep 16, 2019 •

edited

Loading

youngnick commented Apr 28, 2020 •

edited

Loading