Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests can be misrouted due to HTTP/2 Connection Coalescing under certain scenarios #1493

Closed
alexbrand opened this issue Sep 12, 2019 · 9 comments
Assignees
Labels
blocked Blocked waiting on a dependency priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@alexbrand
Copy link
Contributor

alexbrand commented Sep 12, 2019

tl;dr: Browsers reuse connections when using HTTP/2. With certain IngressRoute configurations (Envoy configurations), requests can be routed to the wrong backend or produce puzzling 404s.

We believe the root cause to be a bug in Envoy that has an open issue here: envoyproxy/envoy#6767

What we observed

We noticed that requests destined for an application (let's call it foo) were being routed to another application (let's call it bar).

Interestingly, the misrouting would only happen if we accessed bar before trying to reach foo. That is, if we quit the browser, opened it again, and accessed foo without first accessing bar, we would get routed to foo as expected. Once we accessed bar, all future requests to foo would get misrouted to bar.

In addition, we noticed that the misrouting only happened when using Chrome as the client. Requests produced by Firefox and cURL were being routed properly.

The deployment configuration

  • The environment has a wildcard DNS that resolves to the IP addresses of ingress nodes (e.g. *.platform.us-east.example.com)
  • Application foo is exposed via foo.platform.us-east.example.com using a regular IngressRoute with TLS enabled. The certificate used for TLS is a wildcard cert for *.platform.us-east.example.com.
  • Application bar is exposed using a tcpproxy IngressRoute with TLS passthrough enabled. The certificate used for TLS is a wildcard cert for *.platform.us-east.example.com.

Our understanding of the issue

Based on envoyproxy/envoy#6767 (comment), we believe the problem is caused due to the connection coalescing that happens in HTTP/2.

Once the browser opens the connection to bar.platform.us-east.example.com, it latches that connection to the Envoy listener/filter pair that is responsible for serving bar. Future requests to other applications on the same wildcard DNS entry (e.g. foo.platform.us-east.example.com) are sent over the same connection to the same Envoy listener/filter pair.

Depending on which application is accessed first, we see different behavior:

App exposed over TCP Proxy with TLS passthrough is accessed first

  1. Open fresh browser process
  2. Browse to https://bar.platform.us-east.example.com
  3. Browser opens a connection to Envoy's ingress_https listener on the envoy.tcp_proxy filter.
  4. Request routed successfully
  5. Browse to https://foo.platform.us-east.example.com
  6. Browser re-uses the connection to Envoy's ingress_https listener on the envoy.tcp_proxy filter.
  7. Request is misrouted to bar

App exposed over HTTPS IngressRoute is accessed first

  1. Open fresh browser process
  2. Browse to https://foo.platform.us-east.example.com
  3. Browser opens a connection to Envoy's ingress_https listener on the envoy.http_connection_manager filter.
  4. Request routed successfully
  5. Browse to https://bar.platform.us-east.example.com
  6. Browser re-uses the connection to Envoy's ingress_https listener on the envoy.http_connection_manager filter.
  7. Request results in a 404 presented by Envoy. We believe the 404 is returned because an entry for bar is missing from the list of server names in the envoy.http_connection_manager filter chain.

Repro steps

I've put together a set of Kubernetes deployment manifests that can be used to reproduce this issue with Contour. The apps are based on this comment (istio/istio#13589 (comment)).

The apps are:

  • index-application.yaml: An nginx container listening for HTTPS connections. The TLS certificates are mounted into the container via a volume. Exposed using an IngressRoute with TLS passthrough.
  • png-serving-application.yaml: An nginx container listening for HTTP connections. An init container downloads a png into an emptyDir volume. The emptyDir volume is served by nginx. Exposed using a TLS-enabled IngressRoute.

Repro steps:

  1. Install contour

    kubectl apply -f https://raw.githubusercontent.com/heptio/contour/v0.15.0/examples/render/daemonset-rbac.yaml
    
  2. Apply application manifests

    # Index application
    kubectl apply -f https://gist.githubusercontent.com/alexbrand/3f6e41afc04af902879f674604b6ee5e/raw/09ea42338e6b31d64370241ad0394e9ff7ff944a/index-application.yaml
    
    # PNG serving application
    kubectl apply -f https://gist.githubusercontent.com/alexbrand/3f6e41afc04af902879f674604b6ee5e/raw/09ea42338e6b31d64370241ad0394e9ff7ff944a/png-serving-application.yaml
    
  3. Open a tunnel to Envoy using kubectl (might need to use sudo -E to bind port 443)

    kubectl -n heptio-contour port-forward svc/contour 443
    
  4. Verify both IngressRoutes are accessible with curl (Look for 200 response):

    curl -I -k https://app.127.0.0.1.nip.io
    curl -I -k https://image.127.0.0.1.nip.io/images/image.png
    
  5. Browse to https://app.127.0.0.1.nip.io. Notice that the image does not load. If you open the browser's developer tools, you can see that the request for the image results in a 404.

  6. Browse to https://image.127.0.0.1.nip.io. Notice that the request is served by the index application.

  7. Open an incognito window and go to https://image.127.0.0.1.nip.io/images/image.png. Notice that the image is served as expected.

  8. Browse to https://app.127.0.0.1.nip.io in the incognito window. Notice that you get a 404 from Envoy.

@alexbrand alexbrand changed the title Surprising behavior due to HTTP/2 Connection Coalescing Requests can be misrouted due to HTTP/2 Connection Coalescing Sep 12, 2019
@alexbrand alexbrand changed the title Requests can be misrouted due to HTTP/2 Connection Coalescing Requests can be misrouted due to HTTP/2 Connection Coalescing under certain scenarios Sep 12, 2019
@davecheney davecheney added blocked Blocked waiting on a dependency priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Sep 12, 2019
@davecheney davecheney added this to the 1.0.0-beta.1 milestone Sep 12, 2019
@davecheney davecheney self-assigned this Sep 12, 2019
@davecheney davecheney removed the blocked Blocked waiting on a dependency label Sep 13, 2019
@davecheney
Copy link
Contributor

Thank you for the report. After investigation I'm not sure what steps we can take to address this.

For the case where requests for bar arrive at Envoy's http router, we don't have a route for bar so 404 is the correct answer -- we do not have any http routes for bar. The fact that bar is being served directly from a k8s service which we're fowarding tcp traffic to isn't helpful because we don't know anything about the k8s service hosting bar.

A workaround could be to add a catch all http route for bar to return 421.

For the case where requests for foo land on the k8s service for bar there is even less we can do. This traffic is in tcp mode, we don't even intercept with the tls handshake apart from the inital SNI routing and at this point the connection is established and Envoy doesn't know the Host: foo request has been proxied to bar. The misrouting of foo's traffic when it arrives at bar sounds like bar is not looking at the incoming Host: header on the request.

@davecheney davecheney added the blocked/needs-info Categorizes the issue or PR as blocked because there is insufficient information to advance it. label Sep 13, 2019
@davecheney davecheney removed this from the 1.0.0-beta.1 milestone Sep 13, 2019
@alexbrand
Copy link
Contributor Author

Thanks Dave. Your analysis lines up with my understanding described in the issue.

For the case where requests for bar arrive at Envoy's http router, we don't have a route for bar so 404 is the correct answer.

^ My understanding is that Envoy should be responding with a 421 in this case, but it does not do that today due to envoyproxy/envoy#6767

As you say, the TCP scenario seems even worse. I am not sure if Envoy itself can do anything in this case.

@davecheney
Copy link
Contributor

davecheney commented Sep 14, 2019

How could envoy respond 421? Bar is not a hostname registered with the http manager. More correctly there is not virtualhost record for bar in the RDS tables.

@alexbrand
Copy link
Contributor Author

alexbrand commented Sep 16, 2019

After reading through envoyproxy/envoy#6767 (comment), my understanding is that Envoy would return a 421 if bar is registered with another listener. I could be misunderstanding, of course.

@alexbrand
Copy link
Contributor Author

alexbrand commented Sep 16, 2019

This table summarizes the behavior that occurs when mixing L7 IngressRoutes and TCP Proxy IngressRoutes, assuming wildcard certificates and a wildcard DNS entry is in play.

The table shows the results of what happens when two requests are made to separate IngressRoutes. The first request is made to the IngressRoute on the left-most column, while the second request is made to the IngressRoute in the top row.

L7 IngressRoute TCP Proxy w/ TLS Passthrough TCP Proxy no TLS Passthrough
L7 IngressRoute GOOD (1) BAD: Both requests go to the L7 IR (2) BAD: Both requests go to the L7 IR
TCP Proxy w/ TLS Passthrough (3) BAD: Both requests go to TCPProxy IngressRoute (4) BAD: Both requests go to first IR (5) BAD: Both requests go to first IR
TCP Proxy no TLS Passthrough GOOD GOOD GOOD

(1) L7 IngressRoute + TCP Proxy with TLS passthrough

The initial connection is opened to the L7 IngressRoute. Requests to the TCPProxy Ingressroute will fail with a 404.

(2) L7 IngressRoute + TCP Proxy without TLS passthrough

The initial connection is opened to the L7 IngressRoute. Requests to the TCPProxy Ingressroute will fail with a 404.

(3) TCP Proxy w/ TLS Passthrough + L7 IngressRoute

The initial connection is opened to the TCP Proxy IngressRoute. Requests to the L7 IngressRoute are misrouted to the TCP Proxy IngressRoute.

(4) Two TCP Proxy w/ TLS Passthrough

The initial connection is opened to TCP Proxy IngressRoute 1. Requests to TCP Proxy IngressRoute 2 are misrouted to TCP Proxy IngressRoute 1.

(5) TCP Proxy w/ TLS Passthrough + TCP Proxy without TLS Passthrough

The initial connection is opeend to the TCP Proxy IngressRoute with TLS passthrough. Requests to the other TCP Proxy IngressRoute are misrouted to the first TCP Proxy IngressRoute.

@davecheney davecheney removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 17, 2019
@davecheney davecheney added this to the Backlog milestone Sep 17, 2019
@davecheney davecheney removed the blocked/needs-info Categorizes the issue or PR as blocked because there is insufficient information to advance it. label Sep 20, 2019
@davecheney davecheney modified the milestones: Backlog, 0.15.1 Sep 20, 2019
@davecheney
Copy link
Contributor

Assigning to 0.15.1 for tracking purposes.

@davecheney
Copy link
Contributor

Hello,

I want to give an update for the watchers of this ticket. The short version is my interpretation of this issue has not changed since #1493 (comment).

The longer version is there is a possibility of applying a partial workaround in Envoy but I am wary of promoting this as a solution as its effectiveness is entirely dependant on the percentage of virtual hosts handled in http mode vs the percentage in tcp proxy mode. The higher the latter, the less effective this workaround will be. Unless all connections are handled by envoy in http mode, this issue will be present.

What I am proposing is

  1. We can program Envoy to respond with a 421 to http requests which are handled by virtual hosts in tcpproxy mode. This should cause connections which were intended to a tcp proxy virtual host that are misdirected by the browser to receive a 421 client error response code and reconnect as per https://tools.ietf.org/html/rfc7540#section-9.1.2
  2. We can publish an advisory document describing how HTTP/2 connection reuse can affect Contour users.

I don't see a way that we can address connections which were intended for envoy in http mode but are misdirected by the browser to an established connection in tcpproxy mode because Envoy is not able to interdict those requests because they are not in HTTP mode, don't go through Envoy's HTTP connection manager, and in the case where TCP passthrough is used, are still encrypted into and out of Envoy.

@davecheney davecheney modified the milestones: 0.15.1, Backlog Sep 30, 2019
@shabx shabx added the ZD3650 label Oct 10, 2019
@davecheney davecheney removed their assignment Mar 10, 2020
@jpeach jpeach self-assigned this Apr 28, 2020
@jpeach
Copy link
Contributor

jpeach commented Apr 28, 2020

Raising priority, since the 1.4 release means that all users of wildcard certificates will be affected by this problem.

@jpeach jpeach added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 28, 2020
@jpeach jpeach added the blocked Blocked waiting on a dependency label Apr 28, 2020
@jpeach jpeach modified the milestones: Backlog, 1.5.0 Apr 28, 2020
@youngnick
Copy link
Member

youngnick commented Apr 28, 2020

Probably should link in envoyproxy/envoy#6767 here as well, as it may help. Oops, it was linked in the initial comment, but this should ensure it's top-of-mind.

jpeach added a commit to jpeach/contour that referenced this issue May 11, 2020
TLS routes are specialized to a unique virtual hostname. However, if
wildcard certificates are being used, browsers will aggressively coalesce
and reuse server connections even when the full origin hostname doesn't
match. This results on 404 responses because each TLS virtual host only
has routes for one host.

We can avoid this behaviour bleeding out to users by generating a 421
Misdirected Request response if the request authority doesn't match
the FQDN of the virtual host. In this case, the browser is supposed
to understand that the request wasn't processed and re-send it on a
new connection.

This fixes projectcontour#1493.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue May 20, 2020
TLS routes are specialized to a unique virtual hostname. However, if
wildcard certificates are being used, browsers will aggressively coalesce
and reuse server connections even when the full origin hostname doesn't
match. This results on 404 responses because each TLS virtual host only
has routes for one host.

We can avoid this behaviour bleeding out to users by generating a 421
Misdirected Request response if the request authority doesn't match
the FQDN of the virtual host. In this case, the browser is supposed
to understand that the request wasn't processed and re-send it on a
new connection.

This fixes projectcontour#1493.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue May 20, 2020
TLS routes are specialized to a unique virtual hostname. However, if
wildcard certificates are being used, browsers will aggressively coalesce
and reuse server connections even when the full origin hostname doesn't
match. This results on 404 responses because each TLS virtual host only
has routes for one host.

We can avoid this behaviour bleeding out to users by generating a 421
Misdirected Request response if the request authority doesn't match
the FQDN of the virtual host. In this case, the browser is supposed
to understand that the request wasn't processed and re-send it on a
new connection.

This fixes projectcontour#1493.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue May 21, 2020
TLS routes are specialized to a unique virtual hostname. However, if
wildcard certificates are being used, browsers will aggressively coalesce
and reuse server connections even when the full origin hostname doesn't
match. This results on 404 responses because each TLS virtual host only
has routes for one host.

We can avoid this behaviour bleeding out to users by generating a 421
Misdirected Request response if the request authority doesn't match
the FQDN of the virtual host. In this case, the browser is supposed
to understand that the request wasn't processed and re-send it on a
new connection.

This fixes projectcontour#1493.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue May 22, 2020
TLS routes are specialized to a unique virtual hostname. However, if
wildcard certificates are being used, browsers will aggressively coalesce
and reuse server connections even when the full origin hostname doesn't
match. This results on 404 responses because each TLS virtual host only
has routes for one host.

We can avoid this behaviour bleeding out to users by generating a 421
Misdirected Request response if the request authority doesn't match
the FQDN of the virtual host. In this case, the browser is supposed
to understand that the request wasn't processed and re-send it on a
new connection.

This fixes projectcontour#1493.

Signed-off-by: James Peach <jpeach@vmware.com>
@jpeach jpeach closed this as completed in e596934 May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Blocked waiting on a dependency priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

5 participants