Skip to content

Conversation

@MasonM
Copy link
Member

@MasonM MasonM commented Jun 15, 2025

Fixes #14627.

Motivation

#14435 tried to fix the bug in the release-3.6 branch where requests failed with the following error:

Service Unavailable: connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434" on the UI

Unfortunately, this didn't work and broke the readiness probe, so it was reverted in 1285c11. The problem is the readiness probe makes a GET request to /:

readinessProbe:
httpGet:
port: 2746
scheme: HTTPS
path: /
initialDelaySeconds: 10
periodSeconds: 20

That request should be going to the HTTP server, and it does when using HTTP/1.1, since argoserver is configured to forward all HTTP/1.1 requests to the listener for the HTTP server:

httpL := tcpm.Match(cmux.HTTP1Fast())
grpcL := tcpm.Match(cmux.Any())

but with ALPN enabled, the client (the kubelet in this case) automatically upgraded the request to HTTP/2, which caused it to be forwarded to the gRPC server instead.

The reason this wasn't caught in #14435 is because the Argo server isn't run inside Kubernetes in either the local dev environment or CI builds, so the readiness probe wasn't being used. Additionally, http://localhost:8080/ in the local dev environment points to the webpack-dev-server, which proxies requests to the Argo server, without ALPN:

proxy: [
{
context: ['/api/v1', '/artifact-files', '/artifacts', '/input-artifacts', '/artifacts-by-uid', '/input-artifacts-by-uid', '/oauth2'],
target: proxyTarget,
secure: false,
xfwd: true // add x-forwarded-* headers to simulate real-world reverse proxy servers
}
]

Modifications

Initially, I tried to fix this by changing cmux to only forward requests with Content-Type: application/grpc (which is guaranteed to be present per the gRPC over HTTP2 spec) to the gRPC server:
main...MasonM:argo-workflows:fix-http2-multiplexing

but that doesn't work with TLS enabled due to the following limitation mentioned at https://github.com/soheilhy/cmux/:

TLS: net/http uses a type assertion to identify TLS connections; since cmux's lookahead-implementing connection wraps the underlying TLS connection, this type assertion fails. Because of that, you can serve HTTPS using cmux but http.Request.TLS would not be set in your handlers.

Link to the code it's referencing: https://github.com/golang/go/blob/1e756dc5f73dc19eb1cbf038807d18ef1cc54ebc/src/net/http/server.go#L1959

Instead, this uses the solution proposed in grpc/grpc-go#555 (comment), which relies on the h2c package.

I had to partially revert #14588 to restore the changes in util/tls/tls.go that enabled ALPN. To be extra safe, I didn't remove the ENV GRPC_ENFORCE_ALPN_ENABLED=false workaround yet, though it should theoretically be okay to remove now.

Verification

I added basic tests to argo_server_test.go for the readiness probe, but that only tests when TLS is disabled. I wrote the following script to manually test with both HTTPS and HTTP:

#!/bin/bash

set -euo pipefail
BASEURL="$scheme://localhost:2746"
docurl() {
    printf "Testing %s with HTTP/1.1\n" "$*"
    curl -kv --http1.1 "$@" 2>&1 | egrep -i 'ALPN|^< HTTP|< Content-Type|< Grpc-Metadata-Argo-Version'

    printf "Testing %s with HTTP/2 with prior knowledge\n" "$*"
    curl -kv --http2-prior-knowledge "$@" 2>&1 | egrep -i 'ALPN|^< HTTP|< Content-Type|< Grpc-Metadata-Argo-Version'

    printf "Testing %s with ALPN\n" "$*"
    curl -kv --alpn "$@" 2>&1 | egrep -i 'ALPN|^< HTTP|< Content-Type|< Grpc-Metadata-Argo-Version'
}

printf 'Testing root path\n'
docurl "$BASEURL/"

printf '\nTesting grpc-gateway\n'
docurl "$BASEURL/api/v1/workflow-templates/argo"

printf '\nTesting grpc server\n'
docurl -XPOST -H 'Content-Type: application/grpc' "$BASEURL/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates"
Output with HTTP
$ scheme=http ./test_mux.sh 
Testing root path                                                                                                                                                                                                                                    
Testing http://localhost:2746/ with HTTP/1.1                                                                                                                                                                                                         
< HTTP/1.1 200 OK                                                                                                                                                                                                                                    
< Content-Type: text/html; charset=utf-8
Testing http://localhost:2746/ with HTTP/2 with prior knowledge
< HTTP/2 200                                                                                                              
< content-type: text/html; charset=utf-8
Testing http://localhost:2746/ with ALPN     
< HTTP/1.1 200 OK           
< Content-Type: text/html; charset=utf-8                 
                                                             
Testing grpc-gateway                    
Testing http://localhost:2746/api/v1/workflow-templates/argo with HTTP/1.1
< HTTP/1.1 200 OK     
< Content-Type: application/json
< Grpc-Metadata-Argo-Version: latest+19a02a0.dirty
Testing http://localhost:2746/api/v1/workflow-templates/argo with HTTP/2 with prior knowledge
< HTTP/2 200                             
< content-type: application/json
< grpc-metadata-argo-version: latest+19a02a0.dirty
Testing http://localhost:2746/api/v1/workflow-templates/argo with ALPN
< HTTP/1.1 200 OK                       
< Content-Type: application/json
< Grpc-Metadata-Argo-Version: latest+19a02a0.dirty
                                                                                                                          
Testing grpc server         
Testing -XPOST -H Content-Type: application/grpc http://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with HTTP/1.1
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
Testing -XPOST -H Content-Type: application/grpc http://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with HTTP/2 with prior knowledge
< HTTP/2 200                                                                                                              
< content-type: application/grpc
Testing -XPOST -H Content-Type: application/grpc http://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with ALPN
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
Output with HTTPS
$ scheme=https ./test_mux.sh
Testing root path
Testing https://localhost:2746/ with HTTP/1.1
* ALPN: curl offers http/1.1
* ALPN: server did not agree on a protocol. Uses default.
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
Testing https://localhost:2746/ with HTTP/2 with prior knowledge
* ALPN: curl offers h2
* ALPN: server accepted h2
< HTTP/2 200 
< content-type: text/html; charset=utf-8
Testing https://localhost:2746/ with ALPN
* ALPN: curl offers h2,http/1.1
* ALPN: server accepted h2
< HTTP/2 200 
< content-type: text/html; charset=utf-8

Testing grpc-gateway
Testing https://localhost:2746/api/v1/workflow-templates/argo with HTTP/1.1
* ALPN: curl offers http/1.1
* ALPN: server did not agree on a protocol. Uses default.
< HTTP/1.1 200 OK
< Content-Type: application/json
< Grpc-Metadata-Argo-Version: latest+19a02a0.dirty
Testing https://localhost:2746/api/v1/workflow-templates/argo with HTTP/2 with prior knowledge
* ALPN: curl offers h2
* ALPN: server accepted h2
< HTTP/2 200 
< content-type: application/json
< grpc-metadata-argo-version: latest+19a02a0.dirty
Testing https://localhost:2746/api/v1/workflow-templates/argo with ALPN
* ALPN: curl offers h2,http/1.1
* ALPN: server accepted h2
< HTTP/2 200 
< content-type: application/json
< grpc-metadata-argo-version: latest+19a02a0.dirty

Testing grpc server
Testing -XPOST -H Content-Type: application/grpc https://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with HTTP/1.1
* ALPN: curl offers http/1.1
* ALPN: server did not agree on a protocol. Uses default.
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
Testing -XPOST -H Content-Type: application/grpc https://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with HTTP/2 with prior knowledge
* ALPN: curl offers h2
* ALPN: server accepted h2
< HTTP/2 200 
< content-type: application/grpc
Testing -XPOST -H Content-Type: application/grpc https://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with ALPN
* ALPN: curl offers h2,http/1.1
* ALPN: server accepted h2
< HTTP/2 200 
< content-type: application/grpc

Documentation

N/A

argoproj#14435 tried to fix the
bug in the release-3.6 branch where requests failed with the following
error:
```
Service Unavailable: connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: grpc/grpc-go#434" on the UI
```

Unfortunately, this didn't work and broke the readiness probe, so it was
reverted in
argoproj@1285c11. The problem is the readiness probe makes a GET request to `/`: https://github.com/argoproj/argo-workflows/blob/1e2a87f2afdebbcd0e55069df5a945f5faca9d45/manifests/base/argo-server/argo-server-deployment.yaml#L30-L36

With ALPN enabled, that request was automatically upraded to HTTP/2,
which caused it to be forwarded to the gRPC server instead of the HTTP
server. Initially, I tried to fix this by changing cmux to only forward
requests with `Content-Type: application/grpc` (which is guaranteed to
be present per the [gRPC over HTTP2 spec](https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md))
to the gRPC server:
argoproj/argo-workflows@main...MasonM:argo-workflows:fix-http2-multiplexing

but that doesn't work with TLS enabled due to the following limitation
mentioned at https://github.com/soheilhy/cmux/:
> TLS: net/http uses a type assertion to identify TLS connections; since cmux's lookahead-implementing connection wraps the underlying TLS connection, this type assertion fails. Because of that, you can serve HTTPS using cmux but http.Request.TLS would not be set in your handlers.

Instead, this uses the solution proposed in grpc/grpc-go#555 (comment), which relies on the [h2c package](https://pkg.go.dev/golang.org/x/net/http2/h2c).

I added basic tests to `argo_server_test.go` for the readiness probe, but
that only tests when TLS is disabled. I wrote following script to
manually test with both HTTPS and HTTP:
```shell

set -euo pipefail
BASEURL="$scheme://localhost:2746"
docurl() {
    printf "Testing %s with HTTP/1.1\n" "$*"
    curl -kv --http1.1 "$@" 2>&1 | egrep -i 'ALPN|^< HTTP|< Content-Type|< Grpc-Metadata-Argo-Version'

    printf "Testing %s with ALPN\n" "$*"
    curl -kv --alpn "$@" 2>&1 | egrep -i 'ALPN|^< HTTP|< Content-Type|< Grpc-Metadata-Argo-Version'
}

printf 'Testing root path\n'
docurl "$BASEURL/"

printf '\nTesting grpc-gateway\n'
docurl "$BASEURL/api/v1/workflow-templates/argo"

printf '\nTesting grpc server\n'
docurl -XPOST -H 'Content-Type: application/grpc' "$BASEURL/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates"
```

<details>
<summary>Output with HTTP</summary>

```
 $ scheme=http ./test_mux.sh
Testing root path
Testing http://localhost:2746/ with HTTP/1.1
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
Testing http://localhost:2746/ with ALPN
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8

Testing grpc-gateway
Testing http://localhost:2746/api/v1/workflow-templates/argo with HTTP/1.1
< HTTP/1.1 200 OK
< Content-Type: application/json
< Grpc-Metadata-Argo-Version: latest+94f9148.dirty
Testing http://localhost:2746/api/v1/workflow-templates/argo with ALPN
< HTTP/1.1 200 OK
< Content-Type: application/json
< Grpc-Metadata-Argo-Version: latest+94f9148.dirty

Testing grpc server
Testing -XPOST -H Content-Type: application/grpc http://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with HTTP/1.1
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
Testing -XPOST -H Content-Type: application/grpc http://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with ALPN
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
```
</details>

<details>
<summary>Output with HTTPS</summary>

```
$ scheme=https ./test_mux.sh
Testing root path
Testing https://localhost:2746/ with HTTP/1.1
* ALPN, offering http/1.1
* ALPN, server did not agree to a protocol
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
Testing https://localhost:2746/ with ALPN
* ALPN, offering h2
* ALPN, offering http/1.1
* ALPN, server accepted to use h2
< HTTP/2 200
< content-type: text/html; charset=utf-8

Testing grpc-gateway
Testing https://localhost:2746/api/v1/workflow-templates/argo with HTTP/1.1
* ALPN, offering http/1.1
* ALPN, server did not agree to a protocol
< HTTP/1.1 200 OK
< Content-Type: application/json
< Grpc-Metadata-Argo-Version: latest+94f9148.dirty
Testing https://localhost:2746/api/v1/workflow-templates/argo with ALPN
* ALPN, offering h2
* ALPN, offering http/1.1
* ALPN, server accepted to use h2
< HTTP/2 200
< content-type: application/json
< grpc-metadata-argo-version: latest+94f9148.dirty

Testing grpc server
Testing -XPOST -H Content-Type: application/grpc https://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with HTTP/1.1
* ALPN, offering http/1.1
* ALPN, server did not agree to a protocol
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
Testing -XPOST -H Content-Type: application/grpc https://localhost:2746/workflowtemplate.WorkflowTemplateService/ListWorkflowTemplates with ALPN
* ALPN, offering h2
* ALPN, offering http/1.1
* ALPN, server accepted to use h2
< HTTP/2 200
< content-type: application/grpc
```
</details>

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
MasonM added 2 commits July 5, 2025 18:44
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
@MasonM MasonM changed the title fix: support ALPN and H2C with gRPC fix: support ALPN and H2C with gRPC. Fixes #14627 Jul 6, 2025
@MasonM MasonM marked this pull request as ready for review July 6, 2025 02:38
@MasonM MasonM requested review from Joibel and isubasinghe July 6, 2025 02:38
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
@MasonM
Copy link
Member Author

MasonM commented Jul 14, 2025

/retest

MasonM added 2 commits July 15, 2025 19:15
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
// to an HTTP/2 connection which is understandable to s.ServeConn. (s.ServeConn
// understands HTTP/2 except for the h2c part of it.)"
func NewMuxHandler(grpcServerHandler http.Handler, httpServerHandler http.Handler) http.Handler {
return h2c.NewHandler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible concern here is performance, since h2c buffers the entire request in memory, unlike cmux. I'm not sure if that's actually going to be a problem here.

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
@MasonM
Copy link
Member Author

MasonM commented Jul 16, 2025

/retest

@Joibel Joibel self-assigned this Jul 25, 2025
@Joibel
Copy link
Member

Joibel commented Jul 25, 2025

This LGTM, and works for me. Should we merge the reversion of the ENV var in the Dockerfile into this PR, or do you think it wants doing separately?

@MasonM
Copy link
Member Author

MasonM commented Jul 27, 2025

@Joibel Thanks for the review! I was thinking of doing it separately to be extra safe.

Copy link
Member

@isubasinghe isubasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, haven't tested it so just leaving a comment, code looks good.
Thanks for the work!

@Joibel
Copy link
Member

Joibel commented Jul 28, 2025

/cherry-pick release-3.6

@Joibel
Copy link
Member

Joibel commented Jul 28, 2025

/cherry-pick release-3.7

@Joibel Joibel merged commit 9ed9ed9 into argoproj:main Jul 28, 2025
60 of 62 checks passed
@gcp-cherry-pick-bot
Copy link

Cherry-pick failed with Merge error 9ed9ed9972616395dd3457a47092de07b75fb5f9 into temp-cherry-pick-fd80cc-release-3.6

@gcp-cherry-pick-bot
Copy link

Cherry-pick failed with Merge error 9ed9ed9972616395dd3457a47092de07b75fb5f9 into temp-cherry-pick-fd80cc-release-3.7

Joibel pushed a commit that referenced this pull request Aug 5, 2025
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
(cherry picked from commit 9ed9ed9)
Joibel pushed a commit that referenced this pull request Aug 5, 2025
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
(cherry picked from commit 9ed9ed9)
Joibel pushed a commit that referenced this pull request Aug 5, 2025
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
(cherry picked from commit 9ed9ed9)
(cherry picked from commit 44754be)
Joibel pushed a commit that referenced this pull request Aug 5, 2025
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
(cherry picked from commit 9ed9ed9)
Joibel pushed a commit that referenced this pull request Aug 5, 2025
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
(cherry picked from commit 9ed9ed9)
(cherry picked from commit 44754be)
Joibel added a commit that referenced this pull request Aug 5, 2025
…ck 3.7) (#14741)

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Joibel added a commit that referenced this pull request Aug 5, 2025
…ck release-3.6) (#14742)

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

http2 connections are assumed to be gRPC

3 participants