Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Cmux #15510

Closed
wants to merge 15 commits into from
Closed

Replace Cmux #15510

wants to merge 15 commits into from

Conversation

aojea
Copy link

@aojea aojea commented Mar 18, 2023

The cmux project last commit is from Mar 2021 https://github.com/soheilhy/cmux
It also targets a more generic way of multiplexing connections offering different matches and doesn't offer the possibility of use the same cmux to differentiate between TLS and non-TLS.

etcd doesn't need the cmux flexibility, since the backends are always controlled by the project, and will benefit if it can support to discriminate between TLS and non-TLS.

This PR introduces a replacement from cmux targeted specifically for the etcd use case.
It uses the same method of detecting the connection, looking ahead on the connection to try to determine if is an HTTP or GRPC request.
The heuristic here are more simple, since HTTP connections are easy to identify (it always start with HTTP1/1 , see exception explanation below), and apply the following logic "if is not HTTP it has to be GRPC". In the corner case that the connection is forwarded and is not GRPC, the server will reject the connection anyway.

  • There is one special case that is not covered, that is HTTP2 with prior-knowledge, but this is not common in the wild, and it wasn't supported AFAIK here, because it requires a special handler in the server)

NOTE:

I'm not familiar with the etcd code so I did a grep for cmux, but I'm not completely sure I made a 1to1 replacement, the first commit introduces the new cmux for etcd, so I appreciate some guidance and feedback

@aojea
Copy link
Author

aojea commented Mar 18, 2023

/assign @serathius @fuweid

@serathius
Copy link
Member

This PR seems to not pass newly added cmux tests #15479
You can run them by running:

make
cd tests
go test -v  -count=1 -run TestConnectionMultiplexing ./e2e/ | grep '\-\-\-'

Results

--- FAIL: TestConnectionMultiplexing (2.76s)
    --- FAIL: TestConnectionMultiplexing/ServerTLS (1.18s)
        --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS (0.34s)
            --- PASS: TestConnectionMultiplexing/ServerTLS/ClientTLS/etcdctl (0.02s)
            --- PASS: TestConnectionMultiplexing/ServerTLS/ClientTLS/clientv3 (0.01s)
            --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl (0.31s)
                --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl/http2 (0.08s)
                --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl/http1.1 (0.08s)
                --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl/http1.0 (0.07s)
                --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl/default (0.07s)
    --- FAIL: TestConnectionMultiplexing/ServerNonTLS (0.71s)
        --- FAIL: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS (0.18s)
            --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/etcdctl (0.01s)
            --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/clientv3 (0.00s)
            --- FAIL: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl (0.16s)
                --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl/http2 (0.04s)
                --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl/http1.1 (0.04s)
                --- FAIL: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl/http1.0 (0.03s)
                --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl/default (0.04s)
    --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS (0.87s)
        --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS (0.34s)
            --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/etcdctl (0.02s)
            --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/clientv3 (0.01s)
            --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl (0.31s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl/http2 (0.08s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl/http1.1 (0.08s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl/http1.0 (0.07s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl/default (0.07s)
        --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS (0.19s)
            --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/etcdctl (0.01s)
            --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/clientv3 (0.00s)
            --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl (0.18s)
                --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl/http2 (0.05s)
                --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl/http1.1 (0.05s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl/http1.0 (0.03s)
                --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl/default (0.04s)

Also server fails to terminate correctly so I needed to change

defer clus.Close()

to

defer func(clus *e2e.EtcdProcessCluster) error {
  for _, member := range clus.Procs {
    member.Kill()
  }
  return clus.Stop()
}(clus)

@serathius
Copy link
Member

Still it would be great if this works and we don't need to introduce a breaking change to stop running grpc server under http.

@aojea
Copy link
Author

aojea commented Mar 18, 2023

Still it would be great if this works and we don't need to introduce a breaking change to stop running grpc server under http.

half way done by the comments, will try to come back to this tomorrow and run the failing tests

@ahrtr
Copy link
Member

ahrtr commented Mar 18, 2023

Great to see this PR, which I wanted to do but do not get time to do. If this PR eventually works and is accepted, then we don't need to add separate flag (e.g --listen-client-http-urls) to separate http and grpc server (#15446) ?

EDIT: I was thinking to update github.com/soheilhy/cmux to resolve the original issue just as I mentioned in #15451 (comment)? I feel like it makes more sense to delegate such generic function to a separate repository, e.g either cmux or other repository or put it under pkg/cmux?

We may also create a new repo under etcd-io, such as github.com/etcd-io/cmux. cc @mitake @ptabor @serathius @spzala @fuweid @chaochn47 @jmhbnz @tjungblu @aojea WDYT?

@aojea
Copy link
Author

aojea commented Mar 19, 2023

If this PR eventually works and is accepted, then we don't need to add separate flag (e.g --listen-client-http-urls) to separate http and grpc server (#15446) ?

that is the goal, I understand well the logic in the multiplexer, but I have trouble to understand the logic in etcd, I need some help to understand which request should be sent to which backends

We may also create a new repo under etcd-io, such as github.com/etcd-io/cmux. cc @mitake @ptabor @serathius @spzala @fuweid @chaochn47 @jmhbnz @tjungblu @aojea WDYT?

Putting the mux in a public repo means you may offer some guarantees to users and trying to generalize it can go against the supportability of etcd , there are too many protocols out there 😄
I think that is better to have it as an internal detail of etcd, since the project control the backends it is easy to make more deterministic heuristics in the mux making it more reliable and sustainable

@aojea aojea force-pushed the cmux branch 4 times, most recently from f81285f to df48868 Compare March 19, 2023 16:31
@aojea
Copy link
Author

aojea commented Mar 19, 2023

This PR seems to not pass newly added cmux tests #15479 You can run them by running:

make
cd tests
go test -v  -count=1 -run TestConnectionMultiplexing ./e2e/ | grep '\-\-\-'

Results

--- FAIL: TestConnectionMultiplexing (2.76s)
    --- FAIL: TestConnectionMultiplexing/ServerTLS (1.18s)
        --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS (0.34s)
            --- PASS: TestConnectionMultiplexing/ServerTLS/ClientTLS/etcdctl (0.02s)
            --- PASS: TestConnectionMultiplexing/ServerTLS/ClientTLS/clientv3 (0.01s)
            --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl (0.31s)
                --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl/http2 (0.08s)
                --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl/http1.1 (0.08s)
                --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl/http1.0 (0.07s)
                --- FAIL: TestConnectionMultiplexing/ServerTLS/ClientTLS/curl/default (0.07s)
    --- FAIL: TestConnectionMultiplexing/ServerNonTLS (0.71s)
        --- FAIL: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS (0.18s)
            --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/etcdctl (0.01s)
            --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/clientv3 (0.00s)
            --- FAIL: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl (0.16s)
                --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl/http2 (0.04s)
                --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl/http1.1 (0.04s)
                --- FAIL: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl/http1.0 (0.03s)
                --- PASS: TestConnectionMultiplexing/ServerNonTLS/ClientNonTLS/curl/default (0.04s)
    --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS (0.87s)
        --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS (0.34s)
            --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/etcdctl (0.02s)
            --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/clientv3 (0.01s)
            --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl (0.31s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl/http2 (0.08s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl/http1.1 (0.08s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl/http1.0 (0.07s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientTLS/curl/default (0.07s)
        --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS (0.19s)
            --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/etcdctl (0.01s)
            --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/clientv3 (0.00s)
            --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl (0.18s)
                --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl/http2 (0.05s)
                --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl/http1.1 (0.05s)
                --- FAIL: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl/http1.0 (0.03s)
                --- PASS: TestConnectionMultiplexing/ServerTLSAndNonTLS/ClientNonTLS/curl/default (0.04s)

Also server fails to terminate correctly so I needed to change

defer clus.Close()

to

defer func(clus *e2e.EtcdProcessCluster) error {
  for _, member := range clus.Procs {
    member.Kill()
  }
  return clus.Stop()
}(clus)

tests are passing now

@aojea
Copy link
Author

aojea commented Mar 19, 2023

graceful shutdown was broken, next push with the fix

@aojea aojea force-pushed the cmux branch 2 times, most recently from 78d19ad to 30a0749 Compare March 19, 2023 20:34
@ahrtr
Copy link
Member

ahrtr commented Mar 19, 2023

that is the goal, I understand well the logic in the multiplexer, but I have trouble to understand the logic in etcd, I need some help to understand which request should be sent to which backends

Please see the diagram below,
multiplexer

The HTTP endpoints in the right bottom side isn't a full list, please read the section "Background" in #15402 (comment) to get a full list.
Please do not hesitate to ping me if you need any further clarification.

@aojea aojea force-pushed the cmux branch 2 times, most recently from c455eec to ca61d2e Compare March 19, 2023 23:37
@@ -233,7 +233,7 @@ func startGRPCProxy(cmd *cobra.Command, args []string) {
}
httpClient := mustNewHTTPClient(lg)

srvhttp, httpl := mustHTTPListener(lg, m, tlsInfo, client, proxyClient)
srvhttp, httpl := mustHTTPListener(lg, m, client, proxyClient)

if err := http2.ConfigureServer(srvhttp, &http2.Server{
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have doubts if I have to use the h2c handler here too, like in the server

@aojea aojea changed the title [WIP] Replace Cmux Replace Cmux Mar 19, 2023
@aojea
Copy link
Author

aojea commented Mar 19, 2023

ok, I think I got the logic right now, and it can be done anything from the same multiplexer

@aojea aojea force-pushed the cmux branch 3 times, most recently from c1402ef to 765a973 Compare March 20, 2023 00:03
@aojea
Copy link
Author

aojea commented Mar 27, 2023

We should not proceed #15510 (comment) , this was fun to code but not meant to be used in production

@aojea aojea closed this Mar 27, 2023
@aojea
Copy link
Author

aojea commented Mar 27, 2023

The cmux usage in etcd should be removed too based on this conversation, since is not adding much once the grpc and http servers are split, is it?

@serathius
Copy link
Member

Question remains, how we make the intended production configuration a default without a breaking change? Was hoping that this PR can be validated and made default in v3.6. Without this we will need to revert to previous design :(

@aojea
Copy link
Author

aojea commented Mar 27, 2023

Question remains, how we make the intended production configuration a default without a breaking change? Was hoping that this PR can be validated and made default in v3.6. Without this we will need to revert to previous design :(

can we serve redirects https://developer.mozilla.org/en-US/docs/Web/HTTP/Redirections from the grpc server to forward them to the new url?

@aojea
Copy link
Author

aojea commented Mar 28, 2023

Question remains, how we make the intended production configuration a default without a breaking change? Was hoping that this PR can be validated and made default in v3.6. Without this we will need to revert to previous design :(

let me reopen, I was with the idea that we want only passive, but if we write the ack setting to the connection then we can be 100% is a grpc connection.
I will put the ack setting handling and we can keep discussing

@aojea
Copy link
Author

aojea commented Apr 30, 2023

ok, I got it now, the trickies part is to support http2 with TLS, because grpc uses http2 but shows an interesting behavior, the clients block until they receive a settings frame. However, if we send the setting frame and is not grpc, the h2c handler logic considers it a protocol error and tear down the connection

aojea added 15 commits May 1, 2023 13:18
Introduce a new socket connection multiplexer that multiplexes the same ip:port
to an http and a grpc backends.
The multiplexer is able to terminate TLS and forward the request to the
corresponding grpc or http backends

Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
it was just adding one element in the path when it can be used
directly, as it is configured as a passthrough.

Signed-off-by: Antonio Ojea <aojea@google.com>
sockproxy allows to listen in the same listener both in http
and grpc with or without TLS enabled, allowing to remove the
duplicate logic for secure and insecure, and to remove the
http2 handler for splitting the grpc and http traffic.

Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
The only reliable method to detect a GRPC connection seems to be
parsing the content-type header.
We need to be able to establish the HTTP2 connection so the client
sends us the headers.

Signed-off-by: Antonio Ojea <aojea@google.com>
grpc clients block waiting for the settings frame.
http2 fails if we write the settings frame

we have to detect that the client didn't send more than 33 bytes:
24 preface + 9 settings (this seems to be optional)

to write the settings frame.

Signed-off-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
@aojea aojea marked this pull request as draft July 4, 2023 08:51
@stale
Copy link

stale bot commented Oct 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 15, 2023
@stale stale bot closed this Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

6 participants