Skip to content

fix: error propagation in http-connect mode #475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ipochi
Copy link
Contributor

@ipochi ipochi commented Mar 9, 2023

If there is an error encountered by the proxy-server, such as DNS lookup, this is currently not propagated currently back to the client in http-connect mode.

This PR adds a http response, conveying the error message to the client before closing the http connection.

I'm unsure about the Status Code, hence it is merely as a placeholder for now whether the status code should be in the 400 or 500 range.

fixes: #458

@ipochi
Copy link
Contributor Author

ipochi commented Mar 9, 2023

/cc @cheftako @jkh52

@k8s-ci-robot k8s-ci-robot requested review from cheftako and jkh52 March 9, 2023 14:46
@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Mar 9, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 9, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @ipochi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 9, 2023
@ipochi
Copy link
Contributor Author

ipochi commented Mar 9, 2023

/ok-to-test

@k8s-ci-robot
Copy link
Contributor

@ipochi: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jkh52
Copy link
Contributor

jkh52 commented Mar 9, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 9, 2023
@jkh52
Copy link
Contributor

jkh52 commented Mar 9, 2023

Thanks for sending this PR!

Is it feasible to add a unit test around this propagation?

@ipochi ipochi force-pushed the imran/fix-dns-error-propagation-http-connect branch from c5d7120 to 6994a35 Compare March 13, 2023 13:53
@ipochi
Copy link
Contributor Author

ipochi commented Mar 13, 2023

/ok-to-test

@ipochi ipochi force-pushed the imran/fix-dns-error-propagation-http-connect branch from 6994a35 to 0e43f90 Compare March 13, 2023 13:57
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Mar 13, 2023
@jkh52
Copy link
Contributor

jkh52 commented Mar 13, 2023

/ok-to-test

@jkh52
Copy link
Contributor

jkh52 commented Mar 13, 2023

/retest

@ipochi ipochi force-pushed the imran/fix-dns-error-propagation-http-connect branch from 0e43f90 to fd1de7e Compare March 15, 2023 12:17
@ipochi
Copy link
Contributor Author

ipochi commented Mar 21, 2023

/retest

@ipochi
Copy link
Contributor Author

ipochi commented Mar 30, 2023

@cheftako Need some help in understanding why the test fails ?

 go test -race -run TestFailedDial_HTTPCONN
I0330 20:35:07.750773   22278 client.go:467] "error dialing backend" error="dial tcp 127.0.0.1:59864: connect: connection refused" dialID=8674665223082153551 connectionID=1 dialAddress="127.0.0.1:59864"
E0330 20:35:07.755274   22278 server.go:937] "DIAL_RSP contains failure" err="dial tcp 127.0.0.1:59864: connect: connection refused" dialID=8674665223082153551 agentID="1ef4d2b3-2f36-4084-bb01-55d048cb588d"
E0330 20:35:07.759688   22278 tunnel.go:150] "Received failure on connection" err="read tcp [::1]:59868->[::1]:59870: use of closed network connection"
E0330 20:35:07.761144   22278 server.go:861] "Receive stream from agent read failure" err="rpc error: code = Canceled desc = context canceled"
http connect server error:  accept tcp [::]:59868: use of closed network connection
E0330 20:35:07.862339   22278 server.go:861] "Receive stream from agent read failure" err="rpc error: code = Canceled desc = context canceled"
E0330 20:35:07.862569   22278 client.go:388] "could not read stream" err="rpc error: code = Canceled desc = grpc: the client connection is closing" serverID="b300f878-cbfb-42e7-8f76-2a8c021d5ab3" agentID="1ef4d2b3-2f36-4084-bb01-55d048cb588d"
--- FAIL: TestFailedDial_HTTPCONN (0.60s)
    proxy_test.go:735: <nil>
    proxy_test.go:1035: found unexpected goroutines:
        [Goroutine 84 in state select, with net/http.(*persistConn).readLoop on top of the stack:
        goroutine 84 [select]:
        net/http.(*persistConn).readLoop(0xc000313d40)
        	/usr/local/go/src/net/http/transport.go:2213 +0x14ef
        created by net/http.(*Transport).dialConn
        	/usr/local/go/src/net/http/transport.go:1751 +0x2586

         Goroutine 85 in state select, with net/http.(*persistConn).writeLoop on top of the stack:
        goroutine 85 [select]:
        net/http.(*persistConn).writeLoop(0xc000313d40)
        	/usr/local/go/src/net/http/transport.go:2392 +0x1a9
        created by net/http.(*Transport).dialConn
        	/usr/local/go/src/net/http/transport.go:1752 +0x261a
        ]
FAIL
exit status 1
FAIL	sigs.k8s.io/apiserver-network-proxy/tests

@ipochi ipochi force-pushed the imran/fix-dns-error-propagation-http-connect branch from fd1de7e to b911f64 Compare March 31, 2023 14:35
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 31, 2023
@ipochi
Copy link
Contributor Author

ipochi commented Mar 31, 2023

@cheftako Need some help in understanding why the test fails ?

 go test -race -run TestFailedDial_HTTPCONN
I0330 20:35:07.750773   22278 client.go:467] "error dialing backend" error="dial tcp 127.0.0.1:59864: connect: connection refused" dialID=8674665223082153551 connectionID=1 dialAddress="127.0.0.1:59864"
E0330 20:35:07.755274   22278 server.go:937] "DIAL_RSP contains failure" err="dial tcp 127.0.0.1:59864: connect: connection refused" dialID=8674665223082153551 agentID="1ef4d2b3-2f36-4084-bb01-55d048cb588d"
E0330 20:35:07.759688   22278 tunnel.go:150] "Received failure on connection" err="read tcp [::1]:59868->[::1]:59870: use of closed network connection"
E0330 20:35:07.761144   22278 server.go:861] "Receive stream from agent read failure" err="rpc error: code = Canceled desc = context canceled"
http connect server error:  accept tcp [::]:59868: use of closed network connection
E0330 20:35:07.862339   22278 server.go:861] "Receive stream from agent read failure" err="rpc error: code = Canceled desc = context canceled"
E0330 20:35:07.862569   22278 client.go:388] "could not read stream" err="rpc error: code = Canceled desc = grpc: the client connection is closing" serverID="b300f878-cbfb-42e7-8f76-2a8c021d5ab3" agentID="1ef4d2b3-2f36-4084-bb01-55d048cb588d"
--- FAIL: TestFailedDial_HTTPCONN (0.60s)
    proxy_test.go:735: <nil>
    proxy_test.go:1035: found unexpected goroutines:
        [Goroutine 84 in state select, with net/http.(*persistConn).readLoop on top of the stack:
        goroutine 84 [select]:
        net/http.(*persistConn).readLoop(0xc000313d40)
        	/usr/local/go/src/net/http/transport.go:2213 +0x14ef
        created by net/http.(*Transport).dialConn
        	/usr/local/go/src/net/http/transport.go:1751 +0x2586

         Goroutine 85 in state select, with net/http.(*persistConn).writeLoop on top of the stack:
        goroutine 85 [select]:
        net/http.(*persistConn).writeLoop(0xc000313d40)
        	/usr/local/go/src/net/http/transport.go:2392 +0x1a9
        created by net/http.(*Transport).dialConn
        	/usr/local/go/src/net/http/transport.go:1752 +0x261a
        ]
FAIL
exit status 1
FAIL	sigs.k8s.io/apiserver-network-proxy/tests

never mind, figured it out. Added a unit test as well.

/cc @jkh52 @cheftako

@k8s-ci-robot k8s-ci-robot requested a review from jkh52 March 31, 2023 14:36
@ipochi ipochi force-pushed the imran/fix-dns-error-propagation-http-connect branch from b911f64 to c37aed4 Compare March 31, 2023 14:37
@cheftako
Copy link
Contributor

cheftako commented Apr 3, 2023

Seems like a good change. In answer to your earlier question, I think a 5xx series error makes more sense.

@ipochi
Copy link
Contributor Author

ipochi commented Apr 3, 2023

Seems like a good change. In answer to your earlier question, I think a 5xx series error makes more sense.

@cheftako Thank you for the review. 400 was merely a placeholder. I agree with a 5XX range as the status code.

If there is an error encountered by  the proxy-server, such as DNS
lookup, this is not propagated back correctly to the client in
http-connect mode.

This PR adds an http response, conveying the error message to the client
before closing the http connection.

Signed-off-by: Imran Pochi <imranpochi@microsoft.com>
@ipochi ipochi force-pushed the imran/fix-dns-error-propagation-http-connect branch from c37aed4 to 5c7f6ae Compare April 3, 2023 05:33
@ipochi ipochi requested a review from cheftako April 3, 2023 05:34
@ipochi
Copy link
Contributor Author

ipochi commented Apr 3, 2023

/test pull-apiserver-network-proxy-test

@cheftako
Copy link
Contributor

cheftako commented Apr 3, 2023

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 3, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheftako, ipochi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 3, 2023
@k8s-ci-robot k8s-ci-robot merged commit a0bc007 into kubernetes-sigs:master Apr 3, 2023
@ipochi ipochi deleted the imran/fix-dns-error-propagation-http-connect branch April 3, 2023 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
4 participants