Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix dial-timeout not affected for client watch command #18336

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chengjoey
Copy link

fix #18335

etcd/client/v3/client.go

Lines 327 to 330 in e7f5729

var cancel context.CancelFunc
dctx, cancel = context.WithTimeout(c.ctx, c.cfg.DialTimeout)
defer cancel() // TODO: Is this right for cases where grpc.WithBlock() is not set on the dial options?
}

As noted before, it is wrong to not have grpc.WithBlock()

in the test, we will also addoption grpc.WithBlock(

// grpc.WithBlock to block until connection up or timeout
testCfgs := []Config{
{
Endpoints: []string{"http://254.0.0.1:12345"},
DialTimeout: 2 * time.Second,
DialOptions: []grpc.DialOption{grpc.WithBlock()},
},
{
Endpoints: []string{"http://254.0.0.1:12345"},
DialTimeout: time.Second,
DialOptions: []grpc.DialOption{grpc.WithBlock()},
Username: "abc",
Password: "def",
},
}

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chengjoey
Once this PR has been reviewed and has the lgtm label, please assign wenjiaswe for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @chengjoey. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jmhbnz
Copy link
Member

jmhbnz commented Jul 16, 2024

/ok-to-test

@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.

Project coverage is 68.77%. Comparing base (c86c93c) to head (edae2a1).
Report is 137 commits behind head on main.

Current head edae2a1 differs from pull request most recent head 3d8fe91

Please upload reports for the commit 3d8fe91 to get more accurate results.

Files with missing lines Patch % Lines
etcdctl/ctlv3/command/global.go 0.00% 13 Missing ⚠️
etcdctl/ctlv3/command/watch_command.go 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
Files with missing lines Coverage Δ
etcdctl/ctlv3/command/watch_command.go 44.08% <0.00%> (ø)
etcdctl/ctlv3/command/global.go 0.00% <0.00%> (ø)

... and 24 files with indirect coverage changes

@@           Coverage Diff           @@
##             main   #18336   +/-   ##
=======================================
  Coverage   68.77%   68.77%           
=======================================
  Files         420      420           
  Lines       35535    35548   +13     
=======================================
+ Hits        24438    24449   +11     
- Misses       9668     9673    +5     
+ Partials     1429     1426    -3     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c86c93c...3d8fe91. Read the comment docs.

@chengjoey
Copy link
Author

/test pull-etcd-unit-test-386
/test pull-etcd-unit-test-amd64
/test pull-etcd-e2e-amd64

@chengjoey
Copy link
Author

/retest-required

@jberkus
Copy link

jberkus commented Jul 18, 2024

@chengjoey are you able to troubleshoot these test failures on your own? Or do you need help?

@chengjoey
Copy link
Author

@chengjoey are you able to troubleshoot these test failures on your own? Or do you need help?

thanks @jberkus , i will troubleshoot the test failure in the next two days.

@ahrtr
Copy link
Member

ahrtr commented Jul 19, 2024

Thanks for raising this PR.

It's a valid fix, but it might break some of the existing user experience. I guess it might be the reason why some test cases failed. So I suggest to only fix this in main.

@chengjoey
Copy link
Author

updates:
Modifying the original logic and adding Option grpc.WithBlock will indeed affect the old logic. For example, grpc_proxy must create a client before starting the server. At this time, setting Block will cause the server to never start.

httpClient := mustNewHTTPClient()
srvhttp, httpl := mustHTTPListener(lg, m, tlsInfo, client, proxyClient)
if err := http2.ConfigureServer(srvhttp, &http2.Server{
MaxConcurrentStreams: maxConcurrentStreams,
}); err != nil {
lg.Fatal("Failed to configure the http server", zap.Error(err))
}
errc := make(chan error, 3)
go func() { errc <- newGRPCProxyServer(lg, client).Serve(grpcl) }()
go func() { errc <- srvhttp.Serve(httpl) }()
go func() { errc <- m.Serve() }()
if len(grpcProxyMetricsListenAddr) > 0 {
mhttpl := mustMetricsListener(lg, tlsInfo)
go func() {
mux := http.NewServeMux()
grpcproxy.HandleMetrics(mux, httpClient, client.Endpoints())
grpcproxy.HandleHealth(lg, mux, client)
grpcproxy.HandleProxyMetrics(mux)
grpcproxy.HandleProxyHealth(lg, mux, proxyClient)
lg.Info("gRPC proxy server metrics URL serving")
herr := http.Serve(mhttpl, mux)
if herr != nil {
lg.Fatal("gRPC proxy server metrics URL returned", zap.Error(herr))
} else {
lg.Info("gRPC proxy server metrics URL returned")
}
}()
}

So I changed it to only watch_command to set the block when creating the client, and other commands remain the same.
This will only fix the watch dial-timeout, and will not affect other logic
@ahrtr @jberkus PTAL

@chengjoey
Copy link
Author

Just like e2e test does,newClient will set grpc.WithBlock

etcd/tests/e2e/utils.go

Lines 50 to 54 in d6c0127

ccfg := clientv3.Config{
Endpoints: entpoints,
DialTimeout: 5 * time.Second,
DialOptions: []grpc.DialOption{grpc.WithBlock()},
}

@jmhbnz
Copy link
Member

jmhbnz commented Sep 26, 2024

Discussed during sig-etcd triage meeting, @chengjoey can you please rebase this to prepare it for review?

Signed-off-by: joey <zchengjoey@gmail.com>
@chengjoey chengjoey force-pushed the fix/watch-dial-timeout branch from 8caf4d0 to 3d8fe91 Compare September 27, 2024 01:33
@chengjoey
Copy link
Author

Discussed during sig-etcd triage meeting, @chengjoey can you please rebase this to prepare it for review?

done

@chengjoey
Copy link
Author

/test pull-etcd-robustness-amd64

@jmhbnz
Copy link
Member

jmhbnz commented Oct 24, 2024

Thanks for rebase, I'll review shortly, cc tech leads @ahrtr and @serathius to also review.

@jmhbnz jmhbnz self-requested a review October 24, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

dial-timeout option does not take effect on the watch command.
6 participants