-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for execution result channel before recving #273
Conversation
If we fail to fetch connect to the xds server on startup, there's a race between the xds client placing the error on the channel and the resource manager dropping the receiver due to no longer being able to receive updates from the client. If the latter happens first, we get a generic 'channel dropped' error message instead of the actual error message. This adds a short sleep before checking the channel for an error message to give the client enough time to send the error. Also return a result from main rather than unwrapping. Doesn't change much beyond the error message being less scarier if it doesn't show up as a panic.
Build Succeeded 🥳 Build Id: 72238e45-da21-4a38-9f58-09f7f8942916 To build this version:
|
Build Succeeded 🥳 Build Id: d54b896d-2657-408e-8eb6-def90b401b97 To build this version:
|
src/proxy/server/resource_manager.rs
Outdated
@@ -201,6 +201,9 @@ impl DynamicResourceManagers { | |||
// initialize properly. | |||
// Check the client's execution result if exiting was due to some root cause | |||
// error and return that error if so. Otherwise return a generic error. | |||
// Wait a bit before checking the channel to give the client enough time | |||
// to put any values on the channel. | |||
tokio::time::sleep(std::time::Duration::from_millis(100)).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than a sleep
in the code, which always freaks me out a little 😄 as they tend to hide things, and can be annoyingly inconsistent, we do a blocking operation, with a timeout instead?
Something like tokio::time::timeout(Duration::from_secs(5), execution_result_rx)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A blocking wouldn't help in this case unfortunately, the issue is that if we check too soon we get an error back that the channel is closed (so blocking here will return immediately with an error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait yeah this should work just fine its the same thing, not sure what I was thinking will fix!
Build Succeeded 🥳 Build Id: 0290fc54-75d4-4a37-8864-6941b5539dca To build this version:
|
Build Succeeded 🥳 Build Id: d55fa14a-2932-4de9-811d-515595a475ed To build this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If we fail to fetch connect to the xds server on startup, there's
a race between the xds client placing the error on the channel
and the resource manager dropping the receiver due to no longer
being able to receive updates from the client.
If the latter happens first, we get a generic 'channel dropped' error
message instead of the actual error message.
This adds a short sleep before checking the channel for an error message
to give the client enough time to send the error.
Also return a result from main rather than unwrapping. Doesn't change
much beyond the error message being less scarier if it doesn't show
up as a panic.