-
Notifications
You must be signed in to change notification settings - Fork 180
RUST-1433 Propagate original error for some labeled retry errors #903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUST-1433 Propagate original error for some labeled retry errors #903
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
I spent some time thinking about this yesterday and came up with the following idea that passes locally with a rough POC I wrote. The gist is:
i.e.
Caveats: I'm not sure what the async-std equivalent here would be (although I don't think it'd be a huge deal if we only ran this test on tokio), and I'm not sure how safe this is against a race between the failpoint being set and the retry occurring. WDYT? Happy to defer to whatever you think the best approach is here. |
I hadn't thought of that approach, but I'm pretty sure that'll be flaky due to the race condition you called out. Once the task to set the failpoint is spawned it's down to the scheduler whether it'll execute before or after the retry, which even if it happened to work doesn't seem good to rely on. |
3849b58
to
8a31022
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that'll be flaky due to the race condition you called out.
Yeah agreed, I think it makes sense to stick with your strategy then. LGTM! (I filed RUST-1694 to update this test to use successive failpoints if/when the server supports that.)
RUST-1433
The actual functional change here is just the added
&& !err.contains_label("NoWritesPerformed")
clause, everything else is in service of testing. The core issue here is that the test calls for adding a new failpoint in response to aCommandSucceededEvent
, and the Rust driver'sCommandEventHandler
doesn't permit async code. Options I considered:I'm pretty sure the cheat option wouldn't work, or at best would be flaky, because it would end up having to block the Tokio executor thread to wait on the channel in the sync event handler. The code also is pretty hostile to being tested in isolation, since we don't have any kind of abstraction for those layers. So, I ended up going with the test-only async events as the least-bad route.
Attempting to implement that as a direct callback (
Fn(&CommandEvent) -> BoxFuture<...>
) pretty rapidly ran into gnarly lifetime issues, so I switched that over to using a channel withAcknowledgedMessage
s, which gives an equivalent effect ("notify external code of an event, wait for possibly-async code before continuing"). That worked a lot better, although to prevent deadlock due to the failpoint itself triggering events it still ended up being messy; I've left comments to try to clarify what's going on but let me know if it needs more.