RUST-1433 Propagate original error for some labeled retry errors #903

abr-egn · 2023-06-26T18:55:12Z

The actual functional change here is just the added && !err.contains_label("NoWritesPerformed") clause, everything else is in service of testing. The core issue here is that the test calls for adding a new failpoint in response to a CommandSucceededEvent, and the Rust driver's CommandEventHandler doesn't permit async code. Options I considered:

try to cheat the async in there via a channel to an independent task
add a test-only way to run async event processing
test the code in isolation against some kind of fake server

I'm pretty sure the cheat option wouldn't work, or at best would be flaky, because it would end up having to block the Tokio executor thread to wait on the channel in the sync event handler. The code also is pretty hostile to being tested in isolation, since we don't have any kind of abstraction for those layers. So, I ended up going with the test-only async events as the least-bad route.

Attempting to implement that as a direct callback (Fn(&CommandEvent) -> BoxFuture<...>) pretty rapidly ran into gnarly lifetime issues, so I switched that over to using a channel with AcknowledgedMessages, which gives an equivalent effect ("notify external code of an event, wait for possibly-async code before continuing"). That worked a lot better, although to prevent deadlock due to the failpoint itself triggering events it still ended up being messy; I've left comments to try to clarify what's going on but let me know if it needs more.

drshika

lgtm!

isabelatkinson · 2023-06-28T17:28:32Z

I spent some time thinking about this yesterday and came up with the following idea that passes locally with a rough POC I wrote. The gist is:

define an AsyncEventHandler type that stores a handle to a client and the async runtime
implement CommandEventHandler for AsyncEventHandler and call spawn on the stored runtime in the definition of handle_command_succeeded_event to configure the failpoint
create an AsyncEventHandler in the async test by calling tokio::runtime::Handler::current to grab the runtime handle and configure it on the client used to run the insert

i.e.

struct AsyncEventHandler {
    runtime_handle: tokio::runtime::Handle,
    client: TestClient,
}

impl CommandEventHandler for AsyncEventHandler {
    fn handle_command_succeeded_event(&self, event: CommandSucceededEvent) {
        if event matches {
            runtime_handle.spawn(/* configure failpoint on client */);
        }
    }
}

Caveats: I'm not sure what the async-std equivalent here would be (although I don't think it'd be a huge deal if we only ran this test on tokio), and I'm not sure how safe this is against a race between the failpoint being set and the retry occurring. WDYT? Happy to defer to whatever you think the best approach is here.

abr-egn · 2023-06-28T19:38:25Z

I hadn't thought of that approach, but I'm pretty sure that'll be flaky due to the race condition you called out. Once the task to set the failpoint is spawned it's down to the scheduler whether it'll execute before or after the retry, which even if it happened to work doesn't seem good to rely on.

isabelatkinson

I'm pretty sure that'll be flaky due to the race condition you called out.

Yeah agreed, I think it makes sense to stick with your strategy then. LGTM! (I filed RUST-1694 to update this test to use successive failpoints if/when the server supports that.)

abr-egn requested review from isabelatkinson and drshika June 26, 2023 18:55

drshika approved these changes Jun 26, 2023

View reviewed changes

abr-egn added 7 commits June 28, 2023 15:39

test wip; failpoint builder

303e9de

unwind failpoint builder

7e6c06a

wip channels

c0d8d1d

maybe working?

c83dc49

passing

179523d

cleanup

eaec074

fmt

8a31022

abr-egn force-pushed the RUST-1433/retry-propagate-errors branch from 3849b58 to 8a31022 Compare June 28, 2023 19:39

isabelatkinson approved these changes Jun 28, 2023

View reviewed changes

abr-egn merged commit 3ea196f into mongodb:main Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RUST-1433 Propagate original error for some labeled retry errors #903

RUST-1433 Propagate original error for some labeled retry errors #903

Uh oh!

abr-egn commented Jun 26, 2023

Uh oh!

drshika left a comment

Uh oh!

isabelatkinson commented Jun 28, 2023 •

edited

Loading

Uh oh!

abr-egn commented Jun 28, 2023

Uh oh!

isabelatkinson left a comment

Uh oh!

Uh oh!

RUST-1433 Propagate original error for some labeled retry errors #903

RUST-1433 Propagate original error for some labeled retry errors #903

Uh oh!

Conversation

abr-egn commented Jun 26, 2023

Uh oh!

drshika left a comment

Choose a reason for hiding this comment

Uh oh!

isabelatkinson commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abr-egn commented Jun 28, 2023

Uh oh!

isabelatkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

isabelatkinson commented Jun 28, 2023 •

edited

Loading