Skip to content

RUST-1069 Defer implicit session check out until after connection check out #811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 31, 2023

Conversation

abr-egn
Copy link
Contributor

@abr-egn abr-egn commented Jan 18, 2023

RUST-1069

Previously, retries were handled by a distinct execute_retry method; this was an issue for this change because there are two retry cases (connection check out failure and operation failure), and a retry should create an implicit session in the first case but not the second. This PR changes it to have a loop inside execute_operation_with_retry; this makes the handling there a bit more complicated, but removes the execute_retry method entirely, and also collapses ExecutionOutput into ExecutionDetails, so I think it's a net wash on complexity. Having the loop will also make executing multiple retries easier if/when we get to CSOT.

}
}
};
return Ok(details);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be returned directly in the Ok-handling case on 377, but this way it's clear that the loop unconditionally terminates unless there's an explicit continue.

@abr-egn abr-egn marked this pull request as ready for review January 20, 2023 18:01
@abr-egn abr-egn requested review from a team and isabelatkinson and removed request for a team January 20, 2023 18:01
Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions. As I reviewed this I noticed some areas in this file that could use refactoring, but I don't think they're relevant to this PR so I filed RUST-1575 to address them in the future.

Ok(server) => server,
Err(_) => {
return Err(first_error);
let retryability = self.get_retryability(&conn, &op, &session)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance this could fail during a retry? If so, I think we should return the original error rather than the one that occurs in this call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only error this can return is an internal one indicating an impossible/invalid connection state, which I think would be valuable to expose if it does happen :)

return Err(err);
// If the current transaction has been committed/aborted and it is not being
// re-committed/re-aborted, reset the transaction's state to TransactionState::None.
if let Some(ref mut session) = session {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved outside of the loop? I don't think these conditions would change on a retry: the session and operation should each stay the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, done.

Comment on lines +352 to +354
if session.is_none() {
implicit_session = self.start_implicit_session(&op).await?;
session = implicit_session.as_mut();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC here an implicit session will be started only on the first attempt and reused in subsequent retries. Is this the right behavior? I'm not sure if we should be dropping/re-creating the implicit session after checking out a connection each time around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is correct - the goal of the upstream change was to minimize the number of sessions created serverside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the retryable writes spec actually requires that the retry attempt use the same session as the initial operation, so this is definitely right, and I assume there are some spec tests that would start failing if we didn't preserve this behavior.

per spec:

When retrying a write command, drivers MUST resend the command with the same transaction ID.

where "transaction ID" is defined as:

the top-level lsid and txnNumber fields.

and per the sessions spec:

The retryable writes spec requires that both the original and retry attempt use the same server session. The server will block the retry attempt until the initial attempt completes at which point the retry attempt will continue executing.
For retryable reads that use an implicit session, drivers could choose to use a new server session for the retry attempt however this would lose the information that these two reads are related.

Comment on lines +310 to +313
let selection_criteria = session
.as_ref()
.and_then(|s| s.transaction.pinned_mongos())
.or_else(|| op.selection_criteria());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto above, does this need to be done in the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not - this (potentially) borrows from op, so moving it outside the loop would block the call to op.update_for_retry() (which requires a &mut).

@isabelatkinson isabelatkinson requested review from patrickfreed and removed request for patrickfreed January 25, 2023 15:19
@isabelatkinson
Copy link
Contributor

not sure if we want another review on this -- I'll leave it up to you, Abraham, if you want to wait until Kaitlin is back

@abr-egn abr-egn requested a review from kmahar January 25, 2023 15:40
@abr-egn
Copy link
Contributor Author

abr-egn commented Jan 25, 2023

not sure if we want another review on this -- I'll leave it up to you, Abraham, if you want to wait until Kaitlin is back

No rush. and this is core enough that more eyes will be good :)

Copy link
Contributor

@kmahar kmahar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment that I don't need to re-review for, LGTM otherwise.

max_lsids = std::cmp::max(max_lsids, unique.len());
lsids.clear();
}
assert!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future readers, I think it would be helpful to leave a comment here briefly summarizing why we used 2 rather than 1 as the prose test says

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done.

Comment on lines +352 to +354
if session.is_none() {
implicit_session = self.start_implicit_session(&op).await?;
session = implicit_session.as_mut();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the retryable writes spec actually requires that the retry attempt use the same session as the initial operation, so this is definitely right, and I assume there are some spec tests that would start failing if we didn't preserve this behavior.

per spec:

When retrying a write command, drivers MUST resend the command with the same transaction ID.

where "transaction ID" is defined as:

the top-level lsid and txnNumber fields.

and per the sessions spec:

The retryable writes spec requires that both the original and retry attempt use the same server session. The server will block the retry attempt until the initial attempt completes at which point the retry attempt will continue executing.
For retryable reads that use an implicit session, drivers could choose to use a new server session for the retry attempt however this would lose the information that these two reads are related.

@abr-egn abr-egn merged commit 910881b into mongodb:main Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants