Handle fatal errors in receiver state machine#1060
Handle fatal errors in receiver state machine#1060spacebear21 merged 7 commits intopayjoin:masterfrom
Conversation
arminsabouri
left a comment
There was a problem hiding this comment.
Approach ACK.
Just had some high level questions. In general this is moving in the right direction
| /// Represents a fatal rejection of a state transition. | ||
| /// When this error occurs, the session must be closed and cannot be resumed. | ||
| pub struct RejectFatal<Event, Err>(Event, Err); | ||
| pub struct RejectFatal<Event, Err>(pub(crate) Event, pub(crate) Err); |
There was a problem hiding this comment.
Was this pub(crate)'d for tests?
There was a problem hiding this comment.
Yeah, specifically to match on the rejection details
There was a problem hiding this comment.
Would pub(self) work?
There was a problem hiding this comment.
Any decision on whether pub(self) or pub(crate) is better here?
payjoin/src/core/receive/v2/mod.rs
Outdated
| &mut self, | ||
| &mut self, // NOTE: does this need to be &mut? e.g. extract_err_req just clones the inner keys |
There was a problem hiding this comment.
Definitely easier to deal with if not mut. I believe the concern is &mut self.session_context.ohttp_keys is supposed to get mutated if it ohttp_encapsulate ratchets instead of using randomness so that the same OHTTP ephemeral key doesn't get used twice which would allow a relay to correlate requests since they'd be encrypted with the same key & nonce (base nonce + seq).
I'd like @nothingmuch to confirm.
141b916 to
620c509
Compare
|
From our devcall: We should consider adding a pub |
9e52f2e to
779e172
Compare
Pull Request Test Coverage Report for Build 18170784561Details
💛 - Coveralls |
e0d168e to
58c4143
Compare
|
Re-requesting review because I believe all the main logic changes are in place and the overall design shouldn't change much unless something clearly wrong comes up in review. Leaving in draft status because some cleanup is needed (e.g. extract_err_req from SessionHistory and remove pub process_err_res pure function since it should only be called in HasReplyableError), and it will need additional testing. |
arminsabouri
left a comment
There was a problem hiding this comment.
Partial ApporachACK.
I mostly had highlevel questions. Mainly around:
- The necessity for a session event trait
- How the typestate progresses to the reply error state
| /// Represents a fatal rejection of a state transition. | ||
| /// When this error occurs, the session must be closed and cannot be resumed. | ||
| pub struct RejectFatal<Event, Err>(Event, Err); | ||
| pub struct RejectFatal<Event, Err>(pub(crate) Event, pub(crate) Err); |
There was a problem hiding this comment.
Would pub(self) work?
fa845c8 to
ca5da9c
Compare
|
The biggest change since the last round of review is ca5da9c which introduces a new state transition method for replyable errors resulting in the HasReplyableError typestate. This approach replaces the |
There was a problem hiding this comment.
ack on ca5da9c, Just a few more general clarifying questions. As well the terminal_error TODO can be ignored as we are getting rid of that method entirely, but there still remains the added TODO in the test_err_response in the integrations tests, I'm assuming that can safely be handled after 1.0 in a followup
ca5da9c to
dcfe2f4
Compare
Yeah this should be getting addressed in #1114 |
| /// Represents a fatal rejection of a state transition. | ||
| /// When this error occurs, the session must be closed and cannot be resumed. | ||
| pub struct RejectFatal<Event, Err>(Event, Err); | ||
| pub struct RejectFatal<Event, Err>(pub(crate) Event, pub(crate) Err); |
| pub struct RejectTransient<Err>(pub(crate) Err); | ||
| /// Represents a replyable error that transitions to an error state but keeps the session open. | ||
| /// When this error occurs, the session transitions to the ErrorState. | ||
| pub struct RejectReplyableError<Event, ErrorState, Err>( |
There was a problem hiding this comment.
Note to follow up with some unit testing for this
DanGould
left a comment
There was a problem hiding this comment.
In general I think this works. Wild how few changes actually end up in payjoin-cli's implementation after all this internal jostling. Overall it seems like a much more realistic abstraction of reality. It does seem that tests could be a little tighter if InternalPersistedError could be matched on.
I'm also still a bit bugged out by that name since Transient InternalPersistedErrors are not Persisted. They're PersistErrors which arise from persistence attempts, not errors that are persisted by their nature.
| async fn handle_error( | ||
| &self, | ||
| ohttp_relay: &payjoin::Url, | ||
| session_history: &SessionHistory, | ||
| session: Receiver<HasReplyableError>, | ||
| persister: &ReceiverPersister, | ||
| ) -> Result<()> { |
There was a problem hiding this comment.
Other than an Err(anyhow::Error) being propagated if post_request or process_error_response fails, are there circumstances where we want to close the session in this function? for example, if process_error_response fails? Or if the post_request fails for one of a few enumerated reasons as a follow up with the session.fail() implementation?
| .check_broadcast_suitability(None, |_| Ok(false)) | ||
| .save(&persister) | ||
| .expect_err("Unbroadcastable should error"); | ||
| // NOTE: it would be good to assert against the internal error type but InternalPersistedError is private |
There was a problem hiding this comment.
InternalPersistedError is pub(crate).
Would making the internal field in PersistedError pub(super) (to the core scope) address this concern?
#[derive(Debug)]
pub struct PersistedError<
ApiError: std::error::Error,
StorageError: std::error::Error,
ErrorState: fmt::Debug = (),
>(pub(super) InternalPersistedError<ApiError, StorageError, ErrorState>);| events: vec![ | ||
| SessionEvent::Created(SHARED_CONTEXT.clone()), | ||
| SessionEvent::CheckedBroadcastSuitability(), | ||
| SessionEvent::SessionInvalid(mock_err.0.clone(), Some(mock_err.1.clone())), | ||
| ], |
There was a problem hiding this comment.
Why change the static vec! definition to mut & .push() instead? nit.
There was a problem hiding this comment.
Hmm I don't think I changed that; this old test got deleted and the new test_session_transient_error is being displayed side by side but it's unrelated. I think I just copied to mut events approach from a different test, we seem to mix and match already.
This session event holds the JSON representation of the error reply to the sender that should be attempted by the receiver, as specified in BIP78. Co-authored-by: spacebear <git@spacebear.dev>
This enforces proper handling within the receiver state machine, such that they must attempt replying to the sender with an error response before the session can be considered closed. Co-authored-by: DanGould <d@ngould.dev>
`TerminalFailure` was never a real state in the receiver state machine. Instead, a SessionEvent::Closed simply results in an empty `()` next state.
Previously extract_err_req was exposed as a public method on SessionHistory, and process_err_res as a pure function, for implementers to manually process error responses. Now this can and should be done via the HasReplyableError typestate, so these functions are no longer needed.
Only a subset of fatal errors should result in immediate closure of areceiver session. In many cases an attempt should first be made torespond to the sender with an error as specified in BIP78. Instead of scrutinizing various error types to determine whether itshould be replyable or not, the callsite can simply call the appropriate state transition method: `replyable_error()` for replyable errors or `fatal()` for non-replyable ones. For replyable errors, the resulting HasReplyableError typestate can be obtained from the resulting `PersistedError` with the `error_state()` method.
dcfe2f4 to
9cdb1eb
Compare
|
Latest push is a simple rebase on the latest master. There is an outstanding task in the PR description for the |
This brings fatal error handling into the receiver state machine.
Supersedes #1045
TODOs:
session.fail()andsession.cancel()to allow receiver to explicitly close a session. Maybe fail() should take aErrorparameter and transition toHasReplyableErrorstate if appropriate? see discussion here Fix apply_fee_range error handling #1108 (review)Pull Request Checklist
Please confirm the following before requesting review:
AI
in the body of this PR.