Skip to content

Improve receive error heirarchy#1031

Merged
DanGould merged 7 commits intopayjoin:masterfrom
DanGould:receive-error-heirarchy
Sep 2, 2025
Merged

Improve receive error heirarchy#1031
DanGould merged 7 commits intopayjoin:masterfrom
DanGould:receive-error-heirarchy

Conversation

@DanGould
Copy link
Contributor

@DanGould DanGould commented Sep 2, 2025

I delete a few error types, refine function definitions, and ultimately change the receive module error heirarchy so that the top level receive::Error is the only one that contains the Implementation variant, where previously these could be defined in multiple places across the stack.

In order to solve #793 and #403 two (currently conflated) problems must be solved: Error Handling and error classification. This addresses the latter.

I'll need to follow this up to address #793, but getting the hierarchy of classification right, including the removal of superfluous errors was my first step.

Error handling involves a response to the counterparty and reporting to the client. In V1, that means serializing the error as JSON and making the network call to respond as well as printing a log message. That's demonstrated here in the 5th commit. We'll need to introduce some testing around this as it firms up. In v2, errors need to be persisted. They can be either fatal or transient in nature, with basically all errors being Implementation errors. It is not yet clear to me whether all ImplementationError types imply transient errors for certain, but for now that is how we make the distinction. if this is not the case, we must come up with some enumeration for ImplementationErrors that let us better categorize them so that the history replay can handle them appropriately.

Implementation Error classification For example, this was an LLM generated suggestion of such classification:
pub enum TransientImplementationError {
    StorageTemporaryFailure(String),    // Serialized error message
    NetworkFailure(String),
    WalletUnavailable(String),
    DatabaseConnection(String),
}

pub enum FatalImplementationError {
    InvalidConfiguration(String),
    WalletSigningFailure(String),
    PsbtProcessingFailure(String),
    SessionExpired(String),
}

I think all error states, fatal or transient, need to be persisted in v2+ that way on replay a client can be presented with the error as the current state, even if transient. Fatal errors may need to be replayed and marked as to whether or not they have been communicated to the counterparty or if further action is required at a later resumption. In order to serialize these errors I believe we want to expand on the ErrorCode type to be a SerializableError that includes the error kind with fixed messages for presentation. Perhaps more details may be logged at the time of conversion.

In order to replay them I think we need to replace the SessionEvent::SessionInvalid variant with Transient and Fatal variants that just hold this SerializableError instead of the current TODO marked type SessionInvalid(String, Option<JsonReply>),

Pull Request Checklist

Please confirm the following before requesting review:

The only non-implmementation variant is in fact an error arising
from the implmentation. Therefore, remove that variant and all of the
propagation code resulting from it.
The private functions just return an InternalPayloadError. This is a
step toward separate reply error behavior for v1/v2.
@DanGould DanGould requested a review from spacebear21 September 2, 2025 16:50
{
Ok(inner) => inner,
Err(e) => {
// FIXME: follow up by returning a terminal error rather than replyable error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intentionally left in to be fixed when SessionInvalid is swapped for more specific classification.

@coveralls
Copy link
Collaborator

coveralls commented Sep 2, 2025

Pull Request Test Coverage Report for Build 17412954810

Details

  • 125 of 155 (80.65%) changed or added relevant lines in 8 files are covered.
  • 5 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.01%) to 85.902%

Changes Missing Coverage Covered Lines Changed/Added Lines %
payjoin/src/core/receive/v1/error.rs 0 1 0.0%
payjoin/src/core/receive/v1/mod.rs 16 17 94.12%
payjoin/src/core/receive/error.rs 40 46 86.96%
payjoin/src/core/receive/v2/mod.rs 39 46 84.78%
payjoin-cli/src/app/v1.rs 6 21 28.57%
Files with Coverage Reduction New Missed Lines %
payjoin-cli/src/app/v1.rs 1 71.37%
payjoin/src/core/receive/v2/error.rs 1 27.27%
payjoin/src/core/receive/error.rs 3 45.9%
Totals Coverage Status
Change from base Build 17337532308: -0.01%
Covered Lines: 8171
Relevant Lines: 9512

💛 - Coveralls

Copy link
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK - The Protocol v. Implementation error distinction is way more meaningful than ReplyToSender v. V2.

In V1, that means serializing the error as JSON and making the network call to respond as well as printing a log message. That's demonstrated here in the 5th commit. We'll need to introduce some testing around this as it firms up

I think introducing a basic test for this would fix the failing cargo-mutants check in CI.

I think all error states, fatal or transient, need to be persisted in v2+ that way on replay a client can be presented with the error as the current state, even if transient. Fatal errors may need to be replayed and marked as to whether or not they have been communicated to the counterparty or if further action is required at a later resumption.

This sounds right to me.

Comment on lines 12 to 13
/// Errors that can be replied to the sender
#[error("Replyable error: {0}")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error string and docstring need update too

@@ -9,32 +9,40 @@ use crate::error_codes::ErrorCode::{
#[non_exhaustive]
pub enum Error {
/// Errors that can be replied to the sender
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too

/// after conversion into [`JsonReply`]
#[derive(Debug)]
pub enum ReplyableError {
pub enum ProtocolError {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for this enum too

V1(e) => e.into(),
Implementation(_) => JsonReply::new(Unavailable, "Receiver error"),
#[cfg(feature = "v2")]
V2(_) => JsonReply::new(Unavailable, "Receiver error"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to implement impl From<&SessionError> for JsonReply even if all variants are handled in the same way (this is currently the case for v1::RequestError)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because such From impl is part of the public api, I figured this one liner was a simpler solution. Are yo opposed to handling RequestError that way instead here too, in contrast to your suggestion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling RequestError here instead sounds good

Before we were just responding with Display strings instead of the
correct JSON error payload.

Add a `JsonReply::status_code` function to report the status code also.
And v2 to re-named ProtocolError
@DanGould DanGould force-pushed the receive-error-heirarchy branch from 3bc42dd to 59dad2e Compare September 2, 2025 18:45
@DanGould DanGould requested a review from spacebear21 September 2, 2025 18:49
if expected_ntxid != actual_ntxid {
return Err(FinalizeProposalError::NtxidMismatch(expected_ntxid, actual_ntxid));
return Err(ImplementationError::from(
format!("Ntxid mismatch: expected {expected_ntxid}, got {actual_ntxid}").as_str(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is just a code move, but I wonder if we should be more descriptive about why this error occurs e.g. "this is not the expected Payjoin transaction"

Copy link
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code ACK 59dad2e, I can re-ACK if you want to address my newer comments, or they can be handled in a follow-up.

@DanGould DanGould merged commit 43a5a83 into payjoin:master Sep 2, 2025
14 checks passed
@DanGould DanGould deleted the receive-error-heirarchy branch September 2, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants