Skip to content

One expiration error: Separate session replay & protocol operation#1036

Merged
arminsabouri merged 6 commits intopayjoin:masterfrom
DanGould:one-expiry-error
Sep 8, 2025
Merged

One expiration error: Separate session replay & protocol operation#1036
arminsabouri merged 6 commits intopayjoin:masterfrom
DanGould:one-expiry-error

Conversation

@DanGould
Copy link
Contributor

@DanGould DanGould commented Sep 3, 2025

Do not throw an Expired error as a ReplayError variant. An expired
session does indicate an error with replay. Rather, a replay of an
expired session that has not yet reached a terminal state will produce
an error when it follows protocol and tries to create a request. That
error will create a SessionEvent::SessionInvalid (or whatever that type
turns into after follow ups) which will later be replayed by the state
machine.

This separates the concerns of the protocol from the session replay and
more accurately reflects the protocol operation. This lets us record to
what extent the protocol actually was able to operate before expiration
since all of the events will be replayed up to the terminal expiration
event.

I noticed that the receiver's final post was not checking expiration, so
I added and tested that as well after removing the replay expiration
test.

Make SessionError(pub(super) InternalSessionError) so that the error
variant can be used in matches in unit tests.

I did use claude-4-sonnet to write the skeleton of the fn test_create_post_request_fails_when_expired( so that I didn't have to fetch all of the boilerplate. After it was done I used ctrl+k in-file llm edit with gpt-4o to "prune this" to reduce some redundancy and generated comments. Then, because the robot generated assertions against the Display impl of the error I rewrote the assert statements to match against the exact variant I expected. so that the test was as precise as possible.

Pull Request Checklist

Please confirm the following before requesting review:

@coveralls
Copy link
Collaborator

coveralls commented Sep 3, 2025

Pull Request Test Coverage Report for Build 17554627187

Details

  • 74 of 74 (100.0%) changed or added relevant lines in 3 files are covered.
  • 9 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.007%) to 85.926%

Files with Coverage Reduction New Missed Lines %
payjoin/src/core/receive/v2/session.rs 9 95.91%
Totals Coverage Status
Change from base Build 17504088813: -0.007%
Covered Lines: 8224
Relevant Lines: 9571

💛 - Coveralls

let mut valid_proposal = payjoin_proposal_from_test_vector();
assert!(!matches!(
valid_proposal.create_post_request(&*EXAMPLE_URL),
Err(Error::Protocol(ProtocolError::V2(SessionError(InternalSessionError::Expired(_)))))
));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the "valid_proposal" expected to fail?

Copy link
Contributor Author

@DanGould DanGould Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not: !matches! not matches!

Perhaps assert_not! is more legible? <- that's a whole ass dependency or another macro, not part of the language.

Or would you prefer the condition be something else?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, why not is_ok() or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale here is that it's supposed to be NOT that one error for sure, but not necessarily OK or another specific error. is that not the right test condition?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we it's a "valid_proposal" it implies to me it should be "ok" but I may just be bikeshedding now.

Comment on lines 1024 to 1026
if SystemTime::now() > self.state.session_context.expiry {
return Err(InternalSessionError::Expired(self.state.session_context.expiry).into());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think of a good reason why a Receiver might want to post the proposal PSBT even after expiry, and I'm coming up short.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I don't think there is one. At that point it's optional to post the fallback and that's it.

Copy link
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SystemTime exclusion in mutants.toml needs to be updated to fix CI.

@DanGould
Copy link
Contributor Author

DanGould commented Sep 3, 2025

Fixed the mutants exclusion. Thanks for the suggestion I did not find that on my own.

&mut self,
ohttp_relay: impl IntoUrl,
) -> Result<(Request, ohttp::ClientResponse), Error> {
if SystemTime::now() > self.state.session_context.expiry {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that expired session is not a replay error. And it should have never been. However it still make sense and provides better UX to error out on protocol related errors as soon as we can.

Consider Liana hardware signer(s) that replay an expired session, collect signature(s) from their airgapped hw wallet, and then gets a session is expired error. This can be avoided if we propogate protocol errors during session replays.

Copy link
Collaborator

@arminsabouri arminsabouri Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a great suggestion. Perhaps we can add ProtocolError as a variant to InternalReplayError.

Edit: that is essentially what we do today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is simple to solve actually. When a protocol error is hit for the first time, it'll create a TerminalFailure event (SessionInvalid as of this PR) which gets persisted. When the session is replayed this is hit and that error can be displayed. I think that is current behavior. It's possible for us to check for TerminalFailure straight away in replay as an optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider Liana hardware signer(s) that replay an expired session, collect signature(s) from their airgapped hw wallet, and then gets a session is expired error

To alleviate this specific concern another expiration check could be done before any sort of async check that might have a great deal of delay before the next expiration check

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is current behavior.

Yes, I believe this is the current behavior. And this is why expired session was a special carveout. All other protocol errors would have and will be encountered during state machine progression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. For some reason I figured the request for a signature would initiate the closure that could check for expiration but I see how that's a problem. The replay itself needs to check for this protocol error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still a good reason to check expiry in create_post_request if we already checked on replay?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think so.
Its currently checked in 4 places:

  • In replay_event_log (session.rs:71)
  • In create_post_request (mod.rs:252)
  • In process_post_request (mod.rs:344)
  • In process_get_request (mod.rs:1024)

I believe create_post_request is redundant if a non-default expiry is long enough.

Copy link
Contributor Author

@DanGould DanGould Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine a receiver that never goes offline and replays. In that case I think at each stage you do indeed need to check the expiration. Because an arbitrary amount of time can pass between each stage.

The only place I can see this being unnecessary is in process_post_request if a good payload is returned and the state machine would have already progressed to the point that it's waiting on the counterparty to sign and broadcast.

if a non-default expiry is long enough

This assumption would then have to be enforced

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because an arbitrary amount of time can pass between each stage.

In practice this could be a likely outcome. A server could keep a session active in memory and never replay.

The only place I can see this being unnecessary is in process_post_request

I agree.

@DanGould DanGould marked this pull request as draft September 3, 2025 23:25
@arminsabouri arminsabouri self-assigned this Sep 5, 2025
}

let ctx =
history.session_context().expect("Session context should be present after the first event");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this expect makes more sense after we remove uninitlized as a session state #1014

@arminsabouri arminsabouri marked this pull request as ready for review September 5, 2025 18:24
Copy link
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR addresses the receiver side, I suppose we'd also want to check session expiry when replaying sender session events?


let ctx =
history.session_context().expect("Session context should be present after the first event");
if SystemTime::now() > ctx.expiry {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this needs a mutants exclusion too, see #1036 (review)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy that. I excluded this pattern: ""replace > with >= in replay_event_log","

We'll see if that works. Thanks

let mut valid_proposal = payjoin_proposal_from_test_vector();
assert!(!matches!(
valid_proposal.create_post_request(&*EXAMPLE_URL),
Err(Error::Protocol(ProtocolError::V2(SessionError(InternalSessionError::Expired(_)))))
));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we it's a "valid_proposal" it implies to me it should be "ok" but I may just be bikeshedding now.

&mut self,
ohttp_relay: impl IntoUrl,
) -> Result<(Request, ohttp::ClientResponse), Error> {
if SystemTime::now() > self.state.session_context.expiry {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still a good reason to check expiry in create_post_request if we already checked on replay?

@arminsabouri arminsabouri force-pushed the one-expiry-error branch 2 times, most recently from 85d94f2 to e5687ea Compare September 5, 2025 21:05
@arminsabouri
Copy link
Collaborator

@spacebear21
sender side is now implemented in e7406f9

@arminsabouri
Copy link
Collaborator

I am realizing the behavior for handling an expired session is different when replaying and normal state machine progressions.

if SystemTime::now() > self.context.expiry {
            return Err(InternalSessionError::Expired(self.context.expiry).into());
        }

In most cases we return an error instead of closing teh session and pushing an error event. I don't think that should be a blocker for this release. But something to figure out before the 1.0 RC release.

@arminsabouri arminsabouri added this to the payjoin-1.0 milestone Sep 8, 2025
DanGould and others added 6 commits September 8, 2025 10:42
Do not throw an Expired error as a ReplayError variant. An expired
session does indicate an error with replay. Rather, a replay of an
expired session that has not yet reached a terminal state will produce
an error when it follows protocol and tries to create a request. That
error will create a SessionEvent::SessionInvalid (or whatever that type
turns into after follow ups) which will later be replayed by the state
machine.

This separates the concerns of the protocol from the session replay and
more accurately reflects the protocol operation. This lets us record to
what extent the protocol actually was able to operate before expiration
since all of the events will be replayed up to the terminal expiration
event.

I noticed that the receiver's final post was not checking expiration, so
I added and tested that as well after removing the replay expiration
test.

Make SessionError(pub(super) InternalSessionError) so that the error
variant can be used in matches in unit tests.
extract_v2 was renamed create_v2_post_request.
Close the session and save a session expired error after replaying the
event log.
Close the session and save a session expired error after replaying
the sender event log.
Copy link
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 95a2bed

I don't think that should be a blocker for this release. But something to figure out before the 1.0 RC release.

Isn't this release the 1.0 RC release?

@arminsabouri
Copy link
Collaborator

ACK 95a2bed

I don't think that should be a blocker for this release. But something to figure out before the 1.0 RC release.

Isn't this release the 1.0 RC release?

My bad. Typo. I meant to say "shouldn't be a blocker for this PR getting merged"

@arminsabouri arminsabouri merged commit ca35bac into payjoin:master Sep 8, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants