Skip to content

Create fallback ohttp-relay logic for payjoin-cli#607

Merged
DanGould merged 2 commits intopayjoin:masterfrom
benalleng:ohttp-fallback
May 19, 2025
Merged

Create fallback ohttp-relay logic for payjoin-cli#607
DanGould merged 2 commits intopayjoin:masterfrom
benalleng:ohttp-fallback

Conversation

@benalleng
Copy link
Collaborator

@benalleng benalleng commented Mar 27, 2025

The reliance on ohttp-relays can cause some distruption if they ever go offline but we did not have any examples of a fallback mechanism for choosing the relays at random and falling back to other relays in case they fail.

This adds a new RelayState to manage which relays have already been tried and ulitmately hangs on to the first successful relay that it reaches per session.

Closes #528

@coveralls
Copy link
Collaborator

coveralls commented Mar 27, 2025

Pull Request Test Coverage Report for Build 15071763691

Details

  • 48 of 83 (57.83%) changed or added relevant lines in 4 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.1%) to 82.135%

Changes Missing Coverage Covered Lines Changed/Added Lines %
payjoin-cli/src/app/v2.rs 33 47 70.21%
payjoin/src/io.rs 6 27 22.22%
Files with Coverage Reduction New Missed Lines %
payjoin/src/io.rs 1 58.26%
Totals Coverage Status
Change from base Build 15031183812: -0.1%
Covered Lines: 5508
Relevant Lines: 6706

💛 - Coveralls

@benalleng benalleng force-pushed the ohttp-fallback branch 2 times, most recently from faa5d95 to b3c5707 Compare March 27, 2025 21:58
@DanGould
Copy link
Contributor

Nice lookin' draft. There are some code repeats that can probably be squashed but you probably know that and are just sharing early. I suppose this fixes #528? A mention in the OP will close the issue when this merges, if so. Thanks Ben

@benalleng benalleng force-pushed the ohttp-fallback branch 8 times, most recently from 36376eb to 1b80380 Compare March 29, 2025 00:43
@benalleng benalleng force-pushed the ohttp-fallback branch 5 times, most recently from ec11bd6 to 6685f85 Compare March 31, 2025 17:52
@benalleng benalleng marked this pull request as ready for review March 31, 2025 19:07
@benalleng benalleng force-pushed the ohttp-fallback branch 2 times, most recently from c424ad6 to 001379c Compare April 3, 2025 16:53
Copy link
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a once over as a way to stop blocking and make progress but bear in mind I didn't load the code and give it the most thorough review. Some questions:

  • is the http dependency new or is it already in cargo tree? If so, which version? May you please specify exactly instead of just "1"
  • same for rand. Do we need rand explicitly or is payjoin::bitcoin::rand available in all configurations?
  • How necessary are the dual code paths for _danger-local-https? To what extent may they be reduced?
  • Will the relay state mutex unlock ever be contentious (lots of clients during resume?) If so, we should use tokio::sync::Mutex to await instead of blocking the async fn. If not, the synchronous one is fine.

I did check through commit text for rationale but missed answers to these questions

Hope this allows us to continue the conversation even though more thorough review may be required

This is important work for reliability of payjoin clients in the wild, so we need to make sure this is shaping up to be an excellent reference once merged.

Comment on lines 280 to 269
let selected_relay = self.relay_state.lock().unwrap().get_selected_relay();
let ohttp_relay = match selected_relay {
Some(relay) => relay,
None => validate_relay(&self.config, self.relay_state.clone()).await?,
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has got to be a non-duplicated helper

}
}

//Local relays are incapable of acting as proxies so we must opportunistically fetch keys from the config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also GET directly from the payjoin-directory 🤔 not sure if that's any worse or less cumbersome than ohttp_keys configuration

Copy link
Collaborator Author

@benalleng benalleng Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only qualm about doing a direct GET request is that the local test that uses this then must depend on a good connection to the directory. This path currently only exists as our tests use dummy relays and pre-configured ohttp-keys so no changes are needed for those tests with these dual path functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be /// not //

@benalleng
Copy link
Collaborator Author

benalleng commented Apr 10, 2025

Thanks for the feedback, always helpful to get some nudges in the right direction

is the http dependency new or is it already in cargo tree? If so, which version? May you please specify exactly instead of just "1"

It looks like the payjoin tree uses 1.1.0 though we also need to specify the version there as well.

same for rand. Do we need rand explicitly or is payjoin::bitcoin::rand available in all configurations?

rand can definitely be just from within payjoin, we can remove the independent dep there.

How necessary are the dual code paths for _danger-local-https? To what extent may they be reduced?

These were holdouts from v1 and I think ultimately they can be somewhat reduced though there will be some separation assuming we desire to keep the tests local and not having to actually reach out to a server.
Ultimately I think that the _danger-local-https flag here is useful for our tests as we can use local relays and directories without needing a good connection to an actual directory.

Will the relay state mutex unlock ever be contentious (lots of clients during resume?) If so, we should use tokio::sync::Mutex to await instead of blocking the async fn. If not, the synchronous one is fine.

I'll need to take a closer look at this. See below

@benalleng benalleng force-pushed the ohttp-fallback branch 4 times, most recently from 13b6273 to a9af1e3 Compare April 10, 2025 18:21
@benalleng benalleng force-pushed the ohttp-fallback branch 3 times, most recently from 8bb7896 to f13ddfa Compare May 13, 2025 20:41
@benalleng
Copy link
Collaborator Author

benalleng commented May 13, 2025

Will the relay state mutex unlock ever be contentious (lots of clients during resume?) If so, we should use tokio::sync::Mutex to await instead of blocking the async fn. If not, the synchronous one is fine.

I do not think we need to use async Mutex as all of our mutex unlocking and usage is done within small blocks and our mutexes are dropped before any sort of await call is made.

Comment on lines 493 to 497
let status_code = error
.to_string()
.split_whitespace()
.find_map(|code| http::StatusCode::from_str(code).ok())
.expect("Could not find status code");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing some string splits here instead of changing the payjoin io Error::UnexpectedStatusCode It still feels a bit brittle, curious about other ideas you might have beyond making InternalError public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the error public is the way to go, perhaps with just an/some private variants

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will revert this logic back to 6685f85

@benalleng benalleng requested a review from DanGould May 13, 2025 21:13
@benalleng benalleng force-pushed the ohttp-fallback branch 2 times, most recently from b3ea7d5 to 78ee4f2 Compare May 14, 2025 02:32
@DanGould
Copy link
Contributor

I think I spoke imprecisely. The Internal Error should remain private. If it's time to make certain variants that are public, but keep an Internal Variant that's private, I think that would solve the problem.

Copy link
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling proposed after my last comment is more complicated than I had imagined. I don't think we need a pub enum PublicError type because we already have an Error type that is public, it just needs to be a non-exhaustive enum with an Internal variant.

Since we've gone back and forth a couple of times, I took the liberty to write up a sketch of what I'm talking about. If we're not stabilizing the error and would rather kick the error can down the road this seems like the incremental step forward:

Read the last 2 commits here: https://github.com/DanGould/rust-payjoin/tree/ohttp-fallback

and specifically check out this section

https://github.com/DanGould/rust-payjoin/blob/83d8f7888ff7e2a6522d0c67028711501424246f/payjoin/src/io.rs#L65-L84

hyper-util = { version = "0.1", optional = true }
log = "0.4.7"
payjoin = { version = "0.23.0", default-features = false }
payjoin = { path="../payjoin", default-features = false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't need to be here. We're already patching payjoin from crates.io with our local path in the top level Cargo.toml

[patch.crates-io.payjoin]
path = "payjoin"

@benalleng
Copy link
Collaborator Author

benalleng commented May 15, 2025

what msrv is #[allow(private_interfaces)] included in?
https://github.com/DanGould/rust-payjoin/actions/runs/15049802630/job/42301513021#step:5:803

This isn't included until 1.72.0.

@benalleng benalleng force-pushed the ohttp-fallback branch 2 times, most recently from 0fdc761 to c97f23f Compare May 15, 2025 17:12
@benalleng
Copy link
Collaborator Author

benalleng commented May 15, 2025

Since using this #[allow(public_interfaces)] + #[doc(hidden)] approach won't work with our .msrv is it ok to make the InternalError variants public as well while keeping them doc hidden like I did in c97f23f? Or should we find another approach?
I guess we could do something along the lines of just making all the variants of InternalError doc hidden except our UnexpectedStatusCode error.

@DanGould
Copy link
Contributor

One fix would be to make the current InternalError into InternalErrorInner and then make a pub struct InternalError(InternalErrorInner).

Otherwise, #[doc(hidden)] on both the Internal Variant AND pub enum InternalError could work.

@benalleng benalleng force-pushed the ohttp-fallback branch 2 times, most recently from 3c4d612 to 83abee1 Compare May 15, 2025 17:48
Copy link
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you dealt with the error now.

However we've got some .unwrap()s on locks. Shouldn't those be .expect()s?

{
cmd = cmd.arg(
Arg::new("ohttp_relay")
Arg::new("ohttp_relays")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? Someone passes --ohttp_relays "https://a.com" --ohttp_relays "https:://b.com"? Wouldn't we still want the singular --ohttp_relay name? Or do you expect them to be passed --ohttp_relays "https:://b.com","https://b.com ?

https://docs.rs/clap-action-command/latest/clap_action_command/fn.get_many.html

Copy link
Collaborator Author

@benalleng benalleng May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we pass them as one or more value like --ohttp-relays "https://pj.benalleng.com,https://pj.bobspacebkk.com" or --ohttp-relays "https://pj.benalleng.com"?

Comment on lines 426 to 444
match ohttp_keys {
Ok(keys) => return Ok(Some(keys)),
Err(error) => match error {
payjoin::io::Error(InternalError::UnexpectedStatusCode(_)) => {
log::debug!("{error:?}");
return Err(error.into());
}
_ => {
log::debug!("Failed to connect to relay: {selected_relay}, {error:?}");
relay_state.lock().unwrap().add_failed_relay(selected_relay);
}
},
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same flatten done to the other match should probably happen here too

@benalleng benalleng force-pushed the ohttp-fallback branch 5 times, most recently from 421e017 to c57927c Compare May 15, 2025 19:44
DanGould and others added 2 commits May 16, 2025 11:16
So that they can be matched on in downstream logic. Keep Internal
variants private.
The reliance on ohttp-relays can cause some distruption
if they ever go offline but we did not have any examples of
a fallback mechanism for choosing the relays at random and falling back
to other relays in case they fail.

This adds a new RelayState to manage which relays have already been tried
and ulitmately hangs on to the first successful relay that it reaches.
Copy link
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 152393f

This is way more reliable. I think we need to think about what errors we exactly expose publicly (re: #403 but moreso to be a good reference example) and see if we abstract the RelayState away from the main v2.rs file. We can also very likely remove some of the duplicate flags for local-https in the follow up.

@DanGould DanGould merged commit 5b7678a into payjoin:master May 19, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document and reference best practice for OHTTP Relay selection

3 participants