-
Notifications
You must be signed in to change notification settings - Fork 415
Static invoice server #3628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Static invoice server #3628
Conversation
1c6073b
to
4a287ea
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3628 +/- ##
==========================================
- Coverage 88.89% 88.84% -0.05%
==========================================
Files 165 165
Lines 118886 118931 +45
Branches 118886 118931 +45
==========================================
- Hits 105680 105665 -15
- Misses 10889 10941 +52
- Partials 2317 2325 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4a287ea
to
93bd4d0
Compare
23f3276
to
706e277
Compare
In upcoming commits, we need to check whether a static invoice or its underlying offer is expired in no-std builds. Here we expose the methods to do so. The methods could instead be kept private to the crate, but they seem potentially useful.
706e277
to
f5a78e4
Compare
As part of serving static invoices to payers on behalf of often-offline recipients, these recipients need a way to contact the static invoice server to retrieve blinded paths to include in their offers. Add a utility to create blinded paths for this purpose as a static invoice server. The recipient will be configured with the resulting paths and use them to request offer paths on startup.
f5a78e4
to
2dc0bc6
Compare
Still need to write a handful of tests but this should be good for a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a high level pass. Fairly straight-forward and less complicated than the client.
Two main points:
-
Persistence/rate limiting is now offloaded to the user of LDK and/or to LDK node. Perhaps code should be provided that takes a
KVStore
and makes it happen.Related: persistence for static invoices is event based, but for chain and channelmonitor there are the
Persist
andPersister
traits. Perhaps the same concept should be used? -
A lot of the 'refresh' logic is handled through expiry, but does that work if you can't control your peers?
pub fn is_offer_expired_no_std(&self, duration_since_epoch: Duration) -> bool { | ||
self.contents.is_offer_expired_no_std(duration_since_epoch) | ||
} | ||
|
||
#[allow(unused)] // TODO: remove this once we remove the `async_payments` cfg flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove cfg flag in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't, it might be worth to switch these TODOs to cfg_attr(async_payments, allow(unused))
, which would make sure we will remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't remove the cfg flag until we have blinded path auth at a minimum (either via #3845 or re-adding the nonce and hmacs that were previously present in #3618). There may be other requirements as well, need to discuss with @TheBlueMatt to get on the same page with which follow-ups documented on #2298 are blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't the cfg flag be removed everywhere except for the points/gates where auth would be required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't remove them right now, it might also be worth to switch the cfg(async_payments)
CI to cover the entire workspace, not only the lightning
crate, see #3904
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't the cfg flag be removed everywhere except for the points/gates where auth would be required?
Maybe, but it does feel weird to publicly expose unusable APIs? Feels like it will be less work to expose them all in one go then surgically remove a few now IMO but we can go down that route if people feel strongly.
Aiming to take a look at solving #3904 tomorrow, thanks @tnull for catching that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's better to have as few conditional compiles as possible. The onion message handlers and the public API only. Indeed to allow as many bugs as possible to be caught without a separate async_payments build.
Currently there are 78 #[cfg(async_payments)]
directives. I'd expect that 90% can be removed already now.
Also interested to know how much work re-adding the nonce and hmacs really is. If that's an hour work, I'd just do it to get rid of the dependency on another PR.
/// | ||
/// ## Usage | ||
/// 1. Static invoice server calls [`Self::blinded_paths_for_async_recipient`] | ||
/// 2. Static invoice server communicates the resulting paths out-of-band to the async recipient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this out-of-band communication work in practice? Maybe a LSP proprietary comm channel to the LSP clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might indeed be interesting to explore an LSP spec extension protocol based on bLIP-50 to transmit the paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, LSPs frequently need a way to communicate out-of-band anyway, such as transmitting intercept SCIDs for inclusion in a recipient's invoice. I discussed the out-of-band approach in this PR with the Lexe folks and they seemed comfortable with it. Not opposed to an LSP spec extension at all though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that out of band channel exists anyway, it could have been used for storing the static invoices too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is worth to think this through a bit more outside the context of this PR. What are the instructions/recommendations that we'd give users that want to run an LSP with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the instructions are largely in the new event docs, let me know if you think there's something missing. Of course pending the outcome of this discussion #3628 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was specifically thinking of out-of-band communication for transmitting paths to the static invoice server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that'd be pretty specific to the LSP's architecture. Not sure if we want to go down the road of describing what websockets/REST calls are. Also have very limited platform-specific knowledge myself 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just wondering why we defined the recipient<->server protocol via onion messages when the LSP already has a communication channel with its clients? (lsp spec, websockets, or something else).
) -> (msgs::OnionMessage, msgs::OnionMessage) { | ||
// First provide an OfferPathsRequest from the recipient to the server. | ||
recipient.node.timer_tick_occurred(); | ||
let offer_paths_req = loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this loop need a sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I thought perhaps it is polling and taking up max cpu without a sleep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no the messages are all there already but the onion messenger won't pull them from the handlers unless next_onion_message_for_peer
is called at the moment, hence the loop.
I'll address this by changing the test-only method release_pending_messages
to automatically pull from all the message handlers (if it doesn't break too many tests), which should allow us to get rid of all the loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first high-level look.
I agree with some of points Joost raised here. In particular, it might be worth exploring whether/how we can enforce rate-limiting across related contexts. In lightning-liquidity
/ for LSPS5 we now concluded that we'll enforce rate-limiting where possible, but also piggyback on top of the LSPS1/LSPS2 flows, which allow to authenticate peers via the user-provided token
field (that an LSP might or might not require).
/// | ||
/// ## Usage | ||
/// 1. Static invoice server calls [`Self::blinded_paths_for_async_recipient`] | ||
/// 2. Static invoice server communicates the resulting paths out-of-band to the async recipient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might indeed be interesting to explore an LSP spec extension protocol based on bLIP-50 to transmit the paths.
/// [`ChannelManager::blinded_paths_for_async_recipient`]: crate::ln::channelmanager::ChannelManager::blinded_paths_for_async_recipient | ||
/// [`ChannelManager::set_paths_to_static_invoice_server`]: crate::ln::channelmanager::ChannelManager::set_paths_to_static_invoice_server | ||
#[cfg(async_payments)] | ||
PersistStaticInvoice { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there has been prior discussion around this that I'm unware of, but it strikes me as a bit odd to manage persistence via the event queue, if we have KVStore
trait readily available in the crate.
Pushing/popping the event will result in at least two additional ChannelManager
repersists. Assuming that ~all users run this against the same persistence backend, I wonder why/if this approach is preferable in any way. I think @joostjager brought up a similar question above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to the approach that was taken in #2973, though unfortunately I can't find a link to any discussion that took place landing on that approach...
But, either way, I think the server needs to verify the recipient_id
and rate limit them in their business logic before any persistence takes place such as in KVStore
. I agree the events are weird but it feels like rate limiting and persistence is something that should be handed in the layer that actually manages the database itself. Let me know what you think.
Pushing/popping the event will result in at least two additional ChannelManager repersists
Hopefully @joostjager and I can get rid of ChannelManager
persistence entirely after this feature is done 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully @joostjager and I can get rid of
ChannelManager
persistence entirely after this feature is done 😅
Hmm, but even then we'd repersist the event queue twice, no? Granted, it would be much less overhead, but we'd still eat the IO latency once or twice before actually getting to persist the data (incurring more IO latency).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to KVStore
needs to be made at some point anyway to get a working server. If we don't do it in LDK, the work is just shifted to LDK-node? Then if we do it here and can cut out the event queue and event queue persistence, the overall amount of work might be less.
I do see some intersection with #3778. The events as proposed in this PR would allow for async persistence, but it is yet again another way to do it alongside a native async interface and the InProgress/callback mechanism in chain monitor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but even then we'd repersist the event queue twice, no? Granted, it would be much less overhead, but we'd still eat the IO latency once or twice before actually getting to persist the data (incurring more IO latency).
Hmm... Thoughts on a separate PR that would allow us to process the event queue and skip manager persistence entirely for for certain events? With these events and several others I can think of, no manager persistence is needed and an event being dropped/failed to handle/redundantly handled should not be a problem.
I guess I'm just not sure how we can properly rate limit without knowing anything about the underlying database and its limitations. It feels like an instance where finer-grained control might be desirable and rust-lightning
is the wrong layer, unlike with the current uses of KVStore
where all channel updates are definitely necessary to persist. Also, I still think users need to authenticate recipient_id
s in their business logic before any persistence takes place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming almost all users would use ldk-node, it doesn't matter so much in which layer KVStore
is accessed. To me it seems fine to do it in ldk-node. I am just not very happy with the event mechanism used here. I think a StaticInvoicePersister
trait would be simpler, faster (no event persistence) and better encapsulated (not part of a pile of events of all types).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming almost all users would use ldk-node, it doesn't matter so much in which layer
KVStore
is accessed. To me it seems fine to do it in ldk-node. I am just not very happy with the event mechanism used here. I think aStaticInvoicePersister
trait would be simpler, faster (no event persistence) and better encapsulated (not part of a pile of events of all types).
I agree with this sentiment: generally I don't have too much of a strong opinion on how we do it, but (ab)using the persisted event queue for persistence calls seems like unnecessary overhead, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, I think we can avoid persisting the event queue when these events are present specifically, as well as existing non-critical events that are in the same boat like ConnectionNeeded
, OnionMessagePeerConnected
.
But, a separate trait is another approach that has been discussed in the past that we can definitely revisit, will start an offline discussion.
As part of serving static invoices to payers on behalf of often-offline recipients, we need to provide the async recipient with blinded message paths to include in their offers. Support responding to inbound requests for offer paths from async recipients.
As part of serving static invoices to payers on behalf of often-offline recipients, the recipient will send us the final static invoice once it's done being interactively built. We will then persist this invoice and confirm to them that the corresponding offer is ready to be used for async payments. Surface an event once the invoice is received and expose an API to tell the recipient that it's ready for payments.
Here we implement serving static invoices to payers on behalf of often-offline recipients. These recipients previously encoded blinded paths terminating at our node in their offer, so we receive invoice requests on their behalf. Handle those inbound invreqs by retrieving a static invoice we previously persisted on behalf of the payee, and forward it to the payer as a reply to their invreq.
We're about to add a bunch more async payments tests, so take this opportunity to clean up the existing tests by no longer hardcoding the keysend payment preimage bytes ahead of time. This previously caused an MPP test to spuriously fail because all the session_privs were the same, and is generally not ideal. Also add a few comments to an existing test and a few more trivial cleanups.
We were manually creating the static invoice in tests, but now we can use the static invoice server protocol to interactively build the invoice.
2dc0bc6
to
685a364
Compare
Addressed feedback. Diff I think the main thing outstanding to discuss is whether the event-based approach makes sense conceptually or whether more work should be shifted into LDK and we should persist directly to the |
As part of supporting async payments, we want to support serving static invoices on behalf of an often-offline payee, in response to invoice requests from payers.
See this doc for more details on the protocol. Here we implement the server-side of the linked protocol.
Based on #3618