Skip to content

Static invoice server #3628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

valentinewallace
Copy link
Contributor

As part of supporting async payments, we want to support serving static invoices on behalf of an often-offline payee, in response to invoice requests from payers.

See this doc for more details on the protocol. Here we implement the server-side of the linked protocol.

Based on #3618

@valentinewallace valentinewallace force-pushed the 2025-02-static-invoice-server branch from 1c6073b to 4a287ea Compare May 29, 2025 16:58
Copy link

codecov bot commented May 29, 2025

Codecov Report

Attention: Patch coverage is 40.42553% with 28 lines in your changes missing coverage. Please review.

Project coverage is 88.84%. Comparing base (6771d84) to head (685a364).

Files with missing lines Patch % Lines
lightning/src/util/test_utils.rs 52.17% 7 Missing and 4 partials ⚠️
lightning/src/offers/static_invoice.rs 0.00% 9 Missing ⚠️
lightning/src/offers/invoice.rs 0.00% 3 Missing ⚠️
lightning/src/offers/invoice_macros.rs 0.00% 3 Missing ⚠️
lightning/src/ln/channelmanager.rs 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3628      +/-   ##
==========================================
- Coverage   88.89%   88.84%   -0.05%     
==========================================
  Files         165      165              
  Lines      118886   118931      +45     
  Branches   118886   118931      +45     
==========================================
- Hits       105680   105665      -15     
- Misses      10889    10941      +52     
- Partials     2317     2325       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@valentinewallace valentinewallace force-pushed the 2025-02-static-invoice-server branch 4 times, most recently from 23f3276 to 706e277 Compare June 10, 2025 21:18
In upcoming commits, we need to check whether a static invoice or its underlying
offer is expired in no-std builds. Here we expose the methods to do so. The
methods could instead be kept private to the crate, but they seem potentially
useful.
@valentinewallace valentinewallace force-pushed the 2025-02-static-invoice-server branch from 706e277 to f5a78e4 Compare June 30, 2025 22:46
@valentinewallace valentinewallace marked this pull request as ready for review June 30, 2025 22:46
As part of serving static invoices to payers on behalf of often-offline
recipients, these recipients need a way to contact the static invoice server to
retrieve blinded paths to include in their offers.

Add a utility to create blinded paths for this purpose as a static invoice
server. The recipient will be configured with the resulting paths and use them
to request offer paths on startup.
@valentinewallace valentinewallace force-pushed the 2025-02-static-invoice-server branch from f5a78e4 to 2dc0bc6 Compare June 30, 2025 23:27
@valentinewallace
Copy link
Contributor Author

Still need to write a handful of tests but this should be good for a look.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a high level pass. Fairly straight-forward and less complicated than the client.

Two main points:

  • Persistence/rate limiting is now offloaded to the user of LDK and/or to LDK node. Perhaps code should be provided that takes a KVStore and makes it happen.

    Related: persistence for static invoices is event based, but for chain and channelmonitor there are the Persist and Persister traits. Perhaps the same concept should be used?

  • A lot of the 'refresh' logic is handled through expiry, but does that work if you can't control your peers?

pub fn is_offer_expired_no_std(&self, duration_since_epoch: Duration) -> bool {
self.contents.is_offer_expired_no_std(duration_since_epoch)
}

#[allow(unused)] // TODO: remove this once we remove the `async_payments` cfg flag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove cfg flag in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't, it might be worth to switch these TODOs to cfg_attr(async_payments, allow(unused)), which would make sure we will remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't remove the cfg flag until we have blinded path auth at a minimum (either via #3845 or re-adding the nonce and hmacs that were previously present in #3618). There may be other requirements as well, need to discuss with @TheBlueMatt to get on the same page with which follow-ups documented on #2298 are blocking.

Copy link
Contributor

@joostjager joostjager Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the cfg flag be removed everywhere except for the points/gates where auth would be required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't remove them right now, it might also be worth to switch the cfg(async_payments) CI to cover the entire workspace, not only the lightning crate, see #3904

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the cfg flag be removed everywhere except for the points/gates where auth would be required?

Maybe, but it does feel weird to publicly expose unusable APIs? Feels like it will be less work to expose them all in one go then surgically remove a few now IMO but we can go down that route if people feel strongly.

Aiming to take a look at solving #3904 tomorrow, thanks @tnull for catching that!

Copy link
Contributor

@joostjager joostjager Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's better to have as few conditional compiles as possible. The onion message handlers and the public API only. Indeed to allow as many bugs as possible to be caught without a separate async_payments build.

Currently there are 78 #[cfg(async_payments)] directives. I'd expect that 90% can be removed already now.

Also interested to know how much work re-adding the nonce and hmacs really is. If that's an hour work, I'd just do it to get rid of the dependency on another PR.

///
/// ## Usage
/// 1. Static invoice server calls [`Self::blinded_paths_for_async_recipient`]
/// 2. Static invoice server communicates the resulting paths out-of-band to the async recipient,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this out-of-band communication work in practice? Maybe a LSP proprietary comm channel to the LSP clients?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might indeed be interesting to explore an LSP spec extension protocol based on bLIP-50 to transmit the paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, LSPs frequently need a way to communicate out-of-band anyway, such as transmitting intercept SCIDs for inclusion in a recipient's invoice. I discussed the out-of-band approach in this PR with the Lexe folks and they seemed comfortable with it. Not opposed to an LSP spec extension at all though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that out of band channel exists anyway, it could have been used for storing the static invoices too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is worth to think this through a bit more outside the context of this PR. What are the instructions/recommendations that we'd give users that want to run an LSP with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the instructions are largely in the new event docs, let me know if you think there's something missing. Of course pending the outcome of this discussion #3628 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was specifically thinking of out-of-band communication for transmitting paths to the static invoice server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that'd be pretty specific to the LSP's architecture. Not sure if we want to go down the road of describing what websockets/REST calls are. Also have very limited platform-specific knowledge myself 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just wondering why we defined the recipient<->server protocol via onion messages when the LSP already has a communication channel with its clients? (lsp spec, websockets, or something else).

) -> (msgs::OnionMessage, msgs::OnionMessage) {
// First provide an OfferPathsRequest from the recipient to the server.
recipient.node.timer_tick_occurred();
let offer_paths_req = loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this loop need a sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought perhaps it is polling and taking up max cpu without a sleep

Copy link
Contributor Author

@valentinewallace valentinewallace Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no the messages are all there already but the onion messenger won't pull them from the handlers unless next_onion_message_for_peer is called at the moment, hence the loop.

I'll address this by changing the test-only method release_pending_messages to automatically pull from all the message handlers (if it doesn't break too many tests), which should allow us to get rid of all the loops.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first high-level look.

I agree with some of points Joost raised here. In particular, it might be worth exploring whether/how we can enforce rate-limiting across related contexts. In lightning-liquidity / for LSPS5 we now concluded that we'll enforce rate-limiting where possible, but also piggyback on top of the LSPS1/LSPS2 flows, which allow to authenticate peers via the user-provided token field (that an LSP might or might not require).

///
/// ## Usage
/// 1. Static invoice server calls [`Self::blinded_paths_for_async_recipient`]
/// 2. Static invoice server communicates the resulting paths out-of-band to the async recipient,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might indeed be interesting to explore an LSP spec extension protocol based on bLIP-50 to transmit the paths.

/// [`ChannelManager::blinded_paths_for_async_recipient`]: crate::ln::channelmanager::ChannelManager::blinded_paths_for_async_recipient
/// [`ChannelManager::set_paths_to_static_invoice_server`]: crate::ln::channelmanager::ChannelManager::set_paths_to_static_invoice_server
#[cfg(async_payments)]
PersistStaticInvoice {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there has been prior discussion around this that I'm unware of, but it strikes me as a bit odd to manage persistence via the event queue, if we have KVStore trait readily available in the crate.

Pushing/popping the event will result in at least two additional ChannelManager repersists. Assuming that ~all users run this against the same persistence backend, I wonder why/if this approach is preferable in any way. I think @joostjager brought up a similar question above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to the approach that was taken in #2973, though unfortunately I can't find a link to any discussion that took place landing on that approach...

But, either way, I think the server needs to verify the recipient_id and rate limit them in their business logic before any persistence takes place such as in KVStore. I agree the events are weird but it feels like rate limiting and persistence is something that should be handed in the layer that actually manages the database itself. Let me know what you think.

Pushing/popping the event will result in at least two additional ChannelManager repersists

Hopefully @joostjager and I can get rid of ChannelManager persistence entirely after this feature is done 😅

Copy link
Contributor

@tnull tnull Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully @joostjager and I can get rid of ChannelManager persistence entirely after this feature is done 😅

Hmm, but even then we'd repersist the event queue twice, no? Granted, it would be much less overhead, but we'd still eat the IO latency once or twice before actually getting to persist the data (incurring more IO latency).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link to KVStore needs to be made at some point anyway to get a working server. If we don't do it in LDK, the work is just shifted to LDK-node? Then if we do it here and can cut out the event queue and event queue persistence, the overall amount of work might be less.

I do see some intersection with #3778. The events as proposed in this PR would allow for async persistence, but it is yet again another way to do it alongside a native async interface and the InProgress/callback mechanism in chain monitor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but even then we'd repersist the event queue twice, no? Granted, it would be much less overhead, but we'd still eat the IO latency once or twice before actually getting to persist the data (incurring more IO latency).

Hmm... Thoughts on a separate PR that would allow us to process the event queue and skip manager persistence entirely for for certain events? With these events and several others I can think of, no manager persistence is needed and an event being dropped/failed to handle/redundantly handled should not be a problem.

I guess I'm just not sure how we can properly rate limit without knowing anything about the underlying database and its limitations. It feels like an instance where finer-grained control might be desirable and rust-lightning is the wrong layer, unlike with the current uses of KVStore where all channel updates are definitely necessary to persist. Also, I still think users need to authenticate recipient_ids in their business logic before any persistence takes place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming almost all users would use ldk-node, it doesn't matter so much in which layer KVStore is accessed. To me it seems fine to do it in ldk-node. I am just not very happy with the event mechanism used here. I think a StaticInvoicePersister trait would be simpler, faster (no event persistence) and better encapsulated (not part of a pile of events of all types).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming almost all users would use ldk-node, it doesn't matter so much in which layer KVStore is accessed. To me it seems fine to do it in ldk-node. I am just not very happy with the event mechanism used here. I think a StaticInvoicePersister trait would be simpler, faster (no event persistence) and better encapsulated (not part of a pile of events of all types).

I agree with this sentiment: generally I don't have too much of a strong opinion on how we do it, but (ab)using the persisted event queue for persistence calls seems like unnecessary overhead, IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, I think we can avoid persisting the event queue when these events are present specifically, as well as existing non-critical events that are in the same boat like ConnectionNeeded, OnionMessagePeerConnected.

But, a separate trait is another approach that has been discussed in the past that we can definitely revisit, will start an offline discussion.

As part of serving static invoices to payers on behalf of often-offline
recipients, we need to provide the async recipient with blinded message paths
to include in their offers.

Support responding to inbound requests for offer paths from async recipients.
As part of serving static invoices to payers on behalf of often-offline
recipients, the recipient will send us the final static invoice once it's done
being interactively built. We will then persist this invoice and confirm to
them that the corresponding offer is ready to be used for async payments.

Surface an event once the invoice is received and expose an API to tell the
recipient that it's ready for payments.
Here we implement serving static invoices to payers on behalf of often-offline
recipients. These recipients previously encoded blinded paths terminating at
our node in their offer, so we receive invoice requests on their behalf.

Handle those inbound invreqs by retrieving a static invoice we previously
persisted on behalf of the payee, and forward it to the payer as a reply to
their invreq.
We're about to add a bunch more async payments tests, so take this opportunity
to clean up the existing tests by no longer hardcoding the keysend payment
preimage bytes ahead of time. This previously caused an MPP test to spuriously
fail because all the session_privs were the same, and is generally not ideal.

Also add a few comments to an existing test and a few more trivial cleanups.
We were manually creating the static invoice in tests, but now we can use the
static invoice server protocol to interactively build the invoice.
@valentinewallace valentinewallace force-pushed the 2025-02-static-invoice-server branch from 2dc0bc6 to 685a364 Compare July 1, 2025 20:12
@valentinewallace
Copy link
Contributor Author

Addressed feedback. Diff

I think the main thing outstanding to discuss is whether the event-based approach makes sense conceptually or whether more work should be shifted into LDK and we should persist directly to the KVStore (correct me if that's off). Continuing discussion in this thread.

@valentinewallace valentinewallace self-assigned this Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants