Skip to content

Resolve forwarded HTLC claims via MonitorEvents#4524

Draft
valentinewallace wants to merge 33 commits intolightningdevkit:mainfrom
valentinewallace:2026-03-monitor-resolves-fwds
Draft

Resolve forwarded HTLC claims via MonitorEvents#4524
valentinewallace wants to merge 33 commits intolightningdevkit:mainfrom
valentinewallace:2026-03-monitor-resolves-fwds

Conversation

@valentinewallace
Copy link
Copy Markdown
Contributor

@valentinewallace valentinewallace commented Mar 30, 2026

As part of #4482, we're looking into changing our architecture -- currently, the Channel{Manager} is responsible for managing the resolution of off-chain HTLCs, and the ChannelMonitor is responsible for them once they're on-chain. See the issue description but there's complexity that results from this design.

Quoting the issue, "Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them - adding a new MonitorEvent resolution path through a new method (rather than via ChannelMonitorUpdates)."

Here we begin resolving forwarded HTLC claims using MonitorEvents: when a preimage is persisted in the ChannelMonitor via ChannelMonitorUpdateStep::LatestHolderCommitment{TXInfo}, we generate a persistent monitor event containing the preimage update. This monitor event will be regenerated until the preimage is durably persisted in the inbound edge, at which point it is ACK'd. This allows us to disable a bunch of startup reconstruction code if persistent_monitor_events is enabled, because we can just wait for the monitor event replay instead. We also stop using RAA monitor update blocking actions for forwards because there is no need to block the outbound edge's RAA update; instead, the monitor event will keep being replayed until the inbound edge has the preimage.

Based on #4491, #4545

@ldk-reviews-bot
Copy link
Copy Markdown

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

@valentinewallace valentinewallace changed the title 2026 03 monitor resolves fwds Resolve forwarded HTLCs via MonitorEvents Mar 30, 2026
@valentinewallace valentinewallace changed the title Resolve forwarded HTLCs via MonitorEvents Resolve forwarded HTLC claims via MonitorEvents Mar 30, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 83.90646% with 117 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.06%. Comparing base (044f3fa) to head (01fab23).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 77.46% 61 Missing and 12 partials ⚠️
lightning/src/chain/channelmonitor.rs 91.05% 14 Missing and 8 partials ⚠️
lightning/src/ln/channel.rs 88.00% 10 Missing and 2 partials ⚠️
lightning/src/ln/functional_test_utils.rs 30.76% 9 Missing ⚠️
lightning/src/chain/chainmonitor.rs 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4524      +/-   ##
==========================================
+ Coverage   86.19%   87.06%   +0.86%     
==========================================
  Files         161      163       +2     
  Lines      108611   109252     +641     
  Branches   108611   109252     +641     
==========================================
+ Hits        93621    95119    +1498     
+ Misses      12356    11630     -726     
+ Partials     2634     2503     -131     
Flag Coverage Δ
fuzzing 40.25% <50.21%> (?)
tests 86.15% <83.21%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@valentinewallace valentinewallace added the blocked on next release Should Wait Until Next Release To Land label Mar 30, 2026
@valentinewallace valentinewallace force-pushed the 2026-03-monitor-resolves-fwds branch from 43ca8ab to 01fab23 Compare March 31, 2026 14:53
Helps when debugging to know which variants failed.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them.  This
simplify things - on restart instead of examining the set of HTLCs in monitors
we can simply replay all the pending MonitorEvents.

As a first step towards this, here we persist a flag in the ChannelManager and
ChannelMonitors indicating whether this new feature is enabled. It will be used
in upcoming commits to maintain compatibility and create an upgrade/downgrade
path between LDK versions.
Cleans up the next commit
This field will be deprecated in upcoming commits when we start persisting
MonitorEvent ids alongside the MonitorEvents.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them.  This
simplify things - on restart instead of examining the set of HTLCs in monitors
we can simply replay all the pending MonitorEvents.

Here we add an as-yet-unused API to chain::Watch to allow the ChannelManager to
tell the a ChannelMonitor that a MonitorEvent has been irrevocably processed
and can be deleted.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them.  This
simplify things - on restart instead of examining the set of HTLCs in monitors
we can simply replay all the pending MonitorEvents.

To allow the ChannelManager to ack specific monitor events once they are
resolved in upcoming commits, here we give each MonitorEvent a corresponding
unique id. It's implemented in such a way that we can delete legacy monitor
event code in the future when the new persistent monitor events flag is enabled
by default.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Here for the purposes of merging initial support for persistent monitor events,
we ack each immediately after it is received/handled by the ChannelManager,
which is equivalent to the behavior we had prior to monitor events becoming
persistent.

In upcoming work, we'll want to have much more special handling for HTLCUpdate
monitor events in particular -- e.g. for outbound payment claim events, we
should only ACK the monitor event when the PaymentSent event is processed,
until that point we want it to keep being provided back to us on startup.

All the other monitor events are trivial to ACK, since they don't need to be
re-processed on startup.
@valentinewallace valentinewallace force-pushed the 2026-03-monitor-resolves-fwds branch from 01fab23 to 5248533 Compare April 7, 2026 18:26
@valentinewallace valentinewallace added this to the 0.4 milestone Apr 7, 2026
@valentinewallace valentinewallace force-pushed the 2026-03-monitor-resolves-fwds branch from 5248533 to 6c8a07b Compare April 7, 2026 19:03
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Here we complete work that was built on recent prior commits and actually start
re-providing monitor events on startup if they went un-acked during runtime.
This isn't actually supported in prod yet, so this new code will run randomly
in tests, to ensure we still support the old paths.
And log them in check_added_monitors if it fails.
Will be used in upcoming commits when generating MonitorEvents.
Processing MonitorEvent::HTLCEvent causes the ChannelManager to call
claim_funds_internal, but it will currently pass in None for the
user_channel_id parameter. In upcoming commits when we begin generating monitor
events for off-chain HTLC claims as well as onchain, we'll want to start using
an accurate value instead.
Used in an upcoming commit to insert a pending payment if it's missing on
startup and we need to re-claim.
Used in an upcoming commit to generate an
EventCompletionAction::AckMonitorEvent.
In upcoming commits, we'll be generating monitor events for off-chain claims as
well as on-chain. As a small prefactor, calculate the from_onchain value rather
than hardcoding it to true.
If ChannelManager::persistent_monitor_events is enabled, we may want to avoid
acking a monitor event until after an Event is processed by the user. In
upcoming commits, we'll use this to ensure a MonitorEvent::HTLCEvent will keep
being re-provided back to us until after an Event::PaymentSent is processed.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

In recent work, we added support for keeping monitor events around until they
are explicitly acked by the ChannelManager, but would always ack monitor events
immediately, which preserved the previous behavior and didn't break any tests.

Up until this point, we only generated HTLC monitor events when a payment was
claimed/failed on-chain.

In this commit, we start generating persistent monitor events whenever a payment
is claimed *off*-chain, specifically when new latest holder commitment data is
provided to the monitor.

For the purpose of making incremental progress on this feature, these events
will be a no-op and/or continue to be acked immediately except in the narrow
case of an off-chain outbound payment claim. HTLC forward claim monitor events
will be a no-op, and on-chain outbound payment claim events continue to be
acked immediately.

Off-chain outbound payment claims, however, now have monitor events generated
for them that will not be acked by the ChannelManager until the PaymentSent
event is processed by the user. This also allows us to stop blocking the RAA
monitor update that removes the preimage, because the purpose of that behavior
was to ensure the user got a PaymentSent event and the monitor event now serves
that purpose instead.
This isn't a bug at the moment because a claim in this situation would already
be filtered out due to its inclusion in htlcs_resolved_to_user. However, when
we stop issuing ReleasePaymentComplete monitor updates for claims in upcoming
commits, HTLC claims will no longer be in htlcs_resolved_to_user when
get_onchain_failed_htlcs checks.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

In recent work, we added support for keeping monitor events around until they
are explicitly acked by the ChannelManager, but would always ack monitor events
immediately, which preserved the previous behavior and didn't break any tests.

Here we start acking monitor events for on-chain HTLC claims when the user
processes the PaymentSent event, if the persistent_monitor_events feature is
enabled. This allows us to stop issuing ReleasePaymentComplete monitor updates
for onchain payment claims, because the purpose of that behavior is to ensure
we won't keep repeatedly issuing PaymentSent events, and the monitor event and
acking it on PaymentSent processing now serves that purpose instead.
Use new ForwardEventContents struct as the return value of the closure passed
to claim_funds_from_htlc_forward_hop.

Useful as upcoming commits will add a struct that wants to contain a forward
event specifically, not just any Event.
claim_mpp_part will use this event id in future commits when generating monitor
update completion actions to ack the event corresponding to the id.
Channel will store this event id in upcoming commits, to allow the monitor
event corresponding to the id to be acked after the HTLC is removed via
revoke_and_ack.
We don't actually need to write this field to disk because monitor events will
be re-provided on startup.

Will be used in upcoming commits to ack the monitor events corresponding to
these ids after the htlcs are removed via revoke_and_ack.
We don't actually need to write this field to disk because monitor events will
be re-provided on startup.

Will be used in upcoming commits to ack the monitor events corresponding to
these ids after the htlcs are removed via revoke_and_ack.
If we stored a monitor event id with a pending inbound HTLC and that HTLC is
about to be fully removed from the channel via revoke_and_ack, we should ack
the monitor event corresponding to that id when the monitor update associated
with the RAA is complete.

We do this so the monitor event will keep being re-provided to us on startup
until the HTLC is removed, to ensure the HTLC gets resolved even if we lose the
holding cell.
If we stored a monitor event id with a pending inbound HTLC and that HTLC is
about to be fully removed from the channel via revoke_and_ack, we should ack
the monitor event corresponding to that id when the monitor update associated
with the RAA is complete.

We do this so the monitor event will keep being re-provided to us on startup
until the HTLC is removed, to ensure the HTLC gets resolved even if we lose the
holding cell.
The first time we claim an HTLC in a channel, we may not have an associated
monitor event id. Once we later process the monitor event for the claim, we
need to update the channel's internal htlc state to include the monitor event
id, so the event can be properly acked afer the htlc is removed.
We'll be adding another parameter to this closure soon. Also this makes it a
little clearer what's going on at the callsites.
Indicates we should emit an Event::PaymentForwarded and possibly ack a monitor
event via Watch::ack_monitor_event.

This is generated when we've completed an inbound edge preimage update for an
HTLC forward, at which point it's safe to generate the forward event. If the
inbound edge is closed, a monitor event id may be included so we can tell the
outbound edge to stop telling us about the claim once the forward event is
processed by the user.
We need this level of precision when generating this event for off-chain
claims.

One test failed due to us previously overreporting a fee due to this lack of
precision, fixed now.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Building on the work of prior commits, here we stop immediately acking monitor
events for HTLC claims for forwarded HTLCs, and instead ack the monitor event
when the forward is fully resolved and we don't want the monitor event to be
re-provided back to us on startup.

If the inbound edge channel is on-chain, we'll ack the event when the inbound
edge preimage monitor update completes and the user processes a
PaymentForwarded event. If the inbound edge is open, we'll ack the event when
the inbound HTLC is fully removed via revoke_and_ack (this ensures we'll
remember to resolve the htlc off-chain even if we lose the holding cell).
We're moving towards having the ChannelMonitor be responsible for resolving
HTLCs, via MonitorEvents, so we need to start tracking more payment information
in monitor updates so the monitor later has access to it. Here we add tracking
for the skimmed_fee field for claims.
We're moving towards having the ChannelMonitor be responsible for resolving
HTLCs, via MonitorEvents, so we need to start surfacing more information in
monitor events.

Here we start persisting/surfacing a specific HTLC failure reason and the
skimmed fee in MonitorEvent::HTLCEvents, which is useful when generating
and handling monitor events for off-chain claims/fails.

The skimmed fee for forwards is already put to use in this commit for
generating correct PaymentForwarded events. The failure reason will be used in
upcoming commits. We add the failure reason now to not have to change what
we're serializing to disk later.
@valentinewallace valentinewallace force-pushed the 2026-03-monitor-resolves-fwds branch from 6c8a07b to dd56009 Compare April 8, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked on dependent pr blocked on next release Should Wait Until Next Release To Land

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants