Skip to content

Conversation

t-bast
Copy link
Collaborator

@t-bast t-bast commented May 2, 2024

Splicing allows spending the current funding transaction to replace it with a new one that changes the capacity of the channel, allowing both peers to add or remove funds to/from their channel balance.

Splicing takes place while a channel is quiescent, to ensure that both peers have the same view of the current commitments.

We don't want channels to be unusable while waiting for transactions to confirm, so channel operation returns to normal once the splice transaction has been signed and we're waiting for it to confirm. The channel can then be used for payments, as long as those payments are valid for every pending splice transactions. Splice transactions can be RBF-ed to speed up confirmation.

Once one of the pending splice transactions confirms and reaches acceptable depth, peers exchange splice_locked to discard the other pending splice transactions and the previous funding transaction. The confirmed splice transaction becomes the channel funding transaction.

Nodes then advertise this spliced channel to the network, so that nodes keep routing payments through it without any downtime.

This PR replaces #863 which contains a lot of legacy mechanisms for early versions of splicing, which didn't work in some edge cases (detailed in the test vectors provided in this PR). It can be very helpful to read the protocol flows described in the test vector: they give a better intuition of how splicing works, and how it deals with message concurrency and disconnections.

This PR requires the quiescence feature (#869) to start negotiating a splice.

Credits to @rustyrussell and @ddustin will be added in the commit messages once we're ready to merge this PR.

@ProofOfKeags
Copy link
Contributor

Can I suggest we do this as an extension BOLT rather than layering it in with the existing BOLT2 text? It makes it easier to implement when all of the requirements deltas are in a single document than when it is inlined into the original spec. Otherwise, the PR/branch-diff itself is the only way to see the diff and that can get very messy during the review process as people's commentary comes in. While there are other ways to get at this diff without the commentary, it would make the UX of getting at this diff rather straightforward.

Given that the change is gated behind a feature bit anyway it also makes it easier for a new implementation to bootstrap itself without the splice feature by just reading the main BOLTs as is.

At some point in the future when splicing support becomes standard across the network we can consolidate the extension BOLT into the main BOLTs if people still prefer.

@t-bast
Copy link
Collaborator Author

t-bast commented May 3, 2024

Why not, if others also feel that it would be better as an extension bolt. I prefer it directly in Bolt 2, because of the following reasons:

  • Most of it is self contained in its own section(s) anyway.
  • It's an important part of the channel lifecycle: channels are opened, then during normal operation payments are relayed and splices happen, then the channel eventually closes. It is nicely reflected in the architecture of the Bolt 2 sections right now.
  • The few additions to existing message TLVs (commit_sig, tx_add_input, tx_signatures) should not be in a separate document when merging, because otherwise different features may use the same TLV tags without realizing it, with a risk of inadvertently shipping incompatible code. I think it's important that all TLVs for a given message are listed in that message's section, this way you know you don't have to randomly search the BOLTs for another place where TLVs may be defined.

But if I'm the only one thinking this is better, I'll move it to a separate document!

One thing to note is that we already have two implementations (eclair and cln), and maybe a 3rd one (LDK) who are very close to code-complete and have had months of experience on mainnet, which means the spec is almost final and we should be able to to merge it to the BOLTs in the not-so-distant future (:crossed_fingers:).

@ddustin
Copy link
Contributor

ddustin commented Jun 4, 2024

One thing I've been thinking about is with large splices across many nodes, if some node fails to send signatures (likely because two nodes in the cluster demand to sign last) than splice will hang one tx_signatures.

I believe we need two things to address this:

  1. Timeout logic where splices are aborted
  2. Being lax about having sent our tx_signatures but getting nothing back

Currently CLN fails the channel in this case as taking signatures and not responding is rather rude but this is bad because it could lead to clusters of splice channels being closed.

The unfortunate side effect of this is we have to be comfortable sending out signatures with no recourse for not getting any back.

I believe long term the solution is to maintain a signature-sending reputation for each peer and eventually blacklist peers from doing splices and / or fail your channels with that peer.

A reputation system may be beyond the needs of the spec but what to do with hanging tx_signatures (timeout etc) should be in the spec with a note about this problem.

@t-bast
Copy link
Collaborator Author

t-bast commented Jun 6, 2024

  1. Timeout logic where splices are aborted

This is already covered at the quiescence level: quiescence will timeout if the splice doesn't complete (e.g. because we haven't received tx_signatures).

  1. Being lax about having sent our tx_signatures but getting nothing back

I don't think this is necessary, and I think we should really require people to send tx_signatures when it is owed, to ensure that we get to a clean state on both peers.

if some node fails to send signatures (likely because two nodes in the cluster demand to sign last)

It seems like we've discussed this many times already: this simply cannot happen because ordering based on contributed amount fixes this? Can you detail a concrete scenario where tx_signatures ordering leads to a deadlock?

Comment on lines +1711 to +1780
- Either side has added an output other than the channel funding output
and the balance for that side is less than the channel reserve that
matches the new channel capacity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to have a channel reserve to "match the new channel capacity". AFAICT the channel_reserve is specified in satoshis and reading the negotiation process of this proposal doesn't seem to indicate that there is any change happening to that parameter during negotiation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the channel_reserve is specified in satoshis

Not with dual-funding, where the channel reserve is 1% of the channel capacity. That's why this is potentially changing "automatically" when splicing on top of a dual-funded channel if we want to keep using 1%.

But you're right to highlight this: the channel reserve behavior is very loosely specified for now, and there were a lot of previous discussions with @morehouse regarding what we should do when splicing. Another edge case that we must better specify is what happens when splicing on top of a non-dual-funded channel, where the channel reserve was indeed a static value instead of a proportional one!

The channel reserve behavior is IMO the only missing piece of this specification, that we should discuss, thanks for bringing it up!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a good thing to discuss in Tokyo!

Also worth stepping back and double checking the reserve requirement makes sense in its current form generally 👀.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of the following behavior for handling channel reserves:

  • Whenever a splice happens, the channel is automatically enrolled into the 1% reserve policy, even if it wasn't initially a dual-funded channel (unless 0-reserve is used of course, see Add option_zero_reserve (FEATURE 64/65) #1140)
  • Splice-out is not allowed if you end up below your pre-splice reserve (your peer will reject that splice with tx_abort)
  • Otherwise, it's ok if one side ends up below the channel reserve after a splice: this is the same behavior as when a new channel is created. If we get into that state, the peer that is below the channel reserve:
    • is not allowed to send outgoing HTLCs
    • is allowed to receive incoming HTLCs
    • if it is paying the commit fees, it is allowed to dip further into its channel reserve to receive HTLCs (because of the added weight of the HTLC output), because we must be able to move liquidity to their side to get them above their reserve
  • When there are multiple unconfirmed splices, we use the highest channel reserve of all pending splices (ie requirements must be satisfied for all pending splice transactions)

As discussed during yesterday's meeting, there are subtle edge cases due to concurrent updates: this is inherent to the current commitment protocol, but will eventually become much simpler with #867

@ddustin @ProofOfKeags @rustyrussell @ziggie1984 @morehouse

Copy link
Contributor

@ziggie1984 ziggie1984 Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related: ACINQ/eclair#2899 (comment), tries to specify the concurrent edge cases and also the requirement when we would already (without splicing) allow the peer paying the fees being dipped below its reserve.

Copy link
Contributor

@morehouse morehouse Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-bast

That all seems reasonable to me. The one part where we could get into trouble is:

if it is paying the commit fees, it is allowed to dip further into its channel reserve to receive HTLCs (because of the added weight of the HTLC output), because we must be able to move liquidity to their side to get them above their reserve

This allows the reserve to be violated, potentially all the way down to 0. In that situation, there is ~zero incentive to broadcast the latest commitment on force close.

That said, I know the implementation details are hairy to do things completely safely. And we can also look forward to zero-fee commitments with TRUC and ephemeral anchors, which would obsolete the "dip-into-reserve to pay fees" exception entirely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows the reserve to be violated, potentially all the way down to 0. In that situation, there is ~zero incentive to broadcast the latest commitment on force close.

Since we only allow this to happen when the node paying the fee receives HTLCs, the other node sending that HTLC can limit the exposure by controlling how many HTLCs they send in a batch (or keep pending the commit tx) when we're in this state.

There are unfortunately cases where even a single HTLC would make the node paying the fee have no output (small channels with high feerate), but when that happens you really don't have any other option, the channel is otherwise unusable, so your only other option is to force-close anyway which isn't great...

And we can also look forward to zero-fee commitments with TRUC and ephemeral anchors, which would obsolete the "dip-into-reserve to pay fees" exception entirely.

Exactly, this is coming together (look at this beautiful 0-fee commitment transaction: https://mempool.space/testnet4/tx/85f2256c8d6d61498c074d53912d1f0ef907ee508bb06f5701f3826432ba53b8) which will finally get rid of this kind of mess: I'm fine with using an imperfect but simple work-around in the meantime!

Copy link
Contributor

@ziggie1984 ziggie1984 Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this requirement would solely be used for the splicing case, allowing HTLC which dip the opener into its reserve or should we make this an overall requirement. If so there is the problem with backwards compatibility, because older nodes (speaking for LND nodes) will force close if the opener dips below its reserve. So maybe it makes sense to only activate it for splicing use cases so that we don't run into the backwards compatibility issues ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

As proposed by @ddustin, we explicitly narrow the requirements for
`start_batch` to match our only use-case for it (splicing). We can
change that in the future if we use this messages for other features.
Some of those requirements shouldn't be gated on announcing the channel,
and we clarify that we retransmit once per connection.
Comment on lines 3382 to 3385
- if it receives `channel_ready` for that transaction after exchanging `channel_reestablish`:
- MUST retransmit `channel_ready` in response, if not already sent since reconnecting.
- if it receives `splice_locked` for that transaction after exchanging `channel_reestablish`:
- MUST retransmit `splice_locked` in response, if not already sent since reconnecting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written these requirements are dependent on receiving other messages. This seems more complicated than it needs to be. Instead, can't the retransmission requirements be entirely within the channel_reestablish's last "A receiving node" section? There's already a requirement there to retransmit splice_locked, so the one here seems redundant. We'd just need to add a requirement there for retransmitting channel_ready.

Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it would be clearer, I like keeping all of those requirements under the if option_splice was negotiated...overall I think that channel_reestablish requirements deserve a refactoring, but nobody was interested in it (see #1049 and #1051) so I gave up 🤷‍♂️

Can you try refactoring like what you suggest? If it's better I'll include that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the spec already states that an redundant channel_ready messages must be ignored, any reason why we don't just do the same with splice_locked and always re-send channel_ready / splice_locked upon connection? Seems that would simplify the requirements quite a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing offline and re-parsing the spec, it seems we should never need to (re-)transmit splice_locked during channel reestablishment. The node receiving channel_reestablish can always infer the splice_locked from the my_current_funding_locked that is set.

Updating the spec would remove much of those requirements. Here's a first pass at it: jkczyz@b15e506

We'd still re-transmit channel_ready, but I'm not sure if the parts removed in that commit related to retransmitting channel_ready are actually necessary.

Am I missing anything that would prevent this from working?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point, thanks for raising this! You are right that at this stage, the retransmission of the splice_locked message is completely redundant since we have the my_current_funding_locked TLV inside channel_reestablish.

However, the reason I kept it was to allow transmitting more data (in the TLV stream of splice_locked): my_current_funding_locked gives you the "main" information of splice_locked (that the sender considers the splice transaction deeply confirmed), but lacks optional data that could be included in splice_locked TLVs. I was thinking about this in the context of taproot, where for announcements we need to send nonces to our peer before announcement_signatures, and splice_locked is a good candidate to transmit this data. But we don't necessarily need to do this in splice_locked, and even if we do, we could simply provide those TLVs directly in channel_reestablish.

I need to think a bit more about how we manage announcement_signatures retransmission before deciding whether we can remove splice_locked retransmission: we may need to introduce a TLV in channel_reestablish if we want to handle it more cleanly than what we currently do? It would be quite similar to my_current_funding_locked, but for announcements (e.g. my_current_funding_announcement). The way we currently do announcement_signatures retransmission has been bothering me for a while (especially since it won't work for taproot because of the necessary nonce exchange), it's probably a good opportunity to fix it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have applied your suggestion in f9fd539 and gone further to simplify the reestablish TLVs. Thanks for raising this, I wasn't really satisfied with the way we handle retransmission of announcement_signatures, and this cleans it up nicely. Let me know how it goes implementation-wise on your side!

I've changed eclair's behavior to match this in the latest commit of ACINQ/eclair#2887

We introduce a `retransmit_flags` field to ask our peer to retransmit
`announcement_signatures` if we're expecting them: this way we don't
need to retransmit `splice_locked` on reconnection to trigger the
exchange of `announcement_signatures`. We don't need to retransmit it
to let our peer know that we've seen enough confirmations for the splice
either, since `my_current_funding_locked` implies that.

This allows us to completely remove retransmission of `splice_locked` on
reconnection, and also get rid of the `your_last_funding_locked` TLV,
which greatly simplifies the reconnection logic.

This will work with taproot channels since we'll simply provide nonces
in `channel_reestablish` when we need our peer to send announcement
signatures.

Note that we use a different TLV type to more easily allow migrating
existing users of the previous versions of the spec to the latest one.
@t-bast t-bast mentioned this pull request Aug 12, 2025
1. type: 5 (`my_current_funding_locked`)
2. data:
* [`sha256`:`my_current_funding_locked_txid`]
* [`byte`:`retransmit_flags`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this always tied to my_current_funding_locked_txid? In other words, is there any future use where a flag is independent of that?

Copy link
Collaborator Author

@t-bast t-bast Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question! I chose to include this in my_current_funding_locked because announcement_signatures depends on sending splice_locked first, so it makes sense for this specific case to include it in the same TLV.

It's unlikely that we'll add new flags in the future, so I chose to only think about the announcement_signatures case so far. If we need to add more flags, then if they depend on splice_locked, they can use this retransmit_flags field, otherwise we'll simply introduce a separate TLV?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wpaulino, @jkczyz and I were discussing this and were thinking it might make sense to add a flag for requesting a commitment_signed as well.

Currently, the next_commitment_number field in channel_reestablish always refers to the next commitment number in the inbound (from the pov of the sender) of the channel. Now, with splicing, it can mean two totally different things - if we're not in a splice, seeing a next_commitment_number that is behind one from where you think it is implies they've lost a commitment_signed. If we are splicing (based on the my_current_funding_locked_txid field) and we see a next_commitment_number that is behind one from where you think it is implies that you need to resend the splice commitment_signed, which is a very different message.

IOW, the definition of next_commitment_number changes based on whether my_current_funding_locked_txid is set, which is fine (I think its unambiguous), but honestly pretty weird. ISTM we should prefer a resend_splice_commitment_signed flag in the retransmit flags here to make clear what state the peer is in wrt the splice, so that we can leave next_commitment_number to refer to the next "normal" commitment_signed the peer expects to receive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess actualy rereading it I'm not sure this made sense to add to this thread, that would still imply flags staying inside the next-funding-locked field, but whatever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new flag should apply to next_funding_txid instead, since my_current_funding_locked_txid for a splice implies tx_signatures (and commitment_signed) was already exchanged. Also, if we plan to have one for commitment_signed, might as well add one for tx_signatures?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify backwards-compat for existing cln and eclair nodes, it would be best to change the TLV type for next_funding, which would become for example:

Hmm, okay. Are you okay with breaking the dual-funding compat with previous versions, though? That stuff shipped already? Why not just add a new freestanding TLV with resend-messages flags? I don't see much drawback to that and its more compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it's a bit simpler to implement the parsing logic in LDK for the combined next_funding TLV as proposed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed with Matt offline, and IMO it's fine to break compat for this edge case: we've already broken compat' between eclair and cln since eclair has implemented #1214 but cln hasn't, and we've never ran into this issue in several months (there would only be an issue if nodes disconnect after exchanging tx_complete but before exchanging commit_sig).

I'll implement this update of the next_funding TLV in eclair to verify that nothing stands out, but I think it will be a nice simplification!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this in the last commit of ACINQ/eclair#2887 if you want to take a look. This is indeed cleaner (IMO) than using the next_commitment_number, and this way we introduce this more general pattern of having explicit retransmit_flags for various scenarios. Let's wait for #1214 to be updated to reflect that 🔥

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #1289

The specification for who sends `tx_signatures` first states that the
node that has the lowest satoshis contributed, "as defined by total
`tx_add_input` values" must sign first. Since the splice initiator is
sending `tx_add_input` for the current channel output, that means all
of the channel's previous capacity is attributed to the initiator,
instead of taking into account each node's pre-splice channel balance.

Since this detail is easy to miss, we add a note in the requirements
for `tx_signatures` in the splice section to highlight this subtlety.

Suggested by @wpaulino
@t-bast
Copy link
Collaborator Author

t-bast commented Sep 23, 2025

@wpaulino @jkczyz I've added a note in f6346ba about the ordering for tx_signatures: IMO this was already properly specified in the tx_signatures section, since we explicitly say that we count each node's tx_add_input value, but it's worth highlighting since this can easily be missed.

@t-bast
Copy link
Collaborator Author

t-bast commented Sep 23, 2025

@wpaulino I've thought about the channel_ready scenario you mentioned during yesterday's spec meeting. The main reason it isn't treated as implicitly received with my_current_funding_locked is that channel_ready may contain an scid_alias, which isn't included in my_current_funding_locked or channel_reestablish. It is even allowed to re-send channel_ready later to update the scid_alias: this was probably a bad idea and we should have introduced a dedicated update_alias message instead, but we have to live with this technical debt for now and thus cannot really treat channel_ready the same way we treat splice_locked.

But we can add a requirement that if a node sets my_current_funding_locked for a splice transaction (not for the initial channel opening), then it must not retransmit channel_ready (because since a splice was negotiated, it means that both nodes received each other's channel_ready already). Unless of course you changed your scid_alias, in which case you re-send a different channel_ready (but it doesn't really count as a retransmission). Similarly, if you receive my_current_funding_locked for a splice transaction, you mustn't retransmit channel_ready. WDYT?

@wpaulino
Copy link
Contributor

But we can add a requirement that if a node sets my_current_funding_locked for a splice transaction (not for the initial channel opening), then it must not retransmit channel_ready (because since a splice was negotiated, it means that both nodes received each other's channel_ready already). Unless of course you changed your scid_alias, in which case you re-send a different channel_ready (but it doesn't really count as a retransmission). Similarly, if you receive my_current_funding_locked for a splice transaction, you mustn't retransmit channel_ready. WDYT?

SGTM.

I've thought about the channel_ready scenario you mentioned during yesterday's spec meeting. The main reason it isn't treated as implicitly received with my_current_funding_locked is that channel_ready may contain an scid_alias, which isn't included in my_current_funding_locked or channel_reestablish. It is even allowed to re-send channel_ready later to update the scid_alias: this was probably a bad idea and we should have introduced a dedicated update_alias message instead, but we have to live with this technical debt for now and thus cannot really treat channel_ready the same way we treat splice_locked.

Wouldn't we want to use my_current_funding_locked with dual funding RBF as well, perhaps with the requirement that you have to send an explicit channel_ready message as opposed to implying it as we do with splice_locked? We also have the possibility of nodes sending channel_ready for different transactions like splice_locked.

@t-bast
Copy link
Collaborator Author

t-bast commented Sep 24, 2025

I added the requirement to avoid channel_ready retransmission after a splice in ea25a7e

Wouldn't we want to use my_current_funding_locked with dual funding RBF as well, perhaps with the requirement that you have to send an explicit channel_ready message as opposed to implying it as we do with splice_locked?

Unless I'm misunderstanding your point, that's already what we do. We currently have the following requirement:

- if it never sent `splice_locked` for any transaction, but it sent `channel_ready`:
  - MUST include `my_current_funding_locked` with the txid of the channel funding transaction.

Since we don't introduce new requirements for channel_ready in that section, channel_ready must be retransmitted if we haven't locked a splice yet and the commitment index is still 0.

We also have the possibility of nodes sending channel_ready for different transactions like splice_locked.

True, that could in theory happen when using RBF in case of a reorg...that's potentially an issue with the current spec, because channel_ready doesn't contain the funding_txid and thus could refer to different funding transactions. I'm not sure we should add more complexity to the spec for this though: if that happens, it will result in a force-close, which isn't ideal, but no funds will be at risk, and nodes should just have waited for enough confirmations to ensure this doesn't happen?

If nodes have started a splice, this means they have both sent and
received `channel_ready` already: in that case, it's unnecessary to
retransmit `channel_ready` on reconnection.

Suggested by @wpaulino
- if `next_commitment_number` is 1 in both the `channel_reestablish` it
sent and received:
sent and received, and none of those `channel_reestablish` messages
contain `my_current_funding_locked` for a splice transaction:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also consider next_funding_txid for a splice transaction as well to cover the reestablish cases where we still need to retransmit commitment_signed/tx_signatures? Sending splice_init already requires that both sides have received channel_ready.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, done in f5fcdbf 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.