Skip to content

Fix splicing hang, speed CI#8911

Open
rustyrussell wants to merge 9 commits intoElementsProject:masterfrom
rustyrussell:guilt/flakes31
Open

Fix splicing hang, speed CI#8911
rustyrussell wants to merge 9 commits intoElementsProject:masterfrom
rustyrussell:guilt/flakes31

Conversation

@rustyrussell
Copy link
Contributor

@rustyrussell rustyrussell commented Feb 24, 2026

test_splice_rbf kept hanging. Turns out one side was stopping sending before a commitment_signed, resulting in a hang. I clarified the logic around when we can send STFU, and when we should defer new actions, and now it's both simpler to understand and doesn't hang.

I also ended up speeding CI further:

  • min-btc-support was sometimes timing out (2 hours!), so I now use the -O3 version and it's down to about 53 minutes.
  • Added caching for things we download and other reductions, so setup.sh now takes under 30 seconds from about 1 minute 15.
  • Our entire prebuild check now takes 1minute 15, down from about 2m45s
  • Other stages speed up by about 1 minute each.

@rustyrussell rustyrussell requested a review from ddustin February 24, 2026 00:48
Copy link
Collaborator

@ddustin ddustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Makes total sense to have a new state. Feels like a cleaner approach on top of fixing the bug.

ACK 26ea5ef

@rustyrussell rustyrussell force-pushed the guilt/flakes31 branch 2 times, most recently from 7c72d0c to 309d453 Compare February 24, 2026 01:24
@rustyrussell rustyrussell enabled auto-merge (rebase) February 24, 2026 01:24
I got a flake, but all I see is:
```

>           assert suspended == set()
E           AssertionError: assert {'c26485f2839c5a27'} == set()
E             
E             Extra items in the left set:
E             'c26485f2839c5a27'
E             
E             Full diff:
E             - set()
E             + {
E             +     'c26485f2839c5a27',
E             + }

tests/test_misc.py:5049: AssertionError
```

Add some more diagnostics, but it's also clear that something is being
suspended and not terminating before we finish.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is useful for splicing: given an HTLC state, do we need to send
more messages to get it into a non-pending state?

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And remove `uncommitted_ok` flag which was always false.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Otherwise, we can hang: we don't send commitment_signed, and they're
waiting to receive it.

1. We defer fee updates, blockheight updates and master requests
   (adding and closing htlcs) if we're *trying* or *started* to quiesce.
2. We only stop actually sending commitment_signed if we have sent
   STFU.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: Protocol: avoid an occasional hang when splicing with a pending closing HTLC.
It's timing out after 2 hours sometimes: this now make it finish in 53
minutes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…f random delays.

Sometimes this times out after 30 minutes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell force-pushed the guilt/flakes31 branch 5 times, most recently from e69d5d3 to ce19937 Compare February 25, 2026 05:07
@rustyrussell rustyrussell changed the title Fix splicing hang Fix splicing hang, speed CI Feb 25, 2026
Usually downloading and installing takes 90 seconds.  But sometimes it
takes an hour!  Use caching for this, to keep it consistent.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
.github/scripts/setup.sh does this already, *and* it uses the cache now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell force-pushed the guilt/flakes31 branch 4 times, most recently from 1d42ca2 to 6b5650a Compare February 25, 2026 22:22
I noticed this line causing a delay; ChatGPT thinks NSS name lookup.
I've removed the useradd line altogether.

I also install eatmydata first, to try to speed the other installs.

This drops the install step from 1m45 to about 30 seconds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants