Skip to content

routing: delay pruning closed channels from the channel graph by 12 blocks  #7865

Open
@Roasbeef

Description

Spec Change & Potential Implications

A bit over a year ago a changed was merged into the spec to advise nodes to wait 12 blocks before removing spent funding outputs. This has implications for splciign, as if nodes just wait before removing a channel, then from the PoV of path finders, it's as if the channel never closed. In this case, the only thing lost then is channel age, which some (?) algos use as a proxy for reliability. We should catch up to observe this while also thinking about path finding and re-org implications.

On the path finding side, if we see a channel X with 1 BTC, which is then "closed", and replaced with a channel with 1.1 BTC, for the 12 block interval, would we then see a 2.1 BTC link between the two parties, or a 1.1 BTC link? All our path finding algos now use the capacity to factor into the apriori probability, so done in an iterative manner, we'd think the channel is getting large and larger, and try to send payments through that would likely fail. Does this matter much in practice?

Re re-orgs, it's possible that a channel is closed, a few blocks elapse, then the close itself is re-org'd out. In this case, we haven't yet removed the channel yet as it's still valid, but would the nodes continue to observe the channel as if it was never closed?

Architecture Overview

The FilteredChainView is used today to handle removing spent channels for a given block. Upon start up we'll scan forward in the chain to find all closed channels since we were down. This logic is also re-org aware, as it's able to detect we were on a stale chain to ensure we consume the proper blocks. Once caught up, for each new block, we'll prune the closed channels from the channel graph.

Suggested Implementation Path

Given the above, it may make the most sense to actually implement this logic on the filtered block level. So rather than notify for the chain tip, it first gives a notification for 12 blocks back, then has a lagging pointer from there on.

The catch up logic would then just (?) lag behind by 12 blocks, reducing the bounds of this loop:

lnd/routing/router.go

Lines 803 to 815 in 1871970

// If we're not yet caught up, then we'll walk forward in the chain
// pruning the channel graph with each new block that hasn't yet been
// consumed by the channel graph.
var spentOutputs []*wire.OutPoint
for nextHeight := pruneHeight + 1; nextHeight <= uint32(bestHeight); nextHeight++ {
// Break out of the rescan early if a shutdown has been
// requested, otherwise long rescans will block the daemon from
// shutting down promptly.
select {
case <-r.quit:
return ErrRouterShuttingDown
default:
}

This change would also affect most of our itests as well, since they always assert that the channels aren't present in the graph right after the close tx is mined. We can either add a build tag to ignore this for itests, or have every test mine 12 blocks at the end, then assert that all the channels are gone.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    P2should be fixed if one has timeadvancedIssues suitable for very experienced developersbrainstormingLong term ideas/discussion/requests for feedbackenhancementImprovements to existing features / behaviourpath findingprotocolsplicing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions