Skip to content

Conversation

@ziggie1984
Copy link
Collaborator

@ziggie1984 ziggie1984 commented Dec 22, 2025

This commit fixes a critical race condition in MarkChanFullyClosed and
pruneLinkNode where link nodes could be incorrectly deleted despite
having pending channel.

The race occurred because the check for open channels and the link node
deletion happened in separate database transactions:

Thread A: TX1 checks open channels → [] (empty)
Thread A: TX1 commits
Thread B: Opens new channel with same peer
Thread A: TX2 deletes link node (using stale data)
Result: Link node deleted despite pending channel existing

This creates a TOCTOU (time-of-check to time-of-use) vulnerability where
database state changes between reading the channel count and deleting
the node.

Fix for MarkChanFullyClosed:

  • Move link node deletion into the same transaction as the channel
    closing check, making the check-and-delete operation atomic

Fix for pruneLinkNode:

  • Add double-check within the write transaction to verify no channels
    were opened since the caller's initial check
  • Maintains performance by keeping early return for common case
  • Prevents deletion if channels exist at delete time

This ensures the invariant: "link node exists iff channels exist"
is never violated, preventing database corruption and potential
connection issues.

@gemini-code-assist
Copy link

Summary of Changes

Hello @ziggie1984, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical race condition within the channeldb package that could result in the erroneous deletion of link nodes, even when active or pending channels existed. By making channel state checks and link node pruning operations atomic or by introducing robust double-checks within database transactions, the changes ensure data integrity and prevent potential connection disruptions caused by stale database states.

Highlights

  • Race Condition Fix: Addressed a critical race condition in MarkChanFullyClosed and pruneLinkNode that could lead to incorrect deletion of link nodes, causing database corruption and connection issues. This resolves a Time-of-Check to Time-of-Use (TOCTOU) vulnerability.
  • Atomic Deletion in MarkChanFullyClosed: The link node deletion logic within MarkChanFullyClosed has been moved into the same database transaction as the channel closing check, ensuring atomicity and preventing race conditions during channel closure.
  • Double-Check in pruneLinkNode: Implemented a robust double-check within the pruneLinkNode's write transaction. This verifies that no new channels were opened since the initial check, preventing premature link node deletion even if concurrent operations occur.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ziggie1984 ziggie1984 self-assigned this Dec 22, 2025
@ziggie1984 ziggie1984 added channel closing Related to the closing of channels cooperatively and uncooperatively channels data-race Related to data-races labels Dec 22, 2025
@ziggie1984 ziggie1984 added this to the v0.20.1 milestone Dec 22, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and fixes a critical TOCTOU race condition in MarkChanFullyClosed and pruneLinkNode. By moving the link node existence check and deletion into the same atomic database transaction, it prevents a state where a link node could be deleted despite a new channel being opened. The fix in MarkChanFullyClosed is direct and effective. The fix in pruneLinkNode correctly uses a double-check pattern to maintain performance while ensuring correctness. I have one suggestion to improve the implementation of pruneLinkNode for better code clarity and consistency.

@ziggie1984
Copy link
Collaborator Author

Happened out there in the wild, was reported by a user:

2025-12-21 21:09:07.723 [INF] PEER: Peer(XXX): Local close channel request is going to be delivered to the peer
2025-12-21 21:09:07.729 [INF] PEER: Peer(XXX): Delivery addr for channel close: XXX
2025-12-21 21:14:56.725 [INF] FNDG: Initiating fundingRequest(local_amt=XXX BTC (subtract_fees=false), push_amt=0 mSAT, chain_hash=XX, peer=XXX, min_confs=XXX)
2025-12-21 21:15:25.451 [INF] CHDB: Pruning link node XXX with zero open channels from database

So although a Pending OpenChannel flow was going on in the background we deleted the node from the db.

@ziggie1984 ziggie1984 marked this pull request as ready for review December 22, 2025 09:05
This commit fixes a critical race condition in MarkChanFullyClosed and
pruneLinkNode where link nodes could be incorrectly deleted despite
having pending or open channels.

The race occurred because the check for open channels and the link node
deletion happened in separate database transactions:

  Thread A: TX1 checks open channels → [] (empty)
  Thread A: TX1 commits
  Thread B: Opens new channel with same peer
  Thread A: TX2 deletes link node (using stale data)
  Result: Link node deleted despite pending channel existing

This creates a TOCTOU (time-of-check to time-of-use) vulnerability where
database state changes between reading the channel count and deleting
the node.

Fix for MarkChanFullyClosed:
- Move link node deletion into the same transaction as the channel
  closing check, making the check-and-delete operation atomic

Fix for pruneLinkNode:
- Add double-check within the write transaction to verify no channels
  were opened since the caller's initial check
- Maintains performance by keeping early return for common case
- Prevents deletion if channels exist at delete time

This ensures the invariant: "link node exists iff channels exist"
is never violated, preventing database corruption and potential
connection issues.
@wildsats
Copy link

My node was impacted by this bug. We had to ask the peer to force close the channel and then used chantools to sweep the funds.

// Decide whether we want to remove the link node, based upon the number
// of still open channels.
return c.pruneLinkNode(openChannels, pruneLinkNode)
err = deleteLinkNode(tx, remotePub)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, we should have been doing this in the same db transaction the entire time.

Do we need any sort of reconciliation logic on start up to handle instances that might've happened in the wild? So sanity check that for each open channel, we have a link node present.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a fix at startup, not sure if it is worth it but I think it does not cost much either and can be removed when moving to sql native ?

We make sure that nodes previously suffering from this error will
have a consitent db view when restarting their node.
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️

"from database",
remotePub.SerializeCompressed())

return c.linkNodeDB.DeleteLinkNode(remotePub)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this DeleteLinkNode can now be deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is still used for tests so that we can add and remove link nodes, I think we can still keep it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel closing Related to the closing of channels cooperatively and uncooperatively channels data-race Related to data-races no-changelog

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants