-
Notifications
You must be signed in to change notification settings - Fork 2.2k
channeldb: fix race condition in link node pruning #10462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
channeldb: fix race condition in link node pruning #10462
Conversation
Summary of ChangesHello @ziggie1984, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical race condition within the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly identifies and fixes a critical TOCTOU race condition in MarkChanFullyClosed and pruneLinkNode. By moving the link node existence check and deletion into the same atomic database transaction, it prevents a state where a link node could be deleted despite a new channel being opened. The fix in MarkChanFullyClosed is direct and effective. The fix in pruneLinkNode correctly uses a double-check pattern to maintain performance while ensuring correctness. I have one suggestion to improve the implementation of pruneLinkNode for better code clarity and consistency.
faf8cca to
d249c83
Compare
|
Happened out there in the wild, was reported by a user: So although a Pending OpenChannel flow was going on in the background we deleted the node from the db. |
d249c83 to
50cad71
Compare
This commit fixes a critical race condition in MarkChanFullyClosed and pruneLinkNode where link nodes could be incorrectly deleted despite having pending or open channels. The race occurred because the check for open channels and the link node deletion happened in separate database transactions: Thread A: TX1 checks open channels → [] (empty) Thread A: TX1 commits Thread B: Opens new channel with same peer Thread A: TX2 deletes link node (using stale data) Result: Link node deleted despite pending channel existing This creates a TOCTOU (time-of-check to time-of-use) vulnerability where database state changes between reading the channel count and deleting the node. Fix for MarkChanFullyClosed: - Move link node deletion into the same transaction as the channel closing check, making the check-and-delete operation atomic Fix for pruneLinkNode: - Add double-check within the write transaction to verify no channels were opened since the caller's initial check - Maintains performance by keeping early return for common case - Prevents deletion if channels exist at delete time This ensures the invariant: "link node exists iff channels exist" is never violated, preventing database corruption and potential connection issues.
50cad71 to
1186e3e
Compare
|
My node was impacted by this bug. We had to ask the peer to force close the channel and then used chantools to sweep the funds. |
| // Decide whether we want to remove the link node, based upon the number | ||
| // of still open channels. | ||
| return c.pruneLinkNode(openChannels, pruneLinkNode) | ||
| err = deleteLinkNode(tx, remotePub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, we should have been doing this in the same db transaction the entire time.
Do we need any sort of reconciliation logic on start up to handle instances that might've happened in the wild? So sanity check that for each open channel, we have a link node present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a fix at startup, not sure if it is worth it but I think it does not cost much either and can be removed when moving to sql native ?
We make sure that nodes previously suffering from this error will have a consitent db view when restarting their node.
yyforyongyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️
| "from database", | ||
| remotePub.SerializeCompressed()) | ||
|
|
||
| return c.linkNodeDB.DeleteLinkNode(remotePub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess this DeleteLinkNode can now be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is still used for tests so that we can add and remove link nodes, I think we can still keep it.
This commit fixes a critical race condition in MarkChanFullyClosed and
pruneLinkNode where link nodes could be incorrectly deleted despite
having pending channel.
The race occurred because the check for open channels and the link node
deletion happened in separate database transactions:
Thread A: TX1 checks open channels → [] (empty)
Thread A: TX1 commits
Thread B: Opens new channel with same peer
Thread A: TX2 deletes link node (using stale data)
Result: Link node deleted despite pending channel existing
This creates a TOCTOU (time-of-check to time-of-use) vulnerability where
database state changes between reading the channel count and deleting
the node.
Fix for MarkChanFullyClosed:
closing check, making the check-and-delete operation atomic
Fix for pruneLinkNode:
were opened since the caller's initial check
This ensures the invariant: "link node exists iff channels exist"
is never violated, preventing database corruption and potential
connection issues.