Skip to content

Conversation

@calvinrzachman
Copy link
Contributor

@calvinrzachman calvinrzachman commented Dec 19, 2025

Change Description

This PR demonstrates a subtle bug in the htlcswitch that causes an incorrect error to be returned for on-chain HTLC resolutions. On the introduction of the switchrpc service in #9489, this bug can be escalated, leading to an ambiguous scenario where the Switch reports a payment result which has no forwarding error, pre-image, or encrypted error for the RPC client.

Problem

When the contractcourt resolves a dust HTLC on-chain, it sends a ResolutionMsg to the switch that results in an UpdateFailHTLC with a nil error Reason.

The core issue stems from a a subtle artifact in the networkResult store's persistence logic:

  1. A nil htlc.Reason slice is serialized to disk as a zero-length byte array.
  2. Upon deserialization, this is read back as a non-nil, empty byte slice ([]byte{}).

This nil -> []byte{} transformation causes the check for on-chain resolutions to fail. As a result, this code path, which is intended to return FailPermanentChannelFailure to signal the channel is being resolved on-chain, is never executed for results read from disk.

Instead, the code falls through to the default case. The Switch attempts to decrypt the empty slice, which fails, causing parseFailedPayment to return ErrUnreadableFailureMessage.

The impact on master is a misclassification of the error. The ChannelRouter receives a generic, imprecise error instead of the correct FailPermanentChannelFailure, hindering its ability to accurately understand the payment's terminal state.

Solution

The fix is to make the check in parseFailedPayment robust to the serialization artifact by checking the length of the reason instead of whether it's nil.

--- a/htlcswitch/switch.go
+++ b/htlcswitch/switch.go
-       case isResolution && htlc.Reason == nil:
+       case isResolution && len(htlc.Reason) == 0:

This change ensures that on-chain resolutions are correctly identified and processed, whether the Reason is nil (in-memory) or an empty slice (post-deserialization). This makes the intended code path reachable and allows the switch to return the precise error to the ChannelRouter.

Steps to Test

  • Run this to demo the nil -> []byte{} transformation during a disk round-trip:
    • go test -v -timeout 30s --tags dev -run ^TestNetworkResultSerialization$ github.com/lightningnetwork/lnd/htlcswitch
  • Then run this more realistic TestExtractResult, which simulates the disk I/O, both with and without the final commit which provides the fix:
    • go test -v -timeout 30s -run ^TestExtractResult$ github.com/lightningnetwork/lnd/htlcswitch

Pull Request Checklist

Testing

  • Your PR passes all CI checks.
  • Tests covering the positive and negative (error paths) are included.
  • Bug fixes contain tests triggering the bug to prevent regressions.

@gemini-code-assist
Copy link

Summary of Changes

Hello @calvinrzachman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug within the htlcswitch component that caused on-chain HTLC resolutions to be miscategorized. The problem arose from how nil error reasons were transformed into empty byte slices during disk serialization and deserialization, preventing the system from correctly identifying these resolutions. By adjusting the logic to check for an empty reason length instead of a nil value, the fix ensures accurate error reporting, particularly for FailPermanentChannelFailure, which is crucial for the ChannelRouter to understand payment outcomes and for RPC clients to receive precise forwarding errors.

Highlights

  • Bug Fix for On-Chain HTLC Resolutions: Corrects an issue where on-chain HTLC resolutions were misclassified due to a nil error reason transforming into an empty byte slice during serialization, leading to imprecise error reporting.
  • Robust Error Identification: The condition for identifying on-chain resolutions in htlcswitch/switch.go was updated from checking for a nil htlc.Reason to checking if len(htlc.Reason) is 0, making it resilient to the serialization behavior.
  • Enhanced Testing: New test cases and a dedicated TestExtractResult in htlcswitch/switch_test.go have been added to validate the fix and ensure correct processing of payment results under various conditions, including simulating disk persistence.
  • Mock Error Decryptor: A mockErrorDecryptor was introduced in htlcswitch/mock.go to facilitate comprehensive testing of error decryption scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a subtle bug in htlcswitch that led to misclassifying on-chain HTLC resolution errors. The fix of checking len(htlc.Reason) == 0 instead of htlc.Reason == nil is robust and handles the serialization artifact well. The addition of the TestExtractResult unit test is excellent, as it thoroughly covers various scenarios and prevents regressions. I have a couple of minor suggestions to improve comment clarity and fix a typo in the new test code.

isResolution: true,
},
expectedResult: &PaymentResult{
Error: NewDetailedLinkError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this test fails with the old nil check ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it fails since we fall into the default switch case within parseFailedPayment which attempts to perform error decryption.

@ziggie1984
Copy link
Collaborator

not super high prio because the local router still flags it as a failed payment but I think we can add the fix if you polish this PR.

@calvinrzachman
Copy link
Contributor Author

Basically, this should improve the functioning of parseFailedPayment such that it can handle this aspect of its godoc comment as it seems was intended.

//  2. A resolution from the chain arbitrator, which possibly has no failure
//     reason attached.

@calvinrzachman calvinrzachman force-pushed the switch-resolution-bug branch 2 times, most recently from bcb2bcd to 68c92d6 Compare December 22, 2025 17:56
This commit adds a new test, TestOnChainResolutionFailure,
which demonstrates a error misclassification in the htlcswitch.

The test shows that when an on-chain resolution with a nil/empty
error reason is read from the result store, it is misclassified by
the parseFailedPayment function. On master, this leads to an
incorrect error being returned to the ChannelRouter by the
GetAttemptResult function.

This test will fail on the current commit and will be fixed in the
subsequent commit.
This commit fixes a bug in `parseFailedPayment` where an on-chain
resolution with an empty reason was not being correctly identified
after being deserialized from disk by SubscribeResult or
GetResult. This impacts callers waiting for the final settle/fail
result on an htlc attempt via GetAttemptResult.

The lnwire codec transforms a `nil` reason into a non-nil, empty
byte slice and bypasses the existing check for `htlc.Reason == nil`.
Thankfully this does not crash the daemon. Rather, it just leads to
the caller (in our case, the ChannelRouter) not actually receiving
the expected FailPermanentChannelFailure link error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants