Skip to content

Conversation

@bitromortac
Copy link
Collaborator

Tries to fix #9769.

Demonstrates the migration error and attempts a fix by making the failure message an optional tlv record. I'm not sure if we want to do this because it's strange that the failure message is nil in the first place. Perhaps we want to instead migrate nil failure messages to CodeNone instead or drop these entries.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 28, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@saubyk saubyk added this to the v0.19.0 milestone Apr 28, 2025
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass looks good, need to take a second look tho.

msg: tlv.NewRecordT[tlv.TlvType1](failureMessage{failureMsg}),
}

if failureMsg != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment why this might happen, decoding error of the error msg.

What about the case where we have a nil source index:

if sourceIdx == nil {
return &paymentFailure{}
}

we should instead return nil here, so we do not create the failure information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should instead return nil here, so we do not create the failure information.

Ah, I think we should do something about the pointer, I converted to use a fn.Option[paymentFailure] in other places and removed the pointer, because newPaymentFailure always constructs a paymentFailure, it still represents a payment failure if the info fields are absent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment why this might happen, decoding error of the error msg.

I added comments to the paymentFailure fields.

What about the case where we have a nil source index

The sourceIdx is a prerequisite to be able to decrypt the error via the hop's shared secret, so I think it makes sense to return an empty paymentFailure here, because if we don't know the index, we won't be able to get a message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I more or less had the same question when looking at the code. Perhaps this explanation is worth a comment in both versions of newPaymentFailure where we return an empty paymentFailure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like one level of nesting can be removed, trying that out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work, I removed paymentFailure, whose role is now taken over by the old paymentFailureInfo. Instead of checking the optional record in paymentFailure, I think we can use the source index instead. I put all mission control migration changes together in the last commit, which was easier for me. Let me know if I should make changes in lockstep.

@bitromortac bitromortac force-pushed the 2504-mc-migration-fix branch from 70f4cba to 68e6ad9 Compare April 29, 2025 10:40
@bitromortac bitromortac marked this pull request as ready for review April 29, 2025 10:40
@bitromortac bitromortac requested a review from ziggie1984 April 29, 2025 10:46
@bitromortac bitromortac force-pushed the 2504-mc-migration-fix branch from 68e6ad9 to dc7d1c9 Compare April 30, 2025 14:52
@bitromortac bitromortac requested a review from guggero April 30, 2025 14:52
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

This is later used to handle their nil values.
We can remove paymentFailureInfo since we can gate the result
interpretation on the source index, meaning that if we don't have a
source index, we deal with an unknown payment outcome because we
couldn't pinpoint the failing hop.
This introduces an option instead to signal whether there was a failure
for result interpretation.
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, in my opinion we can also remove the sourceIdx pointer and check the nil case outside and return early or return an option. Because I don't think we want to persist an empty payment failure tlv in the case sourceIdx == nil espcially since the tlv is optional anyways ?

if failure.info.IsNone() {
// Not having a source index means that we were unable to decrypt the
// error message.
if failure.sourceIdx.IsNone() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we error out if the sourceIdx is none but the failure.msg is not ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned in other response


// If we can't identify a failure source, we also won't have a decrypted
// failure message. In this case we return an empty payment failure.
if sourceIdx == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer removing this pointer for sourceIdx, and catch the nil case outside of this function. or we return an option here for the paymentFailure, and set it to None in the sourceIdx=nil case wdyt ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should check the case sourceIdx == nil but failureMsg not and report an error ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer removing this pointer for sourceIdx, and catch the nil case outside of this function. or we return an option here for the paymentFailure, and set it to None in the sourceIdx=nil case wdyt ?

Because I don't think we want to persist an empty payment failure tlv in the case sourceIdx == nil espcially since the tlv is optional anyways ?

I think we always want to construct a paymentFailure here even if it's empty, because its presence in paymentResult indicates a payment failure (just one we don't know how to interpret precisely). So I think we can't save more in terms of serialization?

// failure with unknown details. Otherwise, the index and failure message are
// used to populate the info field of the paymentFailure.
func newPaymentFailure(sourceIdx *int,
failureMsg lnwire.FailureMessage) *paymentFailure {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned before I think we can also get rid of the sourceIdx pointer ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently not seeing a lot of advantages in removing the source index pointer and would like to keep further changes in this bug fix PR minimal (also with regards to changing the signature to return an error), since this is not touching the serialization. I can change it though if you feel strongly about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool let's keep it as is.

store](https://github.com/lightningnetwork/lnd/pull/8911) to use a more
minimal encoding for payment attempt routes as well as use [pure TLV
encoding](https://github.com/lightningnetwork/lnd/pull/9167).
* [Migrate the mission control
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release notes in a separate commit please.

This commit is temporary and demonstrates a panic. To be squashed with
the following commit.
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bitromortac bitromortac force-pushed the 2504-mc-migration-fix branch from dc7d1c9 to 5020e1f Compare May 2, 2025 06:55
@bitromortac
Copy link
Collaborator Author

Will also check for errors if somebody ran with the previous serialization.

@bitromortac
Copy link
Collaborator Author

If anybody ran with the migration included in v0.19.0-rc3 and has migrated or experienced routing failures, we'd expect something like

2025-05-02 11:30:17.213 [INF] CRTR missioncontrol_store.go:107: Loaded 6 mission control entries
2025-05-02 11:30:17.213 [DBG] CRTR missioncontrol.go:455: [default]: Mission control state reconstruction started
2025-05-02 11:30:17.213 [ERR] LTND lnd.go:169: Shutting down due to error in main method rev="" err="can't create mission control manager: ErrTypeForDecoding want (type: *uint8, length: 1), got (type: *uint8, length: 147)"
2025-05-02 11:30:17.213 [INF] NTFN bitcoind.go:228: Cancelling epoch notification, epoch_id=1
2025-05-02 11:30:17.214 [INF] LTND lnd.go:252: Stopping pprof server... rev=""
2025-05-02 11:30:17.214 [INF] LTND lnd.go:152: Shutdown complete
unable to create server: can't create mission control manager: ErrTypeForDecoding want (type: *uint8, length: 1), got (type: *uint8, length: 147)

because of the change in serialization, just to be aware of that. This can be resolved by lncli resetmc running v0.19.0-rc3.

@guggero guggero merged commit 334a7d1 into lightningnetwork:master May 2, 2025
35 of 36 checks passed
@ziggie1984
Copy link
Collaborator

because of the change in serialization, just to be aware of that. This can be resolved by lncli resetmc running v0.19.0-rc3.

But the user has to revert back to a prior version to startup LND or ?

@guggero
Copy link
Collaborator

guggero commented May 2, 2025

They need to run lncli resetmc before updating to the next RC (which will be RC4). I'll note that in the release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: mission control migration fails for nil payment failure message

4 participants