onionmessage: graph-based pathfinding for onion messages #10612
onionmessage: graph-based pathfinding for onion messages #10612Abdulkbk wants to merge 15 commits intolightningnetwork:masterfrom
Conversation
This commit introduces a new Mailbox interface that abstracts the message queue implementation for actors. Previously, actors used a direct channel for their mailbox, which limited flexibility and made it difficult to implement alternative mailbox strategies. The new Mailbox interface provides methods for sending, receiving, and draining messages, with full context support for cancellation. The Receive method leverages Go 1.23's iter.Seq pattern, providing a clean iterator-based API that allows natural for-range loops over messages. The ChannelMailbox implementation maintains the existing channel-based behavior while conforming to the new interface. It stores the actor's context internally, ensuring both caller and actor contexts are properly respected during send and receive operations. This simplifies context handling compared to complex context merging approaches. This abstraction enables future implementations such as priority mailboxes, persistent mailboxes, or bounded mailboxes with overflow strategies, without requiring changes to the actor implementation.
This commit adds thorough test coverage for the new Mailbox interface and ChannelMailbox implementation. The tests verify correct behavior across various scenarios including successful sends, context cancellation, mailbox closure, and concurrent operations. The test suite specifically validates that the mailbox respects both the caller's context and the actor's context during send and receive operations. This ensures that actors properly shut down when their context is cancelled, and that callers can cancel operations without affecting the actor's lifecycle. Additional tests cover edge cases such as zero-capacity mailboxes (which default to a capacity of 1), draining messages after closure, and concurrent sends from multiple goroutines. The concurrent test uses 10 senders each sending 100 messages to verify thread-safety and proper message ordering. All tests pass with the race detector enabled, confirming the implementation is free from data races.
This commit refactors the Actor implementation to use the new Mailbox interface instead of directly managing a channel. This change significantly simplifies the actor's message processing loop and improves separation of concerns. The main changes include replacing the direct channel field with a Mailbox interface, updating NewActor to create a ChannelMailbox instance, and refactoring the process method to use the iterator pattern provided by mailbox.Receive. The new implementation uses a clean for-range loop over the mailbox's message iterator, eliminating the complex select statement that previously handled both message reception and context cancellation. The Tell and Ask methods in actorRefImpl have been simplified to use the mailbox's Send method, which internally handles both the caller's context and the actor's context. This eliminates the need for complex select statements in these methods and ensures consistent context handling throughout the actor system. Message draining during shutdown is now handled through the mailbox's Drain method, providing a cleaner separation between normal message processing and cleanup operations. The actor still properly sends unprocessed messages to the Dead Letter Office and completes pending promises with appropriate errors during shutdown.
The new wire message defines the OnionMessagePayload, FinalHopPayload, ReplyPath, and related TLV encoding/decoding logic.
Update lightning-onion to commit that includes onion-messaging support.
Adds the NewNonFinalBlindedRouteDataOnionMessage function to create blinded route data specifically for onion messages.
Initialize a sphinx router without persistent replay protection logging for onion message processing. Onion messages don't require replay protection since they don't involve payment routing.
Introduce OnionPeerActor to handle the sending of onion message to each peer. The actor is registered with the receptionist pattern to enable message routing through the actor system. Also adds onion message feature bits to the protocol, so that the actor is only spawned when the peer supports onion messages.
Add onion message forwarding capability using the OnionPeerActor for communication. Messages are routed through a receptionist pattern where each peer has a dedicated OnionPeerActor for handling message sends. The OnionEndpoint uses the sphinx router for decoding and decrypting the onion message packet and the encrypted recipient data in the payload of the onion messages.
Change SubscribeOnionMessages RPC to return OnionMessageUpdate instead of OnionMessage, including the decrypted payload to enable payload inspection in integration tests. The encrypted recipient data is still not decrypted here.
Previously, each peer created its own OnionEndpoint during Start(), requiring SphinxOnionMsg and OnionMessageServer to be passed through the peer config. This added unnecessary fields to the peer's config struct. This commit refactors onion message handling so that: - The OnionEndpoint is created once at server startup - The shared endpoint is passed to peers via config - Peers simply register the endpoint with their message router This reduces the peer config surface and properly encapsulates the OnionEndpoint's dependencies (sphinx router, resolver, message server) at the server level where they naturally belong.
This commit adds a configuration flag to disable onion messaging support. When set, lnd will: - Not advertise the onion messages feature bit (39) in init and node announcements - Skip creating the OnionEndpoint at server startup - Not register an onion message handler with peers, so incoming onion messages are not processed
Add an LRU cache to GraphNodeResolver to avoid repeated database lookups when resolving SCIDs to node public keys. The cache stores up to 1000 compressed pubkey entries, which is sufficient for typical onion message forwarding scenarios. This change also introduces a NewGraphNodeResolver constructor to properly initialize the cache, replacing direct struct literal usage.
In this commit we add FindPath, a BFS-based shsortest-path algorithm that finds routes through the channel graph for onion messages. The search filters nodes by the OnionMessage feature bits (38/39). We also add a unit tests covering: direct neighbor routing, multi-hop paths, feature-bit filtering, missing destination nodes, destination without onion support, max hop limits, cycle handling, and shortest-path selection. choice of BFS is because there isn't any weight involve.
Summary of ChangesHello @Abdulkbk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly advances onion message capabilities by introducing a graph-based pathfinding mechanism. It refactors the internal actor system for more robust message handling and establishes a dedicated endpoint for processing and forwarding onion messages. These changes lay the groundwork for more sophisticated onion message interactions within the network, allowing nodes to efficiently discover and utilize paths while adhering to new feature signaling standards. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
The newly added commit is 744769f. |
There was a problem hiding this comment.
Code Review
This pull request introduces graph-based pathfinding for onion messages and refactors the actor model's mailbox implementation. My review focuses on improving code clarity, documentation, and robustness in the new onionmessage and related packages. I've suggested simplifying a boolean flag in the message endpoint, correcting a function comment to align with the style guide, adding a missing function comment, and adjusting a length check for more precise error handling in payload decoding. Overall, the changes are well-structured and the new functionality is thoroughly tested.
| // NewNonFinalBlindedRouteData creates the data that's provided for hops within | ||
| // a blinded route. | ||
| func NewNonFinalBlindedRouteDataOnionMessage( |
There was a problem hiding this comment.
The comment for this function appears to be a copy of the one for NewNonFinalBlindedRouteData and doesn't accurately describe this new function. According to the repository style guide (lines 17-18), function comments should begin with the function name and describe its purpose. Please update the comment to reflect that this function is for onion messages and can handle both node IDs and SCIDs for the next hop.
| // NewNonFinalBlindedRouteData creates the data that's provided for hops within | |
| // a blinded route. | |
| func NewNonFinalBlindedRouteDataOnionMessage( | |
| // NewNonFinalBlindedRouteDataOnionMessage creates the data that's provided for | |
| // hops within a blinded route for an onion message. It can specify the next hop | |
| // by either public key or short channel ID. | |
| func NewNonFinalBlindedRouteDataOnionMessage( | |
References
- Every function must be commented with its purpose and assumptions, and the comment must begin with the function name. (link)
| func findPeerActor(receptionist *actor.Receptionist, pubKey [33]byte, | ||
| ) fn.Option[actor.ActorRef[*Request, *Response]] { |
There was a problem hiding this comment.
This function is missing a comment, which violates the repository style guide (line 17: "Every function must be commented with its purpose and assumptions."). Please add a comment that describes what this function does.
| func findPeerActor(receptionist *actor.Receptionist, pubKey [33]byte, | |
| ) fn.Option[actor.ActorRef[*Request, *Response]] { | |
| // findPeerActor looks up an onion peer actor in the receptionist using the | |
| // peer's public key. | |
| func findPeerActor(receptionist *actor.Receptionist, pubKey [33]byte, | |
| ) fn.Option[actor.ActorRef[*Request, *Response]] { | |
References
- Every function must be commented with its purpose and assumptions. (link)
| var isProcessedSuccessful bool = false | ||
|
|
||
| // Handle the routing action. | ||
| payload := fn.ElimEither(routingAction, | ||
| func(forwardAction forwardAction) *lnwire.OnionMessagePayload { | ||
| log.DebugS(logCtx, "Forwarding onion message", | ||
| lnutils.LogPubKey("next_node_id", | ||
| forwardAction.nextNodeID), | ||
| ) | ||
|
|
||
| err := o.forwardMessage( | ||
| ctx, forwardAction.nextNodeID, | ||
| forwardAction.nextPathKey, | ||
| forwardAction.nextPacket, | ||
| ) | ||
|
|
||
| if err != nil { | ||
| log.ErrorS(logCtx, "Failed to forward onion "+ | ||
| "message", err) | ||
| } | ||
|
|
||
| isProcessedSuccessful = true | ||
|
|
||
| return forwardAction.payload | ||
| }, | ||
| func(deliverAction deliverAction) *lnwire.OnionMessagePayload { | ||
| log.DebugS(logCtx, "Delivering onion message to self") | ||
| isProcessedSuccessful = true | ||
| return deliverAction.payload | ||
| }) | ||
|
|
||
| // Convert peer []byte to [33]byte. | ||
| var peerArr [33]byte | ||
| copy(peerArr[:], peer) | ||
|
|
||
| // Convert path key []byte to [33]byte. | ||
| var pathKeyArr [33]byte | ||
| copy(pathKeyArr[:], onionMsg.PathKey.SerializeCompressed()) | ||
|
|
||
| // Create the onion message update to send to subscribers. | ||
| update := &OnionMessageUpdate{ | ||
| Peer: peerArr, | ||
| PathKey: pathKeyArr, | ||
| OnionBlob: onionMsg.OnionBlob, | ||
| } | ||
|
|
||
| // If we have a payload, add its contents to our update. | ||
| if payload != nil { | ||
| customRecords := make(record.CustomSet) | ||
| for _, v := range payload.FinalHopTLVs { | ||
| customRecords[uint64(v.TLVType)] = v.Value | ||
| } | ||
| update.CustomRecords = customRecords | ||
| update.ReplyPath = payload.ReplyPath | ||
| update.EncryptedRecipientData = payload.EncryptedData | ||
| } | ||
|
|
||
| // Send the update to any subscribers. | ||
| if sendErr := o.onionMessageServer.SendUpdate(update); sendErr != nil { | ||
| log.ErrorS(logCtx, "Failed to send onion message update", | ||
| sendErr) | ||
|
|
||
| return false | ||
| } | ||
|
|
||
| return isProcessedSuccessful | ||
| } |
There was a problem hiding this comment.
The isProcessedSuccessful variable is initialized to false and then always set to true if the message can be routed. This logic can be simplified by removing the variable and returning true at the end of the function if no errors that warrant a false return are encountered. This would make the code's intent clearer.
// Handle the routing action.
payload := fn.ElimEither(routingAction,
func(forwardAction forwardAction) *lnwire.OnionMessagePayload {
log.DebugS(logCtx, "Forwarding onion message",
lnutils.LogPubKey("next_node_id",
forwardAction.nextNodeID),
)
err := o.forwardMessage(
ctx, forwardAction.nextNodeID,
forwardAction.nextPathKey,
forwardAction.nextPacket,
)
if err != nil {
log.ErrorS(logCtx, "Failed to forward onion "+
"message", err)
}
return forwardAction.payload
},
func(deliverAction deliverAction) *lnwire.OnionMessagePayload {
log.DebugS(logCtx, "Delivering onion message to self")
return deliverAction.payload
})
// Convert peer []byte to [33]byte.
var peerArr [33]byte
copy(peerArr[:], peer)
// Convert path key []byte to [33]byte.
var pathKeyArr [33]byte
copy(pathKeyArr[:], onionMsg.PathKey.SerializeCompressed())
// Create the onion message update to send to subscribers.
update := &OnionMessageUpdate{
Peer: peerArr,
PathKey: pathKeyArr,
OnionBlob: onionMsg.OnionBlob,
}
// If we have a payload, add its contents to our update.
if payload != nil {
customRecords := make(record.CustomSet)
for _, v := range payload.FinalHopTLVs {
customRecords[uint64(v.TLVType)] = v.Value
}
update.CustomRecords = customRecords
update.ReplyPath = payload.ReplyPath
update.EncryptedRecipientData = payload.EncryptedData
}
// Send the update to any subscribers.
if sendErr := o.onionMessageServer.SendUpdate(update); sendErr != nil {
log.ErrorS(logCtx, "Failed to send onion message update",
sendErr)
return false
}
return true| // If we have the correct type, and the length is sufficient (first node | ||
| // pubkey (33) + blinding point (33) + hop count (1) = 67 bytes), decode | ||
| // the reply path. | ||
| if p, ok := val.(*sphinx.BlindedPath); ok && l > 67 { |
There was a problem hiding this comment.
The check l > 67 is a bit too strict. A reply path payload must be at least 67 bytes. If a payload of exactly 67 bytes is received, this check will fail and a generic TLV decoding error will be returned. However, a 67-byte payload could contain a hop count of 0, which should result in the more specific ErrNoHops error. Changing the check to l >= 67 would allow the function to handle this case more gracefully and provide a more precise error.
| if p, ok := val.(*sphinx.BlindedPath); ok && l > 67 { | |
| if p, ok := val.(*sphinx.BlindedPath); ok && l >= 67 { |
Change Description
Part of #10220
Depends on #10089
This PR builds on top of #10089 (onion message forwarding) and is part of the onion message tracking issue #10220. The forwarding layer can relay messages hop-by-hop, but a sender still needs a way to find a path through the graph to a destination. This PR adds that piece.
A FindPath function in
onionmessage/pathfind.gothat discovers a route from a local node to a destination through nodes that advertise onion message support (feature bit 38/39, OnionMessagesOptional/OnionMessagesRequired).We use
Breadth-First Search(BFS) over hop count. Since onion messages are not priced by fees or liquidity, the meaningful routing metric is path length. Shorter paths reduce the relay burden on intermediate nodes and lower latency. BFS naturally finds the minimum-hop path, which is appropriate for the channel graph. Intermediate nodes are included only if they advertise support for onion messages (bits 39/38).A set of unit tests is included to test for various scenarios.