Skip to content

Avoid parsing PublicKeys when handling RGS updates #3581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 4, 2025

Conversation

TheBlueMatt
Copy link
Collaborator

The first commit fixes a regression in 0.1 and should be backported, the second is just an optimization but shouldn't be backported as it changes the public API.

arik-so
arik-so previously approved these changes Jan 31, 2025
@TheBlueMatt
Copy link
Collaborator Author

Oh, new commit improves the bench some 25% or so, too.

arik-so
arik-so previously approved these changes Jan 31, 2025
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could just use some more verbose comments/log messages.

@tnull
Copy link
Contributor

tnull commented Jan 31, 2025

That said, CI is said as the PublicKey to NodeId broke something.

arik-so
arik-so previously approved these changes Jan 31, 2025
@tnull
Copy link
Contributor

tnull commented Feb 1, 2025

Feel free to squash.

Copy link
Member

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM!

Since there aren’t any direct code references explaining why we switch to using NodeId in RGS, it might be helpful to add a quick note in the commit message (of ddc3326) on why skipping PublicKey verification is safe in this case.
I think adding a little extra context could make it clearer for future readers!

`PublicKey` parsing is relatively expensive as we have to check if
the point is actually on the curve. To avoid it, our `NetworkGraph`
uses `NodeId`s which don't have the validity requirement.

Sadly, we were always parsing the broadcasting node's `PublicKey`
from the `node_id` in the network graph whenever we see an update
for that channel, whether we have a corresponding signature or not.

Here we fix this, only parsing the public key (and hashing the
message) if we're going to check a signature.
`PublicKey` parsing is relatively expensive as we have to check if
the point is actually on the curve. To avoid it, our `NetworkGraph`
uses `NodeId`s which don't have the validity requirement.

Here, we take advantage of that in RGS application to avoid parsing
`PublicKey`s, improving performance.
When we build a new `NetworkGraph` from empty, we're generally
doing an initial startup and will be syncing the graph very soon.
Using an initially-empty `IndexedMap` for the `channels` and
`nodes` results in quite some memory churn, with the initial RGS
application benchmark showing 15% of its time in pagefault handling
alone (i.e. allocating new memory from the OS, let alone the 23%
of time in `memmove`).

Further, when deserializing a `NetworkGraph`, we'd swapped the
expected node and channel count constants, leaving the node map
too small and causing map doubling as we read entries from disk.

Finally, when deserializing, allocating only exactly the amount of
map entries we need is likely to lead to at least one doubling, so
we're better off just over-estimating the number of nodes and
channels and allocating what we want.

Here we just always allocate `channels` and `nodes` based on
constants, leading to a 20%-ish speedup in the initial RGS
application benchmark.
@TheBlueMatt
Copy link
Collaborator Author

Squashed with one extra commit at the top that just adds more detail to the NodeId docs to describe why it exists.

@tnull
Copy link
Contributor

tnull commented Feb 3, 2025

Mhh, seems fuzz is failing. Not sure if related, but it blocks merging.

@TheBlueMatt
Copy link
Collaborator Author

Hmm, its possible this caused performance of some fuzz target to regress, but I think more likely github just decided to run it on a slower machine or so, so gonna disable the fuzz-required and merge.

@TheBlueMatt TheBlueMatt merged commit be93dc3 into lightningdevkit:main Feb 4, 2025
23 of 25 checks passed
@TheBlueMatt
Copy link
Collaborator Author

Backported the first (fixing the perf regression) and third (more correctly reserving the graph structures for a large performance win) commits in #3613

TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Apr 3, 2025
v0.1.2 - Apr 02, 2025 - "Foolishly Edgy Cases"

API Updates
===========

 * `lightning-invoice` is now re-exported as `lightning::bolt11_invoice`
   (lightningdevkit#3671).

Performance Improvements
========================

 * `rapid-gossip-sync` graph parsing is substantially faster, resolving a
   regression in 0.1 (lightningdevkit#3581).
 * `NetworkGraph` loading is now substantially faster and does fewer
   allocations, resulting in a 20% further improvement in `rapid-gossip-sync`
   loading when initializing from scratch (lightningdevkit#3581).
 * `ChannelMonitor`s for closed channels are no longer always re-persisted
   immediately after startup, reducing on-startup I/O burden (lightningdevkit#3619).

Bug Fixes
=========

 * BOLT 11 invoices longer than 1023 bytes long (and up to 7089 bytes) now
   properly parse (lightningdevkit#3665).
 * In some cases, when using synchronous persistence with higher latency than
   the latency to communicate with peers, when receiving an MPP payment with
   multiple parts received over the same channel, a channel could hang and not
   make progress, eventually leading to a force-closure due to timed-out HTLCs.
   This has now been fixed (lightningdevkit#3680).
 * Some rare cases with multi-hop BOLT 11 route hints or multiple redundant
   blinded paths could have led to the router creating invalid `Route`s were
   fixed (lightningdevkit#3586).
 * Corrected the decay logic in `ProbabilisticScorer`'s historical buckets
   model. Note that by default historical buckets are only decayed if no new
   datapoints have been added for a channel for two weeks (lightningdevkit#3562).
 * `{Channel,Onion}MessageHandler::peer_disconnected` will now be called if a
   different message handler refused connection by returning an `Err` from its
   `peer_connected` method (lightningdevkit#3580).
 * If the counterparty broadcasts a revoked state with pending HTLCs, those
   will now be claimed with other outputs which we consider to not be
   vulnerable to pinning attacks if they are not yet claimable by our
   counterparty, potentially reducing our exposure to pinning attacks (lightningdevkit#3564).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants