-
Notifications
You must be signed in to change notification settings - Fork 116
Fix two circular Arc references
#736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix two circular Arc references
#736
Conversation
|
I've assigned @tnull as a reviewer! |
58ab500 to
05d7a7d
Compare
|
Note that this is (obviously) blocked on landing the upstream LDK PR. |
|
This needs a rebase now that lightningdevkit/rust-lightning#4294 and #743 landed. When you bump the LDK dependency to |
`LiquiditySource` takes a reference to our `PeerManager` but the `PeerManager` holds an indirect reference to the `LiquiditySource`. As a result, after our `Node` instance is `stop`ped and the `Node` `drop`ped, much of the node's memory will stick around, including the `NetworkGraph`. Here we fix this issue by using `Weak` pointers, though note that there is another issue caused by LDK's gossip validation API.
In added logic to use the HRN resolver from `bitcoin-payment-instructions`, we created a circular `Arc` reference - the `LDKOnionMessageDNSSECHrnResolver` is used as a handler for the `OnionMessenger` but we also set a post-queue-action which holds a reference to the `PeerManager`. As a result, after our `Node` instance is `stop`ped and the `Node` `drop`ped, much of the node's memory will stick around, including the `NetworkGraph`. Here we fix this issue by using `Weak` pointers.
05d7a7d to
150b470
Compare
LDK's gossip validation API basically forced us to have a circular `Arc` reference, leading to memory leaks after `drop`ping an instance of `Node`. This is fixed upstream in LDK PR #4294 which we update to here.
150b470 to
93c38bd
Compare
|
Cleaned up and rebased. Note that a second arc circular ref slipped in in the interim. IMO the first commit should be backported as it by itself fixes |
1d9d3ba to
4e1df76
Compare
Due to two circular `Arc` references, after `stop`ping and `drop`ping the `Node` instance the bulk of ldk-node's memory (in the form of the `NetworkGraph`) would hang around. Here we add a test for this in our integration tests, checking if the `NetworkGraph` (as a proxy for other objects held in reference by the `PeerManager`) hangs around after `Node`s are `drop`ped.
4e1df76 to
97a750f
Compare
Thanks, will take a look. CI failure is unrelated (see #748). |
tnull
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, two questions regarding the cycle tests commit.
| - name: Run CLN integration tests | ||
| run: RUSTFLAGS="--cfg cln_test" cargo test --test integration_tests_cln | ||
| run: RUSTFLAGS="--cfg cln_test --cfg cycle_tests" cargo test --test integration_tests_cln |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, any reason we need to add this cfg to all integration tests? Why not only on the Rust tests where the test lives?
| }) | ||
| } | ||
|
|
||
| #[cfg(cycle_tests)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than leaking internals and adding all that boilerplate which only makes sense in the cycle_tests context, could we just add a LeakChecker object (probably living in src/leak_checker.rs) taking a number of weak references and holding it as the last field in Node. This LeakChecker could then check all counts are 0 in its drop implementation? Seems like that would a) encapsulate the checks more cleanly b) could also be easily used to check that more (most?) of the main object Arcs aren't leaking on Node::drop?
Based on lightningdevkit/rust-lightning#4294, basically, but the first fix should probably get backported to 0.7 as it fixes the issue for RGS ldk-node instances even though we'll have to wait for an LDK update to fix the issue for P2P gossip instances.