Skip to content

lightning-block-sync: Implement serialization logic for header Cache types #3600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tnull
Copy link
Contributor

@tnull tnull commented Feb 13, 2025

During syncing, lightning-block-sync populates as a block header Cache that is crucial to be able to disconnected previously-connected headers cleanly in case of a reorg. Moreover, the Cache can have performance benefits as subsequently synced listeners might not necessarily need to lookup all headers again from the chain source.

While this Cache is ~crucial to the clean operation of lightning-block-sync, it was previously not possible to persist it to disk due to an absence of serialization logic implementations for the corresponding sub-types. Here, we do just that (Implement said serialization logic) to allow users to persist the Cache.

Making use of the serialization logic for all the sub-types, we also switch the UnboundedCache type to be a newtype wrapper around a HashMap (rather than a straight typedef) and implement TLV-based serialization logic on it.

@tnull tnull requested a review from jkczyz February 13, 2025 12:41
@tnull tnull force-pushed the 2025-02-header-cache-persistence branch from bed21f8 to a403bf9 Compare February 13, 2025 13:44
During syncing, `lightning-block-sync` populates as a block header
`Cache` that is crucial to be able to disconnected previously-connected
headers cleanly in case of a reorg. Moreover, the `Cache` can have
performance benefits as subsequently synced listeners might not
necessarily need to lookup all headers again from the chain source.

While this `Cache` is ~crucial to the clean operation of
`lightning-block-sync`, it was previously not possible to persist it to
disk due to an absence of serialization logic implementations for the
corresponding sub-types. Here, we do just that (Implement said
serialization logic) to allow users to persist the `Cache`.

Making use of the serialization logic for all the sub-types, we also
switch the `UnboundedCache` type to be a newtype wrapper around a
`HashMap` (rather than a straight typedef) and implement TLV-based
serialization logic on it.
@tnull tnull force-pushed the 2025-02-header-cache-persistence branch from a403bf9 to 0207228 Compare February 13, 2025 13:52
@TheBlueMatt
Copy link
Collaborator

I'm not sure I quite understand the desire to serialize this. We use it to keep track of things as we fetch them, but generally shouldn't ever need the cache entries after we connect the blocks (and persist the things that we connected the blocks to).

@tnull
Copy link
Contributor Author

tnull commented Feb 13, 2025

I'm not sure I quite understand the desire to serialize this. We use it to keep track of things as we fetch them, but generally shouldn't ever need the cache entries after we connect the blocks (and persist the things that we connected the blocks to).

Talked with @jkczyz offline about this: AFAIU, we use the cache to keep headers around that might have been connected to another fork. Say we synced a listener up to a certain tip. If it now goes offline, and a reorg happens in the meantime, lightning-block-sync might not have the necessary old header data around to call block_disconnected before re-connecting the blocks of the new best chain.

Additionally, it might bring quite some performance improvements for lightning_block_sync::init::synchronize_listeners which first walks the chain for each listener individually before connecting blocks to them, and only in this last step it's actually inserting into the cache. Meaning: if we sync 6 chain listeners, we're walking the header chain 6 times completely uncached on restart before entering the 'block connection phase'. If we don't persist the Cache, that is.

@jkczyz
Copy link
Contributor

jkczyz commented Feb 13, 2025

Additionally, it might bring quite some performance improvements for lightning_block_sync::init::synchronize_listeners which first walks the chain for each listener individually before connecting blocks to them, and only in this last step it's actually inserting into the cache. Meaning: if we sync 6 chain listeners, we're walking the header chain 6 times completely uncached on restart before entering the 'block connection phase'. If we don't persist the Cache, that is.

Alternatively (or in addition), we could manually update the cache at each iteration during initialization from the ChainDifference, which would mean we'd want some AppendOnlyCache instead of a ReadOnlyCache there. That way we'd only need to query the chain source once for anything not yet cached prior to restarting.

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Feb 13, 2025

If it now goes offline, and a reorg happens in the meantime, lightning-block-sync might not have the necessary old header data around to call block_disconnected before re-connecting the blocks of the new best chain.

This is true if you switch to a new bitcoind, but AFAIU using the same bitcoind (data dir) will always provide us with the missing block (headers).

Additionally, it might bring quite some performance improvements for lightning_block_sync::init::synchronize_listeners which first walks the chain for each listener individually before connecting blocks to them, and only in this last step it's actually inserting into the cache. Meaning: if we sync 6 chain listeners, we're walking the header chain 6 times completely uncached on restart before entering the 'block connection phase'. If we don't persist the Cache, that is.

This seems like a bug we could fix, another way, no?

@tnull
Copy link
Contributor Author

tnull commented Feb 13, 2025

Alternatively (or in addition), we could manually update the cache at each iteration during initialization from the ChainDifference, which would mean we'd want some AppendOnlyCache instead of a ReadOnlyCache there. That way we'd only need to query the chain source once for anything not yet cached prior to restarting.

I guess we could make the cache append only, but that's ~orthogonal to this PR, IMO.

This is true if you switch to a new bitcoind, but AFAIU using the same bitcoind (data dir) will always provide us with the missing block (headers).

Right, so if we want to be able to safely handle switiching chain sources, we need to at least support a minimal persisted cache (in LDK Node we restrict it to 100 entries currently, but could even get away with ~ANTI_REORG_DELAY, IIUC).

This seems like a bug we could fix, another way, no?

I don't think it's a bug per se. We need to walk for each listener separately to make sure we find the connection points reliably. It would be nice to make use of caching during it though.

@TheBlueMatt
Copy link
Collaborator

Right, so if we want to be able to safely handle switiching chain sources, we need to at least support a minimal persisted cache (in LDK Node we restrict it to 100 entries currently, but could even get away with ~ANTI_REORG_DELAY, IIUC).

If we want to support this, IMO its pretty weird to require the cache be persisted by a downstream project, rather if we need it (we should check whether we can just remove the header in the disconnect call and not worry about it, I think we probably can) and then store a few block headers/hashes going back in the listeners themselves, cause they're the things responsible for communicating their current chain location, no?

@tnull
Copy link
Contributor Author

tnull commented Feb 14, 2025

If we want to support this, IMO its pretty weird to require the cache be persisted by a downstream project, rather if we need it (we should check whether we can just remove the header in the disconnect call and not worry about it, I think we probably can) and then store a few block headers/hashes going back in the listeners themselves, cause they're the things responsible for communicating their current chain location, no?

On LDK Node's end it would be nice to maintain the cache, also as a performance optimization. And IMO we can leave it to the user to decide if/how much data they want to persist. In any case, I'm not sure why we wouldn't want to allow for the cache to be persisted?

@TheBlueMatt
Copy link
Collaborator

I'm a bit confused, though, the cache should really only be relevant when we're reorging, which shouldn't be common? Why is it a material performance difference?

@tnull
Copy link
Contributor Author

tnull commented Mar 5, 2025

I'm a bit confused, though, the cache should really only be relevant when we're reorging, which shouldn't be common? Why is it a material performance difference?

Well, a persisted cache could help performance to resume syncing in the initial 'walking the chain' step. As also discussed offline, it's also required to allow for switching chain sources, e.g., switching from one bitcoind instance to another as we can't be sure that all nodes have seen the same forks and keep the data readily available. So even if you don't follow the former reason, I'd still argue we should support the latter in LDK Node / Server to make the chain syncing logic more robust.

@TheBlueMatt
Copy link
Collaborator

Okay, but in this case we should probably persist a fixed-size cache of, let's say, 6 blocks (the depth that we assume no reorgs will happen), no?

@tnull
Copy link
Contributor Author

tnull commented Apr 10, 2025

Okay, but in this case we should probably persist a fixed-size cache of, let's say, 6 blocks (the depth that we assume no reorgs will happen), no?

Well, maybe, but a) I think that should be up to the user's Cache implementation and b) even if we go down that road in the future, we'd still need the changes in this PR.

Would be great to get this going again, as I'd like to be able to make use of cache persistence after the next LDK release in LDK Node.

@TheBlueMatt
Copy link
Collaborator

I still really strongly disagree with the approach of persisting the cache.

In the "performance improvement" case, we shouldn't be relying on downstream-of-LDK applications to fix lightning-block-sync's performance issues. We have a header cache in synchronize_listeners and we even use it for the poller passed to find_difference, the performance issue is actually that we don't use the header cache when doing header lookups in, eg, UnboundedCache. If we fixed that, then we'd only load headers for the walk-back once (as long as all objects are at the same tip, but in any case we'll only ever load headers once). That seems fine given, in the absence of reorgs, we'd actually only ever load headers we hadn't seen when we shut down.

For the "switching chain sources" issue, I'd very much prefer to look into this approach, again because LDK requiring that downstream applications work around API limitations is a really bad thing in general:

If we want to support this, IMO its pretty weird to require the cache be persisted by a downstream project, rather if we need it (we should check whether we can just remove the header in the disconnect call and not worry about it, I think we probably can) and then store a few block headers/hashes going back in the listeners themselves, cause they're the things responsible for communicating their current chain location, no?

@tnull
Copy link
Contributor Author

tnull commented May 27, 2025

Excuse the delay here.

In the "performance improvement" case, we shouldn't be relying on downstream-of-LDK applications to fix lightning-block-sync's performance issues. We have a header cache in synchronize_listeners and we even use it for the poller passed to find_difference, the performance issue is actually that we don't use the header cache when doing header lookups in, eg, UnboundedCache.

No, but the cache is immutable at that state on purpose to avoid essential entries being dropped (e.g., in case of a bounded cache) as we walk back to the latest known connection point. We even have ReadOnlyCache to ensure that. AFAIU, the cache was always meant to be persisted as it's crucial for reorg handling, and not adding support for this was merely an oversight. (cc @jkczyz, please correct me on this if I'm wrong).

If we fixed that, then we'd only load headers for the walk-back once (as long as all objects are at the same tip, but in any case we'll only ever load headers once). That seems fine given, in the absence of reorgs, we'd actually only ever load headers we hadn't seen when we shut down.

No, because we walk (and arguably need to walk) each listerener individually back to the last connection point they know, and during that we can't override/remove entries from the cache as it would mess with the data we need for proper reorg handling.

I don't think we need to/should fundamentally rewrite and rethink all of lightning-block-sync just to mitigate the issues at hand. This PR is really all we need.

For the "switching chain sources" issue, I'd very much prefer to look into this approach, again because LDK requiring that downstream applications work around API limitations is a really bad thing in general:

So you'd rather have lightning-block-sync take a KVStore, with all the async/blocking messiness to come, and take care of cache persistence itself, likely then as yet another thing driven by the background processor? Do we expect that many users using 'raw' lightning-block-sync going forward that prioritizing the API over clear separation of concerns of our code makes sense? Why can't I just persist the cache and give it back to lightning-block-sync after reinit, just like LDK requires me to do with all these other objects?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants