-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvdb+channeldb: speed up graph cache #6111
Conversation
kvdb/postgres/readwrite_bucket.go
Outdated
|
||
func (b *readWriteBucket) Prefetch(paths ...[]string) {} | ||
|
||
func (b *readWriteBucket) ForEachFast(cb func(k, v []byte) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about naming this ForAll
? Since it'd be a valid question to ask why ForEach
is not as fast as ForEachFast
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm naming is difficult. ForEach
and ForAll
also sounds like the same thing to me. The functional difference is that the faster one doesn't allow further queries in the callback. Maybe that can somehow be expressed in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I missed the not allowing more queries in the callback part. What I've expected instead is that it'll simply batch fetch more items at once under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also weighing in on the naming discussion :) How about ForEachUnordered
(the main promise of the cursor based ForEach
is the sort by key and seek functionality, right?) or ForEachNative
(implying that the underlying implementation might decide on the best approach for executing it, could also mention no specific ordering is guaranteed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I wanted to do originally with the cursor. In that case, no new method is needed either. But I remember there was a problem with that, only what was it?
I don't think it was writing to the bucket inside the loop, because that is already not supported with bbolt - or is it? Don't we have these iterate and delete loops? We can also only do the optimization for read-only transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or alternatively we can use start
(index). That would translate to OFFSET .. LIMIT ..
in sql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is still the case with GetAllPaginated
that the callback isn't allowed to do other queries, so maybe this is just scope creep beyond a single ForAll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe your idea is to not have a callback with GetAllPaginated
and return []KV
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. So this would also allow nested keys to be retrieved in a single query. And besides
limit
you'd also want to specify some kind offromKey
right?
Yes.
Or alternatively we can use start (index). That would translate to OFFSET .. LIMIT .. in sql.
I think fromKey
is more "portable".
I think it is still the case with GetAllPaginated that the callback isn't allowed to do other queries, so maybe this is just scope creep beyond a single ForAll?
Maybe we should go even further and add it to the kvdb.DB
interface as an extension. That'd indicate that it's a pure optimization thing and is not needed to be compatible with the rest of the kvdb
interfaces.
Or maybe your idea is to not have a callback with GetAllPaginated and return []KV ?
Yeah, sgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the idea, but after sleeping on it, I think we should do the minimum that is necessary at this point. Just fix the immediate problem with postgres startup loading.
If another critical optimization comes up, we can look at what is needed for that. I don't expect that updating the ForAllNative
that we use now to something slightly different in the future will be a problem.
Concept ACK. |
2fc8656
to
cd403eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK from my side as well.
I'm asking myself why the ForEach
would be implemented by using cursors in the first place? I get that this is coming from porting over the bbolt bucket logic. But thinking about it now, is there ever a reason to use cursors instead of basically doing what ForEachFast
does now (but maybe with an ORDER BY
clause)?
kvdb/postgres/readwrite_bucket.go
Outdated
|
||
func (b *readWriteBucket) Prefetch(paths ...[]string) {} | ||
|
||
func (b *readWriteBucket) ForEachFast(cb func(k, v []byte) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also weighing in on the naming discussion :) How about ForEachUnordered
(the main promise of the cursor based ForEach
is the sort by key and seek functionality, right?) or ForEachNative
(implying that the underlying implementation might decide on the best approach for executing it, could also mention no specific ordering is guaranteed).
The reason to use a cursor (or really a single query per row) is that it wouldn't be possible otherwise to query within the callback. While the SELECT is running, nothing else can be done in that same tx. |
0d43346
to
6270ba1
Compare
6270ba1
to
91ab9ea
Compare
Cleaned up PR, added test coverage for graph cache population. |
4bd2ad3
to
f3802e2
Compare
I discovered that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first level pass. Very nice performance improvement!
Will run some performance and other manual tests during the next pass.
Also, can be rebased now that #6116 is merged to fix the postgres itests.
Allows for pure deserialization without depending on a database connection.
4003cb1
to
0cca454
Compare
Yes, the problem for low-ish memory environments was that there was a huge initial spike in memory but then most of it could be garbage collected. So a lot of throw-away instances/allocations. |
I ran some tests (imported a mainnet channel graph into a regtest node with the help of #6149). Current master (ed511bb)Database open time: 2m 8s Memory profile:
This PR (0cca4549)Database open time: 3.3s Memory profile:
So while this PR massively decreases the time to initialize the graph cache (and also fixes the While I think we could still optimize the memory usage (by effectively making it possible to re-use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
|
||
err := kvdb.ForAll(edges, func(k, edgeBytes []byte) error { | ||
// Skip embedded buckets. | ||
if bytes.Equal(k, edgeIndexBucket) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking maybe a more future proof way to avoid iterating the embedded buckets is to check if the value is nil
. Not perfect either, but AFAIK there's no nil
valued key in this bucket other than sub buckets. The unknown policy is an empty slice ([]byte{}
) so that would work too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it explicit as a sanity check that there is nothing unexpected in this bucket. But you could also consider it less future proof to not skip over unexpected data.
|
||
// Skip ahead: | ||
// - LastUpdate (8 bytes) | ||
if _, err := r.Read(node.nodeScratch[:]); err != nil { | ||
return err | ||
if _, err := r.Read(nodeScratch[:]); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I digged into this a bit and I think many of the allocations might be just due to us using the io.Reader
interface here, since both Read
and ReadFull
do allocate. We may in the future want to switch to just passing around the byte slices to these deserialize functions and just copy out the important parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems related to: #4884
@guggero at what point exactly did you insert the memory profile write action? Haven't used the mem profiler much and trying to repro. [update] |
I extracted the memory profile through HTTP (using |
So you timed it somewhere during the cache population phase? As mentioned, I know embarrassing little about mem profiling, but will what you did get you the info you're looking for? |
I waited until the log said the cache was filled. Then I ran |
Does this PR also need #6136? Or will that one be what ultimately gets merged? |
Alternatively you could do something like this @joostjager:
Assuming Timing is somewhat tricky as Oliver mentions since it's based on that logging point. Alternatively you could rig lnd to shut down as soon as the graph is populated, and set the |
defer cancel() | ||
|
||
for rows.Next() { | ||
var key, value []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare at the top of this loop, with the assumption that the caller won't use the byte slices outside of the scope of the closure? A similar assumption exists w.r.t the way bbolt works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ForEach
, the bbolt docs say:
Please note that keys and values in ForEach() are only valid while the transaction is open. If you need to use a key or value outside of the transaction, you must use copy() to copy it to another byte slice.
So the scope for this is wider than just the callback?
|
||
// First, load all edges in memory indexed by node and channel | ||
// id. | ||
channelMap, err := c.getChannelMap(edges) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This impacts all other callers of this method (which ideally should just be hitting the main graph cache instead?) to optimize for only the start up case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no other callers besides graph cache population and DescribeGraph
which needs this optimization as well.
It can't hit the main graph cache, because that only contains a subset of the graph data needed for pathfinding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and DescribeGraph which needs this optimization as well.
FWIW we now have in-memory caching of the proto serialization now here.
|
||
// Skip ahead: | ||
// - LastUpdate (8 bytes) | ||
if _, err := r.Read(node.nodeScratch[:]); err != nil { | ||
return err | ||
if _, err := r.Read(nodeScratch[:]); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems related to: #4884
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hit wrong button...
Yeah this can just hit the channel graph cache instead. Prob worthy of spinning out into another issue. Alternatively it can used a bloom filter, not that big of a deal if we fetch some stuff we don't actually need. |
}, | ||
) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
err = g.ForEachChannel(func(info *ChannelEdgeInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment elsewhere: perhaps we just want to have a new private forEachChannelX
method here that skips the intermediate map and inserts directly into the cache?
Yes, this is what I did. I saved a mem profile right after graph population. But interestingly some allocations were already gone. Perhaps because the go compiler already knew that I wasn't going to use that data anymore even though the function hadn't returned yet. I guess what we really want to know here is peak allocations rather than a snap shot. |
In this commit, we modify the implementation of ForEachChannel to utilize the new kvdb method ForAll. This greatly reduces the number of round-trips to the database needed to iterate over all channels in the graph.
Allows cacheableNode to be used outside of the callback. This is a preparation for optimization of the graph cache population.
Use the optimized ForEachChannel method to reduce the graph cache loading time.
0cca454
to
352008a
Compare
We can cherry-pick it into this PR, however it's OK to separate them too. See #6111 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ☄️
|
||
// First, load all edges in memory indexed by node and channel | ||
// id. | ||
channelMap, err := c.getChannelMap(edges) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and DescribeGraph which needs this optimization as well.
FWIW we now have in-memory caching of the proto serialization now here.
Ah right, thanks for the reminder! Pushed the tag now. |
This PR adds
ForAll
tokvdb
to allow for efficient range queries. The graph cache loader is rewritten to take advantage of this new method, improving loading time on my machine by approx 150x.Fixes #6041
Fixes #6107