Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvdb+channeld: extend kvdb with Prefetch for prefetching buckets in one go and speed up payment control by prefetching payments on hot paths #5640

Merged
merged 5 commits into from
Sep 20, 2021

Conversation

bhandras
Copy link
Collaborator

This PR is a reworked and cleaned up split from #5392. With this API extension kvdb drivers are able to add bucket prefetches to reduce the number of round trips later on in a given transaction. The PR adds this functionality to the etcd driver as well as demonstrates the speedup it provides on the flattened htlc attempts bucket.

The PR builds on: #5635

@@ -159,6 +176,8 @@ func TestFirstLastNextPrev(t *testing.T) {
require.NoError(t, err)

apply := func(stm STM) error {
stm.Prefetch(prefetchKeys, prefetchRange)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this add sufficient test coverage for the changes? Maybe scenarios like deleting prefetched keys, prefetching after a delete, overwritting prefetches, etc are also candidates for testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll add some more tests.

RBucket

// Prefetch will attempt to prefetch all values under a path.
Prefetch(paths ...[]string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could use a better explanation of what Prefetch really does from the caller point of view. It loads the given values/ranges in memory and holds them until they are requested via a Get in the same transaction? And probably also good to describe how this interacts with all the other bucket operations (Delete, CreateBucket, etc) and whether the data remains consistent.

An extension of the generic kvdb-level tests would be a useful safe-guard too here and helps with documenting the behavior as well.

I am also wondering how to implement prefetching for Postgres. The bucket interface as proposed here gives a lot of freedom to call Prefetch where ever one wants. I think the implementation can get quite complicated (and risky) if the prefetched keys need to be kept consistent.

Isn't there a more minimalistic alternative that works just as well for performance? One idea that I had is to convert Prefetch to something like GetMulti that directly returns all the buckets/data ranges that were fetched in an optimized way. Then it is clear that the data is just that and doesn't keep on living in a background cache that needs to be kept consistent.

Copy link
Collaborator Author

@bhandras bhandras Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetMulti could work, but in case where we also want to spare the roundtrips that come from just nesting into buckets how would that work semantically? In this case we prefetch based on the payment hash, fetching several bucket keys so whenever we nest we won't go to the DB in the same transaction.

Edit: also we do fetch all keys in the bucket of the payment and the htlc attempts bucket with all keys in one go. So how would we go around that? Return a nested map structure with all the key/values and buckets?

Edit (2): what about error cases, where the DB is corrupted, missing keys/buckets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a GetMulti that gets all keys in a bucket should be quite simple, but I don't know how much of the bottleneck will still be left because indeed the bucket also needs to be reached first. Could be worth testing.

For including the down traversal in the same transaction too, GetMulti might take as an input a tree of keys perhaps. In that case, can't you do the exact same thing that you do currently, but just hidden? So in the implementation of GetMulti for etcd, you'd internally call prefetch first followed by a (recursive) series of get calls to populate the result tree that you mention?

For postgres, GetMulti(tree) in a single tx would be an interesting challenge. Either magic with subqueries, or a change to the db structure to make it possible to retrieve deeply nested keys in one step (using the sha256 ids perhaps).

For errors, I'd think that just returning an error from GetMulti can be enough because currently we probably also don't do anything special when a condition happens that isn't supposed to happen ever? Missing keys is probably happy flow, but I think that can be signaled in the result tree.

If something like this can work, I think it is worth looking into. The Prefetch approach may work for client-side transactions, but I think it gets hard when connected to database backends that manage the transaction server-side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we actually need the Prefetch for Postgres? I recall having stable performance on Postgres where seemingly roundtrips did not bring any slowdowns over time?

Copy link
Collaborator Author

@bhandras bhandras Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding the Prefetch and GetMulti as well? Where GetMulti would simply work on the current bucket so we'd at least avoid the roundtrip overhead coming from the cursor?

Edit: it's a nice synergy in that Prefetch does help for etcd and GetMulti simply returns the prefetched items, whereas for Postgres GetMulti would just reduce rountrips for the current bucket.

Edit (2): Perhaps GetAll is better naming than GetMulti in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres doesn't need it for stability as far as I know, but reducing round-trips should surely increase performance. If the current gain for etcd can also be achieved with a slightly different interface that is compatible with server-side transactions, I think it is a win.

Has the root cause of the etcd slowdown been found in the mean time? Maybe the prefetching fixes the problem on some machines, but not on others. Or there are other unknown factors that could cause it to return again in nodes with a different life than the benchmark nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If just GetAll is sufficient for postgres, I'd think that it should also be sufficient for etcd? The round-trip itself likely is the most costly part and I'd think that round-trip counts should be comparable across backends.

Also it would probably make the delta of stm.go smaller and save time in adding sufficient test coverage for all the ways in which you can use Prefetch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this PR is moving to the final stages, but shouldn't the Prefetch API be compatible with database backends that support transactions server-side? @Roasbeef

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is that we have Prefetch as optional requirement (see kvdb/interface.go) and we could extend (in a separate PR) with GetAll as suggested. The reason is that Prefetch can make it possible to fetch a payment in one roundtrip while GetAll can optimize on ForEach calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If for optimal performance it is required to get a payment in one round trip (makes sense), how can a separate PR that adds GetAll deliver that? It still won't be a single round trip. Implementing Prefetch as it is currently proposed isn't feasible for server-side transaction-based systems in my opinion.

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass to understand the added logic to the STM code. I think I now understand most of it 😅 Nice work with this to reduce the number of round trips needed 🎉

kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Show resolved Hide resolved
@orijbot
Copy link

orijbot commented Sep 7, 2021

@Roasbeef Roasbeef modified the milestones: v0.15.0, v0.14.0 Sep 8, 2021
@Roasbeef
Copy link
Member

Roasbeef commented Sep 9, 2021

Can be rebased now with the dependent PR merged!

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed an initial pass, def need a few more to ensure I fully grok the proposed changes.

Commit wise, I think things may be easier to follow if one commits moves the existing slice of the read set into the b-tree, then subsequent commits introduce the explicit pre fetch, with the new methods being added in a final step.

kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Show resolved Hide resolved
kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Show resolved Hide resolved
kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Show resolved Hide resolved
kvdb/etcd/stm.go Outdated Show resolved Hide resolved
@@ -452,20 +652,8 @@ func (s *stm) Get(key string) ([]byte, error) {
return []byte(put.val), nil
}

// Populate read set if key is present in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a main difference here is that the prefetch set is declared up front rather than reduced from the initial read and write sets?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the original code is a fork of etcd's own STM implementation which has a separate prefetch set to optimize for when we read less on next retry (to reduce the number of cmp ops on commit).

This (that we read less upon retry) isn't very common in our case so I though it might be a good idea to reduce complexity and just get rid of the prefetch. From my tests it seems that we actually retry very-very rarely, so imho it's not a super important optimization to have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deduced*


fetchPrefixes := make([]string, 0, len(prefixes))
for _, prefix := range prefixes {
if s.rset.hasFullRange(prefix) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the read set has just been created yet and we haven't pre-fetched yet, won't this always be false? In that we wouldn't have pulled in any of these keys?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure that if the user calls Prefetch with already prefetched keys/ranges in the same transaction then we won't fetch them again. One such example is if we retry.

@bhandras bhandras force-pushed the kvdb-prefetch branch 2 times, most recently from 4a89b84 to 445cc15 Compare September 9, 2021 16:24
Copy link
Collaborator Author

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to all reviewers! As next steps I'll break commit structure up a bit more and will add more tests as Joost suggested.

kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Show resolved Hide resolved
kvdb/etcd/stm.go Outdated Show resolved Hide resolved
@@ -452,20 +652,8 @@ func (s *stm) Get(key string) ([]byte, error) {
return []byte(put.val), nil
}

// Populate read set if key is present in
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the original code is a fork of etcd's own STM implementation which has a separate prefetch set to optimize for when we read less on next retry (to reduce the number of cmp ops on commit).

This (that we read less upon retry) isn't very common in our case so I though it might be a good idea to reduce complexity and just get rid of the prefetch. From my tests it seems that we actually retry very-very rarely, so imho it's not a super important optimization to have.


fetchPrefixes := make([]string, 0, len(prefixes))
for _, prefix := range prefixes {
if s.rset.hasFullRange(prefix) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure that if the user calls Prefetch with already prefetched keys/ranges in the same transaction then we won't fetch them again. One such example is if we retry.

kvdb/etcd/stm.go Outdated Show resolved Hide resolved
kvdb/etcd/stm.go Show resolved Hide resolved
@@ -159,6 +176,8 @@ func TestFirstLastNextPrev(t *testing.T) {
require.NoError(t, err)

apply := func(stm STM) error {
stm.Prefetch(prefetchKeys, prefetchRange)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll add some more tests.

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, really looking forward to the performance improvement with this.

The etcd itest is still failing but with the known flake in the update_channel_policy test, which seems to not be affected by any changes in this PR.

kvdb/etcd/readwrite_tx.go Show resolved Hide resolved
kvdb/prefetch_test.go Outdated Show resolved Hide resolved
kvdb/prefetch_test.go Outdated Show resolved Hide resolved
kvdb/prefetch_test.go Show resolved Hide resolved
This commit adds more full range caching to the STM such that we'll
be able to prefetch keys and ranges on the kvdb level. This will allow
us to reduce roundtrips to etcd do fast iteration of small ranges.
This commit extends the kvdb interface in a backwards compatible way
such that we'll be able to prefetch all keys in a bucket in one go reducing the
number of roundtrips.
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💣

@guggero guggero merged commit 29a8661 into lightningnetwork:master Sep 20, 2021
@bhandras bhandras deleted the kvdb-prefetch branch September 12, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants