-
Notifications
You must be signed in to change notification settings - Fork 20.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trie, core/state: improve memory usage and performance #3135
Conversation
@fjl, thanks for your PR! By analyzing the history of the files in this pull request, we identified @karalabe, @zsfelfoldi and @obscuren to be potential reviewers. |
d5e3508
to
199c9a3
Compare
This avoids memory copies when unwrapping node interface values. name old time/op new time/op delta Get 388ns ± 8% 215ns ± 2% -44.56% (p=0.000 n=15+15) GetDB 363ns ± 3% 202ns ± 2% -44.21% (p=0.000 n=15+15) UpdateBE 1.57µs ± 2% 1.29µs ± 3% -17.80% (p=0.000 n=13+15) UpdateLE 1.92µs ± 2% 1.61µs ± 2% -16.25% (p=0.000 n=14+14) HashBE 2.16µs ± 6% 2.18µs ± 6% ~ (p=0.436 n=15+15) HashLE 7.43µs ± 3% 7.21µs ± 3% -2.96% (p=0.000 n=15+13)
Nodes decoded from a DB load kept hashes and values as sub-slices of the DB value. This can be a problem because loading from leveldb often returns []byte with a cap that's larger than necessary, increasing memory usage.
@@ -43,6 +43,9 @@ const ( | |||
// is max uncle depth + 1. | |||
maxTrieCacheLength = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at it, would you mind renaming this? It's really confusing.
// canUnload tells whether a node can be unloaded. | ||
func (n nodeFlag) canUnload(cachegen, cachelimit uint16) bool { | ||
var dist uint16 | ||
if n.gen > cachegen { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if statement should not be here. Suppose n.gen = 65534
; when cachegen = 65535
, you calculate (correctly) cachegen - n.gen = 65535 - 65534 = 1
. When cachegen = 0
you calculate (incorrectly) n.gen - cachegen = 65534 - 0 = 65534
.
If instead you always compute cachegen - n.gen
, the result in the last case is 0 - 65534 = -65534 = 2 mod 65536
.
Are you able to add a test for integer overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the extra check this morning because I got worried about overflow issues. It had n.gen - cachegen
before. I'll add a test instead.
func (n shortNode) String() string { return n.fstring("") } | ||
func (n hashNode) String() string { return n.fstring("") } | ||
func (n valueNode) String() string { return n.fstring("") } | ||
func (n *fullNode) String() string { return n.fstring("") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a bit to the PR about why fullNode
s and shortNode
s are now pointers, but hashNode
s and valueNode
s still aren't? Are you sure this is an improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that both hash and value nodes are already byte slices. So they are effectively 1 x pointer + 2 x int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's the reason. I doubt that there'd be much of a difference with pointers to slices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about unintuitive differences in how things behave when some things are pointers and some are not, though. In general, pointer-izing the nodes in this PR seems like it introduces extra risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see the benchmark (it's in the commit message).
if db == nil { | ||
return hash, n, nil | ||
} | ||
if n.canUnload(h.cachegen, h.cachelimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment somewhere explaining when cache eviction happens? If I'm understanding correctly, whenever a node is modified, we evaluate all its unmodified children for eviction, in effect. Meaning we could retain parts of the trie much longer than the cache policy implies, but as soon as a parent node is modified, we'll notice and discard any old nodes.
This also means that we could theoretically retain a node so long that the generation counter wraps around a 16 bit int and the node appears recent even though it's not. With 100 generation retention and a 16 bit int, this should happen at most 100/65536 = .15% of the time, which seems acceptable.
It's also an invariant, if I'm not mistaken, that the cache generation of a node is >= that of all its subnodes, meaning when a node expires, we can be certain all its subnodes are also expired and discardable; it'd be good to note that, too.
return n, original, err | ||
collapsed.Val, cached.Val, err = h.hash(n.Val, db, false) | ||
if err != nil { | ||
return original, original, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did the first argument change from n
to original
? Shouldn't it in this case be collapsed
? In the original code n
had it's Key compactEncode
-ed, whereas the new code doesn't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It doesn't really change semantics because this is the error case but probably needs to be fixed.
// Hash the full node's children, caching the newly hashed subtrees | ||
cached := fullNode{dirty: n.dirty} | ||
collapsed, cached := n.copy(), n.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it save a few copy ops if we didn't copy the entire n
as in the original code into cache
, rather just the few fields really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy
is a shallow copy.
return n, original, err | ||
collapsed.Children[i], cached.Children[i], err = h.hash(n.Children[i], db, false) | ||
if err != nil { | ||
return original, original, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics changed here too a bit. The old code was returning a copy of original
as the first parameter and we're here returning the original itself (instead of the collapsed
copy). It may be correct, but I'd curious to hear the rationale behind it.
key := compactDecode(kbuf) | ||
if key[len(key)-1] == 16 { | ||
// value node | ||
val, _, err := rlp.SplitString(rest) | ||
if err != nil { | ||
return nil, fmt.Errorf("invalid value node: %v", err) | ||
} | ||
return shortNode{key, valueNode(val), hash, false}, nil | ||
return &shortNode{key, append(valueNode{}, val...), flag}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure how Go behaves in this case. Are we sure it won't allocate more memory that needed if we append
? In theory appending elements grows the underlying array exponentially. Maybe appending to an empty array could be a special case, but I can equally easily imagine that it will leave a lot of extra allocated but unused bytes at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK append to empty does the right thing.
} else { | ||
dist = cachegen - n.gen | ||
} | ||
return !n.dirty && dist > cachelimit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious whether we want dist > cachelimit
or dist >= cachelimit
. Specifically thinking about the special case where cachelimit == 0 (I've seen it some places throughout the code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably be >=
. That way cachelimit zero discards all nodes on Commit every time.
return &SecureTrie{ | ||
trie: *trie, | ||
}, nil | ||
trie.SetCacheLimit(cachelimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for having a separate SetCacheLimit
method and not add it (cachelimit
param) to the New
(the same way it's part of NewSecure
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because most Trie instances don't need the cache. The idea is that caching is an optional feature of Trie but you must choose a cache size for SecureTrie. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind renaming maxTrieCacheLength
while you're working on this file?
dist = cachegen - n.gen | ||
} | ||
return !n.dirty && dist > cachelimit | ||
return !n.dirty && cachegen-n.gen >= cachelimit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: spacing around -
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gofmt chooses the spacing
…ce (ethereum#3135) * trie: store nodes as pointers This avoids memory copies when unwrapping node interface values. name old time/op new time/op delta Get 388ns ± 8% 215ns ± 2% -44.56% (p=0.000 n=15+15) GetDB 363ns ± 3% 202ns ± 2% -44.21% (p=0.000 n=15+15) UpdateBE 1.57µs ± 2% 1.29µs ± 3% -17.80% (p=0.000 n=13+15) UpdateLE 1.92µs ± 2% 1.61µs ± 2% -16.25% (p=0.000 n=14+14) HashBE 2.16µs ± 6% 2.18µs ± 6% ~ (p=0.436 n=15+15) HashLE 7.43µs ± 3% 7.21µs ± 3% -2.96% (p=0.000 n=15+13) * trie: close temporary databases in GetDB benchmark * trie: don't keep []byte from DB load around Nodes decoded from a DB load kept hashes and values as sub-slices of the DB value. This can be a problem because loading from leveldb often returns []byte with a cap that's larger than necessary, increasing memory usage. * trie: unload old cached nodes * trie, core/state: use cache unloading for account trie * trie: use explicit private flags (fixes Go 1.5 reflection issue). * trie: fixup cachegen overflow at request of nick * core/state: rename journal size constant (cherry picked from commit 40cdcf1)
In this PR, "old" not will be evicted if it is old enough. But how can this approach improve the basic GET/INSERT method's performance? There can be redundant resolve from database between two get action, which make me really confused |
This PR adds a new cache eviction strategy to the trie and improves the performance
of basic Get/Insert operations.