Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trie, core/state: improve memory usage and performance #3135

Merged
merged 8 commits into from
Oct 14, 2016

Conversation

fjl
Copy link
Contributor

@fjl fjl commented Oct 14, 2016

This PR adds a new cache eviction strategy to the trie and improves the performance
of basic Get/Insert operations.

@mention-bot
Copy link

@fjl, thanks for your PR! By analyzing the history of the files in this pull request, we identified @karalabe, @zsfelfoldi and @obscuren to be potential reviewers.

@fjl fjl force-pushed the trie-perf branch 3 times, most recently from d5e3508 to 199c9a3 Compare October 14, 2016 07:10
This avoids memory copies when unwrapping node interface values.

name      old time/op  new time/op  delta
Get        388ns ± 8%   215ns ± 2%  -44.56%  (p=0.000 n=15+15)
GetDB      363ns ± 3%   202ns ± 2%  -44.21%  (p=0.000 n=15+15)
UpdateBE  1.57µs ± 2%  1.29µs ± 3%  -17.80%  (p=0.000 n=13+15)
UpdateLE  1.92µs ± 2%  1.61µs ± 2%  -16.25%  (p=0.000 n=14+14)
HashBE    2.16µs ± 6%  2.18µs ± 6%     ~     (p=0.436 n=15+15)
HashLE    7.43µs ± 3%  7.21µs ± 3%   -2.96%  (p=0.000 n=15+13)
Nodes decoded from a DB load kept hashes and values as sub-slices of
the DB value. This can be a problem because loading from leveldb often
returns []byte with a cap that's larger than necessary, increasing
memory usage.
@@ -43,6 +43,9 @@ const (
// is max uncle depth + 1.
maxTrieCacheLength = 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, would you mind renaming this? It's really confusing.

// canUnload tells whether a node can be unloaded.
func (n nodeFlag) canUnload(cachegen, cachelimit uint16) bool {
var dist uint16
if n.gen > cachegen {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if statement should not be here. Suppose n.gen = 65534; when cachegen = 65535, you calculate (correctly) cachegen - n.gen = 65535 - 65534 = 1. When cachegen = 0 you calculate (incorrectly) n.gen - cachegen = 65534 - 0 = 65534.

If instead you always compute cachegen - n.gen, the result in the last case is 0 - 65534 = -65534 = 2 mod 65536.

Are you able to add a test for integer overflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the extra check this morning because I got worried about overflow issues. It had n.gen - cachegen before. I'll add a test instead.

func (n shortNode) String() string { return n.fstring("") }
func (n hashNode) String() string { return n.fstring("") }
func (n valueNode) String() string { return n.fstring("") }
func (n *fullNode) String() string { return n.fstring("") }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a bit to the PR about why fullNodes and shortNodes are now pointers, but hashNodes and valueNodes still aren't? Are you sure this is an improvement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that both hash and value nodes are already byte slices. So they are effectively 1 x pointer + 2 x int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's the reason. I doubt that there'd be much of a difference with pointers to slices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about unintuitive differences in how things behave when some things are pointers and some are not, though. In general, pointer-izing the nodes in this PR seems like it introduces extra risk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see the benchmark (it's in the commit message).

if db == nil {
return hash, n, nil
}
if n.canUnload(h.cachegen, h.cachelimit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment somewhere explaining when cache eviction happens? If I'm understanding correctly, whenever a node is modified, we evaluate all its unmodified children for eviction, in effect. Meaning we could retain parts of the trie much longer than the cache policy implies, but as soon as a parent node is modified, we'll notice and discard any old nodes.

This also means that we could theoretically retain a node so long that the generation counter wraps around a 16 bit int and the node appears recent even though it's not. With 100 generation retention and a 16 bit int, this should happen at most 100/65536 = .15% of the time, which seems acceptable.

It's also an invariant, if I'm not mistaken, that the cache generation of a node is >= that of all its subnodes, meaning when a node expires, we can be certain all its subnodes are also expired and discardable; it'd be good to note that, too.

return n, original, err
collapsed.Val, cached.Val, err = h.hash(n.Val, db, false)
if err != nil {
return original, original, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the first argument change from n to original? Shouldn't it in this case be collapsed? In the original code n had it's Key compactEncode-ed, whereas the new code doesn't do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It doesn't really change semantics because this is the error case but probably needs to be fixed.

// Hash the full node's children, caching the newly hashed subtrees
cached := fullNode{dirty: n.dirty}
collapsed, cached := n.copy(), n.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it save a few copy ops if we didn't copy the entire n as in the original code into cache, rather just the few fields really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy is a shallow copy.

return n, original, err
collapsed.Children[i], cached.Children[i], err = h.hash(n.Children[i], db, false)
if err != nil {
return original, original, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics changed here too a bit. The old code was returning a copy of original as the first parameter and we're here returning the original itself (instead of the collapsed copy). It may be correct, but I'd curious to hear the rationale behind it.

key := compactDecode(kbuf)
if key[len(key)-1] == 16 {
// value node
val, _, err := rlp.SplitString(rest)
if err != nil {
return nil, fmt.Errorf("invalid value node: %v", err)
}
return shortNode{key, valueNode(val), hash, false}, nil
return &shortNode{key, append(valueNode{}, val...), flag}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure how Go behaves in this case. Are we sure it won't allocate more memory that needed if we append? In theory appending elements grows the underlying array exponentially. Maybe appending to an empty array could be a special case, but I can equally easily imagine that it will leave a lot of extra allocated but unused bytes at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK append to empty does the right thing.

} else {
dist = cachegen - n.gen
}
return !n.dirty && dist > cachelimit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious whether we want dist > cachelimit or dist >= cachelimit. Specifically thinking about the special case where cachelimit == 0 (I've seen it some places throughout the code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be >=. That way cachelimit zero discards all nodes on Commit every time.

return &SecureTrie{
trie: *trie,
}, nil
trie.SetCacheLimit(cachelimit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for having a separate SetCacheLimit method and not add it (cachelimit param) to the New (the same way it's part of NewSecure)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because most Trie instances don't need the cache. The idea is that caching is an optional feature of Trie but you must choose a cache size for SecureTrie. Does that make sense?

Copy link
Contributor

@Arachnid Arachnid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind renaming maxTrieCacheLength while you're working on this file?

dist = cachegen - n.gen
}
return !n.dirty && dist > cachelimit
return !n.dirty && cachegen-n.gen >= cachelimit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: spacing around -.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gofmt chooses the spacing

@karalabe karalabe merged commit 40cdcf1 into ethereum:develop Oct 14, 2016
@obscuren obscuren removed the review label Oct 14, 2016
karalabe pushed a commit to karalabe/go-ethereum that referenced this pull request Oct 14, 2016
…ce (ethereum#3135)

* trie: store nodes as pointers

This avoids memory copies when unwrapping node interface values.

name      old time/op  new time/op  delta
Get        388ns ± 8%   215ns ± 2%  -44.56%  (p=0.000 n=15+15)
GetDB      363ns ± 3%   202ns ± 2%  -44.21%  (p=0.000 n=15+15)
UpdateBE  1.57µs ± 2%  1.29µs ± 3%  -17.80%  (p=0.000 n=13+15)
UpdateLE  1.92µs ± 2%  1.61µs ± 2%  -16.25%  (p=0.000 n=14+14)
HashBE    2.16µs ± 6%  2.18µs ± 6%     ~     (p=0.436 n=15+15)
HashLE    7.43µs ± 3%  7.21µs ± 3%   -2.96%  (p=0.000 n=15+13)

* trie: close temporary databases in GetDB benchmark

* trie: don't keep []byte from DB load around

Nodes decoded from a DB load kept hashes and values as sub-slices of
the DB value. This can be a problem because loading from leveldb often
returns []byte with a cap that's larger than necessary, increasing
memory usage.

* trie: unload old cached nodes

* trie, core/state: use cache unloading for account trie

* trie: use explicit private flags (fixes Go 1.5 reflection issue).

* trie: fixup cachegen overflow at request of nick

* core/state: rename journal size constant

(cherry picked from commit 40cdcf1)
@rjl493456442
Copy link
Member

In this PR, "old" not will be evicted if it is old enough. But how can this approach improve the basic GET/INSERT method's performance? There can be redundant resolve from database between two get action, which make me really confused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants