Skip to content
This repository was archived by the owner on Nov 6, 2020. It is now read-only.

Migrate rlp from ElasticArray to SmallVec #1957

Closed
wants to merge 4 commits into from

Conversation

nipunn1313
Copy link
Contributor

This helps out with #1822.
Unfortunately, the smallvec::VecLike trait does not support a firsthand extend operation, so I had to replace those with push loops. It is slightly less ergonomic, but should hopefully compile away into the same code. We can likely upstream this improvement to smallvec's VecLike trait.

out.append_slice(slice);
fn to_small_vec(slice: &[u8]) -> SmallVec<[u8; 1024]> {
let mut out = SmallVec::new();
out.extend(slice.to_vec());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

favor slice.iter().cloned() here, it'll save a heap allocation in a relatively hot function.

@rphmeier rphmeier added the A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. label Aug 18, 2016
@rphmeier
Copy link
Contributor

rphmeier commented Aug 18, 2016

Thanks for doing this, @nipunn1313. Got a couple minor grumbles but mostly LGTM.
The push loops probably can't be optimized as well as an extend which would pre-emptively do the allocation but it probably won't hurt much at all.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) to 86.99% when pulling 7be4a76 on nipunn1313:rlp into d7c184b on ethcore:master.

@nipunn1313
Copy link
Contributor Author

Coolio. I fixed up some to_vec() and to to_owned() calls to be efficient/cleaner

@nipunn1313
Copy link
Contributor Author

Trying to get smallvec to support Extend here. Looks like the extend inside smallvec is currently implemented as a push loop, but I'm sure that could also be optimized in the future.

servo/rust-smallvec#27

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.004%) to 86.964% when pulling 2901834 on nipunn1313:rlp into 0e0cc20 on ethcore:master.

@rphmeier rphmeier added A8-looksgood 🦄 Pull request is reviewed well. and removed A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. labels Aug 19, 2016
@gavofyork
Copy link
Contributor

introducing this kind of complexity in our codebase is a retrograde.

-   self.bytes.insert_slice(pos, &res);
+   for (idx, &item) in res.iter().enumerate() {
+       self.bytes.insert(pos + idx, item);
+   }

if we are to go with SmallVec over ElasticArray, then appropriate traits should be implemented to avoid such low-level algorithms straying into high-level code.

@gavofyork gavofyork added A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. and removed A8-looksgood 🦄 Pull request is reviewed well. labels Aug 19, 2016
@nipunn1313
Copy link
Contributor Author

Hey @gavofyork. I'm working on getting extend() added to the VecLike trait, but insert_slice may be a bit trickier. Let me see what I can do.

@nipunn1313
Copy link
Contributor Author

servo/rust-smallvec#28 and servo/rust-smallvec#27 together should upgrade SmallVec to include the mechanisms we need (extend and insert_slice)

SmallVec just landed a change to implement Hash, so that handles the issue in kvdb, so this task seems quite tractable now.

@gavofyork gavofyork added A8-looksgood 🦄 Pull request is reviewed well. and removed A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. labels Aug 23, 2016
@gavofyork
Copy link
Contributor

code looks good, but fails tests.

@gavofyork gavofyork added A7-looksgoodtestsfail 🤖 Pull request is reviewed well, but cannot be merged due to tests failing. and removed A8-looksgood 🦄 Pull request is reviewed well. labels Aug 23, 2016
@nipunn1313
Copy link
Contributor Author

I do have a couple gripes that concern me before landing

  1. SmallVec does not support insert_many/insert_slice just yet
  2. SmallVec supports extend, but it doesn't use a memcpy like elasticarray. This is because SmallVec supports extending an iterable. SmallVec could support an optimized implementation for Vec or Slice if it used https://github.com/rust-lang/rfcs/blob/master/text/1210-impl-specialization.md, but specialization is an unstable feature right now.

I could probably get SmallVec to support InsertMany, but I think it'll be a little tougher to get support for the fast version.

That being said, I don't suspect that these code paths will get ruined by using the iterable implementation since it's typically copying <= 128 bytes at a time.

Want to point out these disadvantqges so we are aware of them.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 4a03f6e on nipunn1313:rlp into * on ethcore:master*.

@rphmeier rphmeier added the A8-looksgood 🦄 Pull request is reviewed well. label Aug 23, 2016
@rphmeier rphmeier removed the A7-looksgoodtestsfail 🤖 Pull request is reviewed well, but cannot be merged due to tests failing. label Aug 23, 2016
@rphmeier
Copy link
Contributor

Master:

running 7 tests
test bench_decode_nested_empty_lists ... bench:         537 ns/iter (+/- 12)
test bench_decode_u256_value         ... bench:         111 ns/iter (+/- 3)
test bench_decode_u64_value          ... bench:          50 ns/iter (+/- 2)
test bench_stream_1000_empty_lists   ... bench:      11,258 ns/iter (+/- 803)
test bench_stream_nested_empty_lists ... bench:         261 ns/iter (+/- 6)
test bench_stream_u256_value         ... bench:         565 ns/iter (+/- 12)
test bench_stream_u64_value          ... bench:         138 ns/iter (+/- 6)

test result: ok. 0 passed; 0 failed; 0 ignored; 7 measured

This PR:

running 7 tests
test bench_decode_nested_empty_lists ... bench:         533 ns/iter (+/- 8)
test bench_decode_u256_value         ... bench:         111 ns/iter (+/- 4)
test bench_decode_u64_value          ... bench:          49 ns/iter (+/- 1)
test bench_stream_1000_empty_lists   ... bench:       7,554 ns/iter (+/- 980)
test bench_stream_nested_empty_lists ... bench:         303 ns/iter (+/- 7)
test bench_stream_u256_value         ... bench:         655 ns/iter (+/- 20)
test bench_stream_u64_value          ... bench:         191 ns/iter (+/- 5)

test result: ok. 0 passed; 0 failed; 0 ignored; 7 measured

The empty lists case got a lot faster, but everything else got a bit slower. I think the insert_many optimization is probably the culprit. I'll throw together a crate for it.

@rphmeier
Copy link
Contributor

Made an InsertMany trait but the benchmarks remain the same. Probably not a problem in practice but it remains to be seen: https://github.com/rphmeier/insert_many

@gavofyork
Copy link
Contributor

@arkpar what do you think? bench_stream_u64_value got ~50% slower...

@gavofyork gavofyork added A0-pleasereview 🤓 Pull request needs code review. and removed A8-looksgood 🦄 Pull request is reviewed well. labels Aug 24, 2016
@nipunn1313
Copy link
Contributor Author

@rphmeier. I believe that elastic array uses a memcpy for extend, but currently SmallVec uses a (somewhat inefficient) series of ptr::write in a loop. I don't think your insert_many crate is using that optimization and it definitely doesn't address all the calls to Extend.

I set up a PR to use memcpy for extend and insert_many in SmallVec.
servo/rust-smallvec#29

@rphmeier
Copy link
Contributor

rphmeier commented Aug 24, 2016

The trie benches might be more indicative of practical effects (as they use RLP extensively). But they show a definite slowdown (even with insert_many).

master

$ rustup run nightly cargo bench -p ethcore-util --bench trie
running 13 tests
test sha3x10000                     ... bench:   6,555,022 ns/iter (+/- 110,064)
test trie_insertions_32_mir_1k      ... bench:   4,504,945 ns/iter (+/- 346,085)
test trie_insertions_32_ran_1k      ... bench:   4,503,505 ns/iter (+/- 250,628)
test trie_insertions_random_mid     ... bench:   3,910,607 ns/iter (+/- 228,881)
test trie_insertions_six_high       ... bench:   3,538,344 ns/iter (+/- 133,353)
test trie_insertions_six_low        ... bench:   8,409,129 ns/iter (+/- 416,024)
test trie_insertions_six_mid        ... bench:   5,216,624 ns/iter (+/- 307,361)
test triehash_insertions_32_mir_1k  ... bench:   1,939,029 ns/iter (+/- 26,818)
test triehash_insertions_32_ran_1k  ... bench:   1,943,441 ns/iter (+/- 322,224)
test triehash_insertions_random_mid ... bench:   1,208,044 ns/iter (+/- 31,692)
test triehash_insertions_six_high   ... bench:   1,386,069 ns/iter (+/- 24,725)
test triehash_insertions_six_low    ... bench:   1,969,671 ns/iter (+/- 37,346)
test triehash_insertions_six_mid    ... bench:   1,554,332 ns/iter (+/- 72,234)

this pr:

$ rustup run nightly cargo bench -p ethcore-util --bench trie
running 13 tests
test sha3x10000                     ... bench:   6,555,235 ns/iter (+/- 159,885)
test trie_insertions_32_mir_1k      ... bench:   4,951,804 ns/iter (+/- 566,059)
test trie_insertions_32_ran_1k      ... bench:   4,887,106 ns/iter (+/- 318,864)
test trie_insertions_random_mid     ... bench:   4,152,541 ns/iter (+/- 169,546)
test trie_insertions_six_high       ... bench:   3,779,139 ns/iter (+/- 192,456)
test trie_insertions_six_low        ... bench:   9,127,628 ns/iter (+/- 495,289)
test trie_insertions_six_mid        ... bench:   5,550,867 ns/iter (+/- 200,107)
test triehash_insertions_32_mir_1k  ... bench:   2,300,948 ns/iter (+/- 28,523)
test triehash_insertions_32_ran_1k  ... bench:   2,258,644 ns/iter (+/- 31,240)
test triehash_insertions_random_mid ... bench:   1,370,415 ns/iter (+/- 29,024)
test triehash_insertions_six_high   ... bench:   1,588,540 ns/iter (+/- 73,133)
test triehash_insertions_six_low    ... bench:   2,286,975 ns/iter (+/- 48,567)
test triehash_insertions_six_mid    ... bench:   1,788,652 ns/iter (+/- 28,400)

this pr with insert_many crate

$ rustup run nightly cargo bench -p ethcore-util --bench trie
running 13 tests
test sha3x10000                     ... bench:   6,541,748 ns/iter (+/- 248,270)
test trie_insertions_32_mir_1k      ... bench:   4,919,459 ns/iter (+/- 300,605)
test trie_insertions_32_ran_1k      ... bench:   4,864,872 ns/iter (+/- 251,946)
test trie_insertions_random_mid     ... bench:   4,087,234 ns/iter (+/- 160,874)
test trie_insertions_six_high       ... bench:   3,757,468 ns/iter (+/- 160,793)
test trie_insertions_six_low        ... bench:   8,852,989 ns/iter (+/- 455,819)
test trie_insertions_six_mid        ... bench:   5,462,281 ns/iter (+/- 156,619)
test triehash_insertions_32_mir_1k  ... bench:   2,286,986 ns/iter (+/- 21,361)
test triehash_insertions_32_ran_1k  ... bench:   2,249,132 ns/iter (+/- 74,144)
test triehash_insertions_random_mid ... bench:   1,376,235 ns/iter (+/- 23,817)
test triehash_insertions_six_high   ... bench:   1,592,853 ns/iter (+/- 63,619)
test triehash_insertions_six_low    ... bench:   2,318,934 ns/iter (+/- 44,270)
test triehash_insertions_six_mid    ... bench:   1,796,758 ns/iter (+/- 30,831)

@gavofyork gavofyork added A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. and removed A0-pleasereview 🤓 Pull request needs code review. labels Aug 24, 2016
@rphmeier
Copy link
Contributor

I've done a little more testing and I think what gives elastic_array the edge here is the optimization to resize and insert many items simultaneously. If you update your PR to smallvec to accomodate that, it should be competitive and safe.

@nipunn1313
Copy link
Contributor Author

@rphmeier Is that for extend or for insert_many operation?
I believe my PR does do that (servo/rust-smallvec#29). It calls reserve once and then does a ptr::copy.

Can you elaborate a little on which optimization you're referring to?
Is it specifically the fact that reserve might do one copy and then there's a follow up ptr::copy?

Seems like I should add some benchmarks to my PR.

@rphmeier
Copy link
Contributor

@nipunn No, the optimization I'm speaking of can be seen in elastic_array here: https://github.com/debris/elastic-array/blob/master/src/lib.rs#L118

reserve will copy over all the data from the array to the heap, and only then do you do the insert_many operation. However, it's more efficient to combine the move from the array to the allocation with the insert_many it is this edge which I believe leads to elastic_array being faster.

@rphmeier rphmeier added the M4-core ⛓ Core client code / Rust. label Sep 13, 2016
@gavofyork
Copy link
Contributor

closing due to inactivity - please reopen on further activity.

@gavofyork gavofyork closed this Sep 27, 2016
@rphmeier
Copy link
Contributor

rphmeier commented Sep 27, 2016

i've checked up on this every once in a while -- seems that we're waiting on the smallvec PRs before this can move forward. See servo/rust-smallvec#28 and servo/rust-smallvec#29

Note to @nipunn1313: in the meantime, RLP has moved to its own crate and your changes will need to be rebased against that. Thanks for undertaking this change, it seems to be a lot more work than initially expected :)

@nipunn1313
Copy link
Contributor Author

Yep sounds good. Will reopen when I make progress upstream!

--Nipunn

On Tue, Sep 27, 2016 at 3:20 AM Robert Habermeier notifications@github.com
wrote:

i've checked up on this every once in a while -- seems that we're waiting
on the smallvec PRs before this can move forward. See
servo/rust-smallvec#28 servo/rust-smallvec#28
and servo/rust-smallvec#29
servo/rust-smallvec#29


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ethcore/parity/pull/1957#issuecomment-249825389, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABPXoyUVFzPnnkTjzAET69-0JB05XYnWks5quO3pgaJpZM4Jn0q8
.

@rphmeier
Copy link
Contributor

@nipunn1313 Thanks for the herculean effort on getting servo/rust-smallvec#29 in. It should be possible to revive this effort again, if you're still interested.

@nipunn1313
Copy link
Contributor Author

Ah yeah. Perhaps I'll take a look at some point this week. Glad you were still following along!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants