Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-naive implementation of VecDeque.append #52553

Merged
merged 8 commits into from
Aug 18, 2018

Conversation

Pazzaz
Copy link
Contributor

@Pazzaz Pazzaz commented Jul 19, 2018

Replaces the old, simple implementation with a more manual (and unsafe 😱) one. I've added 1 more test and verified that it covers all 6 code paths in the function.

This new implementation was about 60% faster than the old naive one when I tried benchmarking it.

@rust-highfive
Copy link
Collaborator

r? @shepmaster

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 19, 2018
@shepmaster
Copy link
Member

when I tried benchmarking it.

Which benchmarks did you use? Are there some existing or did you write your own? If you wrote new ones, it seems like those should be added (but I'm not 100% sure on that).

Either way, it'd be good to see the benchmarks (and the raw data) to check that they are valid.

a.drain(a_pop_front..);
b.drain(..b_pop_back);
b.drain(b_pop_front..);
let checked = a.iter().chain(b.iter()).map(|&x| x).collect::<Vec<usize>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.map(|&x| x) -> .cloned()

@@ -928,6 +928,60 @@ fn test_append() {
assert_eq!(a.iter().cloned().collect::<Vec<_>>(), []);
}

#[test]
fn test_append_advanced() {
fn check(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check what? Tests are a form of documentation. All tests "check" something, that's rather their purpose.

@@ -928,6 +928,60 @@ fn test_append() {
assert_eq!(a.iter().cloned().collect::<Vec<_>>(), []);
}

#[test]
fn test_append_advanced() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"advanced" isn't really a great descriptor. What makes it advanced?

a_push_front: usize,
a_pop_front: usize,
b_push_front: usize,
b_pop_front: usize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of argument list is about at the top of the list for "easiest to pass the wrong arguments". There's a huge mass of arguments, subtly intertwined, all of the same type.

Instead, WDYT about

#[derive(Debug, Default)]
struct SomeUsefulName {
    a_push_back: usize,
    a_pop_back: usize,
    b_push_back: usize,
    b_pop_back: usize,
    a_push_front: usize,
    a_pop_front: usize,
    b_push_front: usize,
    b_pop_front: usize
}
check(SomeUsefulName { a_push, a_pop, b_push, b_pop, ..SomeUsefulName::default()});

assert_eq!(a, checked);
assert!(b.is_empty());
}
for a_push in 0..17 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 17. Preferably there's a constant with an evocative name.

Copy link
Member

@shepmaster shepmaster Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even 0..=SOME_CONSTANT if 16 happens to be the reason...


// When minimizing the amount of calls to `copy_part`, there are
// 6 different cases to handle. Whether src and/or dst wrap are 4
// combinations and there are 3 distinct cases when they both wrap.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this the first time, I assumed you mathed wrong. Having the number 4 and then the number 3 made me think that there would be 7 cases. On rereading it, I see my mistake. I don't have a suggestion for how to reword, but maybe someone will.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, what benefit does the math in this comment bring?

// 3 3 H 1 1 1 2
let src_2 = dst_before_wrap - src_before_wrap;
let dst_start_2 = dst_start_1 + src_before_wrap;
let src_3 = src_total - dst_before_wrap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the _1, _2, _3 suffixes. How do I know to use _1 vs _2? It's nice that there's a picture, but it's a shame that the code cannot stand on its own without it, especially knowing how code and comments frequently diverge.

self.extend(other.drain(..));
// Copy from src[i1..i1 + len] to dst[i2..i2 + len].
// Does not check if the ranges are valid.
unsafe fn copy_part<T>(i1: usize, i2: usize, len: usize, src: &[T], dst: &mut [T]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing why this part needs to be unsafe. Can't we build slices and use [T]::copy_from_slice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, because T doesn't necessarily implement Copy, of course. However, it feels odd to "reimplement" a half-slice on top of a slice due to the i1 / i2.

i1 and i2 are actually... offsets(?) into src and dst respectively? Those should be renamed to at least have src and dst in there.

// 6 different cases to handle. Whether src and/or dst wrap are 4
// combinations and there are 3 distinct cases when they both wrap.
// 6 = 3 + 1 + 1 + 1
match (src_wraps, dst_wraps) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure speculation, but I wonder if there could be a method to encapsulate this logic, something along the lines of

fn source_parts(&self) -> (&[T], &[T]); // might exist as `as_slices`
unsafe fn destination_parts(&mut self) -> (&mut [T], &mut [T]);

This code will still need to exist and handle the same cases to minimize the copies.

}
}
};
other.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, until this point we are in a Bad State where a given element actually exists twice, once in both collections, right? I believe that means the panic safety of this function needs to be documented to help prevent the next person to touching it from shooting themselves in the foot.

@shepmaster
Copy link
Member

and verified that it covers all 6 code paths in the function

If that's important, there should be comments in the test showcasing which part(s) of the test exercise which conditions to avoid losing that coverage.

@Pazzaz
Copy link
Contributor Author

Pazzaz commented Jul 23, 2018

OK, I've pushed some changes that address most things you commented on. Instead of working with offsets in append, slices are now used to make it easier to understand and less error-prone. (thanks for the idea with "destination parts")
The test was also reworked entirely to make it cover more cases.

Which benchmarks did you use?

This one. I didn't use the standard bench harness as the VecDeques have to be cloned in the loop and that would force the bench to measure .clone() too.

@pietroalbini
Copy link
Member

Ping from triage @shepmaster! This PR needs your review.

@shepmaster
Copy link
Member

Thanks for putting up with my feedback! Now that I've done what I can, I'll pass it off to a better member of the team to actually make a decision!

r? @SimonSapin

@SimonSapin
Copy link
Contributor

I’ll need more time than I have right now to properly review the code (especially because unsafe). But in the meantime: it’d be nice to have the benchmark included in the PR. Try adding it as-is (with a main() function and no #[bench] attribute) to src/liballoc/benches/vec_deque_append.rs, then in src/liballoc/Cargo.toml you should find this section:

[[bench]]
name = "collectionsbenches"
path = "../liballoc/benches/lib.rs"

Below it, add this:

[[bench]]
name = "vec_deque_append_bench"
path = "../liballoc/benches/vec_deque_append.rs"
harness = false

(harness = false is what makes the file be compiled as a "normal" executable rather than with the built in bench / test harness.)

Then you should be able to run it with something like ./x.py bench src/liballoc --test-args "--bench vec_deque_append"

@rust-lang/infra Is there any automation that parses the ouptut of ./x.py bench and would be thrown off by a non-standard-harness benchmark like this?

@Mark-Simulacrum
Copy link
Member

Nothing today parses the benchmark output.

@Pazzaz
Copy link
Contributor Author

Pazzaz commented Aug 10, 2018

@SimonSapin The benchmark has been added. The only changes I did were to make it more similar to the default bench harness (use the median instead of the mean and force the use of nanoseconds).

let (before, after) = buf.split_at_mut(head);
(after, before)
} else {
RingSlices::ring_slices(buf, tail, head)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth a comment explaining why using ring_slices with the arguments swapped provides the desired behavior.

// naive impl
self.extend(other.drain(..));
// Copies all values from `src_slice` to the start of `dst_slice`.
unsafe fn copy_whole_slice<T>(src_slice: &[T], dst_slice: &mut [T]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is like [T]::copy_from_slice but without T: Copy? Then it is up to callers to be careful to not "duplicate" items and have for example some heap memory be double-freed.

… in fact this code ends up calling other.clear() which will incorrectly drop the items that have just been copied. Instead it should probably do something like other.tail = 0; other.head = 0 or other.tail = other.head.

Consider modifying the test to have not just T = usize, but a type with a Drop impl that does something non-trivial, ideally detect double drops. Perhaps T = Box<usize> is enough, if you can verify that CI runs tests with ASAN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new test that checks for double drops similar to the Vec tests has been added.

// 6. `src` and `dst` are discontiguous
// + dst_high is the same size as src_high
let src_contiguous = src_low.is_empty();
let dst_contiguous = dst_high.len() >= src_total;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took me a while to figure out.

The wording of the comment above suggests that “dst is contiguous” is a property of dst alone (namely that the pair of empty/unused slices are in fact one), similar to “src is contiguous”.

However the definition of this dst_contiguous value is not that. Instead, it is “we will only need to write into a contiguous portion of dst”. This does not match the comment or the name of the variable. Consider rewording both.

// it is important we clear the old values from `other`...
other.clear();
// and that we update `head` as the last step, making the values accessible in `self`.
self.head = new_head;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With clear() replaced per above comment there is no need to delay the write to self.head, since there is no arbitrary destructor code being called, only shallow copies (moves). Inlining it in each 6 cases might be clearer. (The reset of other can also be moved before the copies.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assignment of self.head and the reset of other both need to be done after dst and src are retrieved and the borrow checker complains if I do the assignments while they are in scope as self and other are borrowed.

@bors
Copy link
Contributor

bors commented Aug 17, 2018

💔 Test failed - status-travis

@rust-highfive
Copy link
Collaborator

Your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 17, 2018
@kennytm
Copy link
Member

kennytm commented Aug 18, 2018

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 18, 2018
@bors
Copy link
Contributor

bors commented Aug 18, 2018

⌛ Testing commit b063bd4 with merge 4ceb23dfca0457fb399a72b3f7526b54a83b7476...

@bors
Copy link
Contributor

bors commented Aug 18, 2018

💔 Test failed - status-travis

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-aux of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[01:47:50] 
[01:47:50] testing https://github.com/BurntSushi/xsv
[01:47:51] Initialized empty Git repository in /checkout/obj/build/ct/xsv/.git/
[01:47:51] fatal: Could not parse object '66956b6bfd62d6ac767a6b6499c982eae20a2c9f'.
[01:48:11] fatal: unable to access 'https://github.com/BurntSushi/xsv/': Could not resolve host: github.com
[01:48:11] thread 'main' panicked at 'assertion failed: status.success()', tools/cargotest/main.rs:128:13
[01:48:11] 
[01:48:11] 
[01:48:11] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/cargotest" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "/checkout/obj/build/ct"
[01:48:11] expected success, got: exit code: 101
[01:48:11] expected success, got: exit code: 101
[01:48:11] 
[01:48:11] 
[01:48:11] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/test/pretty src/test/run-pass/pretty src/test/run-fail/pretty src/test/run-pass-valgrind/pretty src/test/run-pass-fulldeps/pretty src/test/run-fail-fulldeps/pretty src/tools/cargo src/tools/cargotest
[01:48:11] Build completed unsuccessfully in 0:45:19
[01:48:11] make: *** [check-aux] Error 1
[01:48:11] Makefile:60: recipe for target 'check-aux' failed

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:00497e4e
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
travis_time:end:1a5fffba:start=1534574918091265952,finish=1534574918097774891,duration=6508939
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0743de88
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:02594940
travis_time:start:02594940
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:14a46a05
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 18, 2018
@pietroalbini
Copy link
Member

@bors retry travis-ci/travis-ci#9696

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 18, 2018
@bors
Copy link
Contributor

bors commented Aug 18, 2018

⌛ Testing commit b063bd4 with merge d5b6b95...

bors added a commit that referenced this pull request Aug 18, 2018
Non-naive implementation of `VecDeque.append`

Replaces the old, simple implementation with a more manual (and **unsafe** 😱) one. I've added 1 more test and verified that it covers all 6 code paths in the function.

This new implementation was about 60% faster than the old naive one when I tried benchmarking it.
@bors
Copy link
Contributor

bors commented Aug 18, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: SimonSapin
Pushing d5b6b95 to master...

@bors bors merged commit b063bd4 into rust-lang:master Aug 18, 2018
MaloJaffre added a commit to MaloJaffre/rust that referenced this pull request Aug 22, 2018
…onSapin"

This partially reverts commit d5b6b95,
reversing changes made to 6b1ff19.

Fixes rust-lang#53529.
Cc: rust-lang#53564.
bors added a commit that referenced this pull request Aug 29, 2018
Reoptimize VecDeque::append

~Unfortunately, I don't know if these changes fix the unsoundness mentioned in #53529, so it is stil a WIP.
This is also completely untested.
The VecDeque code contains other unsound code: one example : [reading unitialized memory](https://play.rust-lang.org/?gist=6ff47551769af61fd8adc45c44010887&version=nightly&mode=release&edition=2015) (detected by MIRI), so I think this code will need a bigger refactor to make it clearer and safer.~

Note: this is based on #53571.
r? @SimonSapin
Cc: #53529 #52553 @yorickpeterse @jonas-schievink @Pazzaz @shepmaster.
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 15, 2021
Optimize VecDeque::append

Optimize `VecDeque::append` to do unsafe copy rather than iterating through each element.

On my `Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz`, the benchmark shows 37% improvements:
```
Master:
custom-bench vec_deque_append 583164 ns/iter
custom-bench vec_deque_append 550040 ns/iter

Patched:
custom-bench vec_deque_append 349204 ns/iter
custom-bench vec_deque_append 368164 ns/iter
```

Additional notes on the context: this is the third attempt to implement a non-trivial version of `VecDeque::append`, the last two are reverted due to unsoundness or regression, see:
- rust-lang#52553, reverted in rust-lang#53571
- rust-lang#53564, reverted in rust-lang#54851

Both cases are covered by existing tests.

Signed-off-by: tabokie <xy.tao@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants