Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve worst-case performance of BTreeSet intersection v1 #58577

Closed
wants to merge 17 commits into from

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented Feb 19, 2019

Major performance boost when comparing tiny and huge sets. Probably with controversial changes and I sure have questions:

  • Names and places of functions and types
  • How many comments to write where
  • Why does rustc tell me to ref mut and ref matches on the iterator, while the book says ref is old school: fixed in code, should take it up to compiler people.
  • (Why) do I have to write out the clone like that (#[derive(Clone)] doesn't work): see improve worst-case performance of BTreeSet intersection v3 #59186 (comment)_
  • Am I allowed to use #[derive(Debug)] there at all? Yes, but the actual Debug.fmt implementation should supersede it.
  • I'd like to test function are_proportionate_for_intersection in test_intersection (or another test case next to it) itself, but I think the private function is inaccessible there. liballoc has other test cases not in the tests directory.

PS I don't list these questions to start a discussion here, just to inspire reviewers and to remember myself.

@rust-highfive
Copy link
Collaborator

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 19, 2019
@KodrAus
Copy link
Contributor

KodrAus commented Feb 21, 2019

Thanks @ssomers!

Would you be able to put together some benchmarks so we could get an idea of what the impact is for intersecting variously sized sets?

@ssomers
Copy link
Contributor Author

ssomers commented Feb 21, 2019

put together some benchmarks so we could get an idea of what the impact is for intersecting variously sized sets?

I'm not sure how to contrast the various implementations in the same build. But I could first PR a benchmark in src\liballoc\benches\btree using the existing implementation, then merge this here and anyone looking closely enough would see times improve.

@ssomers
Copy link
Contributor Author

ssomers commented Feb 21, 2019

Rough numbers on my system:

bset::tree ttest case ns/iter before this PR ns/iter after this PR
intersect_neg_vs_pos_100 500 =
intersect_neg_vs_pos_10_vs_10k 60 150
intersect_neg_vs_pos_10k 50,000 =
intersect_neg_vs_pos_10k_vs_10 50,000 400
intersect_pos_vs_neg_100 400 =
intersect_pos_vs_neg_10_vs_10k 40,000 150
intersect_pos_vs_neg_10k 40,000 =
intersect_pos_vs_neg_10k_vs_10 60 400
intersect_random_100 700 =
intersect_random_10_vs_10k 40,000 300
intersect_random_10k 200,000 =
intersect_random_10k_vs_10 50,000 300
intersect_staggered_100 700 =
intersect_staggered_10_vs_10k 80 150
intersect_staggered_10k 70,000 =
intersect_staggered_10k_vs_10 80 150

As you see, intersection of very asymmetric set (10 versus 10,000 elements) is mostly up, except if the set containing only numbers lower than any in the other set (neg = negative ints vs. pos=positive ints) is also a small set (i.e. the algorithm merely iterates over that small set and finishes).

The latter could be improved upon by limiting both sets to the range started with the smallest element of the other set. The whole algorithm could work like that: instead of iterating, find the next element >= the other set's lowest unseen value. But then sometimes/often the binary search towards the next element would be worse than iterating, in particular for the staggered test case.

@KodrAus
Copy link
Contributor

KodrAus commented Feb 21, 2019

Thanks for putting the numbers together @ssomers. That's much appreciated.

It looks like we aren't regressing in what I'd expect to be the common cases and smoothing out those worst cases is a really nice improvement.

@bors r+

@bors
Copy link
Contributor

bors commented Feb 21, 2019

📌 Commit 4fd0cc1 has been approved by KodrAus

@bors
Copy link
Contributor

bors commented Feb 21, 2019

🌲 The tree is currently closed for pull requests below priority 99, this pull request will be tested once the tree is reopened

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 21, 2019
Centril added a commit to Centril/rust that referenced this pull request Feb 22, 2019
…drAus

improve worst-case performance of BTreeSet intersection

Major performance boost when comparing tiny and huge sets. Probably with controversial changes and I sure have questions:

- Names and places of functions and types
- How many comments to write where
- Why does rustc tell me to `ref mut` and `ref` matches on the iterator, while the book says ref is old school.
- (Why) do I have to write out the clone like that (`#[derive(Clone)]` doesn't work)
- Am I allowed to use `#[derive(Debug)]` there at all?
- I'd like to test function `are_proportionate_for_intersection` in test_intersection (or another test case next to it) itself, but I think the private function is inaccessible there. liballoc has other test cases not in the tests directory.

PS I don't list these questions to start a discussion here, just to inspire reviewers and to remember myself.
Centril added a commit to Centril/rust that referenced this pull request Feb 22, 2019
…marks, r=KodrAus

introduce benchmarks of BTreeSet.intersection

16 tests combining 4 kinds of contents with different sizes exposing edge cases.
The ones with asymmetric sizes are addressed by rust-lang#58577.
The pos_vs_neg cases seems (are were meant to be) the same as the neg_vs_pos case (same thing, reverse order) but reality shows a surprsing 25% difference.
@ssomers
Copy link
Contributor Author

ssomers commented Feb 24, 2019

With the last commit added on top, rough numbers on my system are almost all down:

bset::tree ttest case ns/iter before this PR ns/iter after this PR
intersect_neg_vs_pos_100 500 60
intersect_neg_vs_pos_10_vs_10k 60 50
intersect_neg_vs_pos_10k 50,000 80
intersect_neg_vs_pos_10k_vs_10 50,000 400
intersect_pos_vs_neg_100 400 60
intersect_pos_vs_neg_10_vs_10k 40,000 50
intersect_pos_vs_neg_10k 40,000 90
intersect_pos_vs_neg_10k_vs_10 60 400
intersect_random_100 700 600
intersect_random_10_vs_10k 40,000 300
intersect_random_10k 200,000 170,000
intersect_random_10k_vs_10 50,000 300
intersect_staggered_100 700 =
intersect_staggered_10_vs_10k 80 170
intersect_staggered_10k 70,000 =
intersect_staggered_10k_vs_10 80 180

As to taking this further with "instead of iterating, find the next element >= the other set's lowest unseen value", no cigar: it slows down intersect_random_10k 2.5 times, and (absolute worst case) intersect_staggered_10k 14x.

@ssomers
Copy link
Contributor Author

ssomers commented Feb 25, 2019

More tweaks:

  • expose both intersection strategies as an unstable feature
  • benchmark both at the same time, along with the actual strategy chosen
  • seriously lower the threshold for the new strategy while still erring on the conservative side
  • quantify are_proportionate_for_intersection with static assertions instead of unit test

No more ideas at the moment so that's going to be the last batch, unless there are remarks.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:063bf03b:start=1551137061283394269,finish=1551137273491208861,duration=212207814592
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---

[00:03:32] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:99: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:100: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:102: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:103: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:105: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:106: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:108: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:109: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:111: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:112: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:114: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/benches/btree/set.rs:115: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/collections/btree/set.rs:373: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/collections/btree/set.rs:374: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/collections/btree/set.rs:389: line longer than 100 chars
[00:03:33] tidy error: /checkout/src/liballoc/collections/btree/set.rs:390: line longer than 100 chars
[00:03:34] some tidy checks failed
[00:03:34] 
[00:03:34] 
[00:03:34] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:34] 
[00:03:34] 
[00:03:34] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:34] Build completed unsuccessfully in 0:00:46
[00:03:34] Build completed unsuccessfully in 0:00:46
[00:03:34] make: *** [tidy] Error 1
[00:03:34] Makefile:68: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:1fa8d203
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Mon Feb 25 23:31:37 UTC 2019
---
travis_time:end:30e2118a:start=1551137498538054658,finish=1551137498542746470,duration=4691812
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0918cf09
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:164e2594
travis_time:start:164e2594
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0ed429a8
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@rust-highfive
Copy link
Collaborator

Your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:0082f334:start=1551175199493104002,finish=1551175272346776090,duration=72853672088
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=mingw-check
---
Attempting to download s3://rust-lang-ci-sccache2/docker/b5493eee5f15270054538ad6f259fd35cdcde8a43e617233529277e394d7185f136e6026d0c390061a3c94b487e75625efa873039c2128199e11805e10d47851
[00:00:15] Attempting with retry: curl -y 30 -Y 10 --connect-timeout 30 -f -L -C - -o /tmp/rustci_docker_cache https://s3-us-west-1.amazonaws.com/rust-lang-ci-sccache2/docker/b5493eee5f15270054538ad6f259fd35cdcde8a43e617233529277e394d7185f136e6026d0c390061a3c94b487e75625efa873039c2128199e11805e10d47851
[00:00:15]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[00:00:15]                                  Dload  Upload   Total   Spent    Left  Speed
No output has been received in the last 30m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@ssomers ssomers changed the title improve worst-case performance of BTreeSet intersection improve worst-case performance of BTreeSet intersection more Mar 9, 2019
bors added a commit that referenced this pull request Mar 13, 2019
improve worst-case performance of BTreeSet intersection

Alternative to [pull request #58577](#58577): back out of attempts to optimize using ranges, more elegant code (I think).

The stable public type Intersection changes from struct to enum. If that matters, then perhaps changing the fields like in the other proposal also mattered.
@ssomers ssomers changed the title improve worst-case performance of BTreeSet intersection more improve worst-case performance of BTreeSet intersection v1 Mar 14, 2019
@ssomers ssomers closed this Mar 21, 2019
@ssomers ssomers deleted the btreeset_intersection branch April 6, 2019 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants