Skip to content

Commit

Permalink
Auto merge of #6336 - Eh2406:im-rs, r=alexcrichton
Browse files Browse the repository at this point in the history
Persistent data structures by im-rs

There has been a long standing "TODO: We should switch to persistent hash maps if we can" in the resolver. This PR introduces a dependency on [im-rs](bodil/im-rs#26) to provide persistent hash maps. It then uses `im_rc::OrdSet` to store the priority list of dependencies, instead of `std::BinaryHeap`, as cloning the `Heap` was one of the most expensive things we did. In Big O terms these changes are very big wins, in practice the improvement is small. This is do to a number of factors like, `std::collections` are very fast, N is usually only in the hundreds, we only clone when the borrow checker insists on it, and we use `RC` and other tricks to avoid cloning.

I would advocate that we accept this PR, as:
- Things are only going to get more complicated in the future. It would be nice to have Big O on our side.
- When developing a new feature finding each place to add `RC` should be an afterthought not absolutely required before merge. (cc #6129 witch is stuck on this kind of per tick performance problem as well as other things).
- As the code gets more complex, making sure we never break any of the tricks becomes harder. It would be easier to work on this if such mistakes are marginal instead of show stoppers. (cc #5130 a bug report of one of these bing removed and affecting `./x.py build` enough to trigger an investigation).
- The resolver is already very complicated with a lot of moving parts to keep in your head at the same time, this will only get worse as we add  features. There are a long list of tricks that are *critical* before and `~5%`  after, it would be nice to consider if each one is worth the code complexity. (I can list some of the ones I have tried to remove but are still small wins, if that would help the discussion).

Overall, I think it is worth doing now, but can see that if is not a slam dunk.
  • Loading branch information
bors committed Nov 21, 2018
2 parents e192fdf + 1699a5b commit c93e847
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 25 deletions.
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ url = "1.1"
clap = "2.31.2"
unicode-width = "0.1.5"
openssl = { version = '0.10.11', optional = true }
im-rc = "12.1.0"

# A noop dependency that changes in the Rust repository, it's a bit of a hack.
# See the `src/tools/rustc-workspace-hack/README.md` file in `rust-lang/rust`
Expand Down
16 changes: 7 additions & 9 deletions src/cargo/core/resolver/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use core::interning::InternedString;
use core::{Dependency, FeatureValue, PackageId, SourceId, Summary};
use util::CargoResult;
use util::Graph;
use im_rc;

use super::errors::ActivateResult;
use super::types::{ConflictReason, DepInfo, GraphNode, Method, RcList, RegistryQueryer};
Expand All @@ -19,12 +20,9 @@ pub use super::resolve::Resolve;
// possible.
#[derive(Clone)]
pub struct Context {
// TODO: Both this and the two maps below are super expensive to clone. We should
// switch to persistent hash maps if we can at some point or otherwise
// make these much cheaper to clone in general.
pub activations: Activations,
pub resolve_features: HashMap<PackageId, Rc<HashSet<InternedString>>>,
pub links: HashMap<InternedString, PackageId>,
pub resolve_features: im_rc::HashMap<PackageId, Rc<HashSet<InternedString>>>,
pub links: im_rc::HashMap<InternedString, PackageId>,

// These are two cheaply-cloneable lists (O(1) clone) which are effectively
// hash maps but are built up as "construction lists". We'll iterate these
Expand All @@ -36,16 +34,16 @@ pub struct Context {
pub warnings: RcList<String>,
}

pub type Activations = HashMap<(InternedString, SourceId), Rc<Vec<Summary>>>;
pub type Activations = im_rc::HashMap<(InternedString, SourceId), Rc<Vec<Summary>>>;

impl Context {
pub fn new() -> Context {
Context {
resolve_graph: RcList::new(),
resolve_features: HashMap::new(),
links: HashMap::new(),
resolve_features: im_rc::HashMap::new(),
links: im_rc::HashMap::new(),
resolve_replacements: RcList::new(),
activations: HashMap::new(),
activations: im_rc::HashMap::new(),
warnings: RcList::new(),
}
}
Expand Down
46 changes: 30 additions & 16 deletions src/cargo/core/resolver/types.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
use std::cmp::Ordering;
use std::collections::{BinaryHeap, HashMap, HashSet};
use std::collections::{HashMap, HashSet};
use std::ops::Range;
use std::rc::Rc;
use std::time::{Duration, Instant};
Expand All @@ -9,6 +9,8 @@ use core::{Dependency, PackageId, PackageIdSpec, Registry, Summary};
use util::errors::CargoResult;
use util::Config;

use im_rc;

pub struct ResolverProgress {
ticks: u16,
start: Instant,
Expand Down Expand Up @@ -52,7 +54,11 @@ impl ResolverProgress {
// with all the algorithm improvements.
// If any of them are removed then it takes more than I am willing to measure.
// So lets fail the test fast if we have ben running for two long.
debug_assert!(self.ticks < 50_000);
debug_assert!(
self.ticks < 50_000,
"got to 50_000 ticks in {:?}",
self.start.elapsed()
);
// The largest test in our suite takes less then 30 sec
// with all the improvements to how fast a tick can go.
// If any of them are removed then it takes more than I am willing to measure.
Expand Down Expand Up @@ -299,48 +305,56 @@ impl Ord for DepsFrame {
fn cmp(&self, other: &DepsFrame) -> Ordering {
self.just_for_error_messages
.cmp(&other.just_for_error_messages)
.then_with(||
// the frame with the sibling that has the least number of candidates
// needs to get bubbled up to the top of the heap we use below, so
// reverse comparison here.
self.min_candidates().cmp(&other.min_candidates()).reverse())
.reverse()
.then_with(|| self.min_candidates().cmp(&other.min_candidates()))
}
}

/// Note that a `BinaryHeap` is used for the remaining dependencies that need
/// activation. This heap is sorted such that the "largest value" is the most
/// constrained dependency, or the one with the least candidates.
/// Note that a `OrdSet` is used for the remaining dependencies that need
/// activation. This set is sorted by how many candidates each dependency has.
///
/// This helps us get through super constrained portions of the dependency
/// graph quickly and hopefully lock down what later larger dependencies can
/// use (those with more candidates).
#[derive(Clone)]
pub struct RemainingDeps(BinaryHeap<DepsFrame>);
pub struct RemainingDeps {
/// a monotonic counter, increased for each new insertion.
time: u32,
/// the data is augmented by the insertion time.
/// This insures that no two items will cmp eq.
/// Forcing the OrdSet into a multi set.
data: im_rc::OrdSet<(DepsFrame, u32)>,
}

impl RemainingDeps {
pub fn new() -> RemainingDeps {
RemainingDeps(BinaryHeap::new())
RemainingDeps {
time: 0,
data: im_rc::OrdSet::new(),
}
}
pub fn push(&mut self, x: DepsFrame) {
self.0.push(x)
let insertion_time = self.time;
self.data.insert((x, insertion_time));
self.time += 1;
}
pub fn pop_most_constrained(&mut self) -> Option<(bool, (Summary, (usize, DepInfo)))> {
while let Some(mut deps_frame) = self.0.pop() {
while let Some((mut deps_frame, insertion_time)) = self.data.remove_min() {
let just_here_for_the_error_messages = deps_frame.just_for_error_messages;

// Figure out what our next dependency to activate is, and if nothing is
// listed then we're entirely done with this frame (yay!) and we can
// move on to the next frame.
if let Some(sibling) = deps_frame.remaining_siblings.next() {
let parent = Summary::clone(&deps_frame.parent);
self.0.push(deps_frame);
self.data.insert((deps_frame, insertion_time));
return Some((just_here_for_the_error_messages, (parent, sibling)));
}
}
None
}
pub fn iter(&mut self) -> impl Iterator<Item = (&PackageId, Dependency)> {
self.0.iter().flat_map(|other| other.flatten())
self.data.iter().flat_map(|(other, _)| other.flatten())
}
}

Expand Down
1 change: 1 addition & 0 deletions src/cargo/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ extern crate termcolor;
extern crate toml;
extern crate unicode_width;
extern crate url;
extern crate im_rc;

use std::fmt;

Expand Down

0 comments on commit c93e847

Please sign in to comment.