Description
openedon Jul 12, 2022
Originally reduced from a production application using the tower
crate - see @fasterthanlime's reduced repro repo for more background (though note that it exhibits several other failure modes as well)
trait Trait<'a> {
type A;
type B;
type C;
type D;
fn method() {}
}
impl<T> Trait<'_> for &'_ T
where
for<'x> T: Trait<'x, A = (), B = (), C = (), D = ()>,
{
type A = ();
type B = ();
type C = ();
type D = ();
}
impl Trait<'_> for () {
type A = ();
type B = ();
type C = ();
type D = ();
}
pub fn main() {
#[cfg(depth = "7")]
<&&&&&&&() as Trait>::method();
#[cfg(depth = "8")]
<&&&&&&&&() as Trait>::method();
#[cfg(depth = "9")]
<&&&&&&&&&() as Trait>::method();
#[cfg(depth = "10")]
<&&&&&&&&&&() as Trait>::method();
}
The above example currently takes an exponential amount of time to compile, based on the type depth:
$ curl -O https://gist.githubusercontent.com/eddyb/cd4221f14fff265280d135ddce5c9712/raw/a17cbf00af1894756d2b1bdd2e838cdd4db2bbe2/proj-exp.rs
$ command time -f 'took %Us' rustc proj-exp.rs --emit=metadata --cfg 'depth = "7"'
took 0.74s
$ command time -f 'took %Us' rustc proj-exp.rs --emit=metadata --cfg 'depth = "8"'
took 2.98s
$ command time -f 'took %Us' rustc proj-exp.rs --emit=metadata --cfg 'depth = "9"'
took 11.92s
$ command time -f 'took %Us' rustc proj-exp.rs --emit=metadata --cfg 'depth = "10"'
took 51.10s
With every extra type layer, the time increases by ~4x, and that aligns well with there being 4 associated types.
While this example is a bit silly, it doesn't take more than two associated types (both constrained at once) to cause the issue (although at higher depth or with a larger constant factor from having additional bounds).
And I'm guessing it might be one of the main causes for applications built with the tower
crate to experience long compile times (its main trait, Service
, has Response
and Error
as two associated types), in large part because that's the kind of codebase this was originally reduced from (as per the note at the top of this issue).
The reason I suspect caching is a combination of factors:
- instrumenting
evaluate_predicate_recursively
found an exponential ramp in terms of duplicates (i.e. the number of times each unique obligation shows up), and many of them wereProjectionPredicate
s- this was done via
-Z self-profile
, thoughRUSTC_LOG
might also work (and not require compiler changes)
- this was done via
- many of those
ProjectionPredicate
s' had a bound lifetime listed in theirBinder
- attempting to create a
ProjectionCacheKey
returnsNone
iff there are bound variables- the comment is a bit worrying, given that it e.g. talks about "escaping regions" but checks for bounds ones
- (as an aside,
ProjectionCacheKey
not holding aParamEnv
is probably risky long-term) - I've talked to @nikomatsakis and we're unsure if this is a historical artifact - my best guess is that it might have something to do with the fact that normalizing under binders didn't use to work (maybe a micro-opt to not even bother trying to cache those cases? were they that common?)
- replacing the
for<'x> T: Trait<'x, ...
withT: Trait<'static, ...
removes the exponential slowdown
So the next step was to to try always caching (warning: this is actually an unsound quick hack, ProjectionCacheKey
should be modified to carry a Binder<ProjectionTy>
instead of a ProjectionTy
):
diff --git a/compiler/rustc_trait_selection/src/traits/project.rs b/compiler/rustc_trait_selection/src/traits/project.rs
index b3e7fbb3578..5d5f2f67842 100644
--- a/compiler/rustc_trait_selection/src/traits/project.rs
+++ b/compiler/rustc_trait_selection/src/traits/project.rs
@@ -2123,7 +2123,7 @@ fn from_poly_projection_predicate(
let infcx = selcx.infcx();
// We don't do cross-snapshot caching of obligations with escaping regions,
// so there's no cache key to use
- predicate.no_bound_vars().map(|predicate| {
+ Some(predicate.skip_binder()).map(|predicate| {
ProjectionCacheKey::new(
// We don't attempt to match up with a specific type-variable state
// from a specific call to `opt_normalize_projection_type` - if
However, the above hack does not appear to help and we're unsure why - the main other kind of Predicate
that evaluate_predicate_recursively
appears to process is TraitPredicate
, which IIUC is almost always locally cached?
(EDIT: turns out that the caching is more complex than initially assumed and requires further changes - see #99188 (comment) for a success)
Then again, there are other reductions that don't involve ProjectionPredicate
and still cause their own exponential curves, so there might be several different caching issues.
I primarily wanted to open this issue on its own because there's definitely something weird with ProjectionCacheKey
(even if a proper fix might be way more involved than just it), and there's a clear correlation between the number of associated types (being constrained) and the exponential curve.
cc @rust-lang/types