Skip to content

A grab bag of latency reductions #35714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 4, 2020
Merged

A grab bag of latency reductions #35714

merged 7 commits into from
May 4, 2020

Conversation

timholy
Copy link
Member

@timholy timholy commented May 3, 2020

Yesterday I learned about JL_DEBUG_METHOD_INVALIDATION and have spent some time analyzing the sources of invalidation-related latencies from loading Revise. Having Revise be slow to start up is not exactly what we want for a package designed to reduce "practical" latency, so this seemed like an important case to optimize. But I expect some of these changes will have more widespread impacts.

This is organized in a series of independent commits; for details, it's probably best to look at the commit message for each one. The bottom line is that this reduces the number of lines from julia --startup-file=no -e "using Revise" &> /tmp/invalidated.log, when compiled with JL_DEBUG_METHOD_INVALIDATION = 1, by more than 200 lines.

That may sound good (and it is!), but there's an even more awesome prize waiting (EDIT: fixed below): figure out how to prevent massive invalidation due to the logging framework (CC @c42f). Revise defines its own custom logger, and this seems to invalidate basically every method throughout Base and packages that uses a @debug, @info, @warn, or @error macro. That's really expensive, and I'm contemplating removing all the logging from Revise just to prevent this from happening, but it would be a sad loss.

I made a guess that it was catch_exception, see final commit, but this was not effective. The culprit really does appear to be shouldlog, and I'm surprised because ReviseLogger ends up last on its method table so I don't understand why this invalidates so much. ReviseLogger ends up first on the method table for min_enabled_level, so maybe that's the real problem? (I honestly don't understand why it sorts that way, it would seem that having equally-specific methods insert at the end of the method table makes the most sense and that seems to be what happens most of the time.)

With these changes and if I disable Revise's custom logger, I go from about 1300 lines of invalidation-output to around 100. That has a very noticeable positive effect on "snappiness" when using Revise. It sure would be nice to get there!

@timholy timholy added the latency Latency label May 3, 2020
@timholy
Copy link
Member Author

timholy commented May 3, 2020

I also stole a couple of commits from the https://github.com/JuliaLang/julia/tree/kc/stale_precompile branch. (Thanks @KristofferC!)

@timholy
Copy link
Member Author

timholy commented May 3, 2020

Build failure is just for docs (I didn't try that locally). I might wait to see if anybody has any insight about the logging issues first before I decide to tackle this further. Sure would be nice to see this in 1.5, of course, so if needed I can back that last commit out and do surgery on Revise.

timholy added a commit to JuliaLang/Pkg.jl that referenced this pull request May 3, 2020
This is a companion PR to JuliaLang/julia#35714
and accounts for a subset of the gains reported there.
The more specific implementations of `==` and `hash` avoid a `collect`;
that branch prevents many `collect`-induced invalidations via the
`iteratorsize` trick, but this change is probably a good idea anyway.

The more important and subtle one is the one in TOML._print.
This is used by a large number of methods in Pkg, so it's important
to prevent it from being invalidated.
Do to nospecialize markers, this method gets inferred for partially-
specified types, and the `a` is typically just `AbstractDict`.
However, `Iterators.Pairs` is also an AbstractDict, and the return
type of `keys` cannot be inferred from the parameters of
`AbstractDict{K,V}` because the `itr` field of a pair uses a later
parameter.

This short-circuits the whole problem by "hand-inlining" the result
of `invoke(keys, Tuple{AbstractDict})` at the call site, i.e.,
just creating a `KeySet` directly.
@timholy
Copy link
Member Author

timholy commented May 3, 2020

Forgot there was an important companion I needed to push: JuliaLang/Pkg.jl#1806. That TOML._print one-line change accounts for a large subset of the gains reported so far.

@timholy
Copy link
Member Author

timholy commented May 3, 2020

OK, so the difference in method sorting mentioned above seems to be that min_enabled_level gets classified isleafsig and the rest do not. I do not think that min_enabled_level is a big contributor to the problem. The bigger issue is that prior to insertion shouldlog has 91 direct backedges and many more indirect backedges. Curiously, handle_message (which appears in the same block of code) does not have any backedges.

My guess is that this occurs because shouldlog is simple and gets inlined via union-splitting. I'll try rebuilding with all such methods annotated @noinline and see what happens.

@@ -502,7 +513,7 @@ with_logger(logger) do
end
```
"""
with_logger(f::Function, logger::AbstractLogger) = with_logstate(f, LogState(logger))
with_logger(@nospecialize(f::Function), logger::AbstractLogger) = with_logstate(f, LogState(logger))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not expect this to make a difference; it doesn't affect inference and we will already avoid specializing this function on f.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it doesn't seem to change anything.

"""
copyto!(dest, src)

function copyto!(dest::AbstractArray{T1,N}, src::AbstractArray{T2,N}) where {T1,T2,N}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method really not needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure, for now this is just a guess. Have not yet run tests or benchmarks, to me that will be the proof of the pudding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a good one: on master,

julia> a = [1 2; 3 4]
2×2 Array{Int64,2}:
 1  2
 3  4

julia> b = [3, 1, 4, 2]
4-element Array{Int64,1}:
 3
 1
 4
 2

julia> copyto!(view(a, b), a)
4-element view(::Array{Int64,1}, [3, 1, 4, 2]) with eltype Int64:
 1
 3
 1
 1

vs

julia> copyto!(view(a, b), copy(a))
4-element view(::Array{Int64,1}, [3, 1, 4, 2]) with eltype Int64:
 1
 3
 2
 4

I'm hoping to fix it in this PR, or will report it if not.

KristofferC pushed a commit to JuliaLang/Pkg.jl that referenced this pull request May 3, 2020
This is a companion PR to JuliaLang/julia#35714
and accounts for a subset of the gains reported there.
The more specific implementations of `==` and `hash` avoid a `collect`;
that branch prevents many `collect`-induced invalidations via the
`iteratorsize` trick, but this change is probably a good idea anyway.

The more important and subtle one is the one in TOML._print.
This is used by a large number of methods in Pkg, so it's important
to prevent it from being invalidated.
Do to nospecialize markers, this method gets inferred for partially-
specified types, and the `a` is typically just `AbstractDict`.
However, `Iterators.Pairs` is also an AbstractDict, and the return
type of `keys` cannot be inferred from the parameters of
`AbstractDict{K,V}` because the `itr` field of a pair uses a later
parameter.

This short-circuits the whole problem by "hand-inlining" the result
of `invoke(keys, Tuple{AbstractDict})` at the call site, i.e.,
just creating a `KeySet` directly.
@timholy timholy force-pushed the teh/less_invalidation branch from c341326 to 7a1bf8c Compare May 3, 2020 20:37
@timholy
Copy link
Member Author

timholy commented May 3, 2020

OK, I seem to have discovered an exit on the logging framework: the combination of @nospecialize and invoke seems to force runtime dispatch. For maximum effect I also need to modify this line in Pkg to call _invoked_min_enabled_level, but I'll wait until I know whether this is a good idea before pushing that. To see the cumulative impact, see where we started and where this ended up: an invalidation log that went from 1384 lines down to 88.

I've not yet run timing comparisons on anything meaningful to see what this actually adds up to in practice; it's entirely possible that the improvement will be modest. (It may be complicated because it will also depend on what other packages get loaded.) It's also quite possible this will cause dreadful runtime regressions. If this passes tests, let's run our benchmarks and find out---to me it seems that the experiment is well worth doing.

@timholy
Copy link
Member Author

timholy commented May 3, 2020

OK, this is cool. using Revise by itself is basically no faster (and a heck of a lot slower than I wish it were). But while working on this I noticed that a huge number of Pkg methods get invalidated, and so that prompted another way to look for possible benefits. On 1.4, if I say using Revise followed by (@v1.4) pkg> up there's a lag of more than 4 seconds before I see any text output. On this, it's instantaneous. Woot!

@c42f
Copy link
Member

c42f commented May 4, 2020

I've been vaguely wondering about how backedges interact with the logging system for a while. I have noticed that installing a new logger causes a lot more recompilation than you might expect or want, but I sure didn't realize that shouldlog was inlined via union splitting. That's both amazing and terrifying! Amazing because it means we could have better performance than I'd imagined, and terrifying in the amount of method invalidation it could cause.

It's also known that setting the global logger doesn't really work well with the world age - see #33865 . That would suggest, perhaps, we should have an invokelatest barrier somewhere, but I'm not exactly sure where yet.

So anyway, I'm cautiously positive about just breaking some backedges here. The logging system needs some other refactoring to fix (runtime) performance issues as well so it's probably a fair bit slower than it could be already.

timholy and others added 7 commits May 4, 2020 05:16
When the eltype and/or dimensionality is unspecified, there were
2 or 3 methods that might apply here (see
`code_typed(copyto!, (Array, Array))`). Consequently partially-specified
codegen tends to be poor for this operation.

This harmonizes our implementations and eliminates eltype and dimensionality
as a consideration for the dispatch. By default, it dispatches on
`IndexStyle(src)` after a call to `unalias`, which makes sense because
copying an array can change its preferred `IndexStyle`.
It also fixes an aliasing bug.
CodeTracking.jl (a foundation for Revise & JuliaInterpreter)
defines additional `PkgId` methods; leaving this
untyped leads to invalidation of the entire code-loading path.
This aims to reduce disruption when adding methods to
`convert`  and `copymutable`.
@timholy
Copy link
Member Author

timholy commented May 4, 2020

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@timholy
Copy link
Member Author

timholy commented May 4, 2020

Well, that's rather unremarkable (which is what I was hoping for). @JeffBezanson, let me know if there's anything objectionable here before merging.

@timholy timholy merged commit 84d7e67 into master May 4, 2020
@timholy timholy deleted the teh/less_invalidation branch May 4, 2020 21:00
@c42f
Copy link
Member

c42f commented May 5, 2020

Well, that's rather unremarkable

Well I doubt we have any logging performance tests ;-) But anyway I feel like this is probably a good usability tradeoff for now.

In the future we can do some proper performance benchmarking and optimization of the logging system.

KristofferC pushed a commit to JuliaLang/Pkg.jl that referenced this pull request May 10, 2020
This is a companion PR to JuliaLang/julia#35714
and accounts for a subset of the gains reported there.
The more specific implementations of `==` and `hash` avoid a `collect`;
that branch prevents many `collect`-induced invalidations via the
`iteratorsize` trick, but this change is probably a good idea anyway.

The more important and subtle one is the one in TOML._print.
This is used by a large number of methods in Pkg, so it's important
to prevent it from being invalidated.
Do to nospecialize markers, this method gets inferred for partially-
specified types, and the `a` is typically just `AbstractDict`.
However, `Iterators.Pairs` is also an AbstractDict, and the return
type of `keys` cannot be inferred from the parameters of
`AbstractDict{K,V}` because the `itr` field of a pair uses a later
parameter.

This short-circuits the whole problem by "hand-inlining" the result
of `invoke(keys, Tuple{AbstractDict})` at the call site, i.e.,
just creating a `KeySet` directly.

(cherry picked from commit d30adaf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
latency Latency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants