-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC/DNM: profile-guided despecialization #2597
Conversation
This reduces specialization of numerous method arguments. The list was generated by automated tools and may not be particularly well-chosen, as it depends on the nature of the workload. This was generated by running the test suite, which is probably not representative of real-world workloads. The list is sorted from low-to-high inference time, so the ones towards the end are more important. This shaves about 200s off the time to run the tests (out of 700s total). Similar changes to Base & other dependencies would shave off more, though always with the risk of runtime penalty.
Thank you for looking into this. We have not resolved #2566 yet (other issues got a higher priority unfortunately). For me to understand when we have:
what does the Going into more detail (this probably is more informative to @nalimilan):
@timholy - do I get a correct impression that it is good to disable specialization for functions that we know that they do not do a lot of work and specialization should be performed for the functions that actually do the most of work? |
@timholy - as a second thought. Do you think it is feasible/would make sense to have some global "compilation aggressiveness" switch in DataFrames.jl. By default it would be set to "non-aggressive" (as - answering your question about typical workflows - typically in practice compilation takes a significant portion of time in comparison to the actual work for most functions in DataFrames.jl). Then the user could toggle "aggresive" mode if one knows one is going to work with very big data frames? |
Good questions & ideas!
So I don't overwrite any user
Absolutely
Yes, the test suite probably doesn't have enough demand on runtime to be very accurate. The sorting just means the total inference time. There is a threshold in my code that omits methods for which the runtime is at least 10% of the inference time, but if they're only processing small Dataframes in the tests then the runtime may be smaller than that.
Yes, this could be a template. You could have two different settings for Keep in mind that this won't help much for things that only need to be compiled once, unless that once fits a However, by reducing the specialization, you also make a single |
The most problematic case in DataFrames.jl is that users very often will use anonymous functions in top-level code. This is the use case that causes most problems (and hopefully can be helped with your changes). Here is a simple example on 0.22.2:
the issue is that |
diff --git a/src/abstractdataframe/abstractdataframe.jl b/src/abstractdataframe/abstractdataframe.jl
index 75703ca8..9ed1c227 100644
--- a/src/abstractdataframe/abstractdataframe.jl
+++ b/src/abstractdataframe/abstractdataframe.jl
@@ -991,12 +991,13 @@ julia> filter(AsTable(:) => nt -> nt.x == 1 || nt.y == "b", df)
3 │ 1 b
"""
-@inline function Base.filter(f, df::AbstractDataFrame; view::Bool=false)
+@inline function Base.filter(@nospecialize(f), df::AbstractDataFrame; @nospecialize(view::Bool=false))
rowidxs = _filter_helper(f, eachrow(df))
return view ? Base.view(df, rowidxs, :) : df[rowidxs, :]
end
-@inline function Base.filter((cols, f)::Pair, df::AbstractDataFrame; view::Bool=false)
+@inline function Base.filter(@nospecialize(colf::Pair), df::AbstractDataFrame; @nospecialize(view::Bool=false))
+ cols, f = colf
int_cols = index(df)[cols] # it will be AbstractVector{Int} or Int
if length(int_cols) == 0
rowidxs = [f() for _ in axes(df, 1)] reduces the cost by about 20ms for me, but the real costs are in Base.Broadcast. I might take a quick peek and see if I can improve that, but in the meantime is this kind of change something that's viable for you? The |
😮 - I would have never thought about it. I understand that the downside would be that the call to |
The net effect is quite subtle, and I still don't always predict correctly how it will work out. Here's an example: ulia> f(@nospecialize(::Integer)) = 1
f (generic function with 1 method)
julia> f(@nospecialize(::AbstractFloat)) = 2
f (generic function with 2 methods)
julia> g(c) = map(f, c)
g (generic function with 1 method)
julia> g([1,2])
2-element Vector{Int64}:
1
1
julia> using MethodAnalysis
julia> methodinstances(f) # surprise!
1-element Vector{Core.MethodInstance}:
MethodInstance for f(::Int64)
julia> g(AbstractFloat[1,2])
2-element Vector{Int64}:
2
2
julia> methodinstances(f) # probably more along the lines of what you expected
2-element Vector{Core.MethodInstance}:
MethodInstance for f(::Int64)
MethodInstance for f(::AbstractFloat)
julia> Base.return_types(g, (Vector{Int},))
1-element Vector{Any}:
Vector{Int64} (alias for Array{Int64, 1})
julia> Base.return_types(g, (Vector{AbstractFloat},))
1-element Vector{Any}:
Vector{Int64} (alias for Array{Int64, 1}) The last one comes back as |
And regarding broadcasting - I have noticed that moving from broadcasting to using a comprehension does not hurt the performance and often improves compilation - is this something that can be expected (we could re-work the internals to use broadcasting less then). Thank you! |
Yes, broadcasting codegen is pretty expensive. Currently that |
@timholy - we are mostly done with the functionality of DataFrames.jl for 1.0 release and I am going now to work on despecialization + splitting methods to reduce compile cost. I would appreciate if you commented on the following issues:
Thank you! |
@timholy - thank you for raising this issue. We are now manually despecializing either with Thank you! |
DO NOT MERGE
This is an experiment with a new tool I'm trying to develop, which I'm currently calling "profile-guided despecialization." I'm thinking of it as the converse of (or variant of) profile-guided optimization, a technique perhaps most famously used in the JavaScript compiler to dynamically decide which code deserves to be more aggressively compiled. Since Julia aggressively compiles almost everything, in Julia the "tweak" then must be to reduce specialization, and it should be a "static" (source-code level) manipulation since otherwise the specialization would happen before you ever run it.
This reduces specialization of numerous method arguments.
The list was generated by automated tools and may not be
particularly well-chosen, as it depends on the nature of the workload.
This was generated by running the test suite, which is probably
not representative of real-world workloads.
It shaves about 200s off the time to run the tests, out of ~700s total.
I'd be interested in feedback about whether this seems useful. Since this was built by running the test suite, the profile data (aka, runtime performance) was probably not typical of a real workload, so this probably despecializes more than would be ideal. One way to handle that would be to come up with a workload that is more reflective of the real-world tradeoffs between latency and runtime performance and rerun the tools. (I would need help from the DataFrames developers if we go that path.) Another might be to simply truncate the top part of the
despecialize.jl
file, since the entries are sorted in increasing order of inference cost.