Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more type-stable type-inference #41697

Merged
merged 1 commit into from
Jul 29, 2021
Merged

more type-stable type-inference #41697

merged 1 commit into from
Jul 29, 2021

Conversation

aviatesk
Copy link
Member

(this PR is the final output of my demo at our workshop)

This PR eliminated much of runtime dispatches within our type inference
routine, that are reported by the following JET analysis:

using JETTest

const CC = Core.Compiler

function function_filter(@nospecialize(ft))
    ft === typeof(CC.isprimitivetype) && return false
    ft === typeof(CC.ismutabletype) && return false
    ft === typeof(CC.isbitstype) && return false
    ft === typeof(CC.widenconst) && return false
    ft === typeof(CC.widenconditional) && return false
    ft === typeof(CC.widenwrappedconditional) && return false
    ft === typeof(CC.maybe_extract_const_bool) && return false
    ft === typeof(CC.ignorelimited) && return false
    return true
end

function frame_filter((; linfo) = sv)
    meth = linfo.def
    isa(meth, Method) || return true
    return occursin("compiler/", string(meth.file))
end

report_dispatch(CC.typeinf, (CC.NativeInterpreter, CC.InferenceState); function_filter, frame_filter)

on master

═════ 137 possible errors found ═════
...

on this PR

═════ 51 possible errors found ═════
...

And it seems like this PR makes JIT slightly faster:

on master

~/julia/julia master
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));'
  3.659865 seconds (7.19 M allocations: 497.982 MiB, 3.94% gc time, 0.39% compilation time)
  2.696410 seconds (3.62 M allocations: 202.905 MiB, 7.49% gc time, 56.39% compilation time)

on this PR

~/julia/julia avi/jetdemo* 7s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));'
  3.396974 seconds (7.16 M allocations: 491.442 MiB, 4.80% gc time, 0.28% compilation time)
  2.591130 seconds (3.48 M allocations: 196.026 MiB, 7.29% gc time, 56.72% compilation time)

(this PR is the final output of my demo at [our workshop](https://github.com/aviatesk/juliacon2021-workshop-pkgdev))

This PR eliminated much of runtime dispatches within our type inference
routine, that are reported by the following JET analysis:
```julia
using JETTest

const CC = Core.Compiler

function function_filter(@nospecialize(ft))
    ft === typeof(CC.isprimitivetype) && return false
    ft === typeof(CC.ismutabletype) && return false
    ft === typeof(CC.isbitstype) && return false
    ft === typeof(CC.widenconst) && return false
    ft === typeof(CC.widenconditional) && return false
    ft === typeof(CC.widenwrappedconditional) && return false
    ft === typeof(CC.maybe_extract_const_bool) && return false
    ft === typeof(CC.ignorelimited) && return false
    return true
end

function frame_filter((; linfo) = sv)
    meth = linfo.def
    isa(meth, Method) || return true
    return occursin("compiler/", string(meth.file))
end

report_dispatch(CC.typeinf, (CC.NativeInterpreter, CC.InferenceState); function_filter, frame_filter)
```

> on master
```
═════ 137 possible errors found ═════
...
```
> on this PR
```
═════ 51 possible errors found ═════
...
```

And it seems like this PR makes JIT slightly faster:
> on master
```julia
~/julia/julia master
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));'
  3.659865 seconds (7.19 M allocations: 497.982 MiB, 3.94% gc time, 0.39% compilation time)
  2.696410 seconds (3.62 M allocations: 202.905 MiB, 7.49% gc time, 56.39% compilation time)
```
> on this PR
```julia
~/julia/julia avi/jetdemo* 7s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));'
  3.396974 seconds (7.16 M allocations: 491.442 MiB, 4.80% gc time, 0.28% compilation time)
  2.591130 seconds (3.48 M allocations: 196.026 MiB, 7.29% gc time, 56.72% compilation time)
```
@@ -85,9 +85,12 @@ function abstract_call_gf_by_type(interp::AbstractInterpreter, @nospecialize(f),
push!(edges, edge)
end
this_argtypes = isa(matches, MethodMatches) ? argtypes : matches.applicable_argtypes[i]
const_rt, const_result = abstract_call_method_with_const_args(interp, result, f, this_argtypes, match, sv, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is less type stable, so it seems harder to optimize. Why is is better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is our optimizer will emit a bit complex code when we're splatting union-type tuple:

julia> code_typed((Union{Tuple{Int,Nothing}, Tuple{Integer,String}},)) do tpl
           if getfield(tpl, 2) !== nothing
               a, b = tpl
               return a, b
           end
           return a, b
       end |> first
CodeInfo(
1%1  = Main.getfield(tpl, 2)::Union{Nothing, String}%2  = (%1 === Main.nothing)::Bool%3  = Core.Intrinsics.not_int(%2)::Bool
└──       goto #9 if not %3
2%5  = (isa)(tpl, Tuple{Int64, Nothing})::Bool
└──       goto #4 if not %5
3%7  = π (tpl, Tuple{Int64, Nothing})
│   %8  = Base.getfield(%7, 1)::Union{Nothing, Int64}%9  = Base.add_int(1, 1)::Int64%10 = Core.tuple(%8, %9)::Tuple{Union{Nothing, Int64}, Int64}
└──       goto #5
4%12 = Base.indexed_iterate(tpl, 1)::Tuple{Integer, Int64}
└──       goto #5
5%14 = φ (#3 => %10, #4 => %12)::Tuple{Integer, Int64}%15 = Core.getfield(%14, 1)::Integer%16 = (isa)(tpl, Tuple{Int64, Nothing})::Bool
└──       goto #7 if not %16
6%18 = π (tpl, Tuple{Int64, Nothing})
│   %19 = Base.getfield(%18, 2)::Union{Nothing, Int64}%20 = Base.add_int(2, 1)::Int64%21 = Core.tuple(%19, %20)::Tuple{Union{Nothing, Int64}, Int64}
└──       goto #8
7%23 = Base.indexed_iterate(tpl, 2, 2)::Union{Tuple{Nothing, Int64}, Tuple{String, Int64}}
└──       goto #8
8%25 = φ (#6 => %21, #7 => %23)::Union{Tuple{Nothing, Int64}, Tuple{String, Int64}}%26 = Core.getfield(%25, 1)::Union{Nothing, String}%27 = Core.tuple(%15, %26)::Tuple{Integer, Union{Nothing, String}}
└──       return %27
9$(Expr(:throw_undef_if_not, :a, false))::Any
└──       unreachable
) => Tuple{Integer, Union{Nothing, String}}

julia> code_typed((Union{Nothing, Tuple{Integer,String}},)) do tpl
           if tpl !== nothing
               a, b = tpl
               return a, b
           end
           return nothing
       end |> first
CodeInfo(
1%1 = (tpl === Main.nothing)::Bool%2 = Core.Intrinsics.not_int(%1)::Bool
└──      goto #3 if not %2
2%4 = π (tpl, Tuple{Integer, String})
│   %5 = Base.getfield(%4, 1)::Integer%6 = π (tpl, Tuple{Integer, String})
│   %7 = Base.getfield(%6, 2)::String%8 = Core.tuple(%5, %7)::Tuple{Integer, String}
└──      return %8
3return Main.nothing
) => Union{Nothing, Tuple{Integer, String}}

So we can get a performance improvement if we branch on isnothing and avoid union-tuple:

julia> using BenchmarkTools

julia> function f1(ts)
           local a, b
           for t in ts
               if getfield(t, 2) !== nothing
                   a, b = t
               end
           end
           return a, b
       end
f1 (generic function with 1 method)

julia> function f2(ts)
           local a, b
           for t in ts
               if t !== nothing
                   a, b = t
               end
           end
           return a, b
       end
f2 (generic function with 1 method)

julia> let ts = subtypes(Any)
           ts1 = Union{Tuple{DataType,Nothing},Tuple{Any,Char}}[]
           ts2 = Union{Nothing,Tuple{Any,Char}}[]
           for _ in 1:1000
               if rand(Bool)
                   push!(ts1, (Any, nothing)); push!(ts2, nothing)
               else
                   t = rand(ts), rand(Char)
                   push!(ts1, t); push!(ts2, t)
               end
           end
           display(@benchmark(f1($ts1)))
           
           ts2 = Union{Nothing,Tuple{Any,Char}}[
               rand(Bool) ? nothing : (rand(ts), rand(Char)) for _ in 1:1000]
           display(@benchmark(f2($ts2)))
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  123.927 μs    4.081 ms  ┊ GC (min  max): 0.00%  95.08%
 Time  (median):     130.366 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   150.662 μs ± 107.727 μs  ┊ GC (mean ± σ):  2.69% ±  3.84%

  ▇█▇▅▄▃▂▁▂▂▂▂▂▂▁▁ ▁▂▂▁▁▁    ▁                                  ▂
  █████████████████████████████████▆█▇█▇▆▇▇▇▆▇▇▆▆▆▆▆▇▅▆▅▅▅▅▅▄▅▅ █
  124 μs        Histogram: log(frequency) by time        308 μs <

 Memory estimate: 82.19 KiB, allocs estimate: 3347.
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  17.817 μs  145.613 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     18.958 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.272 μs ±   8.474 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆█▆▄▁▁                                                       ▁
  ███████▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▅▅▅▆▄▅▄▄▅▆▅▆▆▆▆▇▇▇▆▆▆▇▇▇▇▆▆▃▃▃▄▄▂▅▆▅▅ █
  17.8 μs       Histogram: log(frequency) by time      57.6 μs <

 Memory estimate: 8.08 KiB, allocs estimate: 516.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is not what I expected that code to infer to as I was expecting Tuple{Any,Union{Char,Nothing}}. That particular tuple we could then potentially add codegen optimizations for, so that we don't box it immediately.

Copy link
Member Author

@aviatesk aviatesk Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with that special handling of union-tuple in the optimizer would be worth to do.
For now I want to have this change (currently the return type of abstract_call_method_with_const_args is Union{Tuple{DataType, Nothing}, Tuple{Any, InferenceResult}}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see that. I did not expect that inference could do so accurately:

julia> Base.return_types(Core.Compiler.abstract_call_method_with_const_args)
1-element Vector{Any}:
 Union{Tuple{Any, Core.Compiler.InferenceResult}, Tuple{DataType, Nothing}}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, do you think we can move this PR forward ?

@vtjnash vtjnash merged commit 795935f into master Jul 29, 2021
@vtjnash vtjnash deleted the avi/jetdemo branch July 29, 2021 19:04
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jul 30, 2021
@vtjnash vtjnash mentioned this pull request Sep 9, 2021
63 tasks
aviatesk added a commit that referenced this pull request Sep 9, 2021
(this PR is the final output of my demo at [our workshop](https://github.com/aviatesk/juliacon2021-workshop-pkgdev))

This PR eliminated much of runtime dispatches within our type inference
routine, that are reported by the following JET analysis:
```julia
using JETTest

const CC = Core.Compiler

function function_filter(@nospecialize(ft))
    ft === typeof(CC.isprimitivetype) && return false
    ft === typeof(CC.ismutabletype) && return false
    ft === typeof(CC.isbitstype) && return false
    ft === typeof(CC.widenconst) && return false
    ft === typeof(CC.widenconditional) && return false
    ft === typeof(CC.widenwrappedconditional) && return false
    ft === typeof(CC.maybe_extract_const_bool) && return false
    ft === typeof(CC.ignorelimited) && return false
    return true
end

function frame_filter((; linfo) = sv)
    meth = linfo.def
    isa(meth, Method) || return true
    return occursin("compiler/", string(meth.file))
end

report_dispatch(CC.typeinf, (CC.NativeInterpreter, CC.InferenceState); function_filter, frame_filter)
```

> on master
```
═════ 137 possible errors found ═════
...
```
> on this PR
```
═════ 51 possible errors found ═════
...
```

And it seems like this PR makes JIT slightly faster:
> on master
```julia
~/julia/julia master
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));'
  3.659865 seconds (7.19 M allocations: 497.982 MiB, 3.94% gc time, 0.39% compilation time)
  2.696410 seconds (3.62 M allocations: 202.905 MiB, 7.49% gc time, 56.39% compilation time)
```
> on this PR
```julia
~/julia/julia avi/jetdemo* 7s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));'
  3.396974 seconds (7.16 M allocations: 491.442 MiB, 4.80% gc time, 0.28% compilation time)
  2.591130 seconds (3.48 M allocations: 196.026 MiB, 7.29% gc time, 56.72% compilation time)
```

cherry-picked from 795935f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants