Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce invalidations from ! #35904

Closed
wants to merge 3 commits into from
Closed

reduce invalidations from ! #35904

wants to merge 3 commits into from

Conversation

JeffBezanson
Copy link
Member

The first commit disallows the loathsome Union{Missing, Bool, Base.var"#66#67"{_A} where _A} return type of !.

The second commit tries to split fewer call sites. This only splits call sites into branches if the argument types are fully covered by the matching definitions, but also allowing disjoint union elements. For example f(::Union{Int,Nothing}) will still be split if only f(::Int) exists.

Based on #35891.

Before:

Base  ─────────── 38.485334 seconds
Stdlibs: ────  54.897741 seconds 58.7876%
-rwxr-xr-x 1 jeff jeff 149314712 May 14 18:33 sys.so
TTFP: 13.533637 seconds (15.12 M allocations: 800.206 MiB, 3.08% gc time)
TTSP: 12.241329 seconds (26.22 M allocations: 1.345 GiB, 7.52% gc time)

after:

Base  ─────────── 37.817779 seconds
Stdlibs: ────  54.142378 seconds 58.8748%
-rwxr-xr-x 1 jeff jeff 148314272 May 15 13:35 sys.so
TTFP: 13.072420 seconds (14.86 M allocations: 787.247 MiB, 1.74% gc time)
TTSP:  7.809885 seconds (17.53 M allocations: 924.497 MiB, 8.53% gc time)

So the main target here are the invalidations causing the long time-to-second-plot (after loading SIMD). But it's also notable that we seem to spend almost 1MB of system image on this.

@JeffBezanson JeffBezanson added the compiler:latency Compiler latency label May 15, 2020
function issimpleenoughtype(@nospecialize t)
return unionlen(t) <= MAX_TYPEUNION_LENGTH && unioncomplexity(t) <= MAX_TYPEUNION_COMPLEXITY
return unionlen(t)+union_count_abstract(t) <= MAX_TYPEUNION_LENGTH && unioncomplexity(t) <= MAX_TYPEUNION_COMPLEXITY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this would get included in the complexity score computation. Just wondering reasoning for this way, though either seems good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would make more sense, but it doesn't increase the union complexity over the maximum. The current complexity of the type is 2; if we're willing to move that to 4 then it can work.

@timholy
Copy link
Member

timholy commented May 15, 2020

Does this cover abstract types?

julia> abstract type AbsType end

julia> struct Type1 <: AbsType end

julia> struct Type2 <: AbsType end

julia> f(::Type1) = 1
f (generic function with 1 method)

julia> f(::Type2) = 2
f (generic function with 2 methods)

julia> g(t::AbsType) = f(t)
g (generic function with 1 method)

julia> h(c) = g(c[1])
h (generic function with 1 method)

julia> c = Any[Type1()]
1-element Array{Any,1}:
 Type1()

julia> h(c)
1

julia> using MethodAnalysis

julia> visit(g) do item
           @show item
           true
       end
item = g
item = # 1 method for generic function "g":
[1] g(t::AbsType) in Main at REPL[6]:1
item = g(t::AbsType) in Main at REPL[6]:1
item = MethodInstance for g(::AbsType)
item = Core.CodeInstance(MethodInstance for g(::AbsType), #undef, 0x0000000000006c71, 0xffffffffffffffff, Int64, #undef, nothing, false, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)
item = MethodInstance for g(::Type1)
item = Core.CodeInstance(MethodInstance for g(::Type1), #undef, 0x0000000000006c70, 0xffffffffffffffff, Int64, 1, nothing, false, Ptr{Nothing} @0x00007f27f695c7c0, Ptr{Nothing} @0x0000000000000000)

julia> code_typed(g, (AbsType,))
1-element Array{Any,1}:
 CodeInfo(
1%1 = (isa)(t, Type2)::Bool
└──      goto #3 if not %1
2 ─      goto #6
3%4 = (isa)(t, Type1)::Bool
└──      goto #5 if not %4
4 ─      goto #6
5%7 = Main.f(t)::Int64
└──      goto #6
6%9 = φ (#2 => 2, #4 => 1, #5 => %7)::Int64
└──      return %9
) => Int64

which is vulnerable if I add

struct Type3 <: AbsType end
f(::Type3) = 3

@JeffBezanson
Copy link
Member Author

With this PR there is no splitting in g(::AbsType), but unfortunately it would still need to be invalidated since we can infer f(::AbsType) isa Int, and adding a new method to f might change that. Maybe I should change this to still split if we can infer a type better than Any? In other words, since we need the backedge anyway, might as well take more advantage of it?

@JeffBezanson
Copy link
Member Author

Inference failures in the tests seem to be mysterious ala #35800.

@timholy
Copy link
Member

timholy commented May 16, 2020

unfortunately it would still need to be invalidated since we can infer f(::AbsType) isa Int, and adding a new method to f might change that

But if the new method does happen return an Int, does this branch obviate the need? ! returns a new type for ::Vec{N,Bool} and that requires invalidation, but I'm thinking about cases like isempty and length, which most likely will return a Bool and Int, respectively, for most container types (even ones we've not seen previously). Traits like IndexStyle are another good example, since the list of possible return types is fairly small and you can imagine these having a very large set of both direct and indirect dependencies.

Curiously, these that I "want" to union-split have many methods which return a narrow set of types; conversely, the dangerous-to-split ! has very few methods and each returns a different type. Is this a pattern? Here are all functions in Base with more than 100 specTypes and fewer than 10 rettypes:

julia> filter(pr -> pr.second.nspecTypes > 100 && pr.second.nrettypes < 10, funcdata)
IdDict{Any,SigCount} with 27 entries:
  ht_keyindex2!          => SigCount(158, 1)
  lastindex              => SigCount(109, 2)
  _growend!              => SigCount(162, 1)
  #sprint#338            => SigCount(132, 1)
  isempty                => SigCount(491, 4)
  show                   => SigCount(209, 4)
  eachindex              => SigCount(235, 4)
  _deleteend!            => SigCount(130, 1)
  isslotempty            => SigCount(127, 1)
  length                 => SigCount(1230, 4)
  axes                   => SigCount(433, 9)
  isslotmissing          => SigCount(127, 1)
  print_to_string        => SigCount(363, 1)
  axes1                  => SigCount(144, 2)
  #with_output_color#701 => SigCount(104, 3)
  ==                     => SigCount(424, 8)
  throw_boundserror      => SigCount(161, 1)
  print                  => SigCount(417, 2)
  string                 => SigCount(426, 2)
  ht_keyindex            => SigCount(156, 1)
  checkbounds            => SigCount(307, 5)
  write                  => SigCount(119, 5)
  haskey                 => SigCount(435, 1)
  isslotfilled           => SigCount(127, 1)
  hash                   => SigCount(126, 2)
  !=                     => SigCount(128, 5)
  in                     => SigCount(192, 7)

A pretty sizable subset look like something I'd imagine union-splitting if the input types were not fully specialized. (I was initially surprised by how many return types length has, but for the record they are Union{}, Any, Int, UInt so perhaps that's not so surprising.)

Some that have <5 MethodInstances and few <5 types are

  signless                   => SigCount(2, 1)
  #BigFloat#5                => SigCount(1, 1)
  initmeta                   => SigCount(1, 1)
  nan_dom_err                => SigCount(1, 1)
  #BigFloat#7                => SigCount(1, 1)
  pow_ui                     => SigCount(1, 1)
  newindex                   => SigCount(3, 2)
  pipe_reader                => SigCount(3, 2)
  include                    => SigCount(1, 1)
  sub_fast                   => SigCount(4, 0)
  argtype_decl               => SigCount(1, 1)
  #s69#135                   => SigCount(1, 1)
  bigitlength                => SigCount(1, 1)
  splitexpr                  => SigCount(4, 4)
  macroexpand                => SigCount(1, 1)
  string_with_env            => SigCount(3, 1)
  rm                         => SigCount(1, 1)
  umul256                    => SigCount(1, 1)
  _nthbyte                   => SigCount(2, 1)
  throw_overflowerr_binaryop => SigCount(3, 1)
  #parse#4                   => SigCount(2, 1)
  cmp_si                     => SigCount(1, 1)
  fieldoffset                => SigCount(1, 1)
  fetch_unbuffered           => SigCount(1, 1)
  isabspath                  => SigCount(3, 2)
  wait_readnb                => SigCount(4, 1)
  #searchsorted#6            => SigCount(1, 1)
  sincos                     => SigCount(2, 0)
  test_success               => SigCount(1, 1)
  ui_sub!                    => SigCount(1, 1)
  _bufcmp                    => SigCount(1, 1)
  reindex                    => SigCount(4, 3)
  sourceinfo_slotnames       => SigCount(1, 1)
  findmeta                   => SigCount(1, 1)
  #reduce#582                => SigCount(2, 2)
  reduce_shortest            => SigCount(2, 1)
  #parse#348                 => SigCount(3, 1)
  srand                      => SigCount(2, 1)
  estimatepower              => SigCount(1, 1)
  unsafe_indices             => SigCount(3, 1)
  @r_str                     => SigCount(2, 1)
  generatecounteddigits!     => SigCount(1, 1)
  LOG10_2U                   => SigCount(2, 0)
  isbitsunion                => SigCount(4, 1)
  decimallength              => SigCount(2, 1)
  filldigits32               => SigCount(2, 1)
  fdiv_q_2exp!               => SigCount(1, 1)
  cbrt                       => SigCount(4, 0)
  ⋮                          => ⋮

but obviously this is a much larger list (>1000 entries). In general these do look like less interesting ones to union-split.

Let's see if functions tend to be in one category or the other with a pretty clean separation:

sigdata

Nope, not much of a pattern ☹️ .

Still, when deciding whether a nonspecific call should be union-split, it doesn't seem crazy to pay attention to the ratio of input-to-output types among the already-compiled methods. I wonder if one can count on having a lot of the specializations already generated before the first abstract case is encountered?

@timholy
Copy link
Member

timholy commented May 16, 2020

But if the new method does happen return an Int, does this branch obviate the need?

Ah, wait, I may have been implicitly assuming that when a new method is defined that we know the answer---but now I am realizing that we probably don't run inference on every new method. Hmmm.

There probably would be cases where the number of invalidations would make it worth just running inference on the new method in the hopes that no invalidation is necessary. Perhaps we could get an estimate just from the number of backedges?

@JeffBezanson
Copy link
Member Author

Wow, really cool analysis.

Ah, wait, I may have been implicitly assuming that when a new method is defined that we know the answer---but now I am realizing that we probably don't run inference on every new method. Hmmm.

Correct, we currently don't take the return type of the new method into account at all. That would be a BIG fish to hook. We've certainly talked about it. If we could do it, it would certainly eliminate a large fraction of invalidations.

it doesn't seem crazy to pay attention to the ratio of input-to-output types among the already-compiled methods

Unfortunately that would make the compiler unpredictable --- how it compiles something would depend on what you happen to have run before. Not a lot of easy answers here.

@timholy
Copy link
Member

timholy commented May 20, 2020

Not a lot of easy answers here.

A really cheap form of inference would be to check whether the new method has a return-type annotation, and use it when deciding whether to invalidate. (I don't use such annotations in my code now, but this might be a use case.)

This only splits call sites into branches if the argument types
are fully covered by the matching definitions, but also allowing
disjoint union elements. For example `f(::Union{Int,Nothing})`
will still be split if only `f(::Int)` exists.
@JeffBezanson
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@JeffBezanson
Copy link
Member Author

Some interesting benchmark results:

  • ["union"]["array"]["perf_countequals","Float64"] --- The PR changed the order of branches, and the array is 90% Float64 so you want to check that first in this case.
  • meteor_contest --- Uses a Matrix{Any} (very slow code, but I'm glad we have a benchmark for a case like that), and we get a nice speedup by branching for UInt64 | Any. I don't see a simple way to get that optimization without also splitting !, except special-casing !, which frankly might be the best thing to do.

@KristofferC
Copy link
Member

KristofferC commented Jun 5, 2020

I don't see a simple way to get that optimization without also splitting !, except special-casing !, which frankly might be the best thing to do.

In this case, the optimization took very slow code and made it 30% faster. The value of that has to be weighed against the latency improvement of the reduced invalidations. If most of the benefit of the optimization is just to make slow code slightly faster, perhaps it's just not worth doing it?

There is also a 2x regression in the raytrace benchmark though.

@JeffBezanson
Copy link
Member Author

There is also a 2x regression in the raytrace benchmark though.

Looks like the same kind of issue. It uses an abstract element type for the array of objects. The speedup there definitely seems worthwhile --- although I guess it would go away if you had more than 4 types of objects 😂

@JeffBezanson
Copy link
Member Author

I tried it, and only doing this for ! still gives essentially all of the latency improvement.

@PallHaraldsson
Copy link
Contributor

What is "TTSP"? ("TTFP" = time to first plot?)

@oscardssmith
Copy link
Member

I think second plot.

@timholy timholy mentioned this pull request Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants