Skip to content

Julep: tfuncs for all #36463

Closed
Closed

Description

TL;DR: one of the harder compiler problems is handling inference on abstract types. This proposes to cheat by adding lookup tables for a few functions. This also analyzes how many poorly-inferred calls we actually have.

There's been a lot of thinking lately about how to handle abstract inference, e.g., #34742, "typocalypse" #36208, and many of the invalidation PRs are nibbling at individual pieces of this problem. Some of the discussion in #35904 centered on whether it would be possible for inference be more assertive about output types, but the cost of running inference on everything is large and recent progress has focused on doing less inference, not more.

I have a growing sense that the majority of inference problems affect a relatively small fraction of functions. One potential solution (which I'm sure must have been considered before) is to do something akin to the tfunc mechanism† for generic functions, automatically adding type-assertions as needed for particular functions (EDIT: previous was clarified in response to @martinholters). I'm not exactly sure how it would work (and therein lies the rub), but something like "if you're calling a method in ModuleA with abstract types as wide as the method signature itself, then its permissible to impose custom tfuncs on its callees." This is essentially a strong form of piracy-prevention.

To see how this might work, let's consider the example of eof. We document that it returns a Bool. In a sense, if you're calling one of "our" eof implementations, and your object is a subtype of IO, then this would seem to be a restriction you should have to live with (meaning, we'd enforce it by automatically adding a type-assert). So if you have a method myreader(io::IO, args...) and you call eof(io), then it seems fair to require that this eof call return a Bool. Conversely, if you duck-type the method, myreader(io, args...), and if inference ends up producing a purely-abstract version without knowing the type of io, then the abstract call is on eof(::Any), and in that case we would not make any guarantees about the return type. (Even though Base "owns" Any, we make an exception and pretend it doesn't.) If this is still too restrictive, it could conceivably be loosened by exempting known concrete types from the pseudo-tfunc constraints. I suspect this isn't a good idea, though, as program behavior would change based on the success or failure of inference, and moreover it just seems weird to allow specific concrete subtypes to escape constraints that we otherwise apply to the abstract type itself.

Somewhat in the spirit of #36393, I thought that perhaps the best way to think about this more clearly would be to get a sense for the scale of the problem. So I did an analysis on our current codebase. The script is in https://gist.github.com/timholy/9d2aabaeabb22239b5e7a4e95e35d298, but in rough terms what it does is go through all methods, run inference on them with their declared signatures (i.e., the widest type signature permissible), and then look for poorly inferred calls to a specific subset of functions. It exempts a "hit" if it is immediately followed by a typeassert. For example, one of the functions on the list is ==, which should return Union{Bool,Missing}. The method

julia> f(x, y) = (x == y)::Bool
f (generic function with 1 method)

julia> code_typed(f, (Any, Any))
1-element Array{Any,1}:
 CodeInfo(
1%1 = (x == y)::Any
│        Core.typeassert(%1, Main.Bool)::Bool%3 = π (%1, Bool)
└──      return %3
) => Bool

would be deemed OK because of the type-assert that follows the == call (which has an inferred return type of Any). But without that type-assert, this would be deemed a violation of the "contract" that == must satisfy. This Julep is essentially exploring the possibility of adding the missing type-assert systematically to everything that doesn't infer to the expected type and doesn't already have its own equally- or more-restrictive type-assert.

While you'll want to see the code for full details, roughly speaking this will count violations of things that I think most people would agree on, e..g., we expect convert(T, x)::T, isempty(x)::Bool, etc.

The results of the analysis: out of ~16000 methods, about 1500 of them have one or more "violating" calls to one of the functions I put on the tfunc list. That's more than I hoped but fewer than I feared. Here are the functions on the list and a count of the number of calling Methods they have (somewhat like "unrealized backedges" since these depend on the inferred code but not MethodInstances):

!: 163
&: 15
+: 102
-: 186
<: 297
<=: 221
==: 552
>: 217
>=: 102
axes: 144
cconvert: 106
cmp: 21
codeunit: 15
convert: 297
copyto!: 40
displaysize: 11
eof: 16
getindex: 13
isascii: 1
isempty: 170
isequal: 33
iseven: 10
isfinite: 34
isinf: 25
isinteger: 11
isless: 24
ismarked: 3
isnan: 45
isodd: 9
isone: 7
isopen: 2
isperm: 2
ispow2: 2
isreadable: 3
isreal: 10
issorted: 6
issubset: 2
istextmime: 3
isvalid: 12
iswritable: 3
iszero: 36
iterate: 395
leading_zeros: 12
length: 342
ncodeunits: 31
nextind: 25
prevind: 20
readline: 1
resize!: 15
size: 300
sizeof: 29
thisind: 3
unsafe_convert: 63
|: 2

In terms of paths forward (aside from just closing this issue...), we could:

  • implement a tfunc-like mechanism in inference to impose these limits "behind the scenes"
  • write a bot that automatically annotates source code with typeasserts for some or all of these functions
  • continue to add type-asserts to specific locations only when we discover they matter for one reason or another (performance, latency, whatever).

The third is kind of the default, but I was curious to know how many annotations would be required to shut most of these down. It turns out, quite a few! In contrast with my impressions from #36393, where I felt that we seem to closing in on eliminating many of the biggest sources of invalidation, from this perspective I think we've barely started: the lists look eerily similar for 1.4 and master.


†If you're unfamiliar with tfunc, it's an inference lookup table for Julia's intrinsic & builtin functions. See https://github.com/JuliaLang/julia/blob/master/base/compiler/tfuncs.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    compiler:inferenceType inferencecompiler:latencyCompiler latencyjulepJulia Enhancement ProposalspeculativeWhether the change will be implemented is speculative

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions