Julep: `tfunc`s for all

TL;DR: one of the harder compiler problems is handling inference on abstract types. This proposes to cheat by adding lookup tables for a few functions. This also analyzes how many poorly-inferred calls we actually have.

There's been a lot of thinking lately about how to handle abstract inference, e.g., #34742, "typocalypse" #36208, and many of the invalidation PRs are nibbling at individual pieces of this problem. Some of the discussion in #35904 centered on whether it would be possible for inference be more assertive about output types, but the cost of running inference on everything is large and recent progress has focused on doing less inference, not more.

I have a growing sense that the majority of inference problems affect a relatively small fraction of functions. One potential solution (which I'm sure must have been considered before) is to do something akin to the `tfunc` mechanism† for generic functions, automatically adding type-assertions as needed for particular functions (EDIT: previous was clarified in response to @martinholters). I'm not exactly sure how it would work (and therein lies the rub), but something like "if you're calling a method in `ModuleA` with abstract types as wide as the method signature itself, then its permissible to impose custom tfuncs on its callees." This is essentially a strong form of piracy-prevention.

To see how this might work, let's consider the example of `eof`. We document that it returns a `Bool`. In a sense, if you're calling one of "our" `eof` implementations, and your object is a subtype of `IO`, then this would seem to be a restriction you should have to live with (meaning, we'd enforce it by automatically adding a type-assert). So if you have a method `myreader(io::IO, args...)` and you call `eof(io)`, then it seems fair to require that this `eof` call return a `Bool`. Conversely, if you duck-type the method, `myreader(io, args...)`, and if inference ends up producing a purely-abstract version without knowing the type of `io`, then the abstract call is on `eof(::Any)`, and in that case we would not make any guarantees about the return type. (Even though Base "owns" `Any`, we make an exception and pretend it doesn't.) If this is still too restrictive, it could conceivably be loosened by exempting known *concrete* types from the pseudo-`tfunc` constraints. I suspect this isn't a good idea, though, as program behavior would change based on the success or failure of inference, and moreover it just seems weird to allow specific concrete subtypes to escape constraints that we otherwise apply to the abstract type itself.

Somewhat in the spirit of #36393, I thought that perhaps the best way to think about this more clearly would be to get a sense for the scale of the problem. So I did an analysis on our current codebase. The script is in https://gist.github.com/timholy/9d2aabaeabb22239b5e7a4e95e35d298, but in rough terms what it does is go through all methods, run inference on them with their declared signatures (i.e., the widest type signature permissible), and then look for poorly inferred calls to a specific subset of functions. It exempts a "hit" if it is immediately followed by a typeassert. For example, one of the functions on the list is `==`, which should return `Union{Bool,Missing}`. The method

```julia
julia> f(x, y) = (x == y)::Bool
f (generic function with 1 method)

julia> code_typed(f, (Any, Any))
1-element Array{Any,1}:
 CodeInfo(
1 ─ %1 = (x == y)::Any
│        Core.typeassert(%1, Main.Bool)::Bool
│   %3 = π (%1, Bool)
└──      return %3
) => Bool
```

would be deemed OK because of the type-assert that follows the `==` call (which has an inferred return type of `Any`). But without that type-assert, this would be deemed a violation of the "contract" that `==` must satisfy. This Julep is essentially exploring the possibility of adding the missing type-assert systematically to everything that doesn't infer to the expected type and doesn't already have its own equally- or more-restrictive type-assert.

While you'll want to see the code for full details, roughly speaking this will count violations of things that I think most people would agree on, e..g., we expect `convert(T, x)::T`, `isempty(x)::Bool`, etc.

The results of the analysis: out of ~16000 methods, about 1500 of them have one or more "violating" calls to one of the functions I put on the `tfunc` list. That's more than I hoped but fewer than I feared. Here are the functions on the list and a count of the number of calling Methods they have (somewhat like "unrealized backedges" since these depend on the inferred code but not MethodInstances):

```
!: 163
&: 15
+: 102
-: 186
<: 297
<=: 221
==: 552
>: 217
>=: 102
axes: 144
cconvert: 106
cmp: 21
codeunit: 15
convert: 297
copyto!: 40
displaysize: 11
eof: 16
getindex: 13
isascii: 1
isempty: 170
isequal: 33
iseven: 10
isfinite: 34
isinf: 25
isinteger: 11
isless: 24
ismarked: 3
isnan: 45
isodd: 9
isone: 7
isopen: 2
isperm: 2
ispow2: 2
isreadable: 3
isreal: 10
issorted: 6
issubset: 2
istextmime: 3
isvalid: 12
iswritable: 3
iszero: 36
iterate: 395
leading_zeros: 12
length: 342
ncodeunits: 31
nextind: 25
prevind: 20
readline: 1
resize!: 15
size: 300
sizeof: 29
thisind: 3
unsafe_convert: 63
|: 2
```

In terms of paths forward (aside from just closing this issue...), we could:
- implement a `tfunc`-like mechanism in inference to impose these limits "behind the scenes"
- write a bot that automatically annotates source code with typeasserts for some or all of these functions
- continue to add type-asserts to specific locations only when we discover they matter for one reason or another (performance, latency, whatever).

The third is kind of the default, but I was curious to know how many annotations would be required to shut most of these down. It turns out, quite a few! In contrast with my impressions from #36393, where I felt that we seem to closing in on eliminating many of the biggest sources of invalidation, from this perspective I think we've barely started: the lists look eerily similar for 1.4 and master.

-------------------

†If you're unfamiliar with `tfunc`, it's an inference lookup table for Julia's intrinsic & builtin functions. See https://github.com/JuliaLang/julia/blob/master/base/compiler/tfuncs.jl



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Julep: `tfunc`s for all #36463

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Julep: tfuncs for all #36463

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Julep: `tfunc`s for all #36463