Skip to content

Julep: attempting to measure (& test?) "absolute" risk of invalidation #36393

Closed

Description

Short version: as protection against invalidation (and possibly "code quality" in general), perhaps we should consider some kind of nanosoldier test on the count of backedges of poorly-inferred MethodInstances. Also an update on progress (& regressions) from 1.4 to master.

There are two things that bother me about my recent work on improving inferrability of Base, Pkg, REPL, etc.:

  • it's all package-specific: I care about PkgX, so I search for and stomp on invalidations that it triggers, but that may or may not help users of PkgY. Worse, is there any risk it could make things worse for PkgY?
  • it's easy to imagine regressions. The fixes I've been making only rarely have a consequence that can be caught by @inferred, as many methods return a known type even if their internals are not well-inferred.

For these reasons, I've begun to wonder whether we can & should add some tests that look for "problematic" MethodInstances and tally their number of backedges. The goal would be to develop an absolute measure of our vulnerability to invalidation: at-risk MethodInstances that have the most backedges can cause the most invalidation. If we can reliably measure our vulnerability, then we could also introduce some "hold the line" tests, which prevent our vulnerability from increasing above a certain level. I recognize that there are serious downsides in adding such tests---someone wants to add cool feature X but it's blocked because it's not easy to do that inferrably---but there are also quality-concern risks with not having such tests. So, I thought I'd begin by at least trying to come up with some kind of overall measure of our vulnerability.

At the bottom of this I present a couple of scripts that seem useful. What they try to do is pull out MethodInstances that seem possibly problematic, in the sense that they fit patterns that I've seen while studying invalidations. Overwhelmingly, they involve abstract types, but many methods with abstract types seem reasonably OK, so I've tried to accommodate that in some specific ways. I should say right from the outset that I'm not at all confident that I've made the selection well, but at least it does pull out a lot of MethodInstances that have featured heavily in my invalidation work (as well as many that haven't). For more detail you should really just open up the code below.

The output of the analysis is a scatter plot, comparing where we were on Julia 1.4.2 vs where we are now on master. Each dot represents one possibly-problematic MethodInstance signature. Magenta is for exported names, green for non-exported names:

method_scatter_log

You can see that overall there seems to have been some improvement, at least if you restrict yourself to exported names: there's a strong tendency for magenta dots to be near or below the diagonal. Until I saw this, I had no idea how far we were towards "done," or even whether it's possible to ask that question in a reasonable way. While I've put a lot of time into hunting these invalidations down, overall this result makes me think that efforts to improve our code-quality are worth it, and that we're mostly not living in a world where #35844 is our only hope (CC @c42f). Time will tell...

It's worth noting that the scripts below generate an interactive plot, where you can click on individual dots and it will print the corresponding signature to the REPL. So running it locally can be an informative exercise. How informative it actually is comes down to how well I've selected "problematic" MethodInstances, of course, so take all this with a big grain of salt. (Quite a few of the dots on the plot are nothing to worry about, e.g., setindex!(::Vector{Any}, ::Any, ::Int) is perfectly fine. Others, like notify(::Base.GenericCondition{Base.Threads.SpinLock},::Any,::Bool,::Bool) seem to be at low risk for being extended by packages.)

Some cases where we've made a lot of progress are in various ==, !=, and isequal methods, a small handful of convert methods, some obscure yet surprisingly-important methods like +(::Ptr{Nothing},::Integer), and several string-generation and manipuluation methods (e.g., repr, thisind, lastindex, isempty(::AbstractString), getindex(::String, ::Any), ^(::String, ::Integer), Base.isidentifier, and Base._show_default). Interestingly, there are also a handful of significant regressions since 1.4, mostly in path-handling (we've had a huge increase in backedges for basename(::AbstractString), isfile(::Any)) but also a small number of others like in(::Any, ::Tuple{Symbol}), +(::Int, ::Any, ::Any), and Base.split_sign(::Integer). It is possible that some of these are consequences of inference fixes (e.g., basename(::AbstractString) seems better than basename(::Any)), but these would be worth digging into.

Based on this result I'm beginning to think that it probably would be worth introducing some "hold the line" tests in Base. But I'd be very curious about other folks' thoughts on this matter.

This script saves the data to a log file. Run once on each Julia version you want to analyze:

using MethodAnalysis

# Analyze MethodInstance signatures and select those that seem at risk for being invalidated.
function atrisktype(@nospecialize(typ))
    # signatures like `convert(Vector, a)`, `foo(::Vararg{Synbol,N}) where N` do not seem to pose a problem
    isa(typ, TypeVar) && return false
    # isbits parameters are not a problem
    isa(typ, Type) || return false
    if isa(typ, UnionAll)
        typ = Base.unwrap_unionall(typ)
    end
    # Exclude signatures with Union{}
    typ === Union{} && return false
    isa(typ, Union) && return atrisktype(typ.a) | atrisktype(typ.b)
    # Type{T}: signatures like `convert(::Type{AbstractString}, ::String)` are not problematic, so mark Type as OK
    typ <: Type && return false
    if typ <: Tuple && length(typ.parameters) >= 1
        p1 = typ.parameters[1]
        # Constructor calls are not themselves a problem (any `convert`s they trigger might be, but those are covered)
        isa(p1, Type) && p1 <: Type && return false
        # convert(::Type{T}, ::S) where S<:T is not problematic
        if p1 === typeof(Base.convert) || p1 === typeof(Core.convert) || p1 === typeof(Core.Compiler.convert)
            p2, p3 = typ.parameters[2], typ.parameters[3]
            if isa(p2, Type)
                p2 = Base.unwrap_unionall(p2)
                if isa(p2, DataType) && length(p2.parameters) === 1
                    T = p2.parameters[1]
                    isa(p3, Type) && isa(T, Type) && p3 <: T && return false
                end
            end
        # `getindex`, `length`, etc are OK for various Tuple{T1,T2,...}
        elseif (p1 === typeof(Base.getindex) || p1 === typeof(Core.Compiler.getindex)) ||
               (p1 === typeof(Base.length)  || p1 === typeof(Core.Compiler.length)) ||
               (p1 === typeof(Base.isempty) || p1 === typeof(Core.Compiler.isempty)) ||
               (p1 === typeof(Base.iterate) || p1 === typeof(Core.iterate) || p1 === typeof(Core.Compiler.iterate))
            p2 = typ.parameters[2]
            if isa(p2, Type)
                p2 = Base.unwrap_unionall(p2)
                p2 <: Tuple && return false
            end
        # show(io::IO, x) is OK as long as typeof(x) is safe
        elseif p1 === typeof(Base.show)
            atrisktype(typ.parameters[2]) && return true
            length(typ.parameters) == 3 && atrisktype(typ.parameters[3]) && return true
            return false
        end
    end
    # Standard DataTypes
    isconcretetype(typ) && return false
    # ::Function args are excluded
    typ === Function && return false
    !isempty(typ.parameters) && (any(atrisktype, typ.parameters) || return false)
    return true
end

# A few tests
@assert  atrisktype(Tuple{typeof(==),Any,Any})
@assert  atrisktype(Tuple{typeof(==),Symbol,Any})
@assert  atrisktype(Tuple{typeof(==),Any,Symbol})
@assert !atrisktype(Tuple{typeof(==),Symbol,Symbol})
@assert !atrisktype(Tuple{typeof(convert),Type{Any},Any})
@assert !atrisktype(Tuple{typeof(convert),Type{AbstractString},AbstractString})
@assert !atrisktype(Tuple{typeof(convert),Type{AbstractString},String})
@assert  atrisktype(Tuple{typeof(convert),Type{String},AbstractString})
@assert !atrisktype(Tuple{typeof(map),Function,Vector{Any}})
@assert !atrisktype(Tuple{typeof(getindex),Dict{Union{String,Int},Any},Union{String,Int}})
@assert  atrisktype(Tuple{typeof(getindex),Dict{Union{String,Int},Any},Any})
@assert !atrisktype(Tuple{Type{BoundsError},Any,Any})
@assert  atrisktype(Tuple{typeof(sin),Any})
@assert !atrisktype(Tuple{typeof(length),Tuple{Any,Any}})

function collect_mis(m::Method)
    list = Core.MethodInstance[]
    visit(m) do item
        if isa(item, Core.MethodInstance)
            push!(list, item)
            return false
        end
        return true
    end
    return list
end

const mis = Dict{Method,Vector{Core.MethodInstance}}()
visit() do item
    if item isa Method
        m = item
        mis[m] = collect_mis(m)
        return false
    end
    return true
end

# Count # of backedges for MethodInstances with abstract types
const becounter = Dict{Core.MethodInstance,Int}()
visit() do item
    if item isa Core.MethodInstance
        if atrisktype(item.specTypes)
            becounter[item] = length(all_backedges(item))
        end
        return false
    end
    return true
end

prs = sort!(collect(becounter); by=last)

open("/tmp/methdata_$VERSION.log", "w") do io
    for (mi, c) in prs
        c == 0 && continue
        println(io, mi.specTypes=>c)
    end
end

This script generates the interactive plot. You'll probably have to modify some file names.

using PyPlot

function parseline(line)
    m = match(r"(.*) => (.*)", line)
    sigstr, count = m.captures[1], parse(Int, m.captures[2])
    sig = try
        ex = Meta.parse(sigstr)
        eval(ex)
    catch
        return nothing
    end
    return sig, count
end

function parsedata(filename)
    lines = readlines(filename)
    sigcount = IdDict{Any,Int}()
    for line in lines
        ret = parseline(line)
        ret === nothing && continue
        sig, count = ret
        sigcount[sig] = count
    end
    return sigcount
end

function split_comparable(sigc1, sigc2)
    c1, c2, sigs = Int[], Int[], Any[]
    for (sig, c) in sigc1
        push!(sigs, sig)
        push!(c1, sigc1[sig])
        push!(c2, get(sigc2, sig, 0))
    end
    for (sig, c) in sigc2
        if !haskey(sigc1, sig)
            push!(sigs, sig)
            push!(c1, 0)
            push!(c2, c)
        end
    end
    return sigs, c1, c2
end

sigc16 = parsedata("/tmp/methdata_$VERSION.log")
sigc14 = parsedata("/tmp/methdata_1.4.3-pre.0.log")

sigs, c1, c2 = split_comparable(sigc14, sigc16)
mx1, mx2 = maximum(c1), maximum(c2)
isexported(sig) = (ft = Base.unwrap_unionall(sig).parameters[1]; isdefined(Main, ft.name.mt.name))
colors = [isexported(sig) ? "magenta" : "green" for sig in sigs]

function on_click(event)
    x, y = event.xdata, event.ydata
    normsqrdist(pr) = ((pr[1]-x)/mx1)^2 + ((pr[2]-y)/mx2)^2
    idx = argmin(normsqrdist.(zip(c1, c2)))
    println(sigs[idx])
end
begin
    fig, ax = plt.subplots()
    ax.scatter(c1 .+ 1, c2 .+ 1, c=colors)  # + 1 for the log-scaling
    ax.set_xlabel("# backedges + 1, 1.4")
    ax.set_ylabel("# backedges + 1, 1.6")
    ax.set_xscale("log")
    ax.set_yscale("log")
    fig.canvas.callbacks.connect("button_press_event", on_click)
    fig
end

Note: in the plotting script, if you copy/paste the begin...end block to the REPL and execute it, you can use the zoom features of matplotlib. For reasons I don't understand, they don't show up when you just include the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions