Skip to content

Same LLVM IR, radically different performance #22408

Closed
@timholy

Description

@timholy

This came up in the context of #22210, where I'm noticing a big performance hit on transpose for sparse matrices. A convenient test case comes from copying these lines to a separate file, and annotating _computecolptrs_halfperm! with @noinline (not strictly necessary since it doesn't inline on master) and then comparing the result of using either @noinline or @inline on _distributevals_halfperm!.

Demo:

A = sprand(600, 600, 0.01);
X = transpose(A);
using BenchmarkTools

With @inline on _distributevals_halfperm!:

julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  166.98 KiB
  allocs estimate:  10685
  --------------
  minimum time:     921.938 μs (0.00% GC)
  median time:      936.064 μs (0.00% GC)
  mean time:        954.923 μs (0.40% GC)
  maximum time:     1.627 ms (38.60% GC)
  --------------
  samples:          1046
  evals/sample:     1

With @noinline on _distributevals_halfperm!:

julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  64 bytes
  allocs estimate:  2
  --------------
  minimum time:     23.175 μs (0.00% GC)
  median time:      23.390 μs (0.00% GC)
  mean time:        23.658 μs (0.00% GC)
  maximum time:     52.727 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Inspection does not suggest an immediate reason for this 40x performance gap; profiling places all the blame at this line with the function evaluation. It made me wonder whether there is some problem inlining the function call.

However, the truly bizarre part is that, with @inline, @code_llvm _distributevals_halfperm!(X, A, 1:A.n, identity) is, for all practical purposes that I can see, identical to @code_llvm halfperm!(X, A, 1:A.n, identity) (aside from the obvious call to _computecolptrs_halfperm!). I am not at all good at reading assembly, but even there the differences do not seem dramatic to me (there are some constant differences to movq statements that might be problematic?).

This seems really puzzling. LLVM bug? Present at least on 0.6.0-rc3 and master.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions