Same LLVM IR, radically different performance

This came up in the context of #22210, where I'm noticing a big performance hit on `transpose` for sparse matrices. A convenient test case comes from copying [these lines](https://github.com/JuliaLang/julia/blob/88222a74cb7091e506cbe39da09163598f017193/base/sparse/sparsematrix.jl#L737-L781) to a separate file, and annotating `_computecolptrs_halfperm!` with `@noinline` (not strictly necessary since it doesn't inline on master) and then comparing the result of using either `@noinline` or `@inline` on `_distributevals_halfperm!`.

Demo:
```julia
A = sprand(600, 600, 0.01);
X = transpose(A);
using BenchmarkTools
```

With `@inline` on `_distributevals_halfperm!`:
```julia
julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  166.98 KiB
  allocs estimate:  10685
  --------------
  minimum time:     921.938 μs (0.00% GC)
  median time:      936.064 μs (0.00% GC)
  mean time:        954.923 μs (0.40% GC)
  maximum time:     1.627 ms (38.60% GC)
  --------------
  samples:          1046
  evals/sample:     1
```
With `@noinline` on `_distributevals_halfperm!`:
```julia
julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  64 bytes
  allocs estimate:  2
  --------------
  minimum time:     23.175 μs (0.00% GC)
  median time:      23.390 μs (0.00% GC)
  mean time:        23.658 μs (0.00% GC)
  maximum time:     52.727 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
```

Inspection does not suggest an immediate reason for this 40x performance gap; profiling places all the blame at [this line](https://github.com/JuliaLang/julia/blob/88222a74cb7091e506cbe39da09163598f017193/base/sparse/sparsematrix.jl#L776) with the function evaluation. It made me wonder whether there is some problem inlining the function call.

However, the truly bizarre part is that, with `@inline`, `@code_llvm _distributevals_halfperm!(X, A, 1:A.n, identity)` is, for all practical purposes that I can see, **identical** to `@code_llvm halfperm!(X, A, 1:A.n, identity)` (aside from the obvious call to `_computecolptrs_halfperm!`). I am not at all good at reading assembly, but even there the differences do not seem dramatic to me (there are some constant differences to `movq` statements that might be problematic?).

This seems really puzzling. LLVM bug? Present at least on 0.6.0-rc3 and master.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Same LLVM IR, radically different performance #22408

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Same LLVM IR, radically different performance #22408

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions