Skip to content

mapreducedim! is super slow #352

@simeonschaub

Description

@simeonschaub

Reductions on CLArrays seem to be almost 100x slower than Base (This is with the pocl CPU backend):

julia> using OpenCL, pocl_jll

julia> X = rand(Float32, 1000, 1000);

julia> X′ = CLArray(X);

julia> @benchmark sum(X; dims = 1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):   58.380 μs … 922.617 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):      90.710 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   103.658 μs ±  35.143 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▄▃▁ ▃▇█▃                                                 
  ▁▁▁▂▄█████████▆▆▆▅▅▄▄▃▂▃▃▃▃▄▃▃▂▂▂▂▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ ▃
  58.4 μs          Histogram: frequency by time          212 μs <

 Memory estimate: 4.02 KiB, allocs estimate: 3.

julia> @benchmark OpenCL.synchronize(sum(X′; dims = 1))
BenchmarkTools.Trial: 653 samples with 1 evaluation per sample.
 Range (min … max):  5.585 ms … 12.908 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.424 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.604 ms ±  1.126 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▁▁▁▂▃▁▄█ ▁▇▇▃▄▄▄▃▄▂▃▂▃▁▁▃▁                             
  ▃▂▃▅▆██████████████████████████▇▆█▇▄▇▇█▆▅▇▄▁▄▅▄▄▃▃▅▄▃▂▂▂▂▃ ▅
  5.59 ms        Histogram: frequency by time        10.6 ms <

 Memory estimate: 22.52 KiB, allocs estimate: 247.

Is there any low-hanging fruit in terms of optimizations here?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions