Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of broadcast_dims on DimStack vs DimArray #728

Closed
haakon-e opened this issue Jun 11, 2024 · 2 comments · Fixed by #767
Closed

Performance of broadcast_dims on DimStack vs DimArray #728

haakon-e opened this issue Jun 11, 2024 · 2 comments · Fixed by #767

Comments

@haakon-e
Copy link
Contributor

It appears that broadcast_dims over a DimStack is significantly slower than over a DimArray. Consider the following example:

julia> xs, ys, zs = X(1:70), Y(1:50), Z(1:500)
 X 1:70,
 Y 1:50,
↗ Z 1:500

julia> AB = DimStack((;A=rand(xs,ys,zs), B=rand(xs,ys,zs)))
╭────────────────────╮
│ 70×50×500 DimStack │
├────────────────────┴───────────────────────────── dims ┐
   X Sampled{Int64} 1:70 ForwardOrdered Regular Points,
   Y Sampled{Int64} 1:50 ForwardOrdered Regular Points,
  ↗ Z Sampled{Int64} 1:500 ForwardOrdered Regular Points
├──────────────────────────────────────────────── layers ┤
  :A eltype: Float64 dims: X, Y, Z size: 70×50×500
  :B eltype: Float64 dims: X, Y, Z size: 70×50×500
└────────────────────────────────────────────────────────┘

julia> ab = DimStack((;A=rand(zs), B=rand(zs)))
╭──────────────────────╮
│ 500-element DimStack │
├──────────────────────┴─────────────────────────── dims ┐
   Z Sampled{Int64} 1:500 ForwardOrdered Regular Points
├──────────────────────────────────────────────── layers ┤
  :A eltype: Float64 dims: Z size: 500
  :B eltype: Float64 dims: Z size: 500
└────────────────────────────────────────────────────────┘

# 6ms!!
julia> @benchmark broadcast_dims(-, AB, ab)
BenchmarkTools.Trial: 768 samples with 1 evaluation.
 Range (min  max):  6.045 ms   11.359 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     6.587 ms               ┊ GC (median):    3.23%
 Time  (mean ± σ):   6.506 ms ± 336.666 μs  ┊ GC (mean ± σ):  3.86% ± 3.73%

        ▃███▆▂▁▂▁                  ▁ ▁▅▄▅▄▂    ▁               
  ▃▃▄▆▇██████████▇▇▆▅▅▅▃▂▂▁▁▁▂▃▂▁▅▆███████████▇█▇▆█▄▃▅▅▅▃▃▃▃▃ ▅
  6.05 ms         Histogram: frequency by time        7.06 ms <

 Memory estimate: 53.42 MiB, allocs estimate: 114.

# baseline: 1ms
julia> @benchmark broadcast_dims(-, AB.A, ab.A)
BenchmarkTools.Trial: 4448 samples with 1 evaluation.
 Range (min  max):  955.459 μs    1.922 ms  ┊ GC (min  max): 0.00%   0.00%
 Time  (median):       1.043 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.120 ms ± 200.487 μs  ┊ GC (mean ± σ):  6.72% ± 11.55%

     ▃▆▇█▅▄▃                                                     
  ▂▄▇████████▆▅▄▃▃▂▂▂▂▂▂▁▁▂▂▁▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▃▃▃▄▃▄▄▃▃▃▃▃▃▂ ▃
  955 μs           Histogram: frequency by time         1.68 ms <

 Memory estimate: 13.35 MiB, allocs estimate: 35.

If I define my own broadcast_dims, it has the right speed:

julia> function my_broadcast_dims(func, AB, ab)
           names = keys(AB)
           out = map(names) do name
               broadcast_dims(func, AB[name], ab[name])
           end
           DimStack(NamedTuple{names}(out), dims(AB))
       end
my_broadcast_dims (generic function with 1 method)

# It takes ~twice as long to broadcast over the 2-stack compared to one DimArray
julia> @benchmark my_broadcast_dims(-, AB, ab)
BenchmarkTools.Trial: 2222 samples with 1 evaluation.
 Range (min  max):  1.992 ms    3.999 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     2.152 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.245 ms ± 221.165 μs  ┊ GC (mean ± σ):  4.72% ± 7.53%

      ▁▂▄▃▅█▆█▆▆▂                                              
  ▂▃▅▆███████████▇▇▆▆▃▃▃▂▂▁▂▁▁▁▁▁▁▁▁▂▁▁▂▂▃▃▄▄▃▆▄▄▆▅▆▅▅▄▄▃▃▃▃▂ ▄
  1.99 ms         Histogram: frequency by time        2.77 ms <

 Memory estimate: 26.71 MiB, allocs estimate: 84.
@rafaqz
Copy link
Owner

rafaqz commented Jun 12, 2024

Thanks. @profview may help. I would make a function that runs it like so

f(AB, ab, n) = for _ in 1:n broadcast_dims(-, AB, ab)

And then try 100 times

using ProfileView
@profview f(AB, ab, 100)

That should highly the problems. Feel free to PR if you fix it.

@rafaqz
Copy link
Owner

rafaqz commented Aug 16, 2024

Ok so the problem is there are a lot of edge cases where your formulation will make the order of arguments change the results - as stack layers don't have all the dimensions of each stack.

What Ive done in #767 is check for these cases and avoid the performance cost of using DimExtensionArray on everything.

With that I'm getting the expected performance for your case, but it still fixes the edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants