Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Performance issues with nonlinear_diffusion! #422

Open
bgroenks96 opened this issue Jul 5, 2021 · 4 comments
Open

Performance issues with nonlinear_diffusion! #422

bgroenks96 opened this issue Jul 5, 2021 · 4 comments

Comments

@bgroenks96
Copy link

bgroenks96 commented Jul 5, 2021

nonlinear_diffusion! appears to be quite slow and unexpectedly allocates. MWE (including simple reference implementation):

using BenchmarkTools
using DiffEqOperators

const nknots = 100
# grid edges
x = LinRange(0.0,10.0,nknots+1) |> Vector
# grid cell centers
x_c = (x[1:end-1] .+ x[2:end]) ./ 2
# state variable
u = sin.(2π.*x_c)
# uniform grid spacing pretending to be non-uniform ;)
dx = diff(x_c)
# boundary padded dx
dx_pad = [dx[1], dx..., dx[end]]
# boundary padded u
q = [0.0,u...,0.0]
# conductivity (diffusion function)
# note: really this should have length(x) because it is defined on the
# boundaries of the grid cells, not on the centers? but nonlinear_diffusion!
# seems to want it to match the length of q.
k = 0.5.*ones(length(q))
# output
du = similar(u)
@benchmark nonlinear_diffusion!($du,1,1,2,$k,$q,$dx_pad,$nknots)

"""
nonlineardiffusion!(∂y::AbstractVector, x::AbstractVector, Δx::AbstractVector, k::AbstractVector, Δk::AbstractArray)

Second order Laplacian with non-linear diffusion function, k.
"""
function nonlineardiffusion!(∂y::AbstractVector, x::AbstractVector, Δx::AbstractVector, k::AbstractVector, Δk::AbstractArray)
    @inbounds let x₁ = (@view x[1:end-2]),
        x₂ = (@view x[2:end-1]),
        x₃ = (@view x[3:end]),
        k₁ = (@view k[1:end-1]),
        k₂ = (@view k[2:end]),
        Δx₁ = (@view Δx[1:end-1]),
        Δx₂ = (@view Δx[2:end]);
        @. ∂y = (k₂*(x₃ - x₂)/Δx₂ - k₁*(x₂-x₁)/Δx₁)/Δk
    end
end

du2 = similar(u)
k_ = k[1:end-1] # nonlineardiffusion! expects k to be on the boundaries
dk = diff(x)
@benchmark nonlineardiffusion!($du2, $q, $dx_pad, $k_,$ dk)

@assert all(du .≈ du2)

Benchmark result from nonlinear_diffusion!

BechmarkTools.Trial: 6922 samples with 1 evaluations.
 Range (min … max):  556.640 μs …  14.328 ms  ┊ GC (min … max): 0.00% … 93.33%
 Time  (median):     644.482 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   716.439 μs ± 607.819 μs  ┊ GC (mean ± σ):  5.61% ±  6.28%

      ▇ █▁                                                       
  ▁▂▇▃█▇██▆▇▄▆▅▄▅▅▄▃▃▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  557 μs           Histogram: frequency by time         1.21 ms <

 Memory estimate: 273.14 KiB, allocs estimate: 3766.

..and from reference implementation:

BechmarkTools.Trial: 10000 samples with 140 evaluations.
 Range (min … max):  705.121 ns …   4.145 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     730.871 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   789.835 ns ± 164.178 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇▁█▅▄▁    ▂▁▂▁▁▁▁     ▁                                       ▁
  █████████████████████▇████████▇██▇█▇▇█▇▇█▇█▆▆▇▅▆▆▆▆█▇▇▆▅▅▄▄▄▅ █
  705 ns        Histogram: log(frequency) by time       1.37 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

edit: forgot to interpolate the arguments on the reference impl run; but it doesn't seem to make much of a difference

@ChrisRackauckas
Copy link
Member

@mjsheikh could you take a look?

@mjsheikh
Copy link
Contributor

mjsheikh commented Jul 6, 2021

I checked, so the allocations aren't for the output but rather the internally used CenteredDifference.
The reference implementation doesn't allocate since here you need first-order derivatives and can directly implement those using FD but for higher orders, the C.D. structures would be required.
But yes I agree this is slow.
@ChrisRackauckas I wonder if there is any nicer way for computing :
(CenteredDifference{N}(first_differential_order,approx_order,dx,nknots)*q) .* (CenteredDifference{N}(second_differential_order,approx_order,dx,nknots)*p)

@bgroenks96
Copy link
Author

Is there a way to provide CenteredDifference with pre-allocated arrays?

@ChrisRackauckas
Copy link
Member

Yes it's just the mul! operation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants