Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using HybridArrays.jl #58

Closed
SteffenPL opened this issue Jul 20, 2020 · 17 comments
Closed

using HybridArrays.jl #58

SteffenPL opened this issue Jul 20, 2020 · 17 comments

Comments

@SteffenPL
Copy link

SteffenPL commented Jul 20, 2020

I'm trying to simulate interacting particle systems. The state vector has dimension '2N' or '3N'.
It seems beneficial to use an Array of StaticArrays for the 2D/3D points.

HybridArrays.jl would be super nice here, since it avoids a Vector of SVectors
and is still very fast. (timings are in the code below; notice that access is exactly as for a 2xN Array!).

Issues:

  • HybridArrays does not work with all methods (for example not with Rosenbrock23(), Rosenbrock23(autodiff=false)).
  • Missing plot recipe (should be easy to implement)

I can try to work on the integration if wanted. Of course, hints are very welcome :D

I think HybridArrays could be an alternative to an Array of StaticArrays and without problems like #47 and SciML/LabelledArrays.jl#68.

using HybridArrays, StaticArrays
using DifferentialEquations, BenchmarkTools, Random

N = 50
Dim = 2

# just some random dynamics to show speed benefit of static arrays
function f_ode(du, u, p, t)
    N = size(u,2)
    du[:] .= 0.
    for i = 1:N
        for j = 1:i-1
            v = u[:,i] - u[:,j]
            dist_squared = sum( x -> x^2, v)
            du[:,i] -= v * dist_squared
            du[:,j] += v * dist_squared
        end
    end
end

# same as f_ode but for vector of svectors
function f_ode_vec_svec(du, u, p, t)
    N = size(u,1)
    du[:] .-= du[:]  #(sorry, I don't know how to zero a vector of svectors fast.)
    for i = 1:N
        for j = 1:i-1
            v = u[i] - u[j]
            dist_squared = sum( x -> x^2, v)
            du[i] -= v * dist_squared
            du[j] += v * dist_squared
        end
    end
end


Random.seed!(1)
z0 = rand(Dim,N)
z0_hybrid = HybridArray{Tuple{Dim,StaticArrays.Dynamic()}}(z0)

Random.seed!(1)
z0_vec_svec = rand(SVector{Dim,Float64},N)


prob          = ODEProblem(f_ode,          z0,          (0., 10.))
prob_hybrid   = ODEProblem(f_ode,          z0_hybrid,   (0., 10.))
prob_vec_svec = ODEProblem(f_ode_vec_svec, z0_vec_svec, (0., 10.))

sol          = @btime solve(prob,          Euler(), dt=0.01) seconds=2
# 682.401 ms (33153208 allocations: 1.65 GiB)
sol_hybrid   = @btime solve(prob_hybrid,   Euler(), dt=0.01) seconds=2
# 7.337 ms (14070 allocations: 3.04 MiB)
sol_vec_svec = @btime solve(prob_vec_svec, Euler(), dt=0.01) seconds=2
# 5.019 ms (20074 allocations: 4.78 MiB)


sol2          = solve(prob,          Rosenbrock23())
# works
sol2_hybrid   = solve(prob_hybrid,   Rosenbrock23())
# MethodError: copyto!(::DiffEqBase.DiffEqBC{Array{Float64,2}}, ::Base.Broadcast.Broadcasted{HybridArrays.HybridArrayStyle{2},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(muladd),Tuple{Base.Broadcast.Broadcasted{HybridArrays.HybridArrayStyle{2},Nothing,typeof(DiffEqBase.ODE_DEFAULT_NORM),Tuple{HybridArray{Tuple{2,StaticArrays.Dynamic()},Float64,2,2,Array{Float64,2}},Float64}},Float64,Float64}})
# is ambiguous.
sol2_vec_svec = solve(prob_vec_svec, Rosenbrock23())
# ArgumentError: Cannot create a dual over scalar type SArray{Tuple{2},Float64,1,2}.
#  If the type behaves as a scalar, define FowardDiff.can_dual.

using Plots
plot( sol[1,1,:], sol[2,1,:])
plot( sol_hybrid[1,1,:], sol_hybrid[2,1,:])
plot( getindex.(sol_vec_svec[1,:],1) , getindex.(sol_vec_svec[1,:],2))
# plots look all the same

plot( sol )
# works
plot( sol_hybrid )
# MethodError: no method matching u_n(...)  
# Note: Here a ::CartesianIndex{2}  does not match the expected ::Int64
plot( sol_vec_svec )
# MethodError: no method matching one(::Type{SArray{Tuple{2},Float64,1,2}})

I can provide full error messages if needed.

@mateuszbaran
Copy link
Contributor

Hi!

Thanks for using HybridArrays 🙂 . I can see at least two problems here:

  1. copyto!(dest::DiffEqBC, bc::$B) from diffeqfastbc.jl needs to be specialized for HybridArrayStyle somewhere. I guess https://github.com/SciML/DiffEqBase.jl/blob/master/src/diffeqfastbc.jl would be the right place, just executed on demand using Requires.jl. I think the same copyto! as here: https://github.com/SciML/DiffEqBase.jl/blob/d6769e8aeac3eb29e54d2e557ceacc4f4fa27891/src/diffeqfastbc.jl#L46 , just for HybridArrayStyle, would be enough.
  2. HybridArrays needs to overload some additional functions. I'm not sure which ones, definitely zero to make the right cache, but if someone more experienced with DiffEq ecosystem could make a list (or link to one) it would be helpful. At the moment HybridArrays doesn't overload that much, mostly broadcasting.

I can help fixing this on the HybridArrays side but I'd like to hear from DiffEq people before I start working on it.

And, just in case, HybridArrays will most likely drop SSubArray in the near-ish future since allocation improvements in Julia 1.5 should make this trick obsolete.

@mateuszbaran
Copy link
Contributor

That plot issue is really intriguing. On one hand, HybridArrays doesn't expose linear indexing where it could. On the other hand, plotting recipe system doesn't work with arrays that are not linearly indexable. On yet another hand, I've noticed that for some reason eachindex on static arrays returns Base.OneTo instead of SOneTo.

@mateuszbaran
Copy link
Contributor

And, by the way, I could implement ArrayInteface.jl methods for HybridArray, seems easy enough. I'm just not sure where should I put the code.

@ChrisRackauckas
Copy link
Member

It's best to put the code in the array library since less Requires will mean less compile times. Right now Requires is just used in order to jump start the adoption of an extended trait ecosystem: if the standard arrays don't have the trait, then there's no point in using it.

@mateuszbaran
Copy link
Contributor

OK, I've implemented ismutable, can_setindex, parent_type and restructure in HybridArrays. That should be enough I guess.

@mateuszbaran
Copy link
Contributor

FYI, with the changes from JuliaArrays/HybridArrays.jl#19 HybridArray is almost as fast as Vector of SVectors in this example (4.968 ms vs 4.492 ms) and plotting also works.

@SteffenPL
Copy link
Author

SteffenPL commented Jul 21, 2020

Wow, thanks for the implementation! Yes, I can confirm that the speed is very good and plotting works.

Implicit methods still fail, since ldiv! is not defined (I tested ImplicitEuler, Rosenbrock23).
But this also doesn't work with SVectors and MVectors, so it might be more complicated. (I'm not sure if DiffEq has special treatment for those.)

PS: For my application, implicit methods are not important. Your changes already solve the other difficulties I had 👍

using LinearAlgebra

A = qr([1 2.2 4; 3.1 0.2 3; 4 1 2]);
X = [1; 2.5; 3];
Yd = HybridArray{Tuple{StaticArrays.Dynamic()}}(X)
Ys = HybridArray{Tuple{3}}(X)
Ysa = SVector{3,Float64}(X)
Ym = MVector{3,Float64}(X)

ldiv!(A,X) # works

ldiv!(A,Yd)
# ERROR: MethodError: no method matching ldiv!(::LinearAlgebra.QRCompactWY{Float64,Array{Float64,2}}, ::HybridArray{Tuple{StaticArrays.Dynamic()},Float64,1,1,Array{Float64,1}})
# Closest candidates are:
#   ldiv!(::IterativeSolvers.Identity, ::Any) at C:\Users\Steffen Plunder\.julia\packages\IterativeSolvers\3g7hG\src\common.jl:31
#   ldiv!(::Number, ::AbstractArray) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.4\LinearAlgebra\src\generic.jl:252
#   ldiv!(::Adjoint{T,#s48} where #s48<:(BandedMatrices.BandedLU{T,#s41} where #s41<:BandedMatrices.BandedMatrix), ::Union{AbstractArray{T,1}, AbstractArray{T,2}}) where T<:Real at C:\Users\Steffen Plunder\.julia\packages\BandedMatrices\vcLoH\src\banded\linalg.jl:54

ldiv!(A,Ys)
# ERROR: MethodError: no method matching ldiv!(::LinearAlgebra.QRCompactWY{Float64,Array{Float64,2}}, ::HybridArray{Tuple{3},Float64,1,1,Array{Float64,1}})
# ...

ldiv!(A,Ysa)
# ERROR: MethodError: no method matching ldiv!(::LinearAlgebra.QRCompactWY{Float64,Array{Float64,2}}, ::SArray{Tuple{3},Float64,1,3})
# ...

ldiv!(A,Ym)
# ERROR: MethodError: no method matching ldiv!(::LinearAlgebra.QRCompactWY{Float64,Array{Float64,2}}, ::SArray{Tuple{3},Float64,1,3})
# ...

@mateuszbaran
Copy link
Contributor

I've implemented vec for HybridArray and now implicit solvers also work. They are also much faster on HybridArray:

julia> sol   = @btime solve(prob,   ImplicitEuler(), dt=0.01); seconds=2
  760.786 ms (5048795 allocations: 534.85 MiB)
2

julia> sol_hybrid   = @btime solve(prob_hybrid,   ImplicitEuler(), dt=0.01); seconds=2
  27.109 ms (10635 allocations: 976.16 KiB)
2

julia> sol_hybrid   = @btime solve(prob_hybrid,   Rosenbrock23(), dt=0.01); seconds=2
  38.912 ms (5940 allocations: 622.34 KiB)
2

julia> sol   = @btime solve(prob,   Rosenbrock23(), dt=0.01); seconds=2
  884.613 ms (3313352 allocations: 723.02 MiB)
2

@ChrisRackauckas
Copy link
Member

Sounds great. I think this is done?

@ChrisRackauckas
Copy link
Member

How are hybrid arrays so much faster there? That's awesome.

@SteffenPL
Copy link
Author

I guess for NBody or particle simulations this is very attractive ( https://github.com/SciML/DiffEqPhysics.jl )
maybe we could document somewhere that HybridArrays are working?

(A tutorial/example somewhere?)

@SteffenPL
Copy link
Author

Is there a way to test that an array type is fully compatible with DiffEq?

@mateuszbaran
Copy link
Contributor

Sounds great. I think this is done?

More or less. I guess we could also check if it's possible to wrap CuArray in HybridArray and get even better performance on GPU but I don't have an NVidia GPU.

How are hybrid arrays so much faster there? That's awesome.

HybridArrays uses the same tricks for indexing and broadcasting as StaticArrays, just adapted to arrays where only a subset of dimensions have statically known sizes. As long as indexing or broadcasting only touches parts with statically known dimensions, the generated code is more or less on par with StaticArrays.

The rest is just the magic of Julia 🙂 .

By the way, I've tried solving that problem using MArray. Compilation of ImplicitEuler solve used all my memory and I had to kill the process...

@ChrisRackauckas
Copy link
Member

Yeah haha, MArray of 100x100 would get quite nasty.

@SteffenPL
Copy link
Author

More or less. I guess we could also check if it's possible to wrap CuArray in HybridArray and get even better performance on GPU but I don't have an NVidia GPU.

I have a NVIDIA GPU and can test whatever code you want.

By the way, I've tried solving that problem using MArray. Compilation of ImplicitEuler solve used all my memory and I had to kill the process...

Yes, this happened to me too.

@chriselrod
Copy link
Collaborator

FWIW, PaddedMatrices.FixedSizeArray is implemented very similarly to StaticArrays.MArray without the same scalability problems.

time julia -O3 -e "using PaddedMatrices; A = @FixedSize rand(100,100); println(A)"
[0.9738329286808532 0.4599862777976286 0.48498955744583994 0.18491771009648705 0.2365293987699838 0.957770504316224 0.6177615805770781 0.0544641305454997 0.19571731988461483 0.554215222653149
... # sparing you the giant wall of numbers
19208889887300795 0.09906601496709622 0.3807470557006286 0.573958391535584 0.7664251041801781 0.9345468648746978 0.9869468068118447 0.8664953718670015 0.747479486334229 0.5900026319707349 0.2775619077088477 0.7748335565754129 0.04467450567646358 0.7174328175565213 0.8681297152062201 0.4501124824839816 0.034289888377784794 0.005733963427150823 0.395402109068577]

________________________________________________________
Executed in    2.39 secs   fish           external
   usr time    2.38 secs  406.00 micros    2.38 secs
   sys time    0.61 secs   81.00 micros    0.61 secs

Vs MArray, which crashed after 90 seconds:

time julia -O3 -e "using StaticArrays; A = @MMatrix rand(100,100); println(A)"
Internal error: encountered unexpected error in runtime:
MethodError(f=typeof(Core.Compiler.widenconst)(), args=(SSAValue(132439),), world=0x0000000000000fd0)
jl_method_error_bare at /home/chriselrod/Documents/languages/julia/src/gf.c:1724
jl_method_error at /home/chriselrod/Documents/languages/julia/src/gf.c:1742
jl_lookup_generic_ at /home/chriselrod/Documents/languages/julia/src/gf.c:2292
jl_apply_generic at /home/chriselrod/Documents/languages/julia/src/gf.c:2313
widen_all_consts! at ./compiler/typeinfer.jl:226
finish at ./compiler/typeinfer.jl:188 [inlined]
typeinf at ./compiler/typeinfer.jl:34
typeinf_edge at ./compiler/typeinfer.jl:529
abstract_call_method at ./compiler/abstractinterpretation.jl:472
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:146
abstract_call_known at ./compiler/abstractinterpretation.jl:989
abstract_call at ./compiler/abstractinterpretation.jl:1012
abstract_call at ./compiler/abstractinterpretation.jl:996
abstract_eval_statement at ./compiler/abstractinterpretation.jl:1117
typeinf_local at ./compiler/abstractinterpretation.jl:1373
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1429
typeinf at ./compiler/typeinfer.jl:12
typeinf_ext at ./compiler/typeinfer.jl:616
typeinf_ext_toplevel at ./compiler/typeinfer.jl:649
typeinf_ext_toplevel at ./compiler/typeinfer.jl:645
jfptr_typeinf_ext_toplevel_8018 at /home/chriselrod/Documents/languages/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/chriselrod/Documents/languages/julia/src/gf.c:2116 [inlined]
jl_apply_generic at /home/chriselrod/Documents/languages/julia/src/gf.c:2317 [inlined]
jl_apply at /home/chriselrod/Documents/languages/julia/src/julia.h:1721 [inlined]
jl_type_infer at /home/chriselrod/Documents/languages/julia/src/gf.c:300
jl_generate_fptr at /home/chriselrod/Documents/languages/julia/src/jitlayers.cpp:294
jl_compile_method_internal at /home/chriselrod/Documents/languages/julia/src/gf.c:1872
jl_compile_method_internal at /home/chriselrod/Documents/languages/julia/src/gf.c:1835 [inlined]
_jl_invoke at /home/chriselrod/Documents/languages/julia/src/gf.c:2126 [inlined]
jl_apply_generic at /home/chriselrod/Documents/languages/julia/src/gf.c:2317
jl_apply at /home/chriselrod/Documents/languages/julia/src/julia.h:1721 [inlined]
do_call at /home/chriselrod/Documents/languages/julia/src/interpreter.c:117
eval_value at /home/chriselrod/Documents/languages/julia/src/interpreter.c:206
eval_stmt_value at /home/chriselrod/Documents/languages/julia/src/interpreter.c:157 [inlined]
eval_body at /home/chriselrod/Documents/languages/julia/src/interpreter.c:551
jl_interpret_toplevel_thunk at /home/chriselrod/Documents/languages/julia/src/interpreter.c:659
top-level scope at /home/chriselrod/.julia/packages/StaticArrays/l7lu2/src/MMatrix.jl:154
jl_toplevel_eval_flex at /home/chriselrod/Documents/languages/julia/src/toplevel.c:838
jl_toplevel_eval_flex at /home/chriselrod/Documents/languages/julia/src/toplevel.c:788
jl_toplevel_eval_flex at /home/chriselrod/Documents/languages/julia/src/toplevel.c:788
jl_toplevel_eval at /home/chriselrod/Documents/languages/julia/src/toplevel.c:847 [inlined]
jl_toplevel_eval_in at /home/chriselrod/Documents/languages/julia/src/toplevel.c:881
eval at ./boot.jl:340
exec_options at ./client.jl:260
_start at ./client.jl:485
jfptr__start_30713 at /home/chriselrod/Documents/languages/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /home/chriselrod/Documents/languages/julia/ui/../src/julia.h:1721 [inlined]
true_main at /home/chriselrod/Documents/languages/julia/ui/repl.c:106
main at /home/chriselrod/Documents/languages/julia/ui/repl.c:227
__libc_start_main at /usr/lib64/haswell/libc.so.6 (unknown line)
_start at /home/chriselrod/Documents/languages/julia/usr/bin/julia (unknown line)
julia: /home/chriselrod/Documents/languages/julia/src/codegen.cpp:4393: jl_cgval_t emit_expr(jl_codectx_t&, jl_value_t*, ssize_t): Assertion `token.V->getType()->isTokenTy()' failed.

signal (6): Aborted
in expression starting at none:1
gsignal at /usr/lib64/haswell/libc.so.6 (unknown line)
abort at /usr/lib64/haswell/libc.so.6 (unknown line)
unknown function (ip: 0x7fddb2f51753)
__assert_fail at /usr/lib64/haswell/libc.so.6 (unknown line)
emit_expr at /home/chriselrod/Documents/languages/julia/src/codegen.cpp:4393
emit_ssaval_assign at /home/chriselrod/Documents/languages/julia/src/codegen.cpp:3893
emit_stmtpos at /home/chriselrod/Documents/languages/julia/src/codegen.cpp:4091 [inlined]
emit_function at /home/chriselrod/Documents/languages/julia/src/codegen.cpp:6656
jl_emit_code at /home/chriselrod/Documents/languages/julia/src/codegen.cpp:7002
jl_emit_codeinst at /home/chriselrod/Documents/languages/julia/src/codegen.cpp:7036
_jl_compile_codeinst at /home/chriselrod/Documents/languages/julia/src/jitlayers.cpp:95
jl_generate_fptr at /home/chriselrod/Documents/languages/julia/src/jitlayers.cpp:306
jl_compile_method_internal at /home/chriselrod/Documents/languages/julia/src/gf.c:1872
jl_compile_method_internal at /home/chriselrod/Documents/languages/julia/src/gf.c:1835 [inlined]
_jl_invoke at /home/chriselrod/Documents/languages/julia/src/gf.c:2126 [inlined]
jl_apply_generic at /home/chriselrod/Documents/languages/julia/src/gf.c:2317
rand at /home/chriselrod/Documents/languages/julia/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:256
rand at /home/chriselrod/Documents/languages/julia/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:259
jl_apply at /home/chriselrod/Documents/languages/julia/src/julia.h:1721 [inlined]
do_call at /home/chriselrod/Documents/languages/julia/src/interpreter.c:117
eval_value at /home/chriselrod/Documents/languages/julia/src/interpreter.c:206
eval_stmt_value at /home/chriselrod/Documents/languages/julia/src/interpreter.c:157 [inlined]
eval_body at /home/chriselrod/Documents/languages/julia/src/interpreter.c:551
jl_interpret_toplevel_thunk at /home/chriselrod/Documents/languages/julia/src/interpreter.c:659
jl_toplevel_eval_flex at /home/chriselrod/Documents/languages/julia/src/toplevel.c:838
jl_toplevel_eval_flex at /home/chriselrod/Documents/languages/julia/src/toplevel.c:788
jl_toplevel_eval_flex at /home/chriselrod/Documents/languages/julia/src/toplevel.c:788
jl_toplevel_eval at /home/chriselrod/Documents/languages/julia/src/toplevel.c:847 [inlined]
jl_toplevel_eval_in at /home/chriselrod/Documents/languages/julia/src/toplevel.c:881
eval at ./boot.jl:340
exec_options at ./client.jl:260
_start at ./client.jl:485
jfptr__start_30713 at /home/chriselrod/Documents/languages/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /home/chriselrod/Documents/languages/julia/ui/../src/julia.h:1721 [inlined]
true_main at /home/chriselrod/Documents/languages/julia/ui/repl.c:106
main at /home/chriselrod/Documents/languages/julia/ui/repl.c:227
__libc_start_main at /usr/lib64/haswell/libc.so.6 (unknown line)
_start at /home/chriselrod/Documents/languages/julia/usr/bin/julia (unknown line)
Allocations: 65191659 (Pool: 65191032; Big: 627); GC: 24

________________________________________________________
Executed in   97.71 secs   fish           external
   usr time   96.26 secs  551.00 micros   96.26 secs
   sys time    1.28 secs    0.00 micros    1.28 secs

fish: “time /home/chriselrod/Documents…” terminated by signal SIGABRT (Abort)

Being statically sized or wrapping giant tuples doesn't cause bad performance, it's just their implementation.

@mateuszbaran
Copy link
Contributor

FWIW, PaddedMatrices.FixedSizeArray is implemented very similarly to StaticArrays.MArray without the same scalability problems.

Nice, I didn't know about your library. Maybe it would be even faster than HybridArrays.

More or less. I guess we could also check if it's possible to wrap CuArray in HybridArray and get even better performance on GPU but I don't have an NVidia GPU.

I have a NVIDIA GPU and can test whatever code you want.

Thanks, but I don't actually need that right now. I just thought that someone else might find it useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants