-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] improved dense-sparse and sparse-dense matrix multiplication kernels #24045
Conversation
cf44d90
to
302917c
Compare
Equivalent kernels for sparse-dense matrix multiplication are now live in the second commit. I have a third commit that consolidates the dense-sparse and sparse-dense methods, but would like to run nanosoldier before and after pushing that commit to check that the consolidation does not impact performance. |
@nanosoldier |
From nanosoldier's logs, ERROR: LoadError: MethodError: no method matching h5d_write(::Int32, ::Int32, ::Base.ReinterpretArray{UInt8,1,HDF5.HDF5ReferenceObj,Array{HDF5.HDF5ReferenceObj,1}})
Closest candidates are:
h5d_write(::Any, ::Any, ::Any, !Matched::Any, !Matched::Any, !Matched::Any) at /home/nanosoldier/.julia/v0.7/HDF5/src/HDF5.jl:2053
h5d_write(::Int32, ::Int32, !Matched::String) at /home/nanosoldier/.julia/v0.7/HDF5/src/HDF5.jl:1922
h5d_write(::Int32, ::Int32, !Matched::Array{S<:String,N} where N) where S<:String at /home/nanosoldier/.julia/v0.7/HDF5/src/HDF5.jl:1929
...
Stacktrace:
[1] writearray(::HDF5.HDF5Dataset, ::Int32, ::Base.ReinterpretArray{UInt8,1,HDF5.HDF5ReferenceObj,Array{HDF5.HDF5ReferenceObj,1}}) at /home/nanosoldier/.julia/v0.7/HDF5/src/HDF5.jl:1807
[2] #_write#17(::Array{Any,1}, ::Function, ::JLD.JldFile, ::String, ::Array{Any,1}, ::JLD.JldWriteSession) at /home/nanosoldier/.julia/v0.7/JLD/src/JLD.jl:574
[3] _write(::JLD.JldFile, ::String, ::Array{Any,1}, ::JLD.JldWriteSession) at /home/nanosoldier/.julia/v0.7/JLD/src/JLD.jl:559
[4] #write#14(::Array{Any,1}, ::Function, ::JLD.JldFile, ::String, ::Any, ::JLD.JldWriteSession) at /home/nanosoldier/.julia/v0.7/JLD/src/JLD.jl:512
[5] #jldopen#9(::Bool, ::Bool, ::Bool, ::Function, ::String, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at /home/nanosoldier/.julia/v0.7/JLD/src/JLD.jl:178
[6] (::getfield(JLD, Symbol("#kw##jldopen")))(::Array{Any,1}, ::typeof(JLD.jldopen), ::String, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./<missing>:0
[7] #jldopen#10(::Bool, ::Bool, ::Bool, ::Function, ::String, ::String) at /home/nanosoldier/.julia/v0.7/JLD/src/JLD.jl:231
[8] (::getfield(JLD, Symbol("#kw##jldopen")))(::Array{Any,1}, ::typeof(JLD.jldopen), ::String, ::String) at ./<missing>:0
[9] #jldopen#11(::Array{Any,1}, ::Function, ::getfield(JLD, Symbol("##34#35")){String,Dict{String,String},Tuple{String,BenchmarkTools.BenchmarkGroup}}, ::String, ::Vararg{Any,N} where N) at /home/nanosoldier/.julia/v0.7/JLD/src/JLD.jl:241
[10] (::getfield(JLD, Symbol("#kw##jldopen")))(::Array{Any,1}, ::typeof(JLD.jldopen), ::Function, ::String, ::String) at ./<missing>:0
[11] #save#33(::Bool, ::Bool, ::Function, ::FileIO.File{FileIO.DataFormat{:JLD}}, ::String, ::Any, ::Any, ::Vararg{Any,N} where N) at /home/nanosoldier/.julia/v0.7/JLD/src/JLD.jl:1220
[12] save(::FileIO.File{FileIO.DataFormat{:JLD}}, ::String, ::Dict{String,String}, ::String, ::Vararg{Any,N} where N) at /home/nanosoldier/.julia/v0.7/JLD/src/JLD.jl:1217
[13] #save#14(::Array{Any,1}, ::Function, ::String, ::String, ::Vararg{Any,N} where N) at /home/nanosoldier/.julia/v0.7/FileIO/src/loadsave.jl:61
[14] save(::String, ::String, ::Dict{String,String}, ::String, ::Vararg{Any,N} where N) at /home/nanosoldier/.julia/v0.7/FileIO/src/loadsave.jl:61
[15] save(::String, ::String, ::Vararg{Any,N} where N) at /home/nanosoldier/.julia/v0.7/BenchmarkTools/src/serialization.jl:36
[16] include_relative(::Module, ::String) at ./loading.jl:533
[17] include(::Module, ::String) at ./sysimg.jl:14
[18] process_options(::Base.JLOptions) at ./client.jl:325
[19] _start() at ./client.jl:391
in expression starting at /home/nanosoldier/workdir/tmpra50k1/benchscript.jl:28 which looks like #23750 (comment) ? (Update: The JLD issue appears resolved by JuliaIO/JLD.jl#192.) |
Yep, that's a JLD issue that just got fixed. Let's try this again. @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
@nanosoldier |
Nanosoldier reported meaningful regressions only in the kernel for |
Fixes #22615 |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
The sparse matmul benchmarks have been pretty noisy on master, so it's kind of hard to determine how much is related and how much is noise. |
Should we try to bring back these improvements? |
I'd love to. Regrettably I won't have time to see this through for the foreseeable future. Best! |
Is there anything "big" that is left to do here @Sacha0? Getting these in would be very nice so if it is just a manner of rebasing ( |
[Insert hazy memory disclaimer here.] Much thanks for looking into resuscitating this pull request! :) Rebasing through the lazy adjoint/transpose change and associated cleanup should make this mergeworthy again. Of course there remains potential downstream work, but none of it pressing. (If this pull request nonetheless remains open then, I hope to polish it off after submitting my dissertation and figuring out immediate next steps.) Best! |
It would be amazing to have this land :) |
Currently,
|
@Sacha0 Would it be possible for you to revive this PR? |
|
Thank you. :-) |
IIUC, this was revived in #38876, but we might still need the tests from here, as it seems like that was merged without tests? @Sacha0, @dkarrasch, or @ViralBShah would you be able confirm and close or update this, as appropriate? |
We are covering all but one cases according to https://codecov.io/gh/JuliaLang/julia/src/master/stdlib/SparseArrays/src/linalg.jl. I'll add one as soon as I get to it. Other than that, this could be closed. The remaining aspect of this one was to (silently) materialize certain adjoints/transposes, but in #40089 I found this can be sometimes advantageous and sometimes not, so it doesn't look like a good general solution and materializing should be left to the user. |
This pull request provides improved dense-sparse matrix multiplication kernels.
Equivalent kernels for sparse-dense matrix multiplication to follow.(Edit: Equivalent kernels for sparse-dense matrix multiplication have landed.) Further optimization of the trickier cases in this pull request (see comments) also to follow.Performance in a machine learning application from @pevnak motivated this pull request (see slack/#linalg). Specifically, IIRC TensorFlow outperforms Flux on master on his problem, whereas with these kernels (and the
forthcomingsparse-dense equivalents) the tables turn.Comprehensive benchmarks
forthcoming(via BaseBenchmarks PR JuliaCI/BaseBenchmarks.jl#128).Best!