This repository has been archived by the owner on Mar 12, 2021. It is now read-only.
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.
Performance issue with v2.1.0 compared with v1.7.3 #701
Closed
Description
openedon May 3, 2020
Describe the bug
The performance of CuArrays@v2.1.0 is slower compared to v1.7.3 for small models.
To Reproduce
The Minimal Working Example (MWE) for this bug:
(@v1.4) pkg> st
[587475ba] Flux v0.10.4
[3a865a2d] CuArrays v2.1.0 #master (https://github.com/JuliaGPU/CuArray
[be33ccc6] CUDAnative v3.0.4
julia> using Flux,CuArrays
julia> model = Chain(
Dense(4, 128, relu),
Dense(128, 128, relu),
Dense(128, 2),
) |> gpu
Chain(Dense(4, 128, relu), Dense(128, 128, relu), Dense(128, 2))
julia> @benchmark CuArrays.@sync model($(cu(rand(4))))
BenchmarkTools.Trial:
memory estimate: 8.80 KiB
allocs estimate: 276
--------------
minimum time: 93.864 μs (0.00% GC)
median time: 115.179 μs (0.00% GC)
mean time: 125.542 μs (1.97% GC)
maximum time: 50.622 ms (48.86% GC)
--------------
samples: 10000
evals/sample: 1
julia> CuArrays.version()
v"10.1.243"
For comparison:
(@v1.4) pkg> st
[be33ccc6] CUDAnative v2.10.2
[3a865a2d] CuArrays v1.7.3
[587475ba] Flux v0.10.3
julia> @benchmark CuArrays.@sync model($(cu(rand(4))))
BenchmarkTools.Trial:
memory estimate: 8.16 KiB
allocs estimate: 223
--------------
minimum time: 45.627 μs (0.00% GC)
median time: 74.875 μs (0.00% GC)
mean time: 85.175 μs (2.61% GC)
maximum time: 32.836 ms (33.09% GC)
--------------
samples: 10000
evals/sample: 1
julia> CUDAdrv.version()
v"10.1.0"
Expected behavior
A clear and concise description of what you expected to happen.
Environment details (please complete this section)
Details on Julia:
julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Additional context
Add any other context about the problem here.
Test with RTX 2080ti
Note that the model is quite small above. For some large models, the performance is similar between v2.1.0
and v1.7.3
. However, I'm still quite interested in why there's a significant difference with small models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment