Open
Description
openedon Jan 15, 2024
While trying to reduce load time for some of our packages, I timed using CUDA
. I noticed that some dependencies are relatively heavy for what they provide. In particular, DataFrames
and PrettyTables
directly account for more than 20 % of the load time (without considering their dependencies). While this is not necessarily a lot of time (~1s in this example below), it seems to me that DataFrames
and PrettyTables
are exlcusively used in profile.jl
and are not required for the operations of CUDA.jl
. This might be a low-hanging fruit to reduce load times for CUDA.jl
and downstream packages (removing the dependencies, or maybe with package extensions).
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.9.4 (2023-11-14)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> @time_imports using CUDA
2.7 ms CEnum
9.3 ms Preferences
0.3 ms JLLWrappers
195.0 ms LLVMExtra_jll 98.84% compilation time (98% recompilation)
30.3 ms LLVM
0.3 ms ExprTools
24.9 ms TimerOutputs
0.3 ms Scratch
174.0 ms GPUCompiler 4.37% compilation time
0.4 ms Adapt
0.1 ms Reexport
1.4 ms GPUArraysCore
0.6 ms Statistics
74.7 ms GPUArrays
0.2 ms Requires
3.9 ms BFloat16s
0.2 ms LLVM → BFloat16sExt
0.1 ms LLVMLoopInfo
75.0 ms CUDA_Driver_jll 37.98% compilation time
6.4 ms CUDA_Runtime_jll
140.1 ms CUDA_Runtime_Discovery
47.3 ms FixedPointNumbers
66.2 ms ColorTypes
48.7 ms Colors
1.2 ms NVTX_jll
0.6 ms JuliaNVTXCallbacks_jll
21.9 ms NVTX
9.7 ms RandomNumbers
2.9 ms Random123
0.2 ms DataValueInterfaces
1.2 ms DataAPI
0.2 ms IteratorInterfaceExtensions
0.1 ms TableTraits
25.7 ms Tables
0.2 ms PrecompileTools
10.3 ms StringManipulation
15.4 ms Crayons
0.6 ms LaTeXStrings
277.5 ms PrettyTables
0.3 ms Compat
0.2 ms Compat → CompatLinearAlgebraExt
59.5 ms DataStructures
1.1 ms SortingAlgorithms
17.9 ms PooledArrays
8.7 ms Missings
2.3 ms InvertedIndices
24.8 ms SentinelArrays
26.2 ms Parsers
6.7 ms InlineStrings
696.0 ms DataFrames
33.3 ms AbstractFFTs
0.4 ms AbstractFFTs → AbstractFFTsTestExt
4.5 ms UnsafeAtomics
11.3 ms Atomix
8.7 ms MacroTools
3.5 ms StaticArraysCore
407.8 ms StaticArrays
0.3 ms Adapt → AdaptStaticArraysExt
0.2 ms StaticArrays → StaticArraysStatisticsExt
3.4 ms UnsafeAtomicsLLVM
21.0 ms KernelAbstractions
1390.2 ms CUDA 1.40% compilation time
With
julia> @time using CUDA
4.312124 seconds (6.52 M allocations: 441.875 MiB, 4.30% gc time, 5.83% compilation time: 75% of which was recompilation)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment