-
Notifications
You must be signed in to change notification settings - Fork 241
Constant memory support #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
One alternative is to create a device-side |
3c0923a
to
31b0619
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #552 +/- ##
==========================================
- Coverage 79.69% 79.56% -0.13%
==========================================
Files 122 124 +2
Lines 7356 7429 +73
==========================================
+ Hits 5862 5911 +49
- Misses 1494 1518 +24 ☔ View full report in Codecov by Sentry. |
I realized that only works for arguments, the conversion machinery doesn't get called for globals or captured variables (#67). Concerning initialization, we definitely don't want to do this on every launch. Modules are cached, so the values should persist (if we don't consider global device memory for now). That means we only need to initialize after compiling a module, or when re-initializing. But to do so we need both mappings (module->gv for the initial compilation, gv->module for re-initialization). Let's maybe keep it simpler for now, requiring either a literal value in the constant constructor (in which case you can inline it when creating the GV, so that you don't even need an external initializer), or that the user does the |
Another way of doing initialisation I've been thinking about which retains all of the current functionality without some of the overhead:
The only issue I see with this approach is that there's not really a way of freeing the memory inside of |
Global state is annoying (esp. in the presence of multiple threads, tasks, devices), so I'd rather not introduce it if we can avoid it. I implemented the more convenient c = CuConstantMemory{Int}([42])
# const initializer put in the LLVM IR
function kernel(...)
c[...]
end
@cuda kernel(...) c = CuConstantMemory{Int}(undef, (1,))
# external initializer
function kernel(...)
c[...]
end
kernel_obj = @cuda delayed=true kernel(...)
constant_memory(kernel_obj)[c] = [42] # or some other function, doesn't really matter
kernel_obj(...) |
91db6b0
to
06fe10b
Compare
eb14117
to
ca6de0a
Compare
Squashed and rebased. |
src/memory_constant.jl
Outdated
arr = constant_memory_initializer[constant_memory_name].value | ||
@assert !isnothing(arr) "calling kernel containing garbage collected constant memory" | ||
|
||
flattened_arr = reduce(vcat, arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> reduce(vcat, [1])
1
...
ERROR: MethodError: no method matching LLVM.ConstantArray(::Int32, ::LLVM.Context)
Closest candidates are:
LLVM.ConstantArray(::AbstractArray{Float64, N}, ::LLVM.Context) where N at /home/tim/Julia/pkg/LLVM/src/core/value/constant.jl:163
LLVM.ConstantArray(::AbstractArray{Float32, N}, ::LLVM.Context) where N at /home/tim/Julia/pkg/LLVM/src/core/value/constant.jl:161
LLVM.ConstantArray(::AbstractArray{Float16, N}, ::LLVM.Context) where N at /home/tim/Julia/pkg/LLVM/src/core/value/constant.jl:159
ca6de0a
to
f690590
Compare
I pushed a WIP commit illustrating what I meant by eagerly initializing to avoid initializing as part of the compiler: This makes it so that LLVM can optimize given the actual constant values. There's an issue though: Julia discards the linkage when importing these variables via function main()
Base.llvmcall(
("""@constant_memory = addrspace(4) externally_initialized global [1 x i32] [i32 42]
define void @entry() {
ret void
}
""", "entry"), Nothing, Tuple{})
end
main()
##
using InteractiveUtils
@code_llvm dump_module=true main()
##
using LLVM
# get the method instance
world = Base.get_world_counter()
meth = which(main, Tuple{})
sig = Base.signature_type(main, Tuple{})::Type
(ti, env) = ccall(:jl_type_intersection_with_env, Any,
(Any, Any), sig, meth.sig)::Core.SimpleVector
meth = Base.func_for_method_checked(meth, ti, env)
method_instance = ccall(:jl_specializations_get_linfo, Ref{Core.MethodInstance},
(Any, Any, Any, UInt), meth, ti, env, world)
# set-up the compiler interface
params = Base.CodegenParams()
# generate IR
native_code = ccall(:jl_create_native, Ptr{Cvoid},
(Vector{Core.MethodInstance}, Base.CodegenParams, Cint),
[method_instance], params, #=extern policy=# 1)
@assert native_code != C_NULL
llvm_mod_ref = ccall(:jl_get_llvm_module, LLVM.API.LLVMModuleRef,
(Ptr{Cvoid},), native_code)
@assert llvm_mod_ref != C_NULL
llvm_mod = LLVM.Module(llvm_mod_ref)
println(llvm_mod) The (implicitly) external symbol becomes Anyway, once this is solved the approach is a bit better, I think. We could decide to make non-undef ConstantArrays not externally visible then, which would make the SRA transformation legal. I'm not sure whether we should keep them externally initialized: if not, it would also be legal to inline the constants in the function, but in that case the constant memory hardware wouldn't be used anymore. But maybe that's an improvement when LLVM can deduce the indices anyway. Finally, we could get rid of the initializer map by putting the values in the device counterpart's type (as a |
src/memory_constant.jl
Outdated
|
||
function CuConstantMemory(value::Array{T,N}) where {T,N} | ||
Base.isbitstype(T) || throw(ArgumentError("CuConstantMemory only supports bits types")) | ||
name = gensym("constant_memory") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A problem with this is that every invocation of a function that launches a kernel with constant memory, will result in a recompilation (because the name is tied to the instance).
Based on empirical evidence from playing around with CUDA C code, |
f690590
to
02ffcaa
Compare
Did some more hacking, but I'm not entirely happy with the result yet. Previously, you I also tried getting rid of the global map in favor of passing the initializer as So I might just revert all that and go back to your design, but I wanted to try somethings out first :-) |
a1af3c1
to
46f6109
Compare
5d585c4
to
c850163
Compare
This PR adds support for constant memory. A non-exhaustive list of stuff to think about and/or work on:
CuConstantMemory
is anisbits
type, which in turn makes initialisation more convoluted, due to not being able to store an array field . Alternatively some compiler work can be done to allow non-isbits
types as CUDA kernel arguments.WeakKeyDict
to not needlessly storeCuConstantMemory
that should have been GC'd