-
Notifications
You must be signed in to change notification settings - Fork 241
Insights: JuliaGPU/CUDA.jl
Overview
Could not load contribution data
Please try again later
10 Pull requests merged by 7 people
-
Remove the unnecessary reshape during mapreduce.
#2778 merged
May 12, 2025 -
cuTENSOR: Preserve storage type when multiplying
#2775 merged
May 9, 2025 -
Fix SPGEMM_ALGOS setup
#2773 merged
May 8, 2025 -
SparseMatricesCSR Dispatch
#2720 merged
May 8, 2025 -
Update to CUDA 12.9.
#2772 merged
May 8, 2025 -
Support new functionality from KA 0.9.32
#2774 merged
May 8, 2025 -
unsafe_wrap for symbols
#2753 merged
May 8, 2025 -
Remove second import of aligned_sizeof
#2767 merged
May 7, 2025 -
CUSPARSE SpGEMM: Support algorithms 2 and 3
#2769 merged
May 7, 2025 -
Get rid of unneeded version checks
#2765 merged
May 7, 2025
2 Pull requests opened by 2 people
-
Update subpackages.
#2776 opened
May 9, 2025 -
add KA.get_backend(dev)
#2779 opened
May 12, 2025
3 Issues closed by 1 person
-
`sum!` throws dispatch error beyond a threshold number of rows
#2777 closed
May 12, 2025 -
aligned_sizeof with an existing identifier
#2766 closed
May 7, 2025 -
CUSPARSE_SPGEMM_ALG2 not working
#2768 closed
May 7, 2025
1 Issue opened by 1 person
-
Error in testset libraries/cublas/level3/gemm and core/device/intrinsics/atomics
#2770 opened
May 5, 2025
48 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Prefix scan using warp primitives and fallback for non primitive types
#287 commented on
May 6, 2025 • 0 new comments -
Constant memory support
#552 commented on
May 6, 2025 • 0 new comments -
Add GPU to CPU RPC mechanism
#567 commented on
May 6, 2025 • 0 new comments -
add custom sum tutorial
#666 commented on
May 6, 2025 • 0 new comments -
Implement sync_threads using an unaligned barrier.
#798 commented on
May 6, 2025 • 0 new comments -
Add support for strided cufft
#912 commented on
May 6, 2025 • 0 new comments -
updating docs to include matrix-vector multiply example
#918 commented on
May 6, 2025 • 0 new comments -
Latency improvements.
#1066 commented on
May 6, 2025 • 0 new comments -
Register BFloat16
#1092 commented on
May 6, 2025 • 0 new comments -
Fixed reducedim operations for CuQRPackedQ
#1118 commented on
May 6, 2025 • 0 new comments -
Add a hostcall interface
#1140 commented on
May 6, 2025 • 0 new comments -
Calling similar(...) on AbstractCuSparseArray with dims
#1184 commented on
May 6, 2025 • 0 new comments -
2-arg `show`
#1197 commented on
May 6, 2025 • 0 new comments -
improve sparse matrix conversions
#1215 commented on
May 6, 2025 • 0 new comments -
[POC] Support for cuFile.
#1235 commented on
May 6, 2025 • 0 new comments -
Add explicit strides to CuArray and CuDeviceArray.
#1322 commented on
May 6, 2025 • 0 new comments -
Support ordering argument for atomics
#1393 commented on
May 6, 2025 • 0 new comments -
WMMA TensorFloat32 (TF32)
#1419 commented on
May 6, 2025 • 0 new comments -
WMMA BFloat16 (BF16)
#1425 commented on
May 6, 2025 • 0 new comments -
WMMA Float64
#1426 commented on
May 6, 2025 • 0 new comments -
CUSPARSE: Better error msg for unsupported sparse mm
#1467 commented on
May 6, 2025 • 0 new comments -
limit csc/csr/bsr sparse conversion index to be cint & fix a few conversion bugs
#1563 commented on
May 6, 2025 • 0 new comments -
Add scoped atomic_thread_fence
#1644 commented on
May 6, 2025 • 0 new comments -
Switch secrets management to cryptic.
#1687 commented on
May 6, 2025 • 0 new comments -
Support for qr of strided inputs (non-contiguous views)
#1764 commented on
May 6, 2025 • 0 new comments -
Adding copyto for non-contiguous matrices and vectors
#1778 commented on
May 6, 2025 • 0 new comments -
Use Atomix
#1790 commented on
May 6, 2025 • 0 new comments -
Add wrappers for NVPERF
#1823 commented on
May 6, 2025 • 0 new comments -
Attempted to add method definitions for supporting the SubArray type …
#1830 commented on
May 6, 2025 • 0 new comments -
Add an experimental opaque closure type.
#1853 commented on
May 6, 2025 • 0 new comments -
WIP: Add an index typevar to CuDeviceArray.
#1895 commented on
May 6, 2025 • 0 new comments -
Add contract through FastmathOverlays.jl
#2037 commented on
May 6, 2025 • 0 new comments -
Support FFT adjoint plans and test
#2073 commented on
May 6, 2025 • 0 new comments -
Use TaskLocalValues
#2075 commented on
May 6, 2025 • 0 new comments -
docs: perf tips: deemphasize `assume` in favor of UnsafeAssume.jl
#2181 commented on
May 6, 2025 • 0 new comments -
Add a dispatch for LinearAlgebra.norm2
#2302 commented on
May 6, 2025 • 0 new comments -
Use PrecompileTools to warmup CUDA.jl
#2325 commented on
May 6, 2025 • 0 new comments -
High Level Wrapper for Fused Matmul + Bias + Activation
#2360 commented on
May 6, 2025 • 0 new comments -
WIP: Native I/O.
#2485 commented on
May 6, 2025 • 0 new comments -
[CUSPARSE] Fix constructor of sparse empty matrices
#2575 commented on
May 6, 2025 • 0 new comments -
Directed rounding
#2576 commented on
May 6, 2025 • 0 new comments -
make CUDA randn work with Zygote
#2581 commented on
May 6, 2025 • 0 new comments -
Allow disabling the linking of libdevice in CUDACompilerParams
#2611 commented on
May 6, 2025 • 0 new comments -
Try fast linear indexes for KA
#2612 commented on
May 6, 2025 • 0 new comments -
Wrap and test some more Float16 intrinsics
#2644 commented on
May 6, 2025 • 0 new comments -
Use invariant.load for ldg
#2655 commented on
May 6, 2025 • 0 new comments -
Added new api and fixed type errors in cuStateVec
#2728 commented on
May 6, 2025 • 0 new comments -
add fastmath flag
#2732 commented on
May 6, 2025 • 0 new comments