Tags · JuliaGPU/CUDA.jl

v5.8.1

[Diff since v5.8.0](v5.8.0...v5.8.1)

**Merged pull requests:**
- CUSPARSE: Bugfixes for sparse vector broadcast. (#2780) (@maleadt)

May 14, 2025
a4a7af4
zip
tar.gz
Notes

v5.8.0

[Diff since v5.7.3](v5.7.3...v5.8.0)

**Merged pull requests:**
- SparseMatricesCSR Dispatch  (#2720) (@Abdelrahman912)
- Very rough implementation of bcast for CuSparseVector (#2733) (@kshyatt)
- Possible fix for #2745, change args in call to `cusparseCreateBsr` (#2747) (@manuelbb-upb)
- Simple tests for check and explain_eltype (#2748) (@kshyatt)
- Test for printing OutOfGPUMemoryError (#2749) (@kshyatt)
- Fix log_message pileup (#2750) (@fps)
- Test for parse_limit (#2751) (@kshyatt)
- unsafe_wrap for symbols (#2753) (@vchuravy)
- Use thread adoption to handle log messages. (#2754) (@maleadt)
- Add pre-commit configuration (#2755) (@vchuravy)
- Broaden check for eltypes to make sure we don't allow invalid stuff (#2756) (@kshyatt)
- Prefer aligned_sizeof (#2757) (@vchuravy)
- More array tests (#2758) (@kshyatt)
- A few more tests for CUSOLVER Q mats (#2759) (@kshyatt)
- More tests for CuArrayPtr (#2760) (@kshyatt)
- [CUSOLVER] Update gesvdp! (#2763) (@amontoison)
- Get rid of unneeded version checks (#2765) (@kshyatt)
- Remove second import of aligned_sizeof (#2767) (@vchuravy)
- CUSPARSE SpGEMM: Support algorithms 2 and 3 (#2769) (@maleadt)
- Update to CUDA 12.9. (#2772) (@maleadt)
- Fix SPGEMM_ALGOS setup (#2773) (@jonas-schulze)
- Support new functionality from KA 0.9.32 (#2774) (@michel2323)
- cuTENSOR: Preserve storage type when multiplying (#2775) (@christiangnrd)
- Update subpackages. (#2776) (@maleadt)
- Remove the unnecessary reshape during mapreduce. (#2778) (@maleadt)

**Closed issues:**
- Type conversions in broadcast fails when compiling with `always_inline=true` (#2722)
- cuDNN loses memory to log messages in Pluto.jl context (#2743)
- `Xgesvdp!` failure when only requesting singular values (#2761)
- CUDA 5.7.3 fails to precompile on Julia 1.12.0-beta2 (#2762)
- aligned_sizeof with an existing identifier (#2766)
- CUSPARSE_SPGEMM_ALG2 not working (#2768)
- `sum!` throws dispatch error beyond a threshold number of rows (#2777)

May 14, 2025
438722e
zip
tar.gz
Notes

v5.7.3

[Diff since v5.7.2](v5.7.2...v5.7.3)

**Merged pull requests:**
- Merge CSC/CSR broadcast kernels (#2731) (@kshyatt)
- GPUToolbox v0.2 take 2 (#2736) (@christiangnrd)
- Add dispatches to access device matrix data via SparseArrays interface (#2738) (@termi-official)
- More tests for CuContext (#2739) (@kshyatt)
- Fill in missing KA functionality (KA.functional + sparse matrices adaption from CUDAbackend) (#2740) (@Abdelrahman912)
- Small tests and changes for coverage (#2742) (@kshyatt)
- More tests and better error type for cusparse generic (#2744) (@kshyatt)
- Restore the descriptors in CUSPARSE (#2746) (@amontoison)

Apr 17, 2025
1a006ea
zip
tar.gz
Notes

v5.7.2

[Diff since v5.7.1](v5.7.1...v5.7.2)

**Merged pull requests:**
- Support disabling implicit synchronization (#2662) (@vchuravy)
- More tests and bugfixes for CUSOLVER (#2707) (@kshyatt)
- Set neutral element to zero for sparse reduce (#2710) (@kshyatt)
- Bugfix and tests for cusolver/base (#2712) (@kshyatt)
- Small fixes and missed tests for CUTENSORNET (#2713) (@kshyatt)
- Even more tests and small fixes for CUTENSORNET (#2715) (@kshyatt)
- Tests for CUSTATEVEC errors (#2716) (@kshyatt)
- Add compat entries for recent devices and toolkits. (#2717) (@maleadt)
- Split out copyto for texture arrays and add more tests (#2719) (@kshyatt)
- Add a docstring for pointer (#2721) (@maleadt)
- More CUSOLVER dense tests (#2723) (@kshyatt)
- Tests for some helper functions (#2724) (@kshyatt)
- More tests and bugfixing for CUSPARSE (#2725) (@kshyatt)
- Add more methods for all versions to unstick tests (#2726) (@kshyatt)

**Closed issues:**
- Ability to opt out of / improved automatic synchronization between tasks for shared array usage (#2617)
- maximum(abs, CuSparseMatrixCSR) returns Inf (#2705)
- mapreduce(f, op, A) for sparse A is wrong if f(0) =/= 0 (#2709)

Apr 7, 2025
57e06f9
zip
tar.gz
Notes

v5.7.1

[Diff since v5.7.0](v5.7.0...v5.7.1)

**Merged pull requests:**
- Tests for MIME printing and indexing (#2686) (@kshyatt)
- Loosen VERSION check for sketchy test (#2688) (@kshyatt)
- CompatHelper: bump compat for GPUToolbox to 0.2, (keep existing compat) (#2689) (@github-actions[bot])
- Even more sparse printing and tril/triu tests (#2692) (@kshyatt)
- Even more sparse tests (#2695) (@kshyatt)
- More tests and a matmatmul fix (#2697) (@kshyatt)
- Sparse conversion tests (#2698) (@kshyatt)
- Tests for descriptors (#2700) (@kshyatt)
- More tests for some missing kron methods (#2701) (@kshyatt)
- Don't duplicate const defs (#2703) (@kshyatt)
- Exclude device-side sorting code from coverage (#2704) (@kshyatt)
- More tests for CuRef/CuRefArray (#2706) (@kshyatt)
- Update Project.toml (#2708) (@kshyatt)

**Closed issues:**
- GC corruption on 1.10 during cusparse/reduce tests (#2027)
- Launch bounds interface (#2674)
- Precompilation errors: `ERROR: LoadError: invalid redefinition of constant CUSPARSE.CuSparseUpperOrUnitUpperTriangular` (#2690)

Mar 21, 2025
6180d2c
zip
tar.gz
Notes

v5.7.0

[Diff since v5.6.1](v5.6.1...v5.7.0)

**Merged pull requests:**
- Bugfix for batched gemv (#2481) (@kose-y)
- Split out level 3 gemm tests (#2610) (@kshyatt)
- Switch CUBLAS to device-side pointer mode (#2616) (@kshyatt)
- Elide bounds checks when kernels contains manual ones. (#2621) (@maleadt)
- Support passing symbols as arguments (#2624) (@vchuravy)
- Remove eager synchronization with HtoD copies. (#2625) (@maleadt)
- Don't prefetch on multi-device systems (#2626) (@vchuravy)
- Cooperative groups: add a boundscheck to avoid confusing inexact errors. (#2631) (@maleadt)
- NFC fixes (#2632) (@maleadt)
- Update to CUDA 12.8 (#2634) (@maleadt)
- [CUSOLVER] Update the test of syevBatched! (#2636) (@amontoison)
- Improve NSight Systems activation by inspecting the session list. (#2638) (@maleadt)
- [CUSPARSE] Support CuSparseMatrixBSR in the generic mm! (#2639) (@amontoison)
- [CUSOLVER] Support symmetric factorization without pivoting (#2640) (@amontoison)
- Wrap the Givens rotation methods (#2642) (@kshyatt)
- Remove kron methods and use those in GPUArrays (#2643) (@kshyatt)
- Add a simpler CuRefValue. (#2645) (@maleadt)
- Use GPUToolbox.jl (#2646) (@christiangnrd)
- DtoH copies: perform a nonblocking sync before calling into libcuda. (#2648) (@maleadt)
- Support Adjoint/Transpose -> COO (#2649) (@kshyatt)
- Support cuTENSOR contractors for 1D views (#2650) (@kshyatt)
- Re-enable mixed precision sparse mv (#2651) (@kshyatt)
- Proper support for similar on CuSparseMats (#2652) (@kshyatt)
- Test error throw for accumulate (#2656) (@kshyatt)
- Lots more tests for CUBLAS (#2657) (@kshyatt)
- MORE tests for CUBLAS and a bugfix (#2659) (@kshyatt)
- Add tests for gemmEx in fast math mode (#2660) (@kshyatt)
- More tests/better coverage for CUSPARSE (#2663) (@kshyatt)
- Fixes and tests for CuStateVec (#2664) (@kshyatt)
- Re-enable NVTX on Windows. (#2665) (@maleadt)
- Protect against occupancy calculations with very large numbers. (#2666) (@maleadt)
- Fixes and tests for COO indexing, exclude more kernels from coverage (#2668) (@kshyatt)
- Exclude lib*jl from coverage also for CUSTATEVEC, CUTENSOR, and CUTENSORNET (#2669) (@kshyatt)
- Even MORE tests and cov for CUBLAS (#2670) (@kshyatt)
- Fix and test for mgpu batch measure (#2671) (@kshyatt)
- Remove some invalid conversions and test more (#2673) (@kshyatt)
- Exclude more device side code in CUSPARSE (#2676) (@kshyatt)
- More tests, better errors, more exclusions for CUSPARSE (#2677) (@kshyatt)
- Try re-enabling the convolution tests (#2678) (@kshyatt)
- Fix Markdown formatting in overview.md (#2680) (@singularitti)
- Even more CUSPARSE tests (#2682) (@kshyatt)
- Fix inference of FFT plan creation (#2683) (@jipolanco)
- Some cudadrv tests (#2684) (@kshyatt)

**Closed issues:**
- Batched strided GEMM tests fail (#151)
- CuArrays.CURAND.curand missing methods (#141)
- Rationals behave badly (#118)
- Matrix inversion for CuArray (#116)
- Dot product of a complex CuArray with a real CuArray performance (#668)
- Sporadic cudnn/convolution test failures (#725)
- Support for LinearAlgebra.pinv (#883)
- Update mv!, mm!, sv! and sm! with the future release of CUPARSE (#1610)
- [CUSPARSE] changing size in similar returns a cpu array (#1667)
- Mix precision sparse mul is not dispatched correctly (#1760)
- Make CuRef(Value) behave more like Ref (#1803)
- [cuTENSOR] Issue when contracting views of CuArrays with cuTENSOR (#2407)
- versioninfo broken on Jetson Orin due to NVML lookup failure (#2542)
- CUBLAS: Improve concurrency using device pointer mode (#2571)
- NVML issues on Jetson Nano Orin (#2580)
- Passing Symbol as a an argument fails (#2590)
- Remove kron functionality (#2602)
- Disable or make automatic prefecthing of unified memory optional (#2618)
- Circular dependency in CUDA with Julia 1.10 (#2622)
- Regression with `nsys profile` and `CUDA.@profile` (#2629)
- PrecompileTools.jl with CUDA.jl causes kernels to fail to run on 1.11 (#2637)
- Support Adjoint Sparse Matrices for CuSparseMatrixCOO (#2647)
- Implicit stream sync in tasks serialise kernel execution (#2654)
- Broadcasting on arrays larger than `typemax(Int32)` yields truncation error (#2658)
- Problem with function in CUDA (#2667)
- CUDA.limit errors with `invalid argument (code 1, ERROR_INVALID_VALUE)` (#2672)
- CUDA.jl does not support tuples of UInt128 (#2675)
- Can not `permutedims!` CuArray with length larger that `typemax(Int32)` (#2679)
- Support for older GPUs (#2685)

Mar 11, 2025
c75b56f
zip
tar.gz
Notes

v5.6.1

[Diff since v5.6.0](v5.6.0...v5.6.1)

**Merged pull requests:**
- Support GPUArrays allocations cache (#2593) (@pxl-th)
- Fix `resize!` when `pool=none` is in use (#2613) (@luraess)
- Update to new alloc cache interface. (#2614) (@maleadt)
- Work around NVML issue on Jetson Orin. (#2620) (@maleadt)

**Closed issues:**
- Add strides, implement CUDA Array Interface (#1298)
- Restore broken CUBLAS test (#2584)
- Issues with multiple GPUs on a single node (#2615)

Jan 15, 2025
6ef1a3d
zip
tar.gz
Notes

v5.6.0

[Diff since v5.5.2](v5.5.2...v5.6.0)

CUDA.jl v5.6 is a relatively minor release, which the most important change being behind the scenes: [GPUArrays.jl v11 has switched to KernelAbstractions.jl](https://juliagpu.org/post/2025-01-07-gpuarrays-11/) (#2524).

- Update to CUDA 12.6.2 (#2512)
- CUSOLVER: support for `Xgeev!` (#2513), `XsyevBatched` (#2577), `gesv!` and `gels!` (#2406)
- CUBLAS: added multiplication of transpose / adjoint matrices by diagonal matrices (#2518, #2538)
- Improve handle cache performance in the presence of many short-lived tasks (#2583)
- CUFFT: Pre-allocate the buffer required for complex-to-real FFTs only once (#2578)
- Improved batched pointer conversion for very large batches (#2608)

- Fix `findall` with an empty CuArray (#2554)
- CUBLAS: Fix use of level 1 methods with strided arrays (#2528)
- CUSOLVER: Fix `Xgesvdr!` (#2556)
- Preserve the array buffer type with more linear algebra operations (#2534)
Work around LinearAlgebra.jl breakage in Julia 1.11.2 concerning generic triangular `(l/r)mul!` - (#2585)
- Fix ambiguity of `LinearAlgebra.dot` (#2569)
- Native RNG: Fixes when working with very large arrays (#2561)
- Avoid a deadlock due do union splitting in the `mapreduce` kernel (#2595)
- Fix pinning of resized CPU memory by automatically re-pinning (#2599)

**Merged pull requests:**
- [CUSOLVER] Interface gesv! and gels! (#2406) (@amontoison)
- Update wrappers for CUDA v12.6.2 (#2512) (@amontoison)
- [CUSOLVER] Interface Xgeev! (#2513) (@amontoison)
- Added multiplication of transpose / adjoint matrices by diagonal matrices  (#2518) (@amontoison)
- CompatHelper: bump compat for GPUCompiler to 1, (keep existing compat) (#2521) (@github-actions[bot])
- Adapt to GPUArrays.jl transition to KernelAbstractions.jl. (#2524) (@maleadt)
- Switch CI to 1.11. (#2525) (@maleadt)
- CUTENSOR: Reduce amount of broadcasts compiled during tests. (#2527) (@maleadt)
- CUBLAS: Don't use BLAS1 wrappers for strided arrays, only vectors. (#2528) (@maleadt)
- Clarify the synchronize(ctx)/device_synchronize() docstrings (#2532) (@JamesWrigley)
- Issue #2533: Preserving the buffer type in linear algebra (#2534) (@kmp5VT)
- Clarify description of how `LocalPreferences.toml` is generated in the docs (#2535) (@glwagner)
- Adapt to JuliaGPU/GPUArrays.jl#567. (#2537) (@maleadt)
- Removed allocations for transpose/adjoint - diagonal multiplications (#2538) (@RedRussianBear)
- Consistent use of Nsight Compute (#2541) (@huiyuxie)
- Fix formatting in profiling docs page (#2543) (@efaulhaber)
- Fix typo in EnzymeCoreExt.jl (#2550) (@wsmoses)
- Enhance warning under a profiler (#2552) (@huiyuxie)
- Fix findall with an empty CuArray of Bool (#2554) (@amontoison)
- [CUSOLVER] Fix Xgesvdr! (#2556) (@amontoison)
- Test restore Enzyme.jl (#2557) (@wsmoses)
- Native RNG fixes for very large arrays (#2561) (@maleadt)
- [Enzyme] Mark launch_configuration as inactive (#2563) (@wsmoses)
- Update EnzymeCoreExt.jl (#2565) (@simenhu)
- Fix ambiguity of LinearAlgebra.dot (#2569) (@amontoison)
- [CUSOLVER] Add more tests for the dense SVD (#2574) (@amontoison)
- [CUSOLVER] Interface XsyevBatched (#2577) (@amontoison)
- [CUFFT] Preallocate a buffer for complex-to-real FFT (#2578) (@amontoison)
- Run the GC when failing to find a handle, but lots are active. (#2583) (@maleadt)
- Work around LinearAlgebra.jl breakage in 1.11.2. (#2585) (@maleadt)
- mapreduce: avoid deadlock by forcing the accumulator type. (#2596) (@maleadt)
- Switch to GitHub Actions-based benchmarks. (#2597) (@maleadt)
- Re-pin variable sized memory (#2599) (@jipolanco)
- Enzyme: add make_zero of cuarrays (#2600) (@wsmoses)
- Update cache.jl (#2604) (@jarbus)
- Enzyme: mark device_sync as non-differentiable [only downstream] (#2605) (@wsmoses)
- Move strided batch pointer conversion to GPU (#2608) (@THargreaves)
- Split linalg tests into multiple files (#2609) (@kshyatt)

**Closed issues:**
- Inference failure with sort(::CuMatrix) after loading MLDatasets (#2258)
- Kron Support for CuSparseMatrixCSC (#2370)
- Broadcasting a function returning an anonymous function with a constructor over CUDA arrays fails to compile, "not isbits" (#2514)
- CuArray view has different variable type outside x inside the cuda kernel (#2516)
- Can't build cuDNN on centos7.8 (#2517)
- Precompile errors (#2519)
- Precompile errors (#2520)
- Error returned from CUDA function in CUDA-aware MPI multi-GPU test (#2522)
- Broadcasting over random static array errors on Julia 1.11 (#2523)
- `gemm_strided_batched` only using strided CUDA kernel when first matrix is transposed (#2529)
- CUDA runtime libraries are loaded from a system path due to LD_LIBRARY_PATH being set (#2530)
- [Bug] `UnifiedMemory` buffer changes during LinearAlgebra operations (#2533)
- Improve system library warning when running under profiler (#2540)
- Local CUDA settings not propagated to Pkg.test (#2545)
- Out of Memory when working with Distributed for Small Matricies (#2548)
- findall is not working with an empty vector of bool (#2553)
- CUDA code does not return when running under VSC Debugging mode (#2558)
- dot is quite slow in multinest Arrays (#2559)
- UndefVarError: `backend` not defined in `GPUArrays` (#2564)
- view() returns CuArray instead of view for 1-D CuArrays (#2566)
- dot ambiguity (#2568)
- InvalidIRError thrown only if critical function is not previously compiled (#2573)
- circular dependency during precompilation (#2579)
- Sparse MatVec Is Nondeterministic? (#2582)
- CUDA triggers long Circular dependency list (#2586)
- Release v5.5.3 for GPUArray v11? (#2587)
- 'dot' gives different answers when viewing rather than slicing multidimensional arrays (#2589)
- Scalar indexing when performing `kron` on two `CuVector`s (#2591)
- Faster strided-batched to batched wrapper (#2592)
- Error when copying data to pinned and resized CPU array (#2594)
- mapreducedim! size-dependent fail when narrowing float element types (#2595)
- Missing `Enzyme.make_zero` in Enzyme extension leads to incorrect behaviour (#2598)
- 'ArgumentError: array must be non-empty' when attempting to pop idle handles from HandleCache (#2603)
- Do a release as current one doesn't support `GPUArrays` v11 (#2606)

Jan 8, 2025
fc952a3
zip
tar.gz
Notes

v5.5.2

[Diff since v5.5.1](v5.5.1...v5.5.2)

**Merged pull requests:**
- Fix type of AbstractFFTs.Plan for real-complex FFTs (#2504) (@jipolanco)
- Profiler: Demangle kernel names. (#2505) (@maleadt)
- Bump CUDNN. (#2507) (@maleadt)
- Restore Enzyme checks (#2508) (@wsmoses)

Sep 26, 2024
a1db081
zip
tar.gz
Notes

v5.5.1

Enzyme: Adapt to pending version breaking update (#2490)

[only downstream]

Sep 22, 2024
3b05baf
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v5.8.1

v5.8.0

v5.7.3

v5.7.2

v5.7.1

v5.7.0

v5.6.1

v5.6.0

v5.5.2

v5.5.1

Tags: JuliaGPU/CUDA.jl