Skip to content

Tags: JuliaGPU/CUDA.jl

Tags

v5.8.1

Toggle v5.8.1's commit message
[Diff since v5.8.0](v5.8.0...v5.8.1)

**Merged pull requests:**
- CUSPARSE: Bugfixes for sparse vector broadcast. (#2780) (@maleadt)

v5.8.0

Toggle v5.8.0's commit message
[Diff since v5.7.3](v5.7.3...v5.8.0)

**Merged pull requests:**
- SparseMatricesCSR Dispatch  (#2720) (@Abdelrahman912)
- Very rough implementation of bcast for CuSparseVector (#2733) (@kshyatt)
- Possible fix for #2745, change args in call to `cusparseCreateBsr` (#2747) (@manuelbb-upb)
- Simple tests for check and explain_eltype (#2748) (@kshyatt)
- Test for printing OutOfGPUMemoryError (#2749) (@kshyatt)
- Fix log_message pileup (#2750) (@fps)
- Test for parse_limit (#2751) (@kshyatt)
- unsafe_wrap for symbols (#2753) (@vchuravy)
- Use thread adoption to handle log messages. (#2754) (@maleadt)
- Add pre-commit configuration (#2755) (@vchuravy)
- Broaden check for eltypes to make sure we don't allow invalid stuff (#2756) (@kshyatt)
- Prefer aligned_sizeof (#2757) (@vchuravy)
- More array tests (#2758) (@kshyatt)
- A few more tests for CUSOLVER Q mats (#2759) (@kshyatt)
- More tests for CuArrayPtr (#2760) (@kshyatt)
- [CUSOLVER] Update gesvdp! (#2763) (@amontoison)
- Get rid of unneeded version checks (#2765) (@kshyatt)
- Remove second import of aligned_sizeof (#2767) (@vchuravy)
- CUSPARSE SpGEMM: Support algorithms 2 and 3 (#2769) (@maleadt)
- Update to CUDA 12.9. (#2772) (@maleadt)
- Fix SPGEMM_ALGOS setup (#2773) (@jonas-schulze)
- Support new functionality from KA 0.9.32 (#2774) (@michel2323)
- cuTENSOR: Preserve storage type when multiplying (#2775) (@christiangnrd)
- Update subpackages. (#2776) (@maleadt)
- Remove the unnecessary reshape during mapreduce. (#2778) (@maleadt)

**Closed issues:**
- Type conversions in broadcast fails when compiling with `always_inline=true` (#2722)
- cuDNN loses memory to log messages in Pluto.jl context (#2743)
- `Xgesvdp!` failure when only requesting singular values (#2761)
- CUDA 5.7.3 fails to precompile on Julia 1.12.0-beta2 (#2762)
- aligned_sizeof with an existing identifier (#2766)
- CUSPARSE_SPGEMM_ALG2 not working (#2768)
- `sum!` throws dispatch error beyond a threshold number of rows (#2777)

v5.7.3

Toggle v5.7.3's commit message
[Diff since v5.7.2](v5.7.2...v5.7.3)

**Merged pull requests:**
- Merge CSC/CSR broadcast kernels (#2731) (@kshyatt)
- GPUToolbox v0.2 take 2 (#2736) (@christiangnrd)
- Add dispatches to access device matrix data via SparseArrays interface (#2738) (@termi-official)
- More tests for CuContext (#2739) (@kshyatt)
- Fill in missing KA functionality (KA.functional + sparse matrices adaption from CUDAbackend) (#2740) (@Abdelrahman912)
- Small tests and changes for coverage (#2742) (@kshyatt)
- More tests and better error type for cusparse generic (#2744) (@kshyatt)
- Restore the descriptors in CUSPARSE (#2746) (@amontoison)

v5.7.2

Toggle v5.7.2's commit message
[Diff since v5.7.1](v5.7.1...v5.7.2)

**Merged pull requests:**
- Support disabling implicit synchronization (#2662) (@vchuravy)
- More tests and bugfixes for CUSOLVER (#2707) (@kshyatt)
- Set neutral element to zero for sparse reduce (#2710) (@kshyatt)
- Bugfix and tests for cusolver/base (#2712) (@kshyatt)
- Small fixes and missed tests for CUTENSORNET (#2713) (@kshyatt)
- Even more tests and small fixes for CUTENSORNET (#2715) (@kshyatt)
- Tests for CUSTATEVEC errors (#2716) (@kshyatt)
- Add compat entries for recent devices and toolkits. (#2717) (@maleadt)
- Split out copyto for texture arrays and add more tests (#2719) (@kshyatt)
- Add a docstring for pointer (#2721) (@maleadt)
- More CUSOLVER dense tests (#2723) (@kshyatt)
- Tests for some helper functions (#2724) (@kshyatt)
- More tests and bugfixing for CUSPARSE (#2725) (@kshyatt)
- Add more methods for all versions to unstick tests (#2726) (@kshyatt)

**Closed issues:**
- Ability to opt out of / improved automatic synchronization between tasks for shared array usage (#2617)
- maximum(abs, CuSparseMatrixCSR) returns Inf (#2705)
- mapreduce(f, op, A) for sparse A is wrong if f(0) =/= 0 (#2709)

v5.7.1

Toggle v5.7.1's commit message
[Diff since v5.7.0](v5.7.0...v5.7.1)

**Merged pull requests:**
- Tests for MIME printing and indexing (#2686) (@kshyatt)
- Loosen VERSION check for sketchy test (#2688) (@kshyatt)
- CompatHelper: bump compat for GPUToolbox to 0.2, (keep existing compat) (#2689) (@github-actions[bot])
- Even more sparse printing and tril/triu tests (#2692) (@kshyatt)
- Even more sparse tests (#2695) (@kshyatt)
- More tests and a matmatmul fix (#2697) (@kshyatt)
- Sparse conversion tests (#2698) (@kshyatt)
- Tests for descriptors (#2700) (@kshyatt)
- More tests for some missing kron methods (#2701) (@kshyatt)
- Don't duplicate const defs (#2703) (@kshyatt)
- Exclude device-side sorting code from coverage (#2704) (@kshyatt)
- More tests for CuRef/CuRefArray (#2706) (@kshyatt)
- Update Project.toml (#2708) (@kshyatt)

**Closed issues:**
- GC corruption on 1.10 during cusparse/reduce tests (#2027)
- Launch bounds interface (#2674)
- Precompilation errors: `ERROR: LoadError: invalid redefinition of constant CUSPARSE.CuSparseUpperOrUnitUpperTriangular` (#2690)

v5.7.0

Toggle v5.7.0's commit message
[Diff since v5.6.1](v5.6.1...v5.7.0)

**Merged pull requests:**
- Bugfix for batched gemv (#2481) (@kose-y)
- Split out level 3 gemm tests (#2610) (@kshyatt)
- Switch CUBLAS to device-side pointer mode (#2616) (@kshyatt)
- Elide bounds checks when kernels contains manual ones. (#2621) (@maleadt)
- Support passing symbols as arguments (#2624) (@vchuravy)
- Remove eager synchronization with HtoD copies. (#2625) (@maleadt)
- Don't prefetch on multi-device systems (#2626) (@vchuravy)
- Cooperative groups: add a boundscheck to avoid confusing inexact errors. (#2631) (@maleadt)
- NFC fixes (#2632) (@maleadt)
- Update to CUDA 12.8 (#2634) (@maleadt)
- [CUSOLVER] Update the test of syevBatched! (#2636) (@amontoison)
- Improve NSight Systems activation by inspecting the session list. (#2638) (@maleadt)
- [CUSPARSE] Support CuSparseMatrixBSR in the generic mm! (#2639) (@amontoison)
- [CUSOLVER] Support symmetric factorization without pivoting (#2640) (@amontoison)
- Wrap the Givens rotation methods (#2642) (@kshyatt)
- Remove kron methods and use those in GPUArrays (#2643) (@kshyatt)
- Add a simpler CuRefValue. (#2645) (@maleadt)
- Use GPUToolbox.jl (#2646) (@christiangnrd)
- DtoH copies: perform a nonblocking sync before calling into libcuda. (#2648) (@maleadt)
- Support Adjoint/Transpose -> COO (#2649) (@kshyatt)
- Support cuTENSOR contractors for 1D views (#2650) (@kshyatt)
- Re-enable mixed precision sparse mv (#2651) (@kshyatt)
- Proper support for similar on CuSparseMats (#2652) (@kshyatt)
- Test error throw for accumulate (#2656) (@kshyatt)
- Lots more tests for CUBLAS (#2657) (@kshyatt)
- MORE tests for CUBLAS and a bugfix (#2659) (@kshyatt)
- Add tests for gemmEx in fast math mode (#2660) (@kshyatt)
- More tests/better coverage for CUSPARSE (#2663) (@kshyatt)
- Fixes and tests for CuStateVec (#2664) (@kshyatt)
- Re-enable NVTX on Windows. (#2665) (@maleadt)
- Protect against occupancy calculations with very large numbers. (#2666) (@maleadt)
- Fixes and tests for COO indexing, exclude more kernels from coverage (#2668) (@kshyatt)
- Exclude lib*jl from coverage also for CUSTATEVEC, CUTENSOR, and CUTENSORNET (#2669) (@kshyatt)
- Even MORE tests and cov for CUBLAS (#2670) (@kshyatt)
- Fix and test for mgpu batch measure (#2671) (@kshyatt)
- Remove some invalid conversions and test more (#2673) (@kshyatt)
- Exclude more device side code in CUSPARSE (#2676) (@kshyatt)
- More tests, better errors, more exclusions for CUSPARSE (#2677) (@kshyatt)
- Try re-enabling the convolution tests (#2678) (@kshyatt)
- Fix Markdown formatting in overview.md (#2680) (@singularitti)
- Even more CUSPARSE tests (#2682) (@kshyatt)
- Fix inference of FFT plan creation (#2683) (@jipolanco)
- Some cudadrv tests (#2684) (@kshyatt)

**Closed issues:**
- Batched strided GEMM tests fail (#151)
- CuArrays.CURAND.curand missing methods (#141)
- Rationals behave badly (#118)
- Matrix inversion for CuArray (#116)
- Dot product of a complex CuArray with a real CuArray performance (#668)
- Sporadic cudnn/convolution test failures (#725)
- Support for LinearAlgebra.pinv (#883)
- Update mv!, mm!, sv! and sm! with the future release of CUPARSE (#1610)
- [CUSPARSE] changing size in similar returns a cpu array (#1667)
- Mix precision sparse mul is not dispatched correctly (#1760)
- Make CuRef(Value) behave more like Ref (#1803)
- [cuTENSOR] Issue when contracting views of CuArrays with cuTENSOR (#2407)
- versioninfo broken on Jetson Orin due to NVML lookup failure (#2542)
- CUBLAS: Improve concurrency using device pointer mode (#2571)
- NVML issues on Jetson Nano Orin (#2580)
- Passing Symbol as a an argument fails (#2590)
- Remove kron functionality (#2602)
- Disable or make automatic prefecthing of unified memory optional (#2618)
- Circular dependency in CUDA with Julia 1.10 (#2622)
- Regression with `nsys profile` and `CUDA.@profile` (#2629)
- PrecompileTools.jl with CUDA.jl causes kernels to fail to run on 1.11 (#2637)
- Support Adjoint Sparse Matrices for CuSparseMatrixCOO (#2647)
- Implicit stream sync in tasks serialise kernel execution (#2654)
- Broadcasting on arrays larger than `typemax(Int32)` yields truncation error (#2658)
- Problem with function in CUDA (#2667)
- CUDA.limit errors with `invalid argument (code 1, ERROR_INVALID_VALUE)` (#2672)
- CUDA.jl does not support tuples of UInt128 (#2675)
- Can not `permutedims!` CuArray with length larger that `typemax(Int32)` (#2679)
- Support for older GPUs (#2685)

v5.6.1

Toggle v5.6.1's commit message
[Diff since v5.6.0](v5.6.0...v5.6.1)

**Merged pull requests:**
- Support GPUArrays allocations cache (#2593) (@pxl-th)
- Fix `resize!` when `pool=none` is in use (#2613) (@luraess)
- Update to new alloc cache interface. (#2614) (@maleadt)
- Work around NVML issue on Jetson Orin. (#2620) (@maleadt)

**Closed issues:**
- Add strides, implement CUDA Array Interface (#1298)
- Restore broken CUBLAS test (#2584)
- Issues with multiple GPUs on a single node (#2615)

v5.6.0

Toggle v5.6.0's commit message
[Diff since v5.5.2](v5.5.2...v5.6.0)

CUDA.jl v5.6 is a relatively minor release, which the most important change being behind the scenes: [GPUArrays.jl v11 has switched to KernelAbstractions.jl](https://juliagpu.org/post/2025-01-07-gpuarrays-11/) (#2524).

- Update to CUDA 12.6.2 (#2512)
- CUSOLVER: support for `Xgeev!` (#2513), `XsyevBatched` (#2577), `gesv!` and `gels!` (#2406)
- CUBLAS: added multiplication of transpose / adjoint matrices by diagonal matrices (#2518, #2538)
- Improve handle cache performance in the presence of many short-lived tasks (#2583)
- CUFFT: Pre-allocate the buffer required for complex-to-real FFTs only once (#2578)
- Improved batched pointer conversion for very large batches (#2608)

- Fix `findall` with an empty CuArray (#2554)
- CUBLAS: Fix use of level 1 methods with strided arrays (#2528)
- CUSOLVER: Fix `Xgesvdr!` (#2556)
- Preserve the array buffer type with more linear algebra operations (#2534)
Work around LinearAlgebra.jl breakage in Julia 1.11.2 concerning generic triangular `(l/r)mul!` - (#2585)
- Fix ambiguity of `LinearAlgebra.dot` (#2569)
- Native RNG: Fixes when working with very large arrays (#2561)
- Avoid a deadlock due do union splitting in the `mapreduce` kernel (#2595)
- Fix pinning of resized CPU memory by automatically re-pinning (#2599)

**Merged pull requests:**
- [CUSOLVER] Interface gesv! and gels! (#2406) (@amontoison)
- Update wrappers for CUDA v12.6.2 (#2512) (@amontoison)
- [CUSOLVER] Interface Xgeev! (#2513) (@amontoison)
- Added multiplication of transpose / adjoint matrices by diagonal matrices  (#2518) (@amontoison)
- CompatHelper: bump compat for GPUCompiler to 1, (keep existing compat) (#2521) (@github-actions[bot])
- Adapt to GPUArrays.jl transition to KernelAbstractions.jl. (#2524) (@maleadt)
- Switch CI to 1.11. (#2525) (@maleadt)
- CUTENSOR: Reduce amount of broadcasts compiled during tests. (#2527) (@maleadt)
- CUBLAS: Don't use BLAS1 wrappers for strided arrays, only vectors. (#2528) (@maleadt)
- Clarify the synchronize(ctx)/device_synchronize() docstrings (#2532) (@JamesWrigley)
- Issue #2533: Preserving the buffer type in linear algebra (#2534) (@kmp5VT)
- Clarify description of how `LocalPreferences.toml` is generated in the docs (#2535) (@glwagner)
- Adapt to JuliaGPU/GPUArrays.jl#567. (#2537) (@maleadt)
- Removed allocations for transpose/adjoint - diagonal multiplications (#2538) (@RedRussianBear)
- Consistent use of Nsight Compute (#2541) (@huiyuxie)
- Fix formatting in profiling docs page (#2543) (@efaulhaber)
- Fix typo in EnzymeCoreExt.jl (#2550) (@wsmoses)
- Enhance warning under a profiler (#2552) (@huiyuxie)
- Fix findall with an empty CuArray of Bool (#2554) (@amontoison)
- [CUSOLVER] Fix Xgesvdr! (#2556) (@amontoison)
- Test restore Enzyme.jl (#2557) (@wsmoses)
- Native RNG fixes for very large arrays (#2561) (@maleadt)
- [Enzyme] Mark launch_configuration as inactive (#2563) (@wsmoses)
- Update EnzymeCoreExt.jl (#2565) (@simenhu)
- Fix ambiguity of LinearAlgebra.dot (#2569) (@amontoison)
- [CUSOLVER] Add more tests for the dense SVD (#2574) (@amontoison)
- [CUSOLVER] Interface XsyevBatched (#2577) (@amontoison)
- [CUFFT] Preallocate a buffer for complex-to-real FFT (#2578) (@amontoison)
- Run the GC when failing to find a handle, but lots are active. (#2583) (@maleadt)
- Work around LinearAlgebra.jl breakage in 1.11.2. (#2585) (@maleadt)
- mapreduce: avoid deadlock by forcing the accumulator type. (#2596) (@maleadt)
- Switch to GitHub Actions-based benchmarks. (#2597) (@maleadt)
- Re-pin variable sized memory (#2599) (@jipolanco)
- Enzyme: add make_zero of cuarrays (#2600) (@wsmoses)
- Update cache.jl (#2604) (@jarbus)
- Enzyme: mark device_sync as non-differentiable [only downstream] (#2605) (@wsmoses)
- Move strided batch pointer conversion to GPU (#2608) (@THargreaves)
- Split linalg tests into multiple files (#2609) (@kshyatt)

**Closed issues:**
- Inference failure with sort(::CuMatrix) after loading MLDatasets (#2258)
- Kron Support for CuSparseMatrixCSC (#2370)
- Broadcasting a function returning an anonymous function with a constructor over CUDA arrays fails to compile, "not isbits" (#2514)
- CuArray view has different variable type outside x inside the cuda kernel (#2516)
- Can't build cuDNN on centos7.8 (#2517)
- Precompile errors (#2519)
- Precompile errors (#2520)
- Error returned from CUDA function in CUDA-aware MPI multi-GPU test (#2522)
- Broadcasting over random static array errors on Julia 1.11 (#2523)
- `gemm_strided_batched` only using strided CUDA kernel when first matrix is transposed (#2529)
- CUDA runtime libraries are loaded from a system path due to LD_LIBRARY_PATH being set (#2530)
- [Bug] `UnifiedMemory` buffer changes during LinearAlgebra operations (#2533)
- Improve system library warning when running under profiler (#2540)
- Local CUDA settings not propagated to Pkg.test (#2545)
- Out of Memory when working with Distributed for Small Matricies (#2548)
- findall is not working with an empty vector of bool (#2553)
- CUDA code does not return when running under VSC Debugging mode (#2558)
- dot is quite slow in multinest Arrays (#2559)
- UndefVarError: `backend` not defined in `GPUArrays` (#2564)
- view() returns CuArray instead of view for 1-D CuArrays (#2566)
- dot ambiguity (#2568)
- InvalidIRError thrown only if critical function is not previously compiled (#2573)
- circular dependency during precompilation (#2579)
- Sparse MatVec Is Nondeterministic? (#2582)
- CUDA triggers long Circular dependency list (#2586)
- Release v5.5.3 for GPUArray v11? (#2587)
- 'dot' gives different answers when viewing rather than slicing multidimensional arrays (#2589)
- Scalar indexing when performing `kron` on two `CuVector`s (#2591)
- Faster strided-batched to batched wrapper (#2592)
- Error when copying data to pinned and resized CPU array (#2594)
- mapreducedim! size-dependent fail when narrowing float element types (#2595)
- Missing `Enzyme.make_zero` in Enzyme extension leads to incorrect behaviour (#2598)
- 'ArgumentError: array must be non-empty' when attempting to pop idle handles from HandleCache (#2603)
- Do a release as current one doesn't support `GPUArrays` v11 (#2606)

v5.5.2

Toggle v5.5.2's commit message
[Diff since v5.5.1](v5.5.1...v5.5.2)

**Merged pull requests:**
- Fix type of AbstractFFTs.Plan for real-complex FFTs (#2504) (@jipolanco)
- Profiler: Demangle kernel names. (#2505) (@maleadt)
- Bump CUDNN. (#2507) (@maleadt)
- Restore Enzyme checks (#2508) (@wsmoses)

v5.5.1

Toggle v5.5.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Enzyme: Adapt to pending version breaking update (#2490)

[only downstream]