Tags: JuliaGPU/CUDA.jl
Tags
[Diff since v5.7.3](v5.7.3...v5.8.0) **Merged pull requests:** - SparseMatricesCSR Dispatch (#2720) (@Abdelrahman912) - Very rough implementation of bcast for CuSparseVector (#2733) (@kshyatt) - Possible fix for #2745, change args in call to `cusparseCreateBsr` (#2747) (@manuelbb-upb) - Simple tests for check and explain_eltype (#2748) (@kshyatt) - Test for printing OutOfGPUMemoryError (#2749) (@kshyatt) - Fix log_message pileup (#2750) (@fps) - Test for parse_limit (#2751) (@kshyatt) - unsafe_wrap for symbols (#2753) (@vchuravy) - Use thread adoption to handle log messages. (#2754) (@maleadt) - Add pre-commit configuration (#2755) (@vchuravy) - Broaden check for eltypes to make sure we don't allow invalid stuff (#2756) (@kshyatt) - Prefer aligned_sizeof (#2757) (@vchuravy) - More array tests (#2758) (@kshyatt) - A few more tests for CUSOLVER Q mats (#2759) (@kshyatt) - More tests for CuArrayPtr (#2760) (@kshyatt) - [CUSOLVER] Update gesvdp! (#2763) (@amontoison) - Get rid of unneeded version checks (#2765) (@kshyatt) - Remove second import of aligned_sizeof (#2767) (@vchuravy) - CUSPARSE SpGEMM: Support algorithms 2 and 3 (#2769) (@maleadt) - Update to CUDA 12.9. (#2772) (@maleadt) - Fix SPGEMM_ALGOS setup (#2773) (@jonas-schulze) - Support new functionality from KA 0.9.32 (#2774) (@michel2323) - cuTENSOR: Preserve storage type when multiplying (#2775) (@christiangnrd) - Update subpackages. (#2776) (@maleadt) - Remove the unnecessary reshape during mapreduce. (#2778) (@maleadt) **Closed issues:** - Type conversions in broadcast fails when compiling with `always_inline=true` (#2722) - cuDNN loses memory to log messages in Pluto.jl context (#2743) - `Xgesvdp!` failure when only requesting singular values (#2761) - CUDA 5.7.3 fails to precompile on Julia 1.12.0-beta2 (#2762) - aligned_sizeof with an existing identifier (#2766) - CUSPARSE_SPGEMM_ALG2 not working (#2768) - `sum!` throws dispatch error beyond a threshold number of rows (#2777)
[Diff since v5.7.2](v5.7.2...v5.7.3) **Merged pull requests:** - Merge CSC/CSR broadcast kernels (#2731) (@kshyatt) - GPUToolbox v0.2 take 2 (#2736) (@christiangnrd) - Add dispatches to access device matrix data via SparseArrays interface (#2738) (@termi-official) - More tests for CuContext (#2739) (@kshyatt) - Fill in missing KA functionality (KA.functional + sparse matrices adaption from CUDAbackend) (#2740) (@Abdelrahman912) - Small tests and changes for coverage (#2742) (@kshyatt) - More tests and better error type for cusparse generic (#2744) (@kshyatt) - Restore the descriptors in CUSPARSE (#2746) (@amontoison)
[Diff since v5.7.1](v5.7.1...v5.7.2) **Merged pull requests:** - Support disabling implicit synchronization (#2662) (@vchuravy) - More tests and bugfixes for CUSOLVER (#2707) (@kshyatt) - Set neutral element to zero for sparse reduce (#2710) (@kshyatt) - Bugfix and tests for cusolver/base (#2712) (@kshyatt) - Small fixes and missed tests for CUTENSORNET (#2713) (@kshyatt) - Even more tests and small fixes for CUTENSORNET (#2715) (@kshyatt) - Tests for CUSTATEVEC errors (#2716) (@kshyatt) - Add compat entries for recent devices and toolkits. (#2717) (@maleadt) - Split out copyto for texture arrays and add more tests (#2719) (@kshyatt) - Add a docstring for pointer (#2721) (@maleadt) - More CUSOLVER dense tests (#2723) (@kshyatt) - Tests for some helper functions (#2724) (@kshyatt) - More tests and bugfixing for CUSPARSE (#2725) (@kshyatt) - Add more methods for all versions to unstick tests (#2726) (@kshyatt) **Closed issues:** - Ability to opt out of / improved automatic synchronization between tasks for shared array usage (#2617) - maximum(abs, CuSparseMatrixCSR) returns Inf (#2705) - mapreduce(f, op, A) for sparse A is wrong if f(0) =/= 0 (#2709)
[Diff since v5.7.0](v5.7.0...v5.7.1) **Merged pull requests:** - Tests for MIME printing and indexing (#2686) (@kshyatt) - Loosen VERSION check for sketchy test (#2688) (@kshyatt) - CompatHelper: bump compat for GPUToolbox to 0.2, (keep existing compat) (#2689) (@github-actions[bot]) - Even more sparse printing and tril/triu tests (#2692) (@kshyatt) - Even more sparse tests (#2695) (@kshyatt) - More tests and a matmatmul fix (#2697) (@kshyatt) - Sparse conversion tests (#2698) (@kshyatt) - Tests for descriptors (#2700) (@kshyatt) - More tests for some missing kron methods (#2701) (@kshyatt) - Don't duplicate const defs (#2703) (@kshyatt) - Exclude device-side sorting code from coverage (#2704) (@kshyatt) - More tests for CuRef/CuRefArray (#2706) (@kshyatt) - Update Project.toml (#2708) (@kshyatt) **Closed issues:** - GC corruption on 1.10 during cusparse/reduce tests (#2027) - Launch bounds interface (#2674) - Precompilation errors: `ERROR: LoadError: invalid redefinition of constant CUSPARSE.CuSparseUpperOrUnitUpperTriangular` (#2690)
[Diff since v5.6.1](v5.6.1...v5.7.0) **Merged pull requests:** - Bugfix for batched gemv (#2481) (@kose-y) - Split out level 3 gemm tests (#2610) (@kshyatt) - Switch CUBLAS to device-side pointer mode (#2616) (@kshyatt) - Elide bounds checks when kernels contains manual ones. (#2621) (@maleadt) - Support passing symbols as arguments (#2624) (@vchuravy) - Remove eager synchronization with HtoD copies. (#2625) (@maleadt) - Don't prefetch on multi-device systems (#2626) (@vchuravy) - Cooperative groups: add a boundscheck to avoid confusing inexact errors. (#2631) (@maleadt) - NFC fixes (#2632) (@maleadt) - Update to CUDA 12.8 (#2634) (@maleadt) - [CUSOLVER] Update the test of syevBatched! (#2636) (@amontoison) - Improve NSight Systems activation by inspecting the session list. (#2638) (@maleadt) - [CUSPARSE] Support CuSparseMatrixBSR in the generic mm! (#2639) (@amontoison) - [CUSOLVER] Support symmetric factorization without pivoting (#2640) (@amontoison) - Wrap the Givens rotation methods (#2642) (@kshyatt) - Remove kron methods and use those in GPUArrays (#2643) (@kshyatt) - Add a simpler CuRefValue. (#2645) (@maleadt) - Use GPUToolbox.jl (#2646) (@christiangnrd) - DtoH copies: perform a nonblocking sync before calling into libcuda. (#2648) (@maleadt) - Support Adjoint/Transpose -> COO (#2649) (@kshyatt) - Support cuTENSOR contractors for 1D views (#2650) (@kshyatt) - Re-enable mixed precision sparse mv (#2651) (@kshyatt) - Proper support for similar on CuSparseMats (#2652) (@kshyatt) - Test error throw for accumulate (#2656) (@kshyatt) - Lots more tests for CUBLAS (#2657) (@kshyatt) - MORE tests for CUBLAS and a bugfix (#2659) (@kshyatt) - Add tests for gemmEx in fast math mode (#2660) (@kshyatt) - More tests/better coverage for CUSPARSE (#2663) (@kshyatt) - Fixes and tests for CuStateVec (#2664) (@kshyatt) - Re-enable NVTX on Windows. (#2665) (@maleadt) - Protect against occupancy calculations with very large numbers. (#2666) (@maleadt) - Fixes and tests for COO indexing, exclude more kernels from coverage (#2668) (@kshyatt) - Exclude lib*jl from coverage also for CUSTATEVEC, CUTENSOR, and CUTENSORNET (#2669) (@kshyatt) - Even MORE tests and cov for CUBLAS (#2670) (@kshyatt) - Fix and test for mgpu batch measure (#2671) (@kshyatt) - Remove some invalid conversions and test more (#2673) (@kshyatt) - Exclude more device side code in CUSPARSE (#2676) (@kshyatt) - More tests, better errors, more exclusions for CUSPARSE (#2677) (@kshyatt) - Try re-enabling the convolution tests (#2678) (@kshyatt) - Fix Markdown formatting in overview.md (#2680) (@singularitti) - Even more CUSPARSE tests (#2682) (@kshyatt) - Fix inference of FFT plan creation (#2683) (@jipolanco) - Some cudadrv tests (#2684) (@kshyatt) **Closed issues:** - Batched strided GEMM tests fail (#151) - CuArrays.CURAND.curand missing methods (#141) - Rationals behave badly (#118) - Matrix inversion for CuArray (#116) - Dot product of a complex CuArray with a real CuArray performance (#668) - Sporadic cudnn/convolution test failures (#725) - Support for LinearAlgebra.pinv (#883) - Update mv!, mm!, sv! and sm! with the future release of CUPARSE (#1610) - [CUSPARSE] changing size in similar returns a cpu array (#1667) - Mix precision sparse mul is not dispatched correctly (#1760) - Make CuRef(Value) behave more like Ref (#1803) - [cuTENSOR] Issue when contracting views of CuArrays with cuTENSOR (#2407) - versioninfo broken on Jetson Orin due to NVML lookup failure (#2542) - CUBLAS: Improve concurrency using device pointer mode (#2571) - NVML issues on Jetson Nano Orin (#2580) - Passing Symbol as a an argument fails (#2590) - Remove kron functionality (#2602) - Disable or make automatic prefecthing of unified memory optional (#2618) - Circular dependency in CUDA with Julia 1.10 (#2622) - Regression with `nsys profile` and `CUDA.@profile` (#2629) - PrecompileTools.jl with CUDA.jl causes kernels to fail to run on 1.11 (#2637) - Support Adjoint Sparse Matrices for CuSparseMatrixCOO (#2647) - Implicit stream sync in tasks serialise kernel execution (#2654) - Broadcasting on arrays larger than `typemax(Int32)` yields truncation error (#2658) - Problem with function in CUDA (#2667) - CUDA.limit errors with `invalid argument (code 1, ERROR_INVALID_VALUE)` (#2672) - CUDA.jl does not support tuples of UInt128 (#2675) - Can not `permutedims!` CuArray with length larger that `typemax(Int32)` (#2679) - Support for older GPUs (#2685)
[Diff since v5.6.0](v5.6.0...v5.6.1) **Merged pull requests:** - Support GPUArrays allocations cache (#2593) (@pxl-th) - Fix `resize!` when `pool=none` is in use (#2613) (@luraess) - Update to new alloc cache interface. (#2614) (@maleadt) - Work around NVML issue on Jetson Orin. (#2620) (@maleadt) **Closed issues:** - Add strides, implement CUDA Array Interface (#1298) - Restore broken CUBLAS test (#2584) - Issues with multiple GPUs on a single node (#2615)
[Diff since v5.5.2](v5.5.2...v5.6.0) CUDA.jl v5.6 is a relatively minor release, which the most important change being behind the scenes: [GPUArrays.jl v11 has switched to KernelAbstractions.jl](https://juliagpu.org/post/2025-01-07-gpuarrays-11/) (#2524). - Update to CUDA 12.6.2 (#2512) - CUSOLVER: support for `Xgeev!` (#2513), `XsyevBatched` (#2577), `gesv!` and `gels!` (#2406) - CUBLAS: added multiplication of transpose / adjoint matrices by diagonal matrices (#2518, #2538) - Improve handle cache performance in the presence of many short-lived tasks (#2583) - CUFFT: Pre-allocate the buffer required for complex-to-real FFTs only once (#2578) - Improved batched pointer conversion for very large batches (#2608) - Fix `findall` with an empty CuArray (#2554) - CUBLAS: Fix use of level 1 methods with strided arrays (#2528) - CUSOLVER: Fix `Xgesvdr!` (#2556) - Preserve the array buffer type with more linear algebra operations (#2534) Work around LinearAlgebra.jl breakage in Julia 1.11.2 concerning generic triangular `(l/r)mul!` - (#2585) - Fix ambiguity of `LinearAlgebra.dot` (#2569) - Native RNG: Fixes when working with very large arrays (#2561) - Avoid a deadlock due do union splitting in the `mapreduce` kernel (#2595) - Fix pinning of resized CPU memory by automatically re-pinning (#2599) **Merged pull requests:** - [CUSOLVER] Interface gesv! and gels! (#2406) (@amontoison) - Update wrappers for CUDA v12.6.2 (#2512) (@amontoison) - [CUSOLVER] Interface Xgeev! (#2513) (@amontoison) - Added multiplication of transpose / adjoint matrices by diagonal matrices (#2518) (@amontoison) - CompatHelper: bump compat for GPUCompiler to 1, (keep existing compat) (#2521) (@github-actions[bot]) - Adapt to GPUArrays.jl transition to KernelAbstractions.jl. (#2524) (@maleadt) - Switch CI to 1.11. (#2525) (@maleadt) - CUTENSOR: Reduce amount of broadcasts compiled during tests. (#2527) (@maleadt) - CUBLAS: Don't use BLAS1 wrappers for strided arrays, only vectors. (#2528) (@maleadt) - Clarify the synchronize(ctx)/device_synchronize() docstrings (#2532) (@JamesWrigley) - Issue #2533: Preserving the buffer type in linear algebra (#2534) (@kmp5VT) - Clarify description of how `LocalPreferences.toml` is generated in the docs (#2535) (@glwagner) - Adapt to JuliaGPU/GPUArrays.jl#567. (#2537) (@maleadt) - Removed allocations for transpose/adjoint - diagonal multiplications (#2538) (@RedRussianBear) - Consistent use of Nsight Compute (#2541) (@huiyuxie) - Fix formatting in profiling docs page (#2543) (@efaulhaber) - Fix typo in EnzymeCoreExt.jl (#2550) (@wsmoses) - Enhance warning under a profiler (#2552) (@huiyuxie) - Fix findall with an empty CuArray of Bool (#2554) (@amontoison) - [CUSOLVER] Fix Xgesvdr! (#2556) (@amontoison) - Test restore Enzyme.jl (#2557) (@wsmoses) - Native RNG fixes for very large arrays (#2561) (@maleadt) - [Enzyme] Mark launch_configuration as inactive (#2563) (@wsmoses) - Update EnzymeCoreExt.jl (#2565) (@simenhu) - Fix ambiguity of LinearAlgebra.dot (#2569) (@amontoison) - [CUSOLVER] Add more tests for the dense SVD (#2574) (@amontoison) - [CUSOLVER] Interface XsyevBatched (#2577) (@amontoison) - [CUFFT] Preallocate a buffer for complex-to-real FFT (#2578) (@amontoison) - Run the GC when failing to find a handle, but lots are active. (#2583) (@maleadt) - Work around LinearAlgebra.jl breakage in 1.11.2. (#2585) (@maleadt) - mapreduce: avoid deadlock by forcing the accumulator type. (#2596) (@maleadt) - Switch to GitHub Actions-based benchmarks. (#2597) (@maleadt) - Re-pin variable sized memory (#2599) (@jipolanco) - Enzyme: add make_zero of cuarrays (#2600) (@wsmoses) - Update cache.jl (#2604) (@jarbus) - Enzyme: mark device_sync as non-differentiable [only downstream] (#2605) (@wsmoses) - Move strided batch pointer conversion to GPU (#2608) (@THargreaves) - Split linalg tests into multiple files (#2609) (@kshyatt) **Closed issues:** - Inference failure with sort(::CuMatrix) after loading MLDatasets (#2258) - Kron Support for CuSparseMatrixCSC (#2370) - Broadcasting a function returning an anonymous function with a constructor over CUDA arrays fails to compile, "not isbits" (#2514) - CuArray view has different variable type outside x inside the cuda kernel (#2516) - Can't build cuDNN on centos7.8 (#2517) - Precompile errors (#2519) - Precompile errors (#2520) - Error returned from CUDA function in CUDA-aware MPI multi-GPU test (#2522) - Broadcasting over random static array errors on Julia 1.11 (#2523) - `gemm_strided_batched` only using strided CUDA kernel when first matrix is transposed (#2529) - CUDA runtime libraries are loaded from a system path due to LD_LIBRARY_PATH being set (#2530) - [Bug] `UnifiedMemory` buffer changes during LinearAlgebra operations (#2533) - Improve system library warning when running under profiler (#2540) - Local CUDA settings not propagated to Pkg.test (#2545) - Out of Memory when working with Distributed for Small Matricies (#2548) - findall is not working with an empty vector of bool (#2553) - CUDA code does not return when running under VSC Debugging mode (#2558) - dot is quite slow in multinest Arrays (#2559) - UndefVarError: `backend` not defined in `GPUArrays` (#2564) - view() returns CuArray instead of view for 1-D CuArrays (#2566) - dot ambiguity (#2568) - InvalidIRError thrown only if critical function is not previously compiled (#2573) - circular dependency during precompilation (#2579) - Sparse MatVec Is Nondeterministic? (#2582) - CUDA triggers long Circular dependency list (#2586) - Release v5.5.3 for GPUArray v11? (#2587) - 'dot' gives different answers when viewing rather than slicing multidimensional arrays (#2589) - Scalar indexing when performing `kron` on two `CuVector`s (#2591) - Faster strided-batched to batched wrapper (#2592) - Error when copying data to pinned and resized CPU array (#2594) - mapreducedim! size-dependent fail when narrowing float element types (#2595) - Missing `Enzyme.make_zero` in Enzyme extension leads to incorrect behaviour (#2598) - 'ArgumentError: array must be non-empty' when attempting to pop idle handles from HandleCache (#2603) - Do a release as current one doesn't support `GPUArrays` v11 (#2606)
[Diff since v5.5.1](v5.5.1...v5.5.2) **Merged pull requests:** - Fix type of AbstractFFTs.Plan for real-complex FFTs (#2504) (@jipolanco) - Profiler: Demangle kernel names. (#2505) (@maleadt) - Bump CUDNN. (#2507) (@maleadt) - Restore Enzyme checks (#2508) (@wsmoses)
PreviousNext