CUDA Toolkit 12.4.0 `tuple` incompatibility

##### System information (version)


- OpenCV => 4.9.0
- Operating System / Platform => Windows 64 Bit
- Compiler => Visual Studio 2022

##### Detailed description

opencv with CUDA support cannot be built using CUDA Toolkit 12.4.0.

While CUDA Toolkit 12.3.2 uses thrust version 2.2.0 (https://docs.nvidia.com/cuda/archive/12.3.2/cuda-toolkit-release-notes/index.html), CUDA Toolkit 12.4.0 updates to thrust version 2.3.1 (https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html). In thrust version 2.3.0, the tuple implementation was replaced with a standard tuple implementaton (https://github.com/NVIDIA/cccl/pull/262). Notably, this changes the definition from a 10-parameter template to a variable-parameter template. So instead of a tuple of _n_ items being padded out with _10 - n_ null types to always have 10 template parameters, it now only has _n_ template parameters. This makes the function templates in cudev specified with 10 template parameters per tuple no longer viable for tuples not of size 10.

An example of one such function template that's no longer viable, `cv::cudev::blockReduce`:

https://github.com/opencv/opencv_contrib/blob/6b5142ff657ca676ab35233556b49a532e75e2b7/modules/cudev/include/opencv2/cudev/block/reduce.hpp#L68-L81

An example of an error I encounter:

```
[build] Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp(379): error : no instance of overloaded function "cv::cudev::blockReduce" matches the argument list [Z:\dev\1\opencv\out\build\user\modules\world\opencv_world.vcxproj]
[build]               argument types are: (cuda::std::__4::tuple<volatile int *, volatile int *>, cuda::std::__4::tuple<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum<int>, cv::cudev::maximum<int>>)
[build]                 blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval), tie(mymin, mymax), tid, make_tuple(minOp, maxOp));
[build]                 ^
[build]   Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/block/reduce.hpp(72): note #3327-D: candidate function template "cv::cudev::blockReduce<N,P0,P1,P2,P3,P4,P5,P6,P7,P8,P9,R0,R1,R2,R3,R4,R5,R6,R7,R8,R9,Op0,Op1,Op2,Op3,Op4,Op5,Op6,Op7,Op8,Op9>(const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> &, const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> &, uint, const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9> &)" failed deduction
[build]     __declspec(__device__) __forceinline void blockReduce(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
[build]                                               ^
[build]   Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/block/reduce.hpp(63): note #3327-D: candidate function template "cv::cudev::blockReduce<N,T,Op>(volatile T *, T &, uint, const Op &)" failed deduction
[build]     __declspec(__device__) __forceinline void blockReduce(volatile T* smem, T& val, uint tid, const Op& op)
[build]                                               ^
[build]             detected during:
[build]               instantiation of "void cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, src_type, work_type>::reduceGrid<BLOCK_SIZE>(work_type *, int) [with src_type=uchar, work_type=int, BLOCK_SIZE=256]" at line 412
[build]               instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,BLOCK_SIZE,PATCH_X,PATCH_Y,SrcPtr,ResType,MaskPtr>(SrcPtr, ResType *, MaskPtr, int, int) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, int>, BLOCK_SIZE=256, PATCH_X=4, PATCH_Y=4, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 421
[build]               instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, int>, Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 460
[build]               instantiation of "void cv::cudev::grid_reduce_detail::minMaxVal<Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 206 of Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
[build]               instantiation of "void cv::cudev::gridFindMinMaxVal_<Policy,SrcPtr,ResType>(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=int]" at line 349 of Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
[build]               instantiation of "void cv::cudev::gridFindMinMaxVal(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=int]" at line 68 of Z:\dev\1\opencv_contrib\modules\cudaarithm\src\cuda\minmax.cu
[build]               instantiation of "void <unnamed>::minMaxImpl<T,R>(const cv::cuda::GpuMat &, const cv::cuda::GpuMat &, cv::cuda::GpuMat &, cv::cuda::Stream &) [with T=uchar, R=int]" at line 92 of Z:\dev\1\opencv_contrib\modules\cudaarithm\src\cuda\minmax.cu
```

The first candidate but nonviable function template shown in the error message is the one linked above, which was viable and selected in previous CUDA Toolkit versions.

I think that all templates specifying 10 template parameters per tuple can be updated to work with the new tuple definition by replacing each set of 10 template parameters with a parameter pack. I think this should still be compatible with the old tuple definition, as well. For example, I think this would be a viable implementation of `cv::cudev::blockReduce`:

```cpp
template <int N, typename... P, typename... R, class... Op>
__device__ __forceinline__ void blockReduce(const tuple<P...>& smem,
                                            const tuple<R...>& val,
                                            uint tid,
                                            const tuple<Op...>& op)
{
    block_reduce_detail::Dispatcher<N>::reductor::template reduce<
        const tuple<P...>&,
        const tuple<R...>&,
        const tuple<Op...>&>(smem, val, tid, op);
}
```

##### Steps to reproduce

Attempt to build cudev using CUDA Toolkit 12.4.0. I suspect that this error will be observed with any combination of OpenCV version, OS, platform, and compiler (that are modern enough to not encounter some other error first).

##### Issue submission checklist

 - [x] I report the issue, it's not a question
   
 - [x] I checked the problem with documentation, FAQ, open issues,
       forum.opencv.org, Stack Overflow, etc and have not found any solution
   
 - [x] I updated to the latest OpenCV version and the issue is still there
   
 - [x] There is reproducer code and related data files: videos, images, onnx, etc

	template <int N,
	typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9,
	typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9,
	class Op0, class Op1, class Op2, class Op3, class Op4, class Op5, class Op6, class Op7, class Op8, class Op9>
	__device__ __forceinline__ void blockReduce(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
	const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val,
	uint tid,
	const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>& op)
	{
	block_reduce_detail::Dispatcher<N>::reductor::template reduce<
	const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>&,
	const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>&,
	const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>&>(smem, val, tid, op);
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Toolkit 12.4.0 `tuple` incompatibility #3690

System information (version)

Detailed description

Steps to reproduce

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA Toolkit 12.4.0 tuple incompatibility #3690

Description

System information (version)

Detailed description

Steps to reproduce

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

CUDA Toolkit 12.4.0 `tuple` incompatibility #3690