Implementation of Computer Vision algorithms with Nvidia CUDA support.
- Benchmarked on this high-resolution 29192x5140 picture by Umit Cukurel.
- Execution time in seconds.
Algorithm | CPU (OpenCV) | CUDA-CV (GTX-1060) |
---|---|---|
RGB2GRAY | 0.112 | 0.058 |
EDGE::SIMPLE | 0.018 | |
EDGE::SOBEL | 3.932 | 0.030 |
EDGE::PREWITT | 0.074 | |
FILTER::BOX 3X3 | 0.149 | 0.019 |
- OpenCV 3.4 - For converting images to Mat. That's it.
- CUDA
Entire project is built using CMake (3.9) with MSVC 2017 generator rules on Windows. I will be adding *ix for support after core algorithms have been implemented.
- Download OpenCV 3.4 source from the github repo. Lets call this directory as
OPENCV
. - Start CMake GUI (I use GUI on Windows because many dependencies have to be manually linked).
- Point Source to
OPENCV/src
- Point Build to
OPENCV/build
- Configure. (Here after the config file is generated, you can ignore the modules that you dont need. This project only needs opencv core and highgui.)
- Set Compiler to MSVC 2017 64. (Mine is 64bit OS.)
- Generate.
- Add
OPENCV/build/install/<your_platform>/vc15/lib
to systemPath
variable. - Set
OpenCV_DIR
toOPENCV/build/install
- The .exe downloaded from website is just a compressed file. So, after the setup files have been extracted make a copy of the folder (lets call this
CUDA
). You'll need a few files from this later for CUDA support in MSVC 2017. - Install all the components except Visual Studio Integration.
- After the installation is complete copy files under
CUDA/CUDAVisualStudioIntegration\extras\visual_studio_integration\MSBuildExtensions
toC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\BuildCustomizations
- Now you have successfully installed and integrated CUDA support in Visual Studio.
- You might want to edit host_config.h and change the upper limit for MSC_VER greatr than 1911 like this:
#if _MSC_VER < 1600 || _MSC_VER > 1955
- Thread blocks are laid into Grids.
- All thread blocks are then scheduled and passed over to SMs.
- SM execute warps of 32 threads. Each thread executes each executing on 1 core.
- 1060 has 128 CUDA Cores, thus each clock cycle can execute 4 warps in each SM.
- Do not include any
.cu
files in the cpp header files.h
. NVCC mistakes.cu
as cpp files on account of it being called int he header files and uses native c compiler for compilation instead of nvcc itself compiling it.
- Feature Matching.
- Euclidean Distance Transform.