Starting point for GPU accelerated python libraries
Adapted from original work from https://github.com/PWhiddy/pybind11-cuda
Present work uses modern CMake/Cuda approach
Cuda
Python 3.6 or greater
Cmake >= 3.12 (for CUDA support and the new FindPython3 module)
mkdir build; cd build
# provide a default cuda hardware architecture to build for
export CUDAFLAGS="-arch=sm_50"
cmake ..
make
Test it with
python3 test_mul.py
gpu_library.so and test_mul.py must be in the same folder. Alternatively you can path to gpu_library.so to your PYTHONPATH env variable.
Compiles out of the box with cmake
Numpy integration
C++ Templating for composable kernels with generic data types
Originally based on https://github.com/torstem/demo-cuda-pybind11