Just opening a placeholder for a possible HIP implementation for AMD, in parallel to CUDA for Nvidia
In principle, only very little code should be significantly different between CUDA and HIP and can be easily ifdef'ed in the same way it is now done for CUDA vs C++