Your task is now to combine the MPI parallelization as described for CPU-only code with the OpenMP offloading.
You can base your work on hybrid MPI + OpenMP code and the previous work on offloading with single GPU.
In order to achieve a working multi-GPU code, you should:
- Assign MPI tasks to devices 2a. Copy the data between host and device before and after the MPI communication or 2b. Pass device pointer to MPI routines
- Use OpenMP offload constructs in the
evolve
routine