[DRAFT] Allow GPU code path to be executed on CPU #299

samhatfield · 2025-08-05T10:22:26Z

This PR implements a "host backend" for the GPU code path. Essentially I have made some minor tweaks to the Fortran and implemented host-side C++ backends for the Fourier and Legendre transforms, using the existing LAPACK and FFTW functionality brought in by the CPU version.

Along the way I have tried to separate out "host OpenMP" and "accelerator OpenMP" functionalities into separate CMake features, as I didn't want to affect the CPU build when switching off OpenMP for the GPU (actually also running on CPU) build. When this PR is working we should properly scrutinise how all the various features and builds interact.

Right now the host GPU backend is not working numerically, hence why this is still a draft. I'm almost there though... see this screenshot of the transformed spherical harmonic used in the benchmark case:

It seems to be correct, except for some latitudes which have obvious error patterns. Really weird. Not sure what's going on there, but I will keep testing.

samhatfield · 2025-08-05T10:24:09Z

I forgot to mention why I'm doing this! The GPU code path has a number of data layout changes and general optimisations that we expect will improve also CPU performance. In the long term, we may therefore want to use this code path for operations instead of the "legacy" CPU version. But we need to do a proper side-by-side comparison first.

Also, this will be useful for debugging the behaviour of the GPU version when it's actually running on GPU.

samhatfield · 2025-08-05T16:04:30Z

The latest commit seems to fix the above error.

No way we can have -DENABLE_OMPOFF=ON 😳

I don't think this works yet.

For some reason out-of-place FFTs give incorrect numbers but only for (consistently) random latitudes, like lat 14. Why? No idea.

This allows one to reenable OpenMP directives when building the GPU host backend. I am testing whether we can use the OpenMP target directives also to accelerate host-side loops.

samhatfield added enhancement New feature or request gpu labels Aug 5, 2025

github-actions bot added the contributor label Aug 5, 2025

samhatfield force-pushed the host_side_gpu branch 2 times, most recently from 160ec9a to 2f44473 Compare August 12, 2025 12:17

samhatfield added 7 commits August 18, 2025 13:49

Introduce skeleton for "host-side GPU emulation"

87cae0d

Rename OMPOFF

3ef379a

No way we can have -DENABLE_OMPOFF=ON 😳

Implement GEMMs for host-side GPU version

150368c

Implement FFTs for host-side GPU version

784a65d

I don't think this works yet.

Link GPU host backend with LAPACK and FFTW

87766c4

Switch to in-place FFT for host-side GPU version

7097d58

For some reason out-of-place FFTs give incorrect numbers but only for (consistently) random latitudes, like lat 14. Why? No idea.

Remove OpenMP offload specific feature and add GPU_HOST feature

1fb5410

This allows one to reenable OpenMP directives when building the GPU host backend. I am testing whether we can use the OpenMP target directives also to accelerate host-side loops.

samhatfield force-pushed the host_side_gpu branch from 2f44473 to 1fb5410 Compare August 18, 2025 12:49

wdeconinck force-pushed the develop branch from aad830b to f1d16a6 Compare August 26, 2025 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] Allow GPU code path to be executed on CPU #299

[DRAFT] Allow GPU code path to be executed on CPU #299

Uh oh!

samhatfield commented Aug 5, 2025

Uh oh!

samhatfield commented Aug 5, 2025

Uh oh!

samhatfield commented Aug 5, 2025

Uh oh!

Uh oh!

[DRAFT] Allow GPU code path to be executed on CPU #299

Are you sure you want to change the base?

[DRAFT] Allow GPU code path to be executed on CPU #299

Uh oh!

Conversation

samhatfield commented Aug 5, 2025

Uh oh!

samhatfield commented Aug 5, 2025

Uh oh!

samhatfield commented Aug 5, 2025

Uh oh!

Uh oh!