-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Elliot -- on my drive into work today, I found myself wondering what it would take to extend this code to use a GPU and cuFFT for the backend FFT.
We already use scratch space to do the YZ transform, so we would just need to allocate that via a CUDA call (which could use unified memory). We'd let the CUDA FFTs do all the heavy lifting (I took a quick look and I think they have all the functionality we need) and we'd use the Chapel code on the host to just coordinate the data transfer.
Now, the dominant time with the FFTs is the communication, but I'd hope we can overlap the data transfer between nodes with the data transfer to and from the GPU + communication, and so we might end up ahead.
Thoughts?
(tagging @ronawho )
ronawho and ben-albrecht
Metadata
Metadata
Assignees
Labels
No labels