GPU version of the code

Elliot -- on my drive into work today, I found myself wondering what it would take to extend this code to use a GPU and cuFFT for the backend FFT. 

We already use scratch space to do the YZ transform, so we would just need to allocate that via a CUDA call (which could use unified memory). We'd let the CUDA FFTs do all the heavy lifting (I took a quick look and I think they have all the functionality we need) and we'd use the Chapel code on the host to just coordinate the data transfer.

Now, the dominant time with the FFTs is the communication, but I'd hope we can overlap the data transfer between nodes with the data transfer to and from the GPU + communication, and so we might end up ahead.

Thoughts? 

(tagging @ronawho )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU version of the code #64

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GPU version of the code #64

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions