Skip to content

GPU version of the code #64

@npadmana

Description

@npadmana

Elliot -- on my drive into work today, I found myself wondering what it would take to extend this code to use a GPU and cuFFT for the backend FFT.

We already use scratch space to do the YZ transform, so we would just need to allocate that via a CUDA call (which could use unified memory). We'd let the CUDA FFTs do all the heavy lifting (I took a quick look and I think they have all the functionality we need) and we'd use the Chapel code on the host to just coordinate the data transfer.

Now, the dominant time with the FFTs is the communication, but I'd hope we can overlap the data transfer between nodes with the data transfer to and from the GPU + communication, and so we might end up ahead.

Thoughts?

(tagging @ronawho )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions