cuFINUFFT vs. FINUFFT: Slowdown in cuFINUFFT for 3D Stacked Type 1 Transform #649
Replies: 3 comments 14 replies
-
Hi @remy-abergel, does
Edit: If I do
If I do
|
Beta Was this translation helpful? Give feedback.
-
Just to give you a bit of context, I am developping forward and backward operators related to 4D spectral-spatial image reconstruction for Electron Paramagnetic Resonance (to be included in the next release of PyEPRI). The image to be reconstructed from the EPR measurements (a.k.a projections) are four-dimensional (they can be viewed as 3D images in which each "voxel" contains a 1D EPR spectrum). Forward operator (spectral-spatial projection operator)Given a 4D image where
Since 4D NUFFT is not available, I compute where Adjoint operator (spectral-spatial backprojection operator)I use a similar strategy for evaluating Toeplitz kernelOn the top of that, I need to compute a Toeplitz convolution kernel enabling the evaluation of Typical sizes
EPR imaging does not allow high spatial resolution (contrary to MRI) so increasing the spatial domain is not a priority. Users are not equipped with HPC workstations so the memory budget is only several GB. By the way, I use Hermitian symmetry properties to reduce the memory usage: |
Beta Was this translation helpful? Give feedback.
-
Here are some fresh feedbacks about this issue: installation without binaries still fails on my own machine (with cudatoolkit installed with conda), but I could make it work on another machine equipped with two NVIDIA A40 GPUs (also managing the cuda installation with conda). I could run again the reported code with the two different kind of installations. cufinufft v. 2.3.1 installed with binaries(installation:
cufinufft v. 2.3.1 installed without binaries(installation:
Installation steps with condaIn case this can be helpful to someone else, here are my installation notes. ####################################
# create a fresh conda environment #
####################################
conda update -y -n base -c defaults conda
conda create -y -n finufft-conda-no-binaries pip
conda activate finufft-conda-no-binaries
#######################
# install dependences #
#######################
pip install setuptools # needed for cudatoolkit-dev install at the next step
conda install -c conda-forge cudatoolkit-dev # cudatoolkit is not enough (nvcc is missing)
conda install -c conda-forge cxx-compiler gcc=11 # cufinufft install fails with gcc > 11
conda install cupy # install of cupy-cuda12x with pip causes issues on this machine
pip install finufft # fails with option --no-binary finufft
pip install packaging # to avoid error on finufft import (version 2.3.1)
pip install --no-binary cufinufft cufinufft # works
########################################
# to install from master (still fails) #
########################################
conda install -c conda-forge cuda-runtime # to get the crt/host_config.h file On mastergit clone https://github.com/flatironinstitute/finufft.git
cd finufft
pip install python/cufinufft # fails This install attempt fails with |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
My apologies for posting multiple times these days. I am currently working with 3D stacked Type 1 transforms (through the Python interface), and for the first time since using (cu)FINUFFT, I have observed a slower execution time with cuFINUFFT compared to FINUFFT. The settings are as follows:
M = 595350
N1 = N2 = N3 = 32
n_trans = 684
dtype = 'complex64'
(single precision)I am unsure whether this should be reported as an issue. Here is the code (sorry, it is a bit messy, but my attempts to simplify it bring me back to a situation similar to issue 648 that I failed to solve for the moment). To reproduce the experiment, you can run the following (changing the setting at line 14:
lib = cp
for GPU orlib = np
for CPU).@DiamonDinoia edits to run both in one go and synchronize the GPU:
I checked that the computed
f
values are the same (up to machine epsilon) forlib=np
andlib=cp
, so I believe that I am not in the situation reported in issue 648. Here are the measured times on my laptop:cp
np
I usually achieve a nice ~10x speedup when using cuFINUFFT instead of FINUFFT. For instance, the Type 2 transformation applied to
f
is faster with cuFINUFFT (~0.6 sec) compared to FINUFFT (~4 sec), which seems more typical to me.I would be glad to hear your comments if you have any.
Many thanks,
Rémy
Environment
pip install cufinufft
, could not be installed using--no-binary cufinufft
option yet)Beta Was this translation helpful? Give feedback.
All reactions