The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is a direct port of Scipy Signal to leverage GPU compute resources via CuPy but also contains Numba CUDA and Raw CuPy CUDA kernels for additional speedups for selected functions. cuSignal achieves its best gains on large signals and compute intensive functions but stresses online processing with zero-copy memory (pinned, mapped) between CPU and GPU.
NOTE: For the latest stable README.md ensure you are on the latest branch.
- Quick Start
- Documentation
- Installation
- Optional Dependencies
- Benchmarking
- Contribution Guide
- cuSignal Blogs and Talks
cuSignal has an API that mimics SciPy Signal. In depth functionality is displayed in the notebooks section of the repo, but let's examine the workflow for Polyphase Resampling under multiple scenarios:
Scipy Signal (CPU)
import numpy as np
from scipy import signal
start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3
cx = np.linspace(start, stop, num_samps, endpoint=False)
cy = np.cos(-cx**2/6.0)
%%timeit
cf = signal.resample_poly(cy, resample_up, resample_down, window=('kaiser', 0.5))
This code executes on 2x Xeon E5-2600 in 2.36 sec.
cuSignal with Data Generated on the GPU with CuPy
import cupy as cp
import cusignal
# Optional: Precompile custom CUDA kernels to eliminate JIT overhead on first run
cusignal.precompile_kernels()
start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3
gx = cp.linspace(start, stop, num_samps, endpoint=False)
gy = cp.cos(-gx**2/6.0)
%%timeit
gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))
This code executes on an NVIDIA V100 in 13.8 ms, a 170x increase over SciPy Signal
cuSignal with Data Generated on the CPU with Mapped, Pinned (zero-copy) Memory
import cupy as cp
import numpy as np
import cusignal
# Optional: Precompile custom CUDA kernels to eliminate JIT overhead on first run
cusignal.precompile_kernels()
start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3
# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False)
cy = np.cos(-cx**2/6.0)
# Create shared memory between CPU and GPU and load with CPU signal (cy)
gpu_signal = cusignal.get_shared_mem(num_samps, dtype=np.float64)
%%time
# Move data to GPU/CPU shared buffer and run polyphase resampler
gpu_signal[:] = cy
gf = cusignal.resample_poly(gpu_signal, resample_up, resample_down, window=('kaiser', 0.5))
This code executes on an NVIDIA V100 in 174 ms.
cuSignal with Data Generated on the CPU and Copied to GPU [AVOID THIS FOR ONLINE SIGNAL PROCESSING]
import cupy as cp
import numpy as np
import cusignal
# Optional: Precompile custom CUDA kernels to eliminate JIT overhead on first run
cusignal.precompile_kernels()
start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3
# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False)
cy = np.cos(-cx**2/6.0)
%%time
gf = cusignal.resample_poly(cp.asarray(cy), resample_up, resample_down, window=('kaiser', 0.5))
This code executes on an NVIDIA V100 in 637 ms.
The complete cuSignal API documentation including a complete list of functionality and examples can be found for both the Stable and Nightly (Experimental) releases.
cuSignal 0.14 API | cuSignal 0.15 Nightly
cuSignal can be installed with conda (Miniconda, or the full Anaconda distribution) from the rapidsai
channel. If you're using a Jetson GPU, please follow the build instructions below
For cusignal version == 0.14
:
# For CUDA 10.0
conda install -c rapidsai -c nvidia -c conda-forge \
-c defaults cusignal=0.14 python=3.6 cudatoolkit=10.0
# or, for CUDA 10.1.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
cusignal=0.14 python=3.6 cudatoolkit=10.1
# or, for CUDA 10.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
cusignal=0.14 python=3.6 cudatoolkit=10.2
For the nightly verison of cusignal
, currently 0.15a:
# For CUDA 10.0
conda install -c rapidsai-nightly -c nvidia -c conda-forge \
-c defaults cusignal python=3.6 cudatoolkit=10.0
# or, for CUDA 10.1.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
cusignal python=3.6 cudatoolkit=10.1
# or, for CUDA 10.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
cusignal python=3.6 cudatoolkit=10.2
cuSignal has been tested and confirmed to work with Python 3.6, 3.7, and 3.8.
See the Get RAPIDS version picker for more OS and version info.
While there are many versions of Anaconda for AArch64 platforms, cuSignal has been tested and supports conda4aarch64. Conda4aarch64 is also described in the Numba aarch64 installation instructions. Further, it's assumed that your Jetson device is running a current edition of JetPack and contains the CUDA Toolkit.
-
Clone the repository
# Set the location to cuSignal in an environment variable CUSIGNAL_HOME export CUSIGNAL_HOME=$(pwd)/cusignal # Download the cuSignal repo git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME
-
Install conda4aarch64 and create the cuSignal conda environment:
cd $CUSIGNAL_HOME conda env create -f conda/environments/cusignal_jetson_base.yml
-
Activate conda environment
conda activate cusignal-dev
-
Install cuSignal module
cd $CUSIGNAL_HOME/python python setup.py install
or
cd $CUSIGNAL_HOME ./build.sh # install cuSignal to $PREFIX if set, otherwise $CONDA_PREFIX # run ./build.sh -h to print the supported command line options.
-
Once installed, periodically update environment
cd $CUSIGNAL_HOME conda env update -f conda/environments/cusignal_jetson_base.yml
-
Also, confirm unit testing via PyTest
cd $CUSIGNAL_HOME/python pytest -v # for verbose mode pytest -v -k <function name> # for more select testing
-
Clone the repository
# Set the location to cuSignal in an environment variable CUSIGNAL_HOME export CUSIGNAL_HOME=$(pwd)/cusignal # Download the cuSignal repo git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME
-
Download and install Anaconda or Miniconda then create the cuSignal conda environment:
Base environment (core dependencies for cuSignal)
cd $CUSIGNAL_HOME conda env create -f conda/environments/cusignal_base.yml
Full environment (including RAPIDS's cuDF, cuML, cuGraph, and PyTorch)
cd $CUSIGNAL_HOME conda env create -f conda/environments/cusignal_full.yml
-
Activate conda environment
conda activate cusignal-dev
-
Install cuSignal module
cd $CUSIGNAL_HOME/python python setup.py install
or
cd $CUSIGNAL_HOME ./build.sh # install cuSignal to $PREFIX if set, otherwise $CONDA_PREFIX # run ./build.sh -h to print the supported command line options.
-
Once installed, periodically update environment
cd $CUSIGNAL_HOME conda env update -f conda/environments/cusignal_base.yml
-
Also, confirm unit testing via PyTest
cd $CUSIGNAL_HOME/python pytest -v # for verbose mode pytest -v -k <function name> # for more select testing
-
Download and install Andaconda for Windows. In an Anaconda Prompt, navigate to your checkout of cuSignal.
-
Create cuSignal conda environment
conda create --name cusignal
-
Activate conda environment
conda activate cusignal-dev
-
Install cuSignal Core Dependencies
conda install numpy numba scipy cudatoolkit pip pip install cupy-cudaXXX
Where XXX is the version of the CUDA toolkit you have installed. 10.1, for example is
cupy-cuda101
. See the CuPy Documentation for information on getting Windows wheels for other versions of CUDA. -
Install cuSignal module
cd python python setup.py install
-
[Optional] Run tests In the cuSignal top level directory:
pip install pytest pytest
For cusignal version == 0.14
:
# For CUDA 10.0
docker pull rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04
For the nightly version of cusignal
docker pull rapidsai/rapidsai-nightly:cuda10.0-runtime-ubuntu18.04
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
rapidsai/rapidsai-nightly:cuda10.0-runtime-ubuntu18.04
Please see the RAPIDS Release Selector for more information on supported Python, Linux, and CUDA versions.
- nvidia-docker if using Docker
- RTL-SDR or other SDR Driver/Packaging. Find more information and follow the instructions for setup here. We have also tested cuSignal integration with SoapySDR
cuSignal uses pytest-benchmark to compare performance between CPU and GPU signal processing implementations. To run cuSignal's benchmark suite, navigate to the topmost python directory ($CUSIGNAL_HOME/python) and run:
pytest --benchmark-only
As with the standard pytest tool, the user can use the -v
and -k
flags for verbose mode and to select a specifc benchmark to run. When intrepreting the output, we recommend comparing the minimum execution time reported.
Review the CONTRIBUTING.md file for information on how to contribute code and issues to the project.
- Announcement Talk - GTC DC 2019 - Recording | Slides
- GPU Accelerated Signal Processing with cuSignal - Adam Thompson - Medium
- cuSignal 0.13 - Entering the Big Leagues and Focused on Screamin' Streaming Performance - Adam Thompson - Medium
- cuSignal: Easy CUDA GPU Acceleration for SDR DSP and Other Applications - RTL-SDR.com
- cuSignal on the AIR-T - Deepwave Digital
- Detecting, Labeling, and Recording Training Data with the AIR-T and cuSignal - Deepwave Digital
- Signal Processing and Deep Learning - Deepwave Digital
- cuSignal and CyberRadio Demonstrate GPU Accelerated SDR - Andrew Back - LimeMicro
- Follow the latest cuSignal Announcements on Twitter