Description
This follows from a discussion on the Numba discourse channel
Currently, STUMPY can be executed on both the CPUs and the GPUs (i.e., given the same input, both should produce the same results - this is useful for validation). The GPU kernel is essentially self contained in the gpu_stump.py
module. In fact, all of the heavy lifting occurs in repeatedly launching a single GPU kernel called _compute_and_update_PI_kernel:
for i in range(range_start, range_stop):
_compute_and_update_PI_kernel[blocks_per_grid, threads_per_block](
i,
device_T_A,
device_T_B,
m,
device_QT_even,
device_QT_odd,
device_QT_first,
device_M_T,
device_Σ_T,
device_μ_Q,
device_σ_Q,
k,
ignore_trivial,
excl_zone,
device_profile,
device_indices,
True,
)
STUMPY Installation
Technically, STUMPY can be pip
/conda
installed but since we are interested in injecting code, let's clone and install from source:
git clone https://github.com/TDAmeritrade/stumpy.git ./stumpy.git
cd stumpy.git
python -m pip install .
For GPUs, it is expected that the appropriate NVIDIA driver and cudatoolkit are also installed.
STUMPY on GPUs
Below is a simple example using a small data set for running the gpu_stump
function. gpu_stump
is just a light wrapper that does a tiny bit of pre-work and then hands off the computation to _compute_and_update_PI_kernel
GPU kernel:
import stumpy
import numpy as np
T = np.random.rand(10000)
m = 50
gpu_mp = stumpy.gpu_stump(T, m)
Validating STUMPY GPU Results
Since we have both CPU and GPU implementations in STUMPY, we can validate that the GPU output is correct using our CPU code (which is appropriate for small to medium sized data that are less than 100K in length):
import stumpy
import numpy as np
import numpy.testing as npt
T = np.random.rand(10000)
m = 50
gpu_mp = stumpy.gpu_stump(T, m)
cpu_mp = stumpy.stump(T, m)
npt.assert_almost_equal(gpu_mp, cpu_mp)