Improve GPU-STUMP Speed

[This follows from a discussion on the Numba discourse channel](https://numba.discourse.group/t/understanding-cuda-stream/216)

Currently, STUMPY can be executed on both the CPUs and the GPUs (i.e., given the same input, both should produce the same results - this is useful for validation). The GPU kernel is essentially self contained in the [`gpu_stump.py`](https://github.com/TDAmeritrade/stumpy/blob/master/stumpy/gpu_stump.py) module. In fact, all of the heavy lifting occurs in [repeatedly launching a single GPU kernel](https://github.com/TDAmeritrade/stumpy/blob/164368e97e2acae5f882a3acb5e7caefc144b7a0/stumpy/gpu_stump.py#L343-L362) called [_compute_and_update_PI_kernel](https://github.com/TDAmeritrade/stumpy/blob/164368e97e2acae5f882a3acb5e7caefc144b7a0/stumpy/gpu_stump.py#L17-L168):

```
        for i in range(range_start, range_stop):
            _compute_and_update_PI_kernel[blocks_per_grid, threads_per_block](
                i,
                device_T_A,
                device_T_B,
                m,
                device_QT_even,
                device_QT_odd,
                device_QT_first,
                device_M_T,
                device_Σ_T,
                device_μ_Q,
                device_σ_Q,
                k,
                ignore_trivial,
                excl_zone,
                device_profile,
                device_indices,
                True,
            )
``` 

### STUMPY Installation

Technically, STUMPY can be `pip`/`conda` installed but since we are interested in injecting code, let's clone and install from source:

```
git clone https://github.com/TDAmeritrade/stumpy.git ./stumpy.git
cd stumpy.git
python -m pip install .
```

For GPUs, it is expected that the appropriate NVIDIA driver and cudatoolkit are also installed.

### STUMPY on GPUs

Below is a simple example using a small data set for running the `gpu_stump` function. `gpu_stump` is just a light wrapper that does a tiny bit of pre-work and then hands off the computation to `_compute_and_update_PI_kernel` GPU kernel:

```
import stumpy
import numpy as np

T = np.random.rand(10000)
m = 50

gpu_mp = stumpy.gpu_stump(T, m)
```

### Validating STUMPY GPU Results

Since we have both CPU and GPU implementations in STUMPY, we can validate that the GPU output is correct using our CPU code (which is appropriate for small to medium sized data that are less than 100K in length):

```
import stumpy
import numpy as np
import numpy.testing as npt

T = np.random.rand(10000)
m = 50

gpu_mp = stumpy.gpu_stump(T, m)
cpu_mp = stumpy.stump(T, m)

npt.assert_almost_equal(gpu_mp, cpu_mp)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve GPU-STUMP Speed #245

STUMPY Installation

STUMPY on GPUs

Validating STUMPY GPU Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve GPU-STUMP Speed #245

Description

STUMPY Installation

STUMPY on GPUs

Validating STUMPY GPU Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions