Skip to content

Improve GPU-STUMP Speed #245

Closed
Closed
@seanlaw

Description

@seanlaw

This follows from a discussion on the Numba discourse channel

Currently, STUMPY can be executed on both the CPUs and the GPUs (i.e., given the same input, both should produce the same results - this is useful for validation). The GPU kernel is essentially self contained in the gpu_stump.py module. In fact, all of the heavy lifting occurs in repeatedly launching a single GPU kernel called _compute_and_update_PI_kernel:

        for i in range(range_start, range_stop):
            _compute_and_update_PI_kernel[blocks_per_grid, threads_per_block](
                i,
                device_T_A,
                device_T_B,
                m,
                device_QT_even,
                device_QT_odd,
                device_QT_first,
                device_M_T,
                device_Σ_T,
                device_μ_Q,
                device_σ_Q,
                k,
                ignore_trivial,
                excl_zone,
                device_profile,
                device_indices,
                True,
            )

STUMPY Installation

Technically, STUMPY can be pip/conda installed but since we are interested in injecting code, let's clone and install from source:

git clone https://github.com/TDAmeritrade/stumpy.git ./stumpy.git
cd stumpy.git
python -m pip install .

For GPUs, it is expected that the appropriate NVIDIA driver and cudatoolkit are also installed.

STUMPY on GPUs

Below is a simple example using a small data set for running the gpu_stump function. gpu_stump is just a light wrapper that does a tiny bit of pre-work and then hands off the computation to _compute_and_update_PI_kernel GPU kernel:

import stumpy
import numpy as np

T = np.random.rand(10000)
m = 50

gpu_mp = stumpy.gpu_stump(T, m)

Validating STUMPY GPU Results

Since we have both CPU and GPU implementations in STUMPY, we can validate that the GPU output is correct using our CPU code (which is appropriate for small to medium sized data that are less than 100K in length):

import stumpy
import numpy as np
import numpy.testing as npt

T = np.random.rand(10000)
m = 50

gpu_mp = stumpy.gpu_stump(T, m)
cpu_mp = stumpy.stump(T, m)

npt.assert_almost_equal(gpu_mp, cpu_mp)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions