Skip to content

Gaps in GPU Reduction #153

Closed
Closed
@PokhodenkoSA

Description

@PokhodenkoSA

"Evaluate the GPU Reduction features that are provided by numba.cuda. We currently do not have anything similar and the output of this step should be a design to support a similar @reduce decorator for numba-dppy.

Links:

Related issues:

Features:

  1. Reduction kernels (convert a simple binary operation into a reduction kernel)
@cuda.reduce
def sum_reduce(a, b):
    return a + b
res = sum_reduce(arr)

numba-dppy does not provide decorator for reduction.
From HowTo:

This can be implemented by invoking the kernel once, but that requires support for local device memory and barrier, which is a work in progress.

See example.
2. Support lambdas

sum_reduce = cuda.reduce(lambda a, b: a + b)

numba-dppy - No
3. Possible parameters

Parameter CUDA DPPY
Works with host and device arrays Yes ?
Size of array Yes ?
Return value or output parameter Yes ?
Initial value Yes ?
Pin to stream Yes Pin to queue?

Side gap:

  • Support for local device memory and barrier

Questions:

We provide all features to write your reductions but we should also provide @reduce decorator.
Make it by hand is harder, @reduce will autogenerate boilerplate code.

Also missing feature is to support reductions in parfor reductions. We should wait for MLIR support for it.

Missing features:

Missing example:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions