Gap analysis: Comparing Numba-dpex to Numba.Cuda

*High-level objective*: 

- Provide users familiar with `numba.cuda` an easy guide to start using `numba-dppy`. At the end of this gap-analysis we should be able to provide use-cases showing how a `numba.cuda` program can be transliterated into a `numba-dppy` program.
- Identify features that are supported in `numba.cuda` but are not yet supported by `numba-dppy`.
- Open separate tickets to track the design and missing features in `numba-dppy`.

*Detailed Goals*

Examples of docs to orient to:

- [x] Produce a guide and documentation similar to https://developer.codeplay.com/products/computecpp/ce/guides/sycl-for-cuda-developers. We should eventually include the documentation in a future gh-pages site for numba-dppy.
- [x] Produce a guide for numba-dppy similar to https://numba.pydata.org/numba-doc/dev/cuda/index.html
   - [x] https://github.com/IntelPython/numba-dppy/issues/168

Sections to analyze:
- [x] Study the [Writing a CUDA Kernel](https://numba.pydata.org/numba-doc/dev/cuda/kernels.html#) section and document with examples equivalent features in `numba-dppy`. (#157)
- [x] Study the [Memory Management](https://numba.pydata.org/numba-doc/dev/cuda/memory.html) section and document with examples equivalent features in `numba-dppy`. Identify missing features, *e.g.* device arrays. (#151)
- [x] Study the [Writing Device Functions](https://numba.pydata.org/numba-doc/dev/cuda/device-functions.html) section and document the equivalent feature in `numba-dppy`. (#152)
- [x] Evaluate if [Supported Python features in CUDA Python](https://numba.pydata.org/numba-doc/dev/cuda/cudapysupported.html#) are currently supported for `numba-dppy`.  (#155 )
- [x] Evaluate if [Supported Atomics Operation](https://numba.pydata.org/numba-doc/dev/cuda/intrinsics.html) are currently supported for `numba-dppy`.  (#156 )
- [x] Evaluate the [RNG](https://numba.pydata.org/numba-doc/dev/cuda/random.html) feature supported by `numba.cuda` and develop a plan on how to include support for similar functionality for `dppy.kernel`. (#159)
- [x] Evaluate [Debugging features](https://numba.pydata.org/numba-doc/dev/cuda/simulator.html) supported by `numba.cuda`. We probably do not need a simulator feature as for us a `dppy.kernel` can be debugged by changing the `dpctl.device_context` to CPU. But the rest of the debugging functionality should be evaluated. (#158)
- [x] Evaluate the [GPU Reduction](https://numba.pydata.org/numba-doc/dev/cuda/reduction.html) features that are provided by `numba.cuda`. We currently do not have anything similar and the output of this step should be a design to support a similar `@reduce` decorator for `numba-dppy`. (#153)
- [x] Evaluate the level of support for [Vectroize and GUVectorize](https://numba.pydata.org/numba-doc/dev/cuda/ufunc.html) functions in `numba.cuda`. We do support `@vectorize` but support for `@guvectorize` is missing. (#154)

Other topics:
- [x] Evaluate how to support pipelined asynchronous execution of GPU kernels to overlap compute and host-device data movement. Refer the example in the following comment. (#147)

The goal of this exercise is to identify features that are missing and need to be added and develop a guide for users to start using `numba-dppy` more easily.

Not supported topics:

Also, not all features provided by `numba.cuda` are relevant or necessary. An example is the support of [NumPy functions](https://numba.pydata.org/numba-doc/dev/cuda/cudapysupported.html#numpy-support) that CUDA supports, but numba-dppy should not support as it is really an anti-pattern (#146).

Topics for dpCtl:

Some of the sections from the [Numba for CUDA GPUs](https://numba.pydata.org/numba-doc/dev/cuda/index.html) are not relevant to `numba-dppy` and should be handled in `dpctl` in our case:
- [x] [Device Management](https://numba.pydata.org/numba-doc/dev/cuda/device-management.html) (IntelPython/dpctl#240)
- [x] [External Memory](https://numba.pydata.org/numba-doc/dev/cuda/external-memory.html) (IntelPython/dpctl#252 )
- [x] [Sharing CUDA Memory](https://numba.pydata.org/numba-doc/dev/cuda/ipc.html) (IntelPython/dpctl#245)

Sources of information:
- Docs
- Tests
- Examples (where are examples for `numba.cuda`?)

Acceptance criteria for analysis:
- Tickets for missing features
- Comparison/Transition from `numba.cuda` to `numba-dppy`
- Example for `numba-dppy`
- Documentation - explanation of the feature

### Documentation should contain:

- [x] Explanation of the feature
- [x] Examples for `numba-dppy`
- [x] Missing features in `numba-dppy`
- [x] Transition from `numba.cuda` to `numba-dppy`
- [x] Limitations of `numba-dppy`

### Missing features

- [Writing a CUDA Kernel](https://numba.pydata.org/numba-doc/dev/cuda/kernels.html#)
  - https://github.com/IntelPython/numba-dppy/issues/199 grid(ndim)
  - https://github.com/IntelPython/numba-dppy/issues/200 gridsize(ndim)

- [Memory Management](https://numba.pydata.org/numba-doc/dev/cuda/memory.html)
  - https://github.com/IntelPython/numba-dppy/issues/162 Data transfer (device_array, device_array_like, to_device, as_dppy_array, is_dppy_array)
  - https://github.com/IntelPython/numba-dppy/issues/163 Device array
  - https://github.com/IntelPython/numba-dppy/issues/164 Local memory
  - https://github.com/IntelPython/numba-dppy/issues/165 Private memory
  - https://github.com/IntelPython/numba-dppy/issues/166 Constant memory

- [Writing Device Functions](https://numba.pydata.org/numba-doc/dev/cuda/device-functions.html)
  - https://github.com/IntelPython/numba-dppy/issues/189 dppy.jit
  - https://github.com/IntelPython/numba-dppy/issues/205 Call from `@numba.jit`
  - https://github.com/IntelPython/numba-dppy/issues/193 Call from `@vectorize`

- [Supported Python features in CUDA Python](https://numba.pydata.org/numba-doc/dev/cuda/cudapysupported.html#)
  - https://github.com/IntelPython/numba-dppy/issues/169 statements (raise, assert)
  - https://github.com/IntelPython/numba-dppy/issues/170 built-in types (complex, bool, None, tuple)
  - https://github.com/IntelPython/numba-dppy/issues/171 built-in functions (complex, enumerate, min, max, zip)
  - https://github.com/IntelPython/numba-dppy/issues/172 cmath library
  - https://github.com/IntelPython/numba-dppy/issues/178 operators (&, &=, <<=, ~=, |=, >>=, ^=, >>, ^)

- [Supported Atomics Operation](https://numba.pydata.org/numba-doc/dev/cuda/intrinsics.html)
  - https://github.com/IntelPython/numba-dppy/issues/161 min, max, nanmin, nanmax, compare_and_swap for int, float and uint types

- [RNG](https://numba.pydata.org/numba-doc/dev/cuda/random.html)
  - https://github.com/IntelPython/numba-dppy/issues/202 RNG in dppy.kernel

- [Debugging features](https://numba.pydata.org/numba-doc/dev/cuda/simulator.html)
  - https://github.com/IntelPython/numba-dppy/issues/174 `@dppy.kernel(debug=True)`
  - https://github.com/IntelPython/numba-dppy/issues/198 pdb

- [GPU Reduction](https://numba.pydata.org/numba-doc/dev/cuda/reduction.html)
  - https://github.com/IntelPython/numba-dppy/issues/182 `@reduce`

- [Vectroize and GUVectorize](https://numba.pydata.org/numba-doc/dev/cuda/ufunc.html)
  - https://github.com/IntelPython/numba-dppy/issues/192 `@guvectorize`
  - https://github.com/IntelPython/numba-dppy/issues/193 Calling Device Functions
  - https://github.com/IntelPython/numba-dppy/issues/194 Intra-device arrays
  - https://github.com/IntelPython/numba-dppy/issues/195 Asynchronous launching
  - https://github.com/IntelPython/numba-dppy/issues/196 Control thread block size
  - https://github.com/IntelPython/numba-dppy/issues/207 Offload diagnostics

- [Device Management](https://numba.pydata.org/numba-doc/dev/cuda/device-management.html)
  - https://github.com/IntelPython/dpctl/issues/246 Multi-GPU machines
  - https://github.com/IntelPython/dpctl/issues/247 Functions for selecting device
  - https://github.com/IntelPython/dpctl/issues/248 Device list

- [External Memory Management](https://numba.pydata.org/numba-doc/dev/cuda/external-memory.html)
  - https://github.com/IntelPython/dpctl/issues/254 Implementing an EMM Plugin
  - https://github.com/IntelPython/dpctl/issues/253 Implement plugin for EMM Deallocation Behavior
  - https://github.com/IntelPython/dpctl/issues/255 Implement The Host-Only Memory Manager 
  - IntelPython/numba-dppy#699 Implement Memory Pointers
  - https://github.com/IntelPython/dpctl/issues/257 Implement Memory Info
  - https://github.com/IntelPython/dpctl/issues/258 Implement IPC

- [Sharing CUDA Memory](https://numba.pydata.org/numba-doc/dev/cuda/ipc.html)
  - https://github.com/IntelPython/dpctl/issues/249 IPC for Device Memory

- Pipelined asynchronous execution of GPU kernels to overlap compute and host-device data movement
  - https://github.com/IntelPython/numba-dppy/issues/147 Pipelined execution

### Missing examples
- [GPU Reduction](https://numba.pydata.org/numba-doc/dev/cuda/reduction.html)
  - https://github.com/IntelPython/numba-dppy/issues/186 Sum reduction in one call

- [RNG](https://numba.pydata.org/numba-doc/dev/cuda/random.html)
  - https://github.com/IntelPython/numba-dppy/issues/203 RNG via dpNP


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gap analysis: Comparing Numba-dpex to Numba.Cuda #145

Documentation should contain:

Missing features

Missing examples

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gap analysis: Comparing Numba-dpex to Numba.Cuda #145

Description

Documentation should contain:

Missing features

Missing examples

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions