Gap analysis for Memory Management 

Study the Memory Management section and document with examples equivalent features in numba-dppy. Identify missing features, e.g. device arrays.

- [ ] Data transfer

| CUDA  | DPPY |
| --- | --- |
| numba.cuda.device_array | - |
| numba.cuda.device_array_like | -  |
| numba.cuda.to_device | - |
| numba.cuda.as_cuda_array (Create a DeviceNDArray from any object that implements the cuda array interface.) | - |
| numba.cuda.is_cuda_array | - |

- [ ] Device arrays in CUDA 

| CUDA  | DPPY |
| --- | --- |
| numba.cuda.cudadrv.devicearray.DeviceNDArray | - |
| copy_to_host | - |
| is_c_contiguous | - |
| is_f_contiguous | - |
| ravel | - |
| reshape | - |

- [ ] Pinned memory in CUDA / no explicit mechanism to request pinned memory in SYCL

There are generally two ways which host memory can be allocated: * When not using the cl::sycl::property::buffer::use_host_pointer property, the SYCL runtime will allocate host memory when required. This uses the implementation-specific mechanism, which can attempt to request pinned memory.
If the cl::sycl::property::buffer::use_host_pointer property is used, then the SYCL runtime will not allocate host memory and will use the pointer provided when the buffer is constructed. In this case, it is the users responsibility to ensure any requirements for memory allocation to allow pinned memory are satisfied.
Users can manually allocate pinned memory on the host, and hand it over to the SYCL implementation. This will often involve allocating host memory with a suitable alignment and multiple, and sometimes can be managed manually using OS specific operations such as mmap and munmap.

| CUDA  | DPPY |
| --- | --- |
| numba.cuda.pinned | - |
| numba.cuda.pinned_array | - |

- [ ] Streams in CUDA / Queue in SYCL

In a similar fashion to CUDA streams, SYCL queues submit command groups for execution asynchronously. However, SYCL is a higher-level programming model, and data transfer operations are implicitly deduced from the dependencies of the kernels submitted to any queue. Furthermore, SYCL queues can map to multiple OpenCL queues, enabling transparent overlapping of data-transfer and kernel execution. The SYCL runtime handles the execution order of the different command groups (kernel + dependencies) automatically across multiple queues in different devices.

| CUDA  | DPPY |
| --- | --- |
| numba.cuda.stream | queue |
| numba.cuda.default_stream | - |
| numba.cuda.legacy_default_stream | - |
| numba.cuda.per_thread_default_stream | - |
| numba.cuda.external_stream | - |
| numba.cuda.cudadrv.driver.Stream | - |
| auto_synchronize | - |
| synchronize | event |

- [ ] Per-block Shared memory and thread synchronization in CUDA / Local memory in SYCL

| CUDA  | DPPY |
| --- | --- |
| numba.cuda.shared.array | dppy.local.static_alloc |
| numba.cuda.syncthreads | dppy.barrier |

- [ ] Per-thread Local memory  / Private memory  in SYCL 

| CUDA  | DPPY |
| --- | --- |
| numba.cuda.local.array | - |

- [ ] Constant memory in CUDA / Constant memory in SYCL

| CUDA  | DPPY |
| --- | --- |
| numba.cuda.const.array_like | - |

- [ ] Deallocation Behavior

https://numba.pydata.org/numba-doc/dev/cuda/external-memory.html#cuda-emm-plugin

| CUDA  | DPPY |
| --- | --- |
| numba.cuda.defer_cleanup | - |



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gap analysis for Memory Management #151

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA	DPPY
numba.cuda.device_array	-
numba.cuda.device_array_like	-
numba.cuda.to_device	-
numba.cuda.as_cuda_array (Create a DeviceNDArray from any object that implements the cuda array interface.)	-
numba.cuda.is_cuda_array	-

CUDA	DPPY
numba.cuda.cudadrv.devicearray.DeviceNDArray	-
copy_to_host	-
is_c_contiguous	-
is_f_contiguous	-
ravel	-
reshape	-

CUDA	DPPY
numba.cuda.stream	queue
numba.cuda.default_stream	-
numba.cuda.legacy_default_stream	-
numba.cuda.per_thread_default_stream	-
numba.cuda.external_stream	-
numba.cuda.cudadrv.driver.Stream	-
auto_synchronize	-
synchronize	event

CUDA	DPPY
numba.cuda.shared.array	dppy.local.static_alloc
numba.cuda.syncthreads	dppy.barrier

Gap analysis for Memory Management #151

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions