Skip to content

Commit 8e76adb

Browse files
authored
Doc for writing kernels (#191)
1 parent 3963904 commit 8e76adb

File tree

2 files changed

+46
-0
lines changed

2 files changed

+46
-0
lines changed

docs/dppy/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Numba for DPPY GPUs
66
.. toctree::
77
:maxdepth: 2
88

9+
writing_kernels.rst
910
memory-management.rst
1011
device-functions.rst
1112
atomic-operations.rst

docs/dppy/writing_kernels.rst

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
Writing DPPY kernels
2+
====================
3+
4+
Introduction
5+
-------------
6+
7+
Numba-dppy has an execution model unlike the traditional sequential model used for programming CPUs.
8+
In Numba-dppy, the code you write will be executed by multiple threads at once (often hundreds or thousands).
9+
Your solution will be modeled by defining a thread hierarchy of work-groups and work-items.
10+
11+
Numba-dppy support exposes facilities to declare and manage this hierarchy of threads.
12+
The facilities are largely similar to those exposed by `OpenCL language <https://www.khronos.org/opencl/>`_.
13+
14+
Kernel declaration
15+
------------------
16+
A kernel function is a GPU function that is meant to be called from CPU code. It gives it
17+
two fundamental characteristics:
18+
19+
- kernels cannot explicitly return a value; all result data must be written to an array passed to the function
20+
(if computing a scalar, you will probably pass a one-element array)
21+
- kernels explicitly declare their thread hierarchy when called: i.e. the number of thread blocks and the number
22+
of threads per block (note that while a kernel is compiled once, it can be called multiple times with different
23+
block sizes or grid sizes).
24+
25+
Example
26+
~~~~~~~~~
27+
28+
.. literalinclude:: ../../numba_dppy/examples/sum.py
29+
30+
Kernel invocation
31+
------------------
32+
33+
A kernel is typically launched in the following way:
34+
35+
.. literalinclude:: ../../numba_dppy/examples/sum.py
36+
:pyobject: driver
37+
38+
Positioning
39+
------------
40+
41+
- ``dppy.get_local_id``
42+
- ``dppy.get_local_size``
43+
- ``dppy.get_group_id``
44+
- ``dppy.get_num_groups``
45+

0 commit comments

Comments
 (0)