2
2
Features
3
3
========
4
4
5
- DPPL is currently implemented using OpenCL 2.1. The features currently available
5
+ DPPY is currently implemented using OpenCL 2.1. The features currently available
6
6
are listed below with the help of sample code snippets. In this release we have
7
7
the implementation of the OAK approach described in MS138 in section 4.3.2. The
8
8
new decorator is described below.
9
9
10
- To access the features driver module have to be imported from numba.dppl .dppl_driver
10
+ To access the features driver module have to be imported from numba_dppy .dppl_driver
11
11
12
12
New Decorator
13
13
=============
14
14
15
- The new decorator included in this release is *dppl .kernel *. Currently this decorator
15
+ The new decorator included in this release is *numba_dppy .kernel *. Currently this decorator
16
16
takes only one option *access_types * which is explained below with the help of an example.
17
17
Users can write OpenCL tpye kernels where they can identify the global id of the work item
18
18
being executed. The supported methods inside a decorated function are:
19
19
20
- - dppl .get_global_id(dimidx)
21
- - dppl .get_local_id(dimidx)
22
- - dppl .get_group_num(dimidx)
23
- - dppl .get_num_groups(dimidx)
24
- - dppl .get_work_dim()
25
- - dppl .get_global_size(dimidx)
26
- - dppl .get_local_size(dimidx)
20
+ - numba_dppy .get_global_id(dimidx)
21
+ - numba_dppy .get_local_id(dimidx)
22
+ - numba_dppy .get_group_num(dimidx)
23
+ - numba_dppy .get_num_groups(dimidx)
24
+ - numba_dppy .get_work_dim()
25
+ - numba_dppy .get_global_size(dimidx)
26
+ - numba_dppy .get_local_size(dimidx)
27
27
28
28
Currently no support is provided for local memory in the device and everything is in the
29
29
global memory. Barrier and other memory fences will be provided once support for local
@@ -61,7 +61,7 @@ Primitive types are passed by value to the kernel, currently supported are int,
61
61
Math Kernels
62
62
============
63
63
64
- This release has support for math kernels. See numba/dppl /tests/dppl/test_math_functions.py
64
+ This release has support for math kernels. See numba_dppy /tests/dppl/test_math_functions.py
65
65
for more details.
66
66
67
67
@@ -72,7 +72,7 @@ Examples
72
72
Sum of two 1d arrays
73
73
====================
74
74
75
- Full example can be found at numba/dppl /examples/sum.py.
75
+ Full example can be found at numba_dppy /examples/sum.py.
76
76
77
77
To write a program that sums two 1d arrays we at first need a OpenCL device environment.
78
78
We can get the environment by using *ocldrv.runtime.get_gpu_device() * for getting the
@@ -82,7 +82,7 @@ where *device_env.copy_array_to_device(data)* will read the ndarray and copy tha
82
82
and *ocldrv.DeviceArray(device_env.get_env_ptr(), data) * will create a buffer in the device
83
83
that has the same memory size as the ndarray being passed. The OpenCL Kernel in the
84
84
folllowing example is *data_parallel_sum *. To get the id of the work item we are currently
85
- executing we need to use the *dppl .get_global_id(0) *, since this example only 1 dimension
85
+ executing we need to use the *numba_dppy .get_global_id(0) *, since this example only 1 dimension
86
86
we only need to get the id in dimension 0.
87
87
88
88
While invoking the kernel we need to pass the device environment and the global work size.
@@ -91,9 +91,9 @@ back to the host and we can use *device_env.copy_array_from_device(ddata)*.
91
91
92
92
.. code-block :: python
93
93
94
- @dppl .kernel
94
+ @numba_dppy .kernel
95
95
def data_parallel_sum (a , b , c ):
96
- i = dppl .get_global_id(0 )
96
+ i = numba_dppy .get_global_id(0 )
97
97
c[i] = a[i] + b[i]
98
98
99
99
global_size = 10
@@ -126,7 +126,7 @@ ndArray Support
126
126
127
127
Support for passing ndarray directly to kernels is also supported.
128
128
129
- Full example can be found at numba/dppl /examples/sum_ndarray.py
129
+ Full example can be found at numba_dppy /examples/sum_ndarray.py
130
130
131
131
For availing this feature instead of creating device buffers explicitly like the previous
132
132
example, users can directly pass the ndarray to the kernel. Internally it will result in
@@ -148,7 +148,7 @@ Reduction
148
148
149
149
This example will demonstrate a sum reduction of 1d array.
150
150
151
- Full example can be found at numba/dppl /examples/sum_reduction.py.
151
+ Full example can be found at numba_dppy /examples/sum_reduction.py.
152
152
153
153
In this example to sum the 1d array we invoke the Kernel multiple times.
154
154
This can be implemented by invoking the kernel once, but that requires
@@ -161,15 +161,15 @@ ParFor Support
161
161
162
162
*Parallel For * is supported in this release for upto 3 dimensions.
163
163
164
- Full examples can be found in numba/dppl /examples/pa_examples/
164
+ Full examples can be found in numba_dppy /examples/pa_examples/
165
165
166
166
167
167
=======
168
168
Testing
169
169
=======
170
170
171
- All examples can be found in numba/dppl /examples/
171
+ All examples can be found in numba_dppy /examples/
172
172
173
- All tests can be found in numba/dppl /tests/dppl and can be triggered by the following command:
173
+ All tests can be found in numba_dppy /tests/dppl and can be triggered by the following command:
174
174
175
- ``python -m numba.runtests numba.dppl .tests ``
175
+ ``python -m numba.runtests numba_dppy .tests ``
0 commit comments