-
Notifications
You must be signed in to change notification settings - Fork 1
Python module usage
oclude
can be used as a Python 3 module.
The oclude
module exports the device
and kernel
commands as 2 separate functions.
The arguments of the following functions correspond to the arguments of the CLI, the documentation of which can be found here.
The device
command corresponds to the following function:
profile_opencl_device(platform_id=0,
device_id=0,
verbose=False)
profile_opencl_device
returns a dictionary with the following entries:
-
profiling overhead (time)
(in milliseconds, float) -
profiling overhead (percentage)
(str) -
command latency
(in milliseconds, float) -
device-to-device transfer latency
(in milliseconds, float) -
device-to-host transfer latency
(in milliseconds, float) -
host-to-device transfer latency
(in milliseconds, float) -
device-device bandwidth
(dict with experiments of different bytes transferred) -
device-host bandwidth
(dict with experiments of different bytes transferred) -
host-device bandwidth
(dict with experiments of different bytes transferred)
The kernel
command corresponds to the following function:
profile_opencl_kernel(file,
kernel,
gsize,
lsize=None,
platform_id=0,
device_id=0,
samples=1,
instcounts=False,
timeit=False,
verbose=False,
clear_cache=False,
ignore_cache=False,
no_cache_warnings=False)
profile_opencl_kernel
returns a dictionary with the following entries:
-
original file
(str) -
instrumented file
(str or None ifinstcounts=False
) -
kernel
(str) -
results
(list of dicts orNone
ifinstcounts=False
andtimeit=False
)
If instcounts=True
and/or timeit=True
, results
is a list of dicts the length of which equals to samples
.
The ith dict of the results
list holds the following information regarding the ith execution of the specified OpenCL kernel.
If instcounts=True
:
-
instcounts
(dict with the whole LLVM instruction set as keys and the corresponding counts as values)
If timeit=True
:
-
timeit
(dict with 3 entries corresponding to the measured time in thehostcode
,device
andtransfer
, in milliseconds)
Here is an example of usage for the profile_opencl_kernel
function:
$ python
Python 3.7.7 (default, Mar 10 2020, 15:16:38)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from oclude import *
>>> from pprint import pprint
>>>
>>>
>>> args = dict(
... file='tests/rodinia_kernels/particlefilter/particle_double.cl',
... kernel='normalize_weights_kernel',
... gsize=1024, lsize=128,
... instcounts=True,
... timeit=True,
... samples=3
... )
>>>
>>>
>>> res = profile_opencl_kernel(**args)
[oclude] WARNING: Instruction count and execution time measurement were both requested.
[oclude] This will result in the time measurement of the instrumented kernel and not the original.
[oclude] INFO: Input file tests/rodinia_kernels/particlefilter/particle_double.cl is cached
[oclude] INFO: Using cached instrumented file
[oclude] Running kernel 'normalize_weights_kernel' from file tests/rodinia_kernels/particlefilter/particle_double.cl
[hostcode] Using the following device:
[hostcode] Platform: Intel(R) OpenCL HD Graphics
[hostcode] Device: Intel(R) Gen9 HD Graphics NEO
[hostcode] Version: OpenCL 2.1 NEO
[hostcode] Kernel name: normalize_weights_kernel
[hostcode] Kernel arg 1: weights (double*, global)
[hostcode] Kernel arg 2: Nparticles (int, private)
[hostcode] Kernel arg 3: partial_sums (double*, global)
[hostcode] Kernel arg 4: CDF (double*, global)
[hostcode] Kernel arg 5: u (double*, global)
[hostcode] Kernel arg 6: seed (int*, global)
[hostcode] About to execute kernel with Global NDRange = 1024 and Local NDRange = 128
[hostcode] Number of executions (a.k.a. samples) to perform: 3
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 225.28 kernel executions/s]
[hostcode] Kernel runs completed successfully
>>>
>>>
>>> pprint(res)
{'instrumented file': '<...>/oclude/utils/.cache/instr_particle_double.cl',
'kernel': 'normalize_weights_kernel',
'original file': 'tests/rodinia_kernels/particlefilter/particle_double.cl',
'results': [{'instcounts': {'add': 583,
'addrspacecast': 0,
'alloca': 10252,
< ... a lot more ... >
'va_arg': 0,
'xor': 0,
'zext': 0},
'timeit': {'device': 5.839665999999999,
'hostcode': 7.50422477722168,
'transfer': 1.6645587772216803}},
{'instcounts': {'add': 143,
'addrspacecast': 0,
'alloca': 10252,
< ... a lot more ... >
'va_arg': 0,
'xor': 0,
'zext': 0},
'timeit': {'device': 2.552083,
'hostcode': 2.6907920837402344,
'transfer': 0.13870908374023427}},
{'instcounts': {'add': 1,
'addrspacecast': 0,
'alloca': 10252,
< ... a lot more ... >
'va_arg': 0,
'xor': 0,
'zext': 0},
'timeit': {'device': 1.0034159999999999,
'hostcode': 1.1868476867675781,
'transfer': 0.18343168676757826}}]}
>>>