Skip to content

Python module usage

Sotiris Niarchos edited this page Jun 21, 2020 · 12 revisions

oclude can be used as a Python 3 module.

The oclude module exports the device and kernel commands as 2 separate functions.

The arguments of the following functions correspond to the arguments of the CLI, the documentation of which can be found here.

The device command corresponds to the following function:

profile_opencl_device(platform_id=0,
                      device_id=0,
                      verbose=False)

profile_opencl_device returns a dictionary with the following entries:

  • profiling overhead (time) (in milliseconds, float)
  • profiling overhead (percentage) (str)
  • command latency (in milliseconds, float)
  • device-to-device transfer latency (in milliseconds, float)
  • device-to-host transfer latency (in milliseconds, float)
  • host-to-device transfer latency (in milliseconds, float)
  • device-device bandwidth (dict with experiments of different bytes transferred)
  • device-host bandwidth (dict with experiments of different bytes transferred)
  • host-device bandwidth (dict with experiments of different bytes transferred)

The kernel command corresponds to the following function:

profile_opencl_kernel(file,
                      kernel,
                      gsize,
                      lsize=None,
                      platform_id=0,
                      device_id=0,
                      samples=1,
                      instcounts=False,
                      timeit=False,
                      verbose=False,
                      clear_cache=False,
                      ignore_cache=False,
                      no_cache_warnings=False)

profile_opencl_kernel returns a dictionary with the following entries:

  • original file (str)
  • instrumented file (str or None if instcounts=False)
  • kernel (str)
  • results (list of dicts or None if instcounts=False and timeit=False)

If instcounts=True and/or timeit=True, results is a list of dicts the length of which equals to samples.

The ith dict of the results list holds the following information regarding the ith execution of the specified OpenCL kernel.

If instcounts=True:

  • instcounts (dict with the whole LLVM instruction set as keys and the corresponding counts as values)

If timeit=True:

  • timeit (dict with 3 entries corresponding to the measured time in the hostcode, device and transfer, in milliseconds)

Here is an example of usage for the profile_opencl_kernel function:

$ python
Python 3.7.7 (default, Mar 10 2020, 15:16:38) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from oclude import *
>>> from pprint import pprint
>>>
>>>
>>> args = dict(
...     file='tests/rodinia_kernels/particlefilter/particle_double.cl',
...     kernel='normalize_weights_kernel',
...     gsize=1024, lsize=128,
...     instcounts=True,
...     timeit=True,
...     samples=3
... )
>>>
>>>
>>> res = profile_opencl_kernel(**args)
[oclude] WARNING: Instruction count and execution time measurement were both requested.
[oclude] This will result in the time measurement of the instrumented kernel and not the original.
[oclude] INFO: Input file tests/rodinia_kernels/particlefilter/particle_double.cl is cached
[oclude] INFO: Using cached instrumented file
[oclude] Running kernel 'normalize_weights_kernel' from file tests/rodinia_kernels/particlefilter/particle_double.cl
[hostcode] Using the following device:
[hostcode] Platform:	Intel(R) OpenCL HD Graphics
[hostcode] Device:	Intel(R) Gen9 HD Graphics NEO
[hostcode] Version:	OpenCL 2.1 NEO
[hostcode] Kernel name: normalize_weights_kernel
[hostcode] Kernel arg 1: weights (double*, global)
[hostcode] Kernel arg 2: Nparticles (int, private)
[hostcode] Kernel arg 3: partial_sums (double*, global)
[hostcode] Kernel arg 4: CDF (double*, global)
[hostcode] Kernel arg 5: u (double*, global)
[hostcode] Kernel arg 6: seed (int*, global)
[hostcode] About to execute kernel with Global NDRange = 1024 and Local NDRange = 128
[hostcode] Number of executions (a.k.a. samples) to perform: 3
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 225.28 kernel executions/s]
[hostcode] Kernel runs completed successfully
>>>
>>>
>>> pprint(res)
{'instrumented file': '<...>/oclude/utils/.cache/instr_particle_double.cl',
 'kernel': 'normalize_weights_kernel',
 'original file': 'tests/rodinia_kernels/particlefilter/particle_double.cl',
 'results': [{'instcounts': {'add': 583,
                             'addrspacecast': 0,
                             'alloca': 10252,
                             < ... a lot more ... >
                             'va_arg': 0,
                             'xor': 0,
                             'zext': 0},
              'timeit': {'device': 5.839665999999999,
                         'hostcode': 7.50422477722168,
                         'transfer': 1.6645587772216803}},
             {'instcounts': {'add': 143,
                             'addrspacecast': 0,
                             'alloca': 10252,
                             < ... a lot more ... >
                             'va_arg': 0,
                             'xor': 0,
                             'zext': 0},
              'timeit': {'device': 2.552083,
                         'hostcode': 2.6907920837402344,
                         'transfer': 0.13870908374023427}},
             {'instcounts': {'add': 1,
                             'addrspacecast': 0,
                             'alloca': 10252,
                             < ... a lot more ... >
                             'va_arg': 0,
                             'xor': 0,
                             'zext': 0},
              'timeit': {'device': 1.0034159999999999,
                         'hostcode': 1.1868476867675781,
                         'transfer': 0.18343168676757826}}]}
>>>
Clone this wiki locally