Description
Several places in dppy_lowerer.py
we are using long
to mean int64
. As long
is not guaranteed to be 64 bits this assumption can cause subtle issues on windows. Instead, we should be using long long
, size_t
, or unsigned long long
that are guaranteed to be at least 64 bit wide.
The problem is this function: https://github.com/IntelPython/numba-dppy/blob/0105e830a42829ea7206cf4d87d9f9d9246253f2/numba_dppy/dppy_host_fn_call_gen.py#L115
dppy_lowerer.py
defines kernel arguments based on the enum values defined in :
https://github.com/IntelPython/dpctl/blob/8db2b2c389dab0c2e6792c7a10b8d5fbec9125df/dpctl-capi/include/dpctl_sycl_enum_types.h#L71. As can be seen from the enum, "7" points to long
, "9" points to long long
and "11" points to size_t
.
- Please add documentation to the
resolve_and_return_dpctl_type
function explaining how it is looking up the dpctl defined enum values and what the hard-coded integer values mean. - Change this line to use
DPCTL_LONG_LONG
(9) https://github.com/IntelPython/numba-dppy/blob/0105e830a42829ea7206cf4d87d9f9d9246253f2/numba_dppy/dppy_host_fn_call_gen.py#L123 - We are passing all literal integer kernel arguments as 64 bit values and not as
int
. Therefore, this line should also be changed to useDPCTL_LONG_LONG
(9)
https://github.com/IntelPython/numba-dppy/blob/0105e830a42829ea7206cf4d87d9f9d9246253f2/numba_dppy/dppy_host_fn_call_gen.py#L117 - Similarly,
uint32
should also get passed asDPCTL_UNSIGNED_LONG_LONG
(10) or asDPCTL_SIZE_T
(11)
https://github.com/IntelPython/numba-dppy/blob/0105e830a42829ea7206cf4d87d9f9d9246253f2/numba_dppy/dppy_host_fn_call_gen.py#L120 - It should be
DPCTL_SIZE_T
(11) here:
https://github.com/IntelPython/numba-dppy/blob/0105e830a42829ea7206cf4d87d9f9d9246253f2/numba_dppy/dppy_host_fn_call_gen.py#L125 - The name of the variable is confusing as
long
is not guaranteed to be 8 bytes. It should be renamed toint64_t
https://github.com/IntelPython/numba-dppy/blob/0105e830a42829ea7206cf4d87d9f9d9246253f2/numba_dppy/dppy_host_fn_call_gen.py#L55
I have not tested #46 on Windows, but I found at least a semi-reproducer on Linux.
from numba import njit
import dpctl
import numba_dppy
import numpy as np
@njit
def func(b):
a = np.ones((64), dtype=np.float64)
np.sin(a, b)
numba_dppy.compiler.DEBUG = 1
expected = np.ones((64), dtype=np.float64)
got_cpu = np.ones((64), dtype=np.float64)
with dpctl.device_context("opencl:cpu"):
func(got_cpu)
numba_dppy.compiler.DEBUG = 0
func(expected)
print(got_cpu)
print(expected)
Without the changes I get Native API failed. Native API returns: -51 (CL_INVALID_ARG_SIZE) -51 (CL_INVALID_ARG_SIZE)
due to int32
being passed as 32 bit integer. With the above changes, I do not get the error.
We should reevaluate if 32 bit integer values should be upcasted to 64 bit always, but that is a separate discussion.
PS: I still get OMP: Info #275: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
not sure why. But, that is unrelated to this ticket.