Description
Version: numba_0.20.0dev3 and main
The three following dpctl calls 1 2 3 have huge wall time on edge devcloud (measured ranging from 10 to 30ms each call by py-spy
, see speedscope report):
On the devcloud this add about 80 seconds to the k-means benchmark (for an expected 10 seconds).
I didn't see the issue on a local machine, but maybe the remaining small overhead that we reported comes from there.
@oleksandr-pavlyk not sure if this should be considered as an unreasonable use in numba_dpex
(those calls should be expected to be that long and cached ?) or a bug in dpctl
.
I've experimenting with caching the values and can confirm that caching those 3 calls completely remove the overhead.
Regarding the scope of the cache, I'll check if a hotfix that consists in storing those value in a WeakKeyDictionary
where keys are val
, and usm_mem
, and wrapping SyclDevice(device)
call in a lru_cache, is enough. (if so, will monkey-patch in sklearn_numba_dpex
in the meantime).