Skip to content

Huge overhead on devcloud linked to dpctl calls #945

Closed
@fcharras

Description

@fcharras

Version: numba_0.20.0dev3 and main

The three following dpctl calls 1 2 3 have huge wall time on edge devcloud (measured ranging from 10 to 30ms each call by py-spy, see speedscope report): report download link

On the devcloud this add about 80 seconds to the k-means benchmark (for an expected 10 seconds).

I didn't see the issue on a local machine, but maybe the remaining small overhead that we reported comes from there.

@oleksandr-pavlyk not sure if this should be considered as an unreasonable use in numba_dpex (those calls should be expected to be that long and cached ?) or a bug in dpctl.

I've experimenting with caching the values and can confirm that caching those 3 calls completely remove the overhead.

Regarding the scope of the cache, I'll check if a hotfix that consists in storing those value in a WeakKeyDictionary where keys are val, and usm_mem, and wrapping SyclDevice(device) call in a lru_cache, is enough. (if so, will monkey-patch in sklearn_numba_dpex in the meantime).

Metadata

Metadata

Assignees

No one assigned

    Labels

    userUser submitted issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions