Skip to content

Request: subsetting with numpy arrays/masks #2053

Open
@david-cortes-intel

Description

@david-cortes-intel

ref uxlfoundation/scikit-learn-intelex#2350

Currently, it's not possible to subset dpctl or dpnp arrays with a numpy array of integers:

import numpy as np
import dpnp
X = np.arange(16).reshape((4,4))
dpnp.array(X)[np.arange(2)]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[9], line 4
      2 import dpnp
      3 X = np.arange(16).reshape((4,4))
----> 4 dpnp.array(X)[np.arange(2)]

File /localdisk2/mkl/dcortes/miniforge3/envs/icxconda/lib/python3.12/site-packages/dpnp/dpnp_array.py:355, in dpnp_array.__getitem__(self, key)
    352 """Return ``self[key]``."""
    353 key = _get_unwrapped_index_key(key)
--> 355 item = self._array_obj.__getitem__(key)
    356 return dpnp_array._create_from_usm_ndarray(item)

File dpctl/tensor/_usmarray.pyx:937, in dpctl.tensor._usmarray.usm_ndarray.__getitem__()

File dpctl/tensor/_slicing.pxi:300, in dpctl.tensor._usmarray._basic_slice_meta()

IndexError: Only integers, slices (`:`), ellipsis (`...`), dpctl.tensor.newaxis (`None`) and integer and boolean arrays are valid indices.

Allowing this type of indexing, even if not very efficiently, would be very helpful for scikit-learn-intelex.

This library scikit-learn-intelex is meant to be compatible with the library scikit-learn, by being able to use classes from scikit-learn and scikit-learn-intelex interchangeably, but with scikit-learn-intelex additionally being able to work with dpctl arrays and offering algorithms that run on GPU.

Internally, library scikit-learn (on which scikit-learn-intelex relies) has many functionalities which simply create arrays of indices to subset a larger array, which currently do not work with dpctl/dpnp inputs, as the integer indices are from numpy and the library is not sycl-aware.

Enabling subsetting of dpctl arrays with numpy integer and boolean arrays (as scikit-learn does internally) would immediately enable a lot of useful GPU features on scikit-learn-intelex, such as tuning parameters of machine learning models on GPU, calculating cross-validated metrics, among many others, which currently are not possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions