Description
ref uxlfoundation/scikit-learn-intelex#2350
Currently, it's not possible to subset dpctl or dpnp arrays with a numpy array of integers:
import numpy as np
import dpnp
X = np.arange(16).reshape((4,4))
dpnp.array(X)[np.arange(2)]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[9], line 4
2 import dpnp
3 X = np.arange(16).reshape((4,4))
----> 4 dpnp.array(X)[np.arange(2)]
File /localdisk2/mkl/dcortes/miniforge3/envs/icxconda/lib/python3.12/site-packages/dpnp/dpnp_array.py:355, in dpnp_array.__getitem__(self, key)
352 """Return ``self[key]``."""
353 key = _get_unwrapped_index_key(key)
--> 355 item = self._array_obj.__getitem__(key)
356 return dpnp_array._create_from_usm_ndarray(item)
File dpctl/tensor/_usmarray.pyx:937, in dpctl.tensor._usmarray.usm_ndarray.__getitem__()
File dpctl/tensor/_slicing.pxi:300, in dpctl.tensor._usmarray._basic_slice_meta()
IndexError: Only integers, slices (`:`), ellipsis (`...`), dpctl.tensor.newaxis (`None`) and integer and boolean arrays are valid indices.
Allowing this type of indexing, even if not very efficiently, would be very helpful for scikit-learn-intelex.
This library scikit-learn-intelex
is meant to be compatible with the library scikit-learn
, by being able to use classes from scikit-learn and scikit-learn-intelex interchangeably, but with scikit-learn-intelex
additionally being able to work with dpctl
arrays and offering algorithms that run on GPU.
Internally, library scikit-learn (on which scikit-learn-intelex relies) has many functionalities which simply create arrays of indices to subset a larger array, which currently do not work with dpctl
/dpnp
inputs, as the integer indices are from numpy and the library is not sycl-aware.
Enabling subsetting of dpctl arrays with numpy integer and boolean arrays (as scikit-learn does internally) would immediately enable a lot of useful GPU features on scikit-learn-intelex
, such as tuning parameters of machine learning models on GPU, calculating cross-validated metrics, among many others, which currently are not possible.