-
Notifications
You must be signed in to change notification settings - Fork 706
Description
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of scanpy.
- (optional) I have confirmed this bug exists on the master branch of scanpy.
Minimal code sample (that we can copy&paste without having any data)
This issue is not yet visible in any builds, since the error fixed by scverse/anndata#970 prevents running the tests properly. Once that PR’s fix is released in a stable anndata version, we’ll start seeing this error unless we fix it first. (the minimal tests + anndata dev don’t have dask, so we don’t see the error there either)
$ pip install dask
$ pytest -k test_normalize_totalThe error happens in this line:
scanpy/scanpy/preprocessing/_normalization.py
Line 186 in 3e3427d
| f'normalization factor computation:\n{adata.var_names[~gene_subset].tolist()}' |
i.e. in this expression: adata.var_names[~gene_subset], where gene_subset is a dask.array<invert, shape=(3,), dtype=bool, chunksize=(3,), chunktype=numpy.ndarray>
IndexError: too many indices for array: array is 1-dimensional, but 3 were indexedThe traceback pytest shows is a bit incorrect and shows this line instead:
scanpy/scanpy/preprocessing/_normalization.py
Line 185 in 3e3427d
| ' The following highly-expressed genes are not considered during ' |
./scanpy/tests/test_normalization.py::test_normalize_total[dask-array-int64] Failed: [undefined]IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
typ = <function from_array at 0x7f6c35f8d940>, dtype = 'int64'
@pytest.mark.parametrize('dtype', ['float32', 'int64'])
def test_normalize_total(typ, dtype):
adata = AnnData(typ(X_total), dtype=dtype)
sc.pp.normalize_total(adata, key_added='n_counts')
assert np.allclose(np.ravel(adata.X.sum(axis=1)), [3.0, 3.0, 3.0])
sc.pp.normalize_total(adata, target_sum=1, key_added='n_counts2')
assert np.allclose(np.ravel(adata.X.sum(axis=1)), [1.0, 1.0, 1.0])
adata = AnnData(typ(X_frac), dtype=dtype)
> sc.pp.normalize_total(adata, exclude_highly_expressed=True, max_fraction=0.7)
scanpy/tests/test_normalization.py:45:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
scanpy/preprocessing/_normalization.py:185: in normalize_total
' The following highly-expressed genes are not considered during '
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Index(['0', '1', '2'], dtype='object')
key = dask.array<invert, shape=(3,), dtype=bool, chunksize=(3,), chunktype=numpy.ndarray>
def __getitem__(self, key):
"""
Override numpy.ndarray's __getitem__ method to work as desired.
This function adds lists and Series as valid boolean indexers
(ndarrays only supports ndarray with dtype=bool).
If resulting ndim != 1, plain ndarray is returned instead of
corresponding `Index` subclass.
"""
getitem = self._data.__getitem__
if is_integer(key) or is_float(key):
# GH#44051 exclude bool, which would return a 2d ndarray
key = com.cast_scalar_indexer(key, warn_float=True)
return getitem(key)
if isinstance(key, slice):
# This case is separated from the conditional above to avoid
# pessimization com.is_bool_indexer and ndim checks.
result = getitem(key)
# Going through simple_new for performance.
return type(self)._simple_new(result, name=self._name)
if com.is_bool_indexer(key):
# if we have list[bools, length=1e5] then doing this check+convert
# takes 166 µs + 2.1 ms and cuts the ndarray.__getitem__
# time below from 3.8 ms to 496 µs
# if we already have ndarray[bool], the overhead is 1.4 µs or .25%
key = np.asarray(key, dtype=bool)
> result = getitem(key)
E IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
../../venvs/single-cell/lib/python3.8/site-packages/pandas/core/indexes/base.py:5055: IndexErrorVersions
Details
anndata 0.9.0rc2.dev18+g7771f6ee
scanpy 1.10.0.dev50+g3e3427d0
PIL 9.1.1
asciitree NA
beta_ufunc NA
binom_ufunc NA
cffi 1.15.0
cloudpickle 2.2.1
cycler 0.10.0
cython_runtime NA
dask 2023.3.2
dateutil 2.8.2
defusedxml 0.7.1
entrypoints 0.4
fasteners 0.17.3
h5py 3.7.0
hypergeom_ufunc NA
igraph 0.10.4
jinja2 3.1.2
joblib 1.1.0
kiwisolver 1.4.3
leidenalg 0.9.1
llvmlite 0.38.1
markupsafe 2.1.1
matplotlib 3.5.2
mpl_toolkits NA
natsort 8.1.0
nbinom_ufunc NA
numba 0.55.2
numcodecs 0.10.2
numpy 1.22.4
packaging 21.3
pandas 1.4.3
pkg_resources NA
psutil 5.9.1
pyparsing 3.0.9
pytz 2022.1
scipy 1.8.1
session_info 1.0.0
setuptools 67.2.0
setuptools_scm NA
six 1.16.0
sklearn 1.1.1
sphinxcontrib NA
texttable 1.6.7
threadpoolctl 3.1.0
tlz 0.12.0
toolz 0.12.0
typing_extensions NA
wcwidth 0.2.5
yaml 6.0
zarr 2.12.0
zipp NA
Python 3.8.16 (default, Dec 7 2022, 12:42:00) [GCC 12.2.0]
Linux-6.2.10-zen1-1-zen-x86_64-with-glibc2.34
Session information updated at 2023-04-11 15:57