sc.pp.normalize_total does not support dask arrays

- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of scanpy.
- [x] (optional) I have confirmed this bug exists on the master branch of scanpy.

---

### Minimal code sample (that we can copy&paste without having any data)

This issue is not yet visible in any builds, since the error fixed by https://github.com/scverse/anndata/pull/970 prevents running the tests properly. Once that PR’s fix is released in a stable anndata version, we’ll start seeing this error unless we fix it first. (the minimal tests + anndata dev don’t have dask, so we don’t see the error there either)

```console
$ pip install dask
$ pytest -k test_normalize_total
```

The error happens in this line:

https://github.com/scverse/scanpy/blob/3e3427d0072e82144711b581113ee10b873a1ba3/scanpy/preprocessing/_normalization.py#L186

i.e. in this expression: `adata.var_names[~gene_subset]`, where `gene_subset` is a `dask.array<invert, shape=(3,), dtype=bool, chunksize=(3,), chunktype=numpy.ndarray>`

```pytb
IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
```

The traceback pytest shows is a bit incorrect and shows this line instead:

https://github.com/scverse/scanpy/blob/3e3427d0072e82144711b581113ee10b873a1ba3/scanpy/preprocessing/_normalization.py#L185

```pytb
./scanpy/tests/test_normalization.py::test_normalize_total[dask-array-int64] Failed: [undefined]IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
typ = <function from_array at 0x7f6c35f8d940>, dtype = 'int64'

    @pytest.mark.parametrize('dtype', ['float32', 'int64'])
    def test_normalize_total(typ, dtype):
        adata = AnnData(typ(X_total), dtype=dtype)
        sc.pp.normalize_total(adata, key_added='n_counts')
        assert np.allclose(np.ravel(adata.X.sum(axis=1)), [3.0, 3.0, 3.0])
        sc.pp.normalize_total(adata, target_sum=1, key_added='n_counts2')
        assert np.allclose(np.ravel(adata.X.sum(axis=1)), [1.0, 1.0, 1.0])
    
        adata = AnnData(typ(X_frac), dtype=dtype)
>       sc.pp.normalize_total(adata, exclude_highly_expressed=True, max_fraction=0.7)

scanpy/tests/test_normalization.py:45: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
scanpy/preprocessing/_normalization.py:185: in normalize_total
    ' The following highly-expressed genes are not considered during '
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = Index(['0', '1', '2'], dtype='object')
key = dask.array<invert, shape=(3,), dtype=bool, chunksize=(3,), chunktype=numpy.ndarray>

    def __getitem__(self, key):
        """
        Override numpy.ndarray's __getitem__ method to work as desired.
    
        This function adds lists and Series as valid boolean indexers
        (ndarrays only supports ndarray with dtype=bool).
    
        If resulting ndim != 1, plain ndarray is returned instead of
        corresponding `Index` subclass.
    
        """
        getitem = self._data.__getitem__
    
        if is_integer(key) or is_float(key):
            # GH#44051 exclude bool, which would return a 2d ndarray
            key = com.cast_scalar_indexer(key, warn_float=True)
            return getitem(key)
    
        if isinstance(key, slice):
            # This case is separated from the conditional above to avoid
            # pessimization com.is_bool_indexer and ndim checks.
            result = getitem(key)
            # Going through simple_new for performance.
            return type(self)._simple_new(result, name=self._name)
    
        if com.is_bool_indexer(key):
            # if we have list[bools, length=1e5] then doing this check+convert
            #  takes 166 µs + 2.1 ms and cuts the ndarray.__getitem__
            #  time below from 3.8 ms to 496 µs
            # if we already have ndarray[bool], the overhead is 1.4 µs or .25%
            key = np.asarray(key, dtype=bool)
    
>       result = getitem(key)
E       IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed

../../venvs/single-cell/lib/python3.8/site-packages/pandas/core/indexes/base.py:5055: IndexError
```

#### Versions

<details>

-----
anndata     0.9.0rc2.dev18+g7771f6ee
scanpy      1.10.0.dev50+g3e3427d0
-----
PIL                 9.1.1
asciitree           NA
beta_ufunc          NA
binom_ufunc         NA
cffi                1.15.0
cloudpickle         2.2.1
cycler              0.10.0
cython_runtime      NA
dask                2023.3.2
dateutil            2.8.2
defusedxml          0.7.1
entrypoints         0.4
fasteners           0.17.3
h5py                3.7.0
hypergeom_ufunc     NA
igraph              0.10.4
jinja2              3.1.2
joblib              1.1.0
kiwisolver          1.4.3
leidenalg           0.9.1
llvmlite            0.38.1
markupsafe          2.1.1
matplotlib          3.5.2
mpl_toolkits        NA
natsort             8.1.0
nbinom_ufunc        NA
numba               0.55.2
numcodecs           0.10.2
numpy               1.22.4
packaging           21.3
pandas              1.4.3
pkg_resources       NA
psutil              5.9.1
pyparsing           3.0.9
pytz                2022.1
scipy               1.8.1
session_info        1.0.0
setuptools          67.2.0
setuptools_scm      NA
six                 1.16.0
sklearn             1.1.1
sphinxcontrib       NA
texttable           1.6.7
threadpoolctl       3.1.0
tlz                 0.12.0
toolz               0.12.0
typing_extensions   NA
wcwidth             0.2.5
yaml                6.0
zarr                2.12.0
zipp                NA
-----
Python 3.8.16 (default, Dec  7 2022, 12:42:00) [GCC 12.2.0]
Linux-6.2.10-zen1-1-zen-x86_64-with-glibc2.34
-----
Session information updated at 2023-04-11 15:57

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sc.pp.normalize_total does not support dask arrays #2465

Minimal code sample (that we can copy&paste without having any data)

Versions

anndata 0.9.0rc2.dev18+g7771f6ee
scanpy 1.10.0.dev50+g3e3427d0

Python 3.8.16 (default, Dec 7 2022, 12:42:00) [GCC 12.2.0]
Linux-6.2.10-zen1-1-zen-x86_64-with-glibc2.34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sc.pp.normalize_total does not support dask arrays #2465

Description

Minimal code sample (that we can copy&paste without having any data)

Versions

anndata 0.9.0rc2.dev18+g7771f6ee scanpy 1.10.0.dev50+g3e3427d0

Python 3.8.16 (default, Dec 7 2022, 12:42:00) [GCC 12.2.0] Linux-6.2.10-zen1-1-zen-x86_64-with-glibc2.34

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

anndata 0.9.0rc2.dev18+g7771f6ee
scanpy 1.10.0.dev50+g3e3427d0

Python 3.8.16 (default, Dec 7 2022, 12:42:00) [GCC 12.2.0]
Linux-6.2.10-zen1-1-zen-x86_64-with-glibc2.34