Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): read_lazy for whole AnnData lazy-loading + xarray reading + read_elem_as_dask -> read_elem_lazy #1247

Open
wants to merge 407 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
407 commits
Select commit Hold shift + click to select a range
7c2e4da
Fix Read/Write
flying-sheep Jul 11, 2024
1ba5b99
Fix one more
flying-sheep Jul 11, 2024
49c0d49
unify names
flying-sheep Jul 11, 2024
3666735
claift ReadCallback signature
flying-sheep Jul 11, 2024
3a332ad
Fix type aliases
flying-sheep Jul 11, 2024
d0f4d13
(fix): clean up typing to use `RWAble`
ilan-gold Jul 11, 2024
6e89e14
Merge branch 'main' into ig/protocol_for_callback
ilan-gold Jul 11, 2024
ea29cfa
(fix): use `Union`
ilan-gold Jul 11, 2024
f4ff236
(fix): add qualname override
ilan-gold Jul 11, 2024
f50b286
(fix): ignore dask and masked array
ilan-gold Jul 11, 2024
712e085
(fix): ignore erroneous class warning
ilan-gold Jul 11, 2024
24dd18b
(fix): upgrade `scanpydoc`
ilan-gold Jul 11, 2024
79d3fdc
(fix): use `MutableMapping` instead of `dict` due to broken docstring
ilan-gold Jul 11, 2024
9a2be00
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 11, 2024
d3bcddf
Add data docs
flying-sheep Jul 11, 2024
3bae623
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 11, 2024
84fdc96
Revert "(fix): use `MutableMapping` instead of `dict` due to broken d…
flying-sheep Jul 11, 2024
2608bc3
(fix): add clarification
ilan-gold Jul 11, 2024
e551e18
Simplify
flying-sheep Jul 11, 2024
13e3bb1
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 11, 2024
2935e45
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 11, 2024
bf0be15
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Jul 11, 2024
890b02a
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 11, 2024
9d37fc8
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 12, 2024
1ffe43e
(fix): remove double `dask` intersphinx
ilan-gold Jul 12, 2024
4ab1409
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 12, 2024
f9df5bc
(fix): remove `_types.DaskArray` from type checking block
ilan-gold Jul 12, 2024
a85da39
(refactor): use `block_info` for resolving fetch location
ilan-gold Jul 15, 2024
3bef77c
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Jul 15, 2024
e78334a
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 15, 2024
eb84176
(fix): don't set chunk sizes manually
ilan-gold Jul 15, 2024
899184f
(fix): dtype for reading
ilan-gold Jul 15, 2024
efb70ec
(fix): ignore import cycle problem (why??)
ilan-gold Jul 16, 2024
118f43c
(fix): add issue
ilan-gold Jul 16, 2024
f742a0a
(fix): subclass `Reader` to remove `datasetkwargs`
ilan-gold Jul 18, 2024
ae68731
(fix): add message tp errpr
ilan-gold Jul 18, 2024
f5e7760
Update tests/test_io_elementwise.py
ilan-gold Jul 18, 2024
d3f3624
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 18, 2024
96b13a3
(fix): correct `self.callback` check
ilan-gold Jul 18, 2024
68e507a
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 18, 2024
9c68e36
(fix): erroneous diffs
ilan-gold Jul 18, 2024
410aeda
(fix): extra `read_elem` `dataset_kwargs`
ilan-gold Jul 18, 2024
31a30c4
(fix): remove more `dataset_kwargs` nonsense
ilan-gold Jul 18, 2024
80fe8cb
(chore): add docs
ilan-gold Jul 18, 2024
b314248
(fix): use `block_info` for dense
ilan-gold Jul 18, 2024
02d4735
(fix): more erroneous diffs
ilan-gold Jul 18, 2024
6e5534a
(fix): use context again
ilan-gold Jul 18, 2024
d26cfe8
(fix): change size by dimension in tests
ilan-gold Jul 22, 2024
94e43a3
(refactor): clean up `get_elem_name`
ilan-gold Jul 22, 2024
5160016
(fix): try new sphinx for error
ilan-gold Jul 22, 2024
43da9a3
(fix): return type
ilan-gold Jul 22, 2024
9735ced
(fix): protocol for reading
ilan-gold Jul 22, 2024
f1730c3
(fix): bring back ignored warning
ilan-gold Jul 22, 2024
9861b56
Fix docs
flying-sheep Jul 22, 2024
235096a
almost fix typing
flying-sheep Jul 22, 2024
dce9f07
add wrapper
flying-sheep Jul 22, 2024
2725ef2
move into type checking
flying-sheep Jul 22, 2024
ffe89f0
(fix): small type fxes
ilan-gold Jul 22, 2024
6cb231e
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 22, 2024
029add9
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 22, 2024
75a64fc
block info types
flying-sheep Jul 22, 2024
3f734fe
simplify
flying-sheep Jul 22, 2024
c4c2356
rename
flying-sheep Jul 22, 2024
cc67a9b
simplify more
flying-sheep Jul 22, 2024
fcb1763
(fix): migrate to use `read_elem` infrastructure
ilan-gold Jul 23, 2024
adcd48a
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 23, 2024
2a72ec0
Merge branch 'main' into ig/xarray_compat
ilan-gold Jul 23, 2024
4c659a1
(fix): no first access of categories
ilan-gold Jul 23, 2024
d3a811a
(fix): last small cleanups
ilan-gold Jul 23, 2024
e852a74
(fix): try not runnign `xarray` tests
ilan-gold Jul 23, 2024
8c92a41
(fix): oops! forgot one test to mark!
ilan-gold Jul 23, 2024
47be954
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 5, 2024
55f706f
Update pyproject.toml
ilan-gold Aug 6, 2024
6fa97f0
(fix): change unused category function from method to function
ilan-gold Aug 6, 2024
9e2e21d
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 6, 2024
eb1237c
(fix): actually track keys instead of relying on `deafultdict` behavior
ilan-gold Aug 6, 2024
6724c62
(chore): test unconsolidated warning
ilan-gold Aug 6, 2024
53796a0
Update pyproject.toml
ilan-gold Aug 6, 2024
076b92f
(fix): use `test-full`/`test`
ilan-gold Aug 6, 2024
036ff3f
(fix): typing for `_gen_dataframe`
ilan-gold Aug 6, 2024
9415a14
(chore): imrpoved comments for `Dataset2D`
ilan-gold Aug 6, 2024
b5dfaac
(fix): `iloc` is an `attr` not a `meth`
ilan-gold Aug 6, 2024
d45a2ce
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 13, 2024
cff41c4
(fix): release notes
ilan-gold Aug 13, 2024
3ccbfaf
(fix): `zarr` doc in `read_backed`
ilan-gold Aug 13, 2024
ff4d487
(fix): docs string
ilan-gold Aug 13, 2024
ed8fedf
(fix): wording in release note
ilan-gold Aug 13, 2024
e0e8891
Merge branch 'main' into ig/xarray_compat
flying-sheep Aug 26, 2024
3325f38
(chore): move `_remove_unused_categories` to static method
ilan-gold Aug 27, 2024
528026f
(chore): use one `isinstance` call in `coerce_arrays`
ilan-gold Aug 27, 2024
aa0d161
(chore): clean up `read_dataframe`
ilan-gold Aug 27, 2024
41e3038
(chore): handle case where `chunks` is not needed
ilan-gold Aug 27, 2024
dc5c6e6
(chore): make reusable `LazyDataStructures`
ilan-gold Aug 27, 2024
4edd279
(chore): use `Path.suffix`
ilan-gold Aug 27, 2024
969c6af
(chore): `msg` for `warnings`
ilan-gold Aug 27, 2024
2a31ab8
(chore): remove erroneous `Union` in `TypeVar`
ilan-gold Aug 27, 2024
2521ff8
(fix): use `cached_property` for accessing `dtype` + test
ilan-gold Aug 27, 2024
628f9fc
(refactor): use `cached_property` for `categories`
ilan-gold Aug 27, 2024
ff9412a
(refactor): use guard clause in `__getitem__` better
ilan-gold Aug 27, 2024
36d57be
(chore): type `get_index_dim`
ilan-gold Aug 27, 2024
51610b1
(fix): `shape` return type
ilan-gold Aug 27, 2024
ba8d147
(refactor): `_subset` guard clause
ilan-gold Aug 27, 2024
2cf1262
(fix): use `Counter`
ilan-gold Aug 27, 2024
b1feb6f
(refactor): `fix_known_differences` usage of `as_type`
ilan-gold Aug 27, 2024
02741f5
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 27, 2024
24e8970
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 28, 2024
ab3e718
(chore): fragment
ilan-gold Aug 28, 2024
4412710
(chore): fix the generic problem
ilan-gold Aug 28, 2024
d3401b2
(chore): clean up tests
ilan-gold Aug 28, 2024
ef0bbf3
Merge branch 'main' into ig/xarray_compat
flying-sheep Aug 30, 2024
3b6d194
Update tests/test_read_backed_experimental.py
ilan-gold Aug 30, 2024
4587fec
Merge branch 'main' into ig/xarray_compat
flying-sheep Aug 30, 2024
97eace5
(fix): should -> shall
ilan-gold Sep 2, 2024
58654a8
Apply suggestions from code review
ilan-gold Sep 2, 2024
67af64f
(fix): `_gen_xarray_dict_iterator_from_elems` -> `_gen_xarray_dict_it…
ilan-gold Sep 2, 2024
bfc2e73
(feat): indexing with `DataArray`
ilan-gold Sep 2, 2024
b801d35
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 2, 2024
a1d0b89
(fix): check h5 store
ilan-gold Sep 4, 2024
160b522
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 4, 2024
5f80b61
(fix):check `DataArray` closer
ilan-gold Sep 4, 2024
8ce6409
(fix): clean up `api.md` from merge
ilan-gold Sep 4, 2024
f9ef9f0
(fix): remove `read_elem_as_dask` docs reference
ilan-gold Sep 4, 2024
5e69a50
(chore): add notebooks/read_backed_experimental
ilan-gold Sep 4, 2024
1540d27
(chore): update notebooks
ilan-gold Sep 4, 2024
371fc2b
(refactor): set `pytestmark` at the top
ilan-gold Sep 18, 2024
5cb2d8d
(chore): clarify comment
ilan-gold Sep 18, 2024
b86ee6b
(refactor): add `assert_access_count` method for `AccessTrackingStore`
ilan-gold Sep 18, 2024
227a3c6
(refactor): `read_backed`->`read_lazy`
ilan-gold Sep 18, 2024
3debf9b
(fix): actually `read_backed` -> `read_lazy`
ilan-gold Sep 18, 2024
4b48988
(chore): time to require `aiohttp` `fsspec` and `zarr` and `requests`
ilan-gold Sep 18, 2024
28c95a7
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 18, 2024
cec633a
(chore): update notebook
ilan-gold Sep 18, 2024
97aff04
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 18, 2024
bf710d0
(fix): actually only read `index` once
ilan-gold Sep 20, 2024
1dfebde
(chore): add `concat` test
ilan-gold Sep 23, 2024
d355ed0
(feat): add `columns` compat
ilan-gold Sep 23, 2024
cfae08a
(fix): type of subset
ilan-gold Sep 23, 2024
1c46ec6
(fix): `MaskedArray` `dtype
ilan-gold Sep 23, 2024
f491724
(fix): add `index.setter`
ilan-gold Sep 23, 2024
411bd91
(chore): add `concat` compat for xarray
ilan-gold Sep 23, 2024
7f89eb3
(fix): refactor concat for in-memory
ilan-gold Sep 23, 2024
d77ba37
(chore): add rest of test
ilan-gold Sep 23, 2024
096f2c6
(feat): allow for concat with masked type using dask
ilan-gold Sep 30, 2024
3c7c627
(refactor): own function for concat xarray
ilan-gold Sep 30, 2024
74b1940
(fix): add basic off-axis mapping without reading in i.e just an index
ilan-gold Sep 30, 2024
04206b8
(feat): add merge for alt annot
ilan-gold Sep 30, 2024
a7cbbd8
(fix): notebook
ilan-gold Sep 30, 2024
36bc262
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 30, 2024
af4520a
(fix): NaS in sparse dask
ilan-gold Sep 30, 2024
fa7358f
(chore): add `X` tracker
ilan-gold Sep 30, 2024
4db8c1b
(chore): add more robust matrix type tests
ilan-gold Sep 30, 2024
eefbee6
(fix): ok now notebooks?
ilan-gold Sep 30, 2024
39f2838
(feat): fix additional index load
ilan-gold Oct 1, 2024
dcca711
(feat): no-load index
ilan-gold Oct 1, 2024
d3121b7
(fix): only use range indices for `{obs,var}`
ilan-gold Oct 1, 2024
b25e8ba
(chore): add range index testing
ilan-gold Oct 1, 2024
870a4f2
(fix): ensure `{obs,var}_names` always exists on dataset
ilan-gold Oct 1, 2024
c2681e0
(fix): only use `use_range_index` on `{obs,var}`
ilan-gold Oct 1, 2024
e7a915b
(fix): don't check uniqueness to prevent index load + rename variable
ilan-gold Oct 1, 2024
f93499d
(fix): check for presence of indexing key
ilan-gold Oct 1, 2024
63a4515
always return `index` object
ilan-gold Oct 1, 2024
b1c8c22
(fix): remove unnecessary check?
ilan-gold Oct 1, 2024
d082a35
(chore): update notebook
ilan-gold Oct 1, 2024
e81d155
(fix): docstring class
ilan-gold Oct 1, 2024
c8c5271
Merge branch 'main' into ig/xarray_compat
ilan-gold Oct 1, 2024
c4d0146
(fix): explicit 1d chunking for `concat`
ilan-gold Oct 1, 2024
b34ac0a
(chore): change `concat` test name
ilan-gold Oct 1, 2024
1f4ab92
(chore): rename test file
ilan-gold Oct 1, 2024
d25f559
(chore): rename notebook
ilan-gold Oct 1, 2024
c661d39
(chore): clarify `load_annotation_index`
ilan-gold Oct 1, 2024
bff63cb
(fix): allow concatenation along arbitrary index + indexing test
ilan-gold Oct 1, 2024
784ea9b
(fix): notebook path in docs
ilan-gold Oct 1, 2024
9b2d9a3
Merge branch 'main' into ig/xarray_compat
ilan-gold Oct 1, 2024
489cc8d
(fix): don't copy column in dataset concat
ilan-gold Oct 2, 2024
24f11be
(fix): actually test h5
ilan-gold Oct 2, 2024
57bcfd9
(feat): add h5ad concat support
ilan-gold Oct 2, 2024
8b07f43
(fix): add docsting example
ilan-gold Oct 2, 2024
408d62a
Merge branch 'ig/xarray_compat' of github.com:scverse/anndata into ig…
ilan-gold Oct 2, 2024
cb125bf
(fix): catch warnings
ilan-gold Oct 2, 2024
e14f53f
(fix): format
ilan-gold Oct 2, 2024
3c5641c
(fix): threaded tests annotation index
ilan-gold Oct 2, 2024
dbe09ca
(fix): remove xarray test from minimum deps
ilan-gold Oct 2, 2024
67fc546
(fix): skip experimental backed tests if xarray not installed
ilan-gold Oct 2, 2024
41a9335
(chore): remove todo
ilan-gold Oct 2, 2024
58e595b
(feat): add `uns` reading
ilan-gold Oct 4, 2024
8875374
(feat): add `Raw` reading + tests
ilan-gold Oct 4, 2024
4c991d4
(chore): make `test-full` shorter
ilan-gold Oct 4, 2024
50cdc66
(fix): stricter type checking
ilan-gold Oct 4, 2024
eb881a9
(fix): dtype casting for concat
ilan-gold Oct 4, 2024
fe1f0a6
(chore): separate into two cleaner unit tests
ilan-gold Oct 4, 2024
c0c0c6c
(fix): typing of `make_xarray_extension_dtypes_dask
ilan-gold Oct 4, 2024
fc72011
(chore): remove comment
ilan-gold Oct 4, 2024
562817d
(fix): clean up compat
ilan-gold Oct 4, 2024
9ef7cf5
(chore): xarray raises import error for `read_lazy`
ilan-gold Oct 4, 2024
69b6cc1
(fix) ignore some uncovered lines
flying-sheep Oct 14, 2024
3dfefc4
use existing warn import
flying-sheep Oct 14, 2024
1eb440e
merge
ilan-gold Oct 16, 2024
fe77a5c
(chore): add awkward `nitpick_ignore` comment
ilan-gold Oct 16, 2024
a654421
(refactor): use generator for new datasets
ilan-gold Oct 16, 2024
015bdca
(chore): docs + types in `merge.py`
ilan-gold Oct 16, 2024
7201bad
(recactor): use set for `index_name` in `merge.py`
ilan-gold Oct 16, 2024
5d813c0
(refactor): comprehension for `{alt_}annotations_in_memory`
ilan-gold Oct 16, 2024
b50e8ad
(chore): types in `lazy_methods.py`
ilan-gold Oct 16, 2024
3ca669c
(chore): `lazy_methods.py` index handling made clearer
ilan-gold Oct 16, 2024
7ea20df
(chore): move comment
ilan-gold Oct 16, 2024
91fdb90
(chore): dedupe `read_params` usage
ilan-gold Oct 16, 2024
a71dad8
(chore): `**kwargs` usage doesn't affect call when empty
ilan-gold Oct 16, 2024
0ac61c6
(fix): clean up `_lazy_arrays.py` typing
ilan-gold Oct 16, 2024
af5c2fe
(fix): no assert, raise ValueError
ilan-gold Oct 16, 2024
edac279
(fix): use `get` instead of membership check + no in-place
ilan-gold Oct 16, 2024
cdd9b89
(chore): pytest mark to beginning + `diskfmt` -> `diskfmt` + thread s…
ilan-gold Oct 16, 2024
e07426a
(fix): remove resetting key trackers
ilan-gold Oct 16, 2024
8278e0f
(fix): adata only has 4 cchunks in test, udpate comment
ilan-gold Oct 16, 2024
30d1bb1
(chore): better use arrange-act-assert
ilan-gold Oct 16, 2024
509af7f
(chore): ids for boolean params
ilan-gold Oct 16, 2024
58122a1
(chore): contextlib + better assert objects
ilan-gold Oct 16, 2024
e6fea74
(chore): refactor concatenation for arrange-act-assert
ilan-gold Oct 16, 2024
bd509a1
merge again?
ilan-gold Oct 16, 2024
62cda13
(fix): notebook submodule
ilan-gold Oct 16, 2024
4e1a1f6
(fix): use `find_spec` pattern
ilan-gold Oct 17, 2024
a242dea
(chore): re-insert types for `AccessTrackingStore`
ilan-gold Oct 17, 2024
07caf93
(chore): dedent docstrings
ilan-gold Oct 17, 2024
8fd1fa0
(chore): raise error if slots have changed on `ZarrOrHDF5Wrapper`
ilan-gold Oct 17, 2024
94cf8ea
(fix): add slots to please xarray
ilan-gold Oct 17, 2024
f01818a
Merge branch 'main' into ig/xarray_compat
ilan-gold Oct 17, 2024
aca24db
Merge branch 'main' into ig/xarray_compat
flying-sheep Oct 17, 2024
2c082bf
(chore): remove redefinition
ilan-gold Oct 17, 2024
81c5fb9
(refactor): reuse join type
ilan-gold Oct 18, 2024
bb49dd2
(fix): mixed type dataframe merging
ilan-gold Oct 18, 2024
942661f
(fix): condition for going to memory in mixed typing
ilan-gold Oct 21, 2024
99219c6
(refactor): mixed type helper function
ilan-gold Oct 21, 2024
98197fe
(fix): try linking to dask/awkward in docs build
ilan-gold Oct 21, 2024
752e02b
(fix): awkward array docs
ilan-gold Oct 21, 2024
2a38900
(chore): `ValueError` -> `AssertionError`
ilan-gold Oct 25, 2024
cc40369
(fix): clean up `_lazy_arrays.py`
ilan-gold Oct 25, 2024
a807673
(fix): `ValueError`->`KeyError` for store
ilan-gold Oct 25, 2024
852ab20
(chore): add note about `unify_extension_dtypes`
ilan-gold Oct 25, 2024
310191c
(chore): add ids
ilan-gold Oct 25, 2024
1c15b70
Apply suggestions from code review
ilan-gold Oct 31, 2024
b96bd55
(fix): move all changes form anndata_elem
ilan-gold Oct 31, 2024
a663c5d
Merge branch 'main' into ig/xarray_compat
ilan-gold Oct 31, 2024
d1fce7e
(fix): `read_elem_as_dask`->`read_elem_lazy`
ilan-gold Oct 31, 2024
52b6a01
(chore): refactor `test_read_lazy` fixtures
ilan-gold Oct 31, 2024
a796d9b
Update tests/test_read_lazy.py
ilan-gold Oct 31, 2024
be786d0
Merge branch 'ig/xarray_compat' of github.com:scverse/anndata into ig…
ilan-gold Oct 31, 2024
e48377a
(chore): restore types
ilan-gold Oct 31, 2024
90f6d77
(fix): do `randint`
ilan-gold Oct 31, 2024
8e29713
(chore): ermove slots check
ilan-gold Oct 31, 2024
f13bfb4
(fix): return read_lazy
ilan-gold Oct 31, 2024
1643da6
(fix): concating with hdf5 and cluster obviates need for locks works …
ilan-gold Nov 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
- script: printf "llvmlite>=0.43\nscanpy>=1.10.0rc1" | tee /tmp/constraints.txt
displayName: "Create constraints file for `pre-release` and `latest` jobs"

- script: uv pip install --system --compile "anndata[dev,test] @ ." -c /tmp/constraints.txt
- script: uv pip install --system --compile "anndata[dev,test-full] @ ." -c /tmp/constraints.txt
displayName: "Install dependencies"
condition: eq(variables['DEPENDENCIES_VERSION'], 'latest')

Expand All @@ -65,7 +65,7 @@ jobs:
displayName: "Install minimum dependencies"
condition: eq(variables['DEPENDENCIES_VERSION'], 'minimum')

- script: uv pip install -v --system --compile --pre "anndata[dev,test] @ ." -c /tmp/constraints.txt
- script: uv pip install -v --system --compile --pre "anndata[dev,test-full] @ ." -c /tmp/constraints.txt
displayName: "Install dependencies release candidates"
condition: eq(variables['DEPENDENCIES_VERSION'], 'pre-release')

Expand All @@ -76,6 +76,10 @@ jobs:
displayName: "PyTest"
condition: eq(variables['TEST_TYPE'], 'standard')

- script: pytest
displayName: "PyTest (minimum)"
condition: eq(variables['DEPENDENCIES_VERSION'], 'minimum')

- script: pytest --cov --cov-report=xml --cov-context=test
displayName: "PyTest (coverage)"
condition: eq(variables['TEST_TYPE'], 'coverage')
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ __pycache__/
/*cache/
/node_modules/
/data/
/venv/

# Distribution / packaging
/dist/
Expand Down
6 changes: 5 additions & 1 deletion docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,8 @@ Low level methods for reading and writing elements of an {class}`AnnData` object
.. autosummary::
:toctree: generated/

experimental.read_elem_as_dask
experimental.read_elem_lazy
experimental.read_lazy
```

Utilities for customizing the IO process:
Expand All @@ -156,6 +157,9 @@ Types used by the former:
experimental.ReadCallback
experimental.WriteCallback
experimental.StorageType
experimental.backed._lazy_arrays.MaskedArray
experimental.backed._lazy_arrays.CategoricalArray
experimental.backed._xarray.Dataset2D
Comment on lines +160 to +162
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other lines in this listing are type aliases, protocols, and so on. Are these here classes?

It’s OK to have whatever in experimental, but we need to be very careful about how big our API surface gets when we move these things out of experimental.

Copy link
Contributor Author

@ilan-gold ilan-gold Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, that's an interesting point. I thought that by exporting read_lazy/making it general, I might appease @ivirshup pre-emptively by not creating only a monolith function and making something more composable a la read_elem. But in fact, by doing that, we create more API surface, yes. Dataset2D is something that will be visible on the AnnData object via adata.obs but the others would not be if we didn't make read_lazy so general (i.e., because we added functionality to it from its previous state as read_elem_as_dask). Tough question.

Copy link
Member

@flying-sheep flying-sheep Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in #1616, I think it’s fine to not export the full API surface of the classes we have and just commit to a carefully designed protocol instead.

Copy link
Contributor Author

@ilan-gold ilan-gold Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flying-sheep I have been thinking about this a bit. Would it be acceptable to simply say "we subclass these things, but make no guarantees about the stability of our added APIs - please see the super class' documentation for supported methods." ?

I think this is perfectly fair.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, we don’t need to write anything, people will only see the superclass, i.e. the stable API:

qualname_overrides = {
    ...,
    "anndata.xyz.MaskedArray": "numpy.ma.Array",
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to our sphinx-autodoc-singledispatch issue, I don't think we can do this because we don't install the package unless this can work without the package installed or mocked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, right the intersphinx mapping is independent of installed packages!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @flying-sheep I am not sure qualname_overrides works this way - I tried adding it in but could not get the links to redirect to the xarray docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think they only work for param docs. I should get around to check if I can extend it to work for pure Sphinx as well.

```

## Errors and warnings
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@
("py:class", "numpy.ma.core.MaskedArray"),
("py:class", "dask.array.core.Array"),
("py:class", "awkward.highlevel.Array"),
("py:class", "awkward.Array"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add awkward to the intersphinx mapping instead please

Suggested change
("py:class", "awkward.Array"),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, we can't actually. sphinx-doc/sphinx#10591 prevents us from using autodoc mocking with singledispatch in our codebase. so we could add awkward to docs build or do this.

("py:class", "anndata._core.sparse_dataset.BaseCompressedSparseDataset"),
("py:obj", "numpy._typing._array_like._ScalarType_co"),
# https://github.com/sphinx-doc/sphinx/issues/10974
Expand Down
2 changes: 1 addition & 1 deletion docs/release-notes/0.11.0rc1.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
- `scipy.sparse.csr_array` and `scipy.sparse.csc_array` are now supported when constructing `AnnData` objects {user}`ilan-gold` {user}`isaac-virshup` ({pr}`1028`)
- Allow `axis` parameter of e.g. :func:`anndata.concat` to accept `'obs'` and `'var'` {user}`flying-sheep` ({pr}`1244`)
- Add `settings` object with methods for altering internally-used options, like checking for uniqueness on `obs`' index {user}`ilan-gold` ({pr}`1270`)
- Add :func:`~anndata.experimental.read_elem_lazy` function to handle i/o with sparse and dense arrays {user}`ilan-gold` ({pr}`1469`)
- Add {attr}`~anndata.settings.remove_unused_categories` option to {attr}`anndata.settings` to override current behavior {user}`ilan-gold` ({pr}`1340`)
- Add :func:`~anndata.experimental.read_elem_as_dask` function to handle i/o with sparse and dense arrays {user}`ilan-gold` ({pr}`1469`)
- Add ability to convert strings to categoricals on write in {meth}`~anndata.AnnData.write_h5ad` and {meth}`~anndata.AnnData.write_zarr` via `convert_strings_to_categoricals` parameter {user}` falexwolf` ({pr}`1474`)
- Add {attr}`~anndata.settings.check_uniqueness` option to {attr}`anndata.settings` to override current behavior {user}`ilan-gold` ({pr}`1507`)
- Add functionality to write from GPU {class}`dask.array.Array` to disk {user}`ilan-gold` ({pr}`1550`)
Expand Down
1 change: 1 addition & 0 deletions docs/release-notes/1247.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add {func}`~anndata.experimental.read_elem_lazy` (in place of `read_elem_as_dask`) to handle backed dataframes, sparse arrays, and dense arrays, as well as a {func}`~anndata.experimental.read_lazy` to handle reading in as much of the on-disk data as possible to produce a {class}`~anndata.AnnData` object {user}`ilan-gold`
1 change: 1 addition & 0 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ notebooks/anncollection-annloader
notebooks/anndata_dask_array
notebooks/awkward-arrays
notebooks/{read,write}_dispatched
notebooks/read_lazy
```
2 changes: 1 addition & 1 deletion docs/tutorials/notebooks
Submodule notebooks updated 2 files
+8 −2 .readthedocs.yml
+3,280 −0 read_lazy.ipynb
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ doc = [
"anndata[dev-doc]",
]
dev-doc = ["towncrier>=24.8.0"] # release notes tool
test-full = ["anndata[test,lazy]"]
test = [
"loompy>=3.0.5",
"pytest>=8.2",
Expand All @@ -109,6 +110,7 @@ cu12 = ["cupy-cuda12x"]
cu11 = ["cupy-cuda11x"]
# https://github.com/dask/dask/issues/11290
dask = ["dask[array]>=2022.09.2,<2024.8.0"]
lazy = ["xarray>=2024.06.0", "aiohttp", "requests", "zarr<3.0.0a0", "anndata[dask]"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to do about optional deps here....I think fsspec should be added as well to make it feature complete, or we should remove all of them except xarray

ilan-gold marked this conversation as resolved.
Show resolved Hide resolved

[tool.hatch.version]
source = "vcs"
Expand Down
18 changes: 16 additions & 2 deletions src/anndata/_core/aligned_df.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import annotations

import warnings
from collections.abc import Mapping
from functools import singledispatch
from typing import TYPE_CHECKING

Expand All @@ -10,13 +11,26 @@
from .._warnings import ImplicitModificationWarning

if TYPE_CHECKING:
from collections.abc import Iterable, Mapping
from collections.abc import Iterable
from typing import Any, Literal


@singledispatch
def _gen_dataframe(
anno: Mapping[str, Any],
anno: Any,
index_names: Iterable[str],
*,
source: Literal["X", "shape"],
attr: Literal["obs", "var"],
length: int | None = None,
) -> pd.DataFrame: # pragma: no cover
raise ValueError(f"Cannot convert {type(anno)} to {attr} DataFrame")


@_gen_dataframe.register(Mapping)
@_gen_dataframe.register(type(None))
def _gen_dataframe_mapping(
anno: Mapping[str, Any] | None,
index_names: Iterable[str],
*,
source: Literal["X", "shape"],
Expand Down
11 changes: 6 additions & 5 deletions src/anndata/_core/anndata.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from collections import OrderedDict
from collections.abc import Mapping, MutableMapping, Sequence
from copy import copy, deepcopy
from functools import partial
from functools import partial, singledispatch
from pathlib import Path
from textwrap import dedent
from typing import TYPE_CHECKING
Expand Down Expand Up @@ -41,7 +41,6 @@
from .sparse_dataset import BaseCompressedSparseDataset, sparse_dataset
from .storage import coerce_array
from .views import (
DataFrameView,
DictView,
_resolve_idxs,
as_view,
Expand Down Expand Up @@ -301,8 +300,8 @@ def _init_as_view(self, adata_ref: AnnData, oidx: Index, vidx: Index):
self._remove_unused_categories(adata_ref.obs, obs_sub, uns)
self._remove_unused_categories(adata_ref.var, var_sub, uns)
# set attributes
self._obs = DataFrameView(obs_sub, view_args=(self, "obs"))
self._var = DataFrameView(var_sub, view_args=(self, "var"))
self._obs = as_view(obs_sub, view_args=(self, "obs"))
self._var = as_view(var_sub, view_args=(self, "var"))
self._uns = uns

# set data
Expand Down Expand Up @@ -1023,8 +1022,10 @@ def __getitem__(self, index: Index) -> AnnData:
oidx, vidx = self._normalize_indices(index)
return AnnData(self, oidx=oidx, vidx=vidx, asview=True)

@staticmethod
@singledispatch
def _remove_unused_categories(
self, df_full: pd.DataFrame, df_sub: pd.DataFrame, uns: dict[str, Any]
df_full: pd.DataFrame, df_sub: pd.DataFrame, uns: dict[str, Any]
):
for k in df_full:
if not isinstance(df_full[k].dtype, pd.CategoricalDtype):
Expand Down
15 changes: 11 additions & 4 deletions src/anndata/_core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,13 @@ def _normalize_index(
| pd.Index,
index: pd.Index,
) -> slice | int | np.ndarray: # ndarray of int or bool
if not isinstance(index, pd.RangeIndex):
msg = "Don’t call _normalize_index with non-categorical/string names"
assert index.dtype != float, msg
assert index.dtype != int, msg
from ..experimental.backed._compat import DataArray

# TODO: why is this here? All tests pass without it and it seems at the minimum not strict enough.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this comment. This line was causing problems with the load_annotation_index=True case, if I remember. But all tests pass without this

Copy link
Member

@flying-sheep flying-sheep Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts don’t necessary exist in runtime code (can be optimized out). So if whoever wrote them knows what they’re doing, asserts in runtime code are purely there to make debugging easier in case the asserts fail.

Ruff has a check to disallow asserts, we should probably activate it for non-test code and replace them with raise AssertionError(msg) (which isn’t optimized out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it would be good to understand what we even going on here @ivirshup ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if the tests pass, we're good? And if we're not good, we should add a test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts in runtime code are purely there to make debugging easier in case the asserts fail.

Hmm, so it’s not possible that _normalize_index ever gets called with the wrong type of index as result of user action?

If that’s impossible, feel free to remove. Otherwise we should probably update this check to a TypeError or so.

# if not isinstance(index, pd.RangeIndex):
# msg = "Don’t call _normalize_index with non-categorical/string names and non-range index"
# assert index.dtype != float, msg
# assert index.dtype != int, msg

# the following is insanely slow for sequences,
# we replaced it using pandas below
Expand Down Expand Up @@ -107,6 +110,10 @@ def name_idx(i):
"are not valid obs/ var names or indices."
)
return positions # np.ndarray[int]
elif isinstance(indexer, DataArray):
if isinstance(indexer.data, DaskArray):
return indexer.data.compute()
return indexer.data
else:
raise IndexError(f"Unknown indexer {indexer!r} of type {type(indexer)}")

Expand Down
Loading
Loading