Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Experiment-level upgrader/resizer #3157

Merged
merged 4 commits into from
Oct 10, 2024
Merged

Conversation

johnkerl
Copy link
Member

@johnkerl johnkerl commented Oct 9, 2024

Issue and/or context: As tracked on issue #2407 / [sc-51048].

Note that the intended Python and R API changes are all agreed on and finalized as described in #2407.

Changes:

Tests the experiment-level upgrader and resizer.

Notes for Reviewer:

There is more to do in tiledbsoma.io for append-mode ingest on #3148.

Example verbose output:

uri = "/var/ns/pbmc3k"
tiledbsoma.io.show_experiment_shapes(uri)
[DataFrame] obs
  URI file:///var/ns/pbmc3k/obs
  count                2638
  domain               ((0, 9223372036854773758),)
  maxdomain            ((0, 9223372036854773758),)
  upgraded             False

[DataFrame] ms/RNA/var
  URI file:///var/ns/pbmc3k/ms/RNA/var
  count                1838
  domain               ((0, 9223372036854773758),)
  maxdomain            ((0, 9223372036854773758),)
  upgraded             False

[SparseNDArray] ms/RNA/X/data
  URI file:///var/ns/pbmc3k/ms/RNA/X/data
  used_shape           ((0, 2637), (0, 1837))
  shape                (9223372036854773759, 9223372036854773759)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             False

[SparseNDArray] ms/RNA/obsm/X_draw_graph_fr
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_draw_graph_fr
  used_shape           ((0, 2637), (0, 1))
  shape                (9223372036854773759, 9223372036854773759)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             False

[SparseNDArray] ms/RNA/obsm/X_pca
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_pca
  used_shape           ((0, 2637), (0, 49))
  shape                (9223372036854773759, 9223372036854773759)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             False

[SparseNDArray] ms/RNA/obsm/X_tsne
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_tsne
  used_shape           ((0, 2637), (0, 1))
  shape                (9223372036854773759, 9223372036854773759)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             False

[SparseNDArray] ms/RNA/obsm/X_umap
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_umap
  used_shape           ((0, 2637), (0, 1))
  shape                (9223372036854773759, 9223372036854773759)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             False

[SparseNDArray] ms/RNA/obsp/connectivities
  URI file:///var/ns/pbmc3k/ms/RNA/obsp/connectivities
  used_shape           ((0, 2637), (0, 2637))
  shape                (9223372036854773759, 9223372036854773759)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             False

[SparseNDArray] ms/RNA/obsp/distances
  URI file:///var/ns/pbmc3k/ms/RNA/obsp/distances
  used_shape           ((0, 2637), (0, 2637))
  shape                (9223372036854773759, 9223372036854773759)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             False

[SparseNDArray] ms/RNA/varm/PCs
  URI file:///var/ns/pbmc3k/ms/RNA/varm/PCs
  used_shape           ((0, 1837), (0, 49))
  shape                (9223372036854773759, 9223372036854773759)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             False

[DataFrame] ms/raw/var
  URI file:///var/ns/pbmc3k/ms/raw/var
  count                13714
  domain               ((0, 9223372036854773758),)
  maxdomain            ((0, 9223372036854773758),)
  upgraded             False

[SparseNDArray] ms/raw/X/data
  URI file:///var/ns/pbmc3k/ms/raw/X/data
  used_shape           ((0, 2637), (0, 13713))
  shape                (9223372036854773759, 9223372036854773759)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             False
uri = "/var/ns/pbmc3k"
tiledbsoma.io.upgrade_experiment_shapes(uri, verbose=True)
[DataFrame] obs
  URI file:///var/ns/pbmc3k/obs
  Applying upgrade_soma_joinid_shape(2638)

[DataFrame] ms/RNA/var
  URI file:///var/ns/pbmc3k/ms/RNA/var
  Applying upgrade_soma_joinid_shape(1838)

[SparseNDArray] ms/RNA/X/data
  URI file:///var/ns/pbmc3k/ms/RNA/X/data
  Applying upgrade_shape((2638, 1838))

[SparseNDArray] ms/RNA/obsm/X_draw_graph_fr
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_draw_graph_fr
  Applying upgrade_shape((2638, 2))

[SparseNDArray] ms/RNA/obsm/X_pca
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_pca
  Applying upgrade_shape((2638, 50))

[SparseNDArray] ms/RNA/obsm/X_tsne
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_tsne
  Applying upgrade_shape((2638, 2))

[SparseNDArray] ms/RNA/obsm/X_umap
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_umap
  Applying upgrade_shape((2638, 2))

[SparseNDArray] ms/RNA/obsp/connectivities
  URI file:///var/ns/pbmc3k/ms/RNA/obsp/connectivities
  Applying upgrade_shape((2638, 2638))

[SparseNDArray] ms/RNA/obsp/distances
  URI file:///var/ns/pbmc3k/ms/RNA/obsp/distances
  Applying upgrade_shape((2638, 2638))

[SparseNDArray] ms/RNA/varm/PCs
  URI file:///var/ns/pbmc3k/ms/RNA/varm/PCs
  Applying upgrade_shape((1838, 50))

[DataFrame] ms/raw/var
  URI file:///var/ns/pbmc3k/ms/raw/var
  Applying upgrade_soma_joinid_shape(13714)

[SparseNDArray] ms/raw/X/data
  URI file:///var/ns/pbmc3k/ms/raw/X/data
  Applying upgrade_shape((2638, 13714))
uri = "/var/ns/pbmc3k"
tiledbsoma.io.resize_experiment(uri, 2638, {"RNA": 5000, "raw": 13714})
[DataFrame] obs
  URI file:///var/ns/pbmc3k/obs
  Applying resize_soma_joinid_shape(2638)

[DataFrame] ms/RNA/var
  URI file:///var/ns/pbmc3k/ms/RNA/var
  Applying resize_soma_joinid_shape(5000)

[SparseNDArray] ms/RNA/X/data
  URI file:///var/ns/pbmc3k/ms/RNA/X/data
  Applying resize((2638, 5000))

[SparseNDArray] ms/RNA/obsm/X_draw_graph_fr
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_draw_graph_fr
  Applying resize((2638, 2))

[SparseNDArray] ms/RNA/obsm/X_pca
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_pca
  Applying resize((2638, 50))

[SparseNDArray] ms/RNA/obsm/X_tsne
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_tsne
  Applying resize((2638, 2))

[SparseNDArray] ms/RNA/obsm/X_umap
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_umap
  Applying resize((2638, 2))

[SparseNDArray] ms/RNA/obsp/connectivities
  URI file:///var/ns/pbmc3k/ms/RNA/obsp/connectivities
  Applying resize((2638, 2638))

[SparseNDArray] ms/RNA/obsp/distances
  URI file:///var/ns/pbmc3k/ms/RNA/obsp/distances
  Applying resize((2638, 2638))

[SparseNDArray] ms/RNA/varm/PCs
  URI file:///var/ns/pbmc3k/ms/RNA/varm/PCs
  Applying resize((5000, 50))

[DataFrame] ms/raw/var
  URI file:///var/ns/pbmc3k/ms/raw/var
  Applying resize_soma_joinid_shape(13714)

[SparseNDArray] ms/raw/X/data
  URI file:///var/ns/pbmc3k/ms/raw/X/data
  Applying resize((2638, 13714))
uri = "/var/ns/pbmc3k"
tiledbsoma.io.show_experiment_shapes(uri)
[DataFrame] obs
  URI file:///var/ns/pbmc3k/obs
  count                2638
  domain               ((0, 2637),)
  maxdomain            ((0, 9223372036854773758),)
  upgraded             True

[DataFrame] ms/RNA/var
  URI file:///var/ns/pbmc3k/ms/RNA/var
  count                1838
  domain               ((0, 4999),)
  maxdomain            ((0, 9223372036854773758),)
  upgraded             True

[SparseNDArray] ms/RNA/X/data
  URI file:///var/ns/pbmc3k/ms/RNA/X/data
  used_shape           ((0, 2637), (0, 1837))
  shape                (2638, 5000)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

[SparseNDArray] ms/RNA/obsm/X_draw_graph_fr
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_draw_graph_fr
  used_shape           ((0, 2637), (0, 1))
  shape                (2638, 2)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

[SparseNDArray] ms/RNA/obsm/X_pca
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_pca
  used_shape           ((0, 2637), (0, 49))
  shape                (2638, 50)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

[SparseNDArray] ms/RNA/obsm/X_tsne
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_tsne
  used_shape           ((0, 2637), (0, 1))
  shape                (2638, 2)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

[SparseNDArray] ms/RNA/obsm/X_umap
  URI file:///var/ns/pbmc3k/ms/RNA/obsm/X_umap
  used_shape           ((0, 2637), (0, 1))
  shape                (2638, 2)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

[SparseNDArray] ms/RNA/obsp/connectivities
  URI file:///var/ns/pbmc3k/ms/RNA/obsp/connectivities
  used_shape           ((0, 2637), (0, 2637))
  shape                (2638, 2638)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

[SparseNDArray] ms/RNA/obsp/distances
  URI file:///var/ns/pbmc3k/ms/RNA/obsp/distances
  used_shape           ((0, 2637), (0, 2637))
  shape                (2638, 2638)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

[SparseNDArray] ms/RNA/varm/PCs
  URI file:///var/ns/pbmc3k/ms/RNA/varm/PCs
  used_shape           ((0, 1837), (0, 49))
  shape                (5000, 50)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

[DataFrame] ms/raw/var
  URI file:///var/ns/pbmc3k/ms/raw/var
  count                13714
  domain               ((0, 13713),)
  maxdomain            ((0, 9223372036854773758),)
  upgraded             True

[SparseNDArray] ms/raw/X/data
  URI file:///var/ns/pbmc3k/ms/raw/X/data
  used_shape           ((0, 2637), (0, 13713))
  shape                (2638, 13714)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

Base automatically changed from kerl/py-exp-shaping to main October 9, 2024 22:01
apis/python/src/tiledbsoma/io/shaping.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/io/shaping.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/io/shaping.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/io/shaping.py Outdated Show resolved Hide resolved
@johnkerl
Copy link
Member Author

johnkerl commented Oct 9, 2024

@nguyenv I decided not to go with attrs over TypedDict -- sort of a po-tay-to po-tah-to ... things that were working before the change didn't work after & I didn't see the benefit of debugging ... (I did change args["foo"] to args.foo etc -- there were other errors)

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

Attention: Patch coverage is 84.83412% with 32 lines in your changes missing coverage. Please review.

Project coverage is 83.30%. Comparing base (7b09dd7) to head (dcecc32).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3157      +/-   ##
==========================================
+ Coverage   83.01%   83.30%   +0.29%     
==========================================
  Files          50       51       +1     
  Lines        5247     5458     +211     
==========================================
+ Hits         4356     4547     +191     
- Misses        891      911      +20     
Flag Coverage Δ
python 83.30% <84.83%> (+0.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 83.30% <84.83%> (+0.29%) ⬆️
libtiledbsoma ∅ <ø> (∅)

Copy link
Member

@nguyenv nguyenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; Internally, there isn't much difference between using attrs vs TypedDict, so it is fine keeping it as is. I do think attrs does give us additional advantages like providing more enhanced type checking and default value generators which could simplify the code here but not substantial that it warrants changing.

@johnkerl johnkerl merged commit 01aba35 into main Oct 10, 2024
11 checks passed
@johnkerl johnkerl deleted the kerl/py-exp-shaping2 branch October 10, 2024 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants