Skip to content

Conversation

selmanozleyen
Copy link
Member

@selmanozleyen selmanozleyen commented Jun 30, 2025

Hi @timtreis @ilan-gold ,

I am working on this PR also and thought maybe it will be smart to have the code I use there supported here because they seem useful at first glance. So I wrote and tested subset_sdata_by_table_mask in this PR.

@LucaMarconato I can't ask for reviews other than @ilan-gold do you know why?


Some notes

  • For PointsModel (GeoDataFrame) it assumes the index is instance_id always
  • For Label2DModel the image is assumed to have the instance_ids as values themselves
  • For Label2DModel when the element is a xr.DataTree it assumes the keys are the different scales
  • I didn't use the relational operations to get the shapes and points for this reason because they assume the merge is on the element indices themselves
  • I didn't add support for AnnData because we would have to also document the return types and stuff in the case of AnnData input which doesn't seem reasonable to me if its going to make it 1-1 same as scanpy.pp.filter_cells

Code excerpt from tests to demonstrate the usage:

sdata = concatenate(
        {
            "labels": blobs_annotating_element("blobs_labels"),
            "shapes": blobs_annotating_element("blobs_circles"),
            "points": blobs_annotating_element("blobs_points"),
            "multiscale_labels": blobs_annotating_element("blobs_multiscale_labels"),
        },
        concatenate_tables=True,
    )
    third_elems = sdata.tables["table"].obs["instance_id"] == 3
    subset_sdata = subset_sdata_by_table_mask(sdata, "table", third_elems)

    labels_remaining_ids = set(np.unique(subset_sdata.labels["blobs_labels-labels"].data.compute())) - {0}
    assert labels_remaining_ids == {3}

    for scale in subset_sdata.labels["blobs_multiscale_labels-multiscale_labels"]:
        ms_labels_remaining_ids = set(
            np.unique(subset_sdata.labels["blobs_multiscale_labels-multiscale_labels"][scale].image.compute())
        ) - {0}
        assert ms_labels_remaining_ids == {3}

    points_remaining_ids = set(np.unique(subset_sdata.points["blobs_points-points"]["instance_id"].compute())) - {0}
    assert points_remaining_ids == {3}

    shapes_remaining_ids = set(np.unique(subset_sdata.shapes["blobs_circles-shapes"].index)) - {0}
    assert shapes_remaining_ids == {3}

Copy link

codecov bot commented Jun 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.44%. Comparing base (3f01688) to head (b4901cb).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #946      +/-   ##
==========================================
+ Coverage   92.36%   92.44%   +0.08%     
==========================================
  Files          48       48              
  Lines        7416     7470      +54     
==========================================
+ Hits         6850     6906      +56     
+ Misses        566      564       -2     
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 96.42% <ø> (ø)
src/spatialdata/_core/query/relational_query.py 92.44% <100.00%> (+1.31%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@timtreis timtreis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this PR make use of

and ?

from spatialdata.datasets import blobs_annotating_element


def test_filter_labels2dmodel_by_instance_ids():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parametrise, don't loop over inputs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the loop here is for the multiple scales. Or did you mean something else?

preserved_ids = np.unique(labels_element[scale].image.compute())
assert filtered_ids == (set(all_instance_ids) - {2, 3})
# check if there is modification of the original labels
assert set(preserved_ids) == set(all_instance_ids) | {0}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use np.testing instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but these are all simple set comparisons. would have to order then compare the matrices which seems convoluted. But if you think the same still I can do it


@_filter_by_instance_ids.register(DataArray)
def _(element: DataArray, ids_to_remove: list[int], instance_key: str) -> DataArray:
del instance_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean why del instance_key? It is to explicitly clarify that I won't be using it for this dispatch

@selmanozleyen
Copy link
Member Author

selmanozleyen commented Jul 10, 2025

@timtreis About using def match_sdata_to_table( and the other. I noticed for shapes the index is assumed to be the instance_id but this doesn't match how the blobs are filled and would fail in the tests. It was documented that way so I assumed this was intentional but match_sdata_to_table wouldn't (and didn't) pass the current tests because it assumed the element index was the instance_id.

@selmanozleyen
Copy link
Member Author

@timtreis now I make use of match_element_to_table. Other than one comment all seems resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants