Skip to content

Some suggestions and proposals for annotations in SpatialData #975

@selmanozleyen

Description

@selmanozleyen

Hi,

I'd like to first start this conversation then create more specific issues in points you agree with me. I have some suggestions about modifying and generalizing the internals of SpatialData annotations.

One row can only link to one spatial element

Image

Currently, a row in a table can at most only annotate one type of a spatial element. E.g. If sdata['table'][i] annotates sdata['shape'][i], then sdata['table'][i] can't annotate sdata['label'][i].

Take this test code I wrote for example #946

sdata = concatenate(
    {
        "labels": blobs_annotating_element("blobs_labels"),
        "shapes": blobs_annotating_element("blobs_circles"),
        "points": blobs_annotating_element("blobs_points"),
        "multiscale_labels": blobs_annotating_element("blobs_multiscale_labels"),
    },
    concatenate_tables=True,
)
third_elems = sdata.tables["table"].obs["instance_id"] == 3
subset_sdata = subset_sdata_by_table_mask(sdata, "table", third_elems)
# here elements with instance_id 3 are more than one in the table
# just to be able to annotate a cell in another region I had to duplicate the count information etc

My conclusion

Because we store each row-to-row mapping in the table itself we end up having to duplicate count information because we "explode" the table.

One row can only link to one item of a spatial element

One-to-many relationship is something we'd like to actually have for points I think. We already have this implicitly for the labels. And we can support this by just generalizing the current annotation scheme.

My suggestion to solve both issues

Ultimately we want a mapping {src_key: {dst_element_name: (dst_access, dst_kind, link_kind, dst_instance_key)}}.

  • dst_access is the access method of the dst element, for example "value" or "key". Currently for labels we use "value" since there is no columns in a raster image and for shapes and points we use "key" since we have a column in the table
  • dst_kind is the kind of the dst element, for example "labels", "shapes", "points".
  • link_kind is the kind of the link, for example "one-to-one", "one-to-many".
  • dst_instance_key is the key of the dst element if dst_access is "key".

Currently dst_kind serves no purpose as we define the kind of linking we want but I added it for future flexibility.

User interface might look like this.

mapping = {
    "instance_id": {
        "blobs_labels": ("value", "label", "one-to-one", None), 
        "blobs_circles": ("key",   "shape", "one-to-one", ("shape_id",)),
        "parts_of_a_cell": ("key",   "shape", "one-to-many", ("shape_id",)),
        "blobs_points": ("key",   "point", "one-to-many", ("contained_in_shape_id",)),
    },
}
add_links(sdata, "table", mapping)

Stored in exploded normalized form for example sdata.tables["table"].uns["row_mappings"]

| src_instance_key | dst_elem_name | dst_instance_key | dst_access | dst_kind | link_kind |
| "instance_id" | "blobs_labels" | ... | "value" | "label" | "one-to-one" |
| "instance_id" | "blobs_circles" | ... | "key" | "shape" | "one-to-one" |
| "instance_id" | "parts_of_a_cell" | ... | "key" | "shape" | "one-to-many" |
| "instance_id" | "blobs_points" | ... | "key" | "point" | "one-to-many" |

I think we can manage these changes in a backwards compatible way and this will open up a lot of possibilities for future extensions.

Bonus points: we would have easier time achieving this #293 (comment) as well since the mapping descriptions is much smaller than adding a column to the .obs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions