-
Notifications
You must be signed in to change notification settings - Fork 70
Description
Hi,
I'd like to first start this conversation then create more specific issues in points you agree with me. I have some suggestions about modifying and generalizing the internals of SpatialData
annotations.
One row can only link to one spatial element
Currently, a row in a table can at most only annotate one type of a spatial element. E.g. If sdata['table'][i]
annotates sdata['shape'][i]
, then sdata['table'][i]
can't annotate sdata['label'][i]
.
Take this test code I wrote for example #946
sdata = concatenate(
{
"labels": blobs_annotating_element("blobs_labels"),
"shapes": blobs_annotating_element("blobs_circles"),
"points": blobs_annotating_element("blobs_points"),
"multiscale_labels": blobs_annotating_element("blobs_multiscale_labels"),
},
concatenate_tables=True,
)
third_elems = sdata.tables["table"].obs["instance_id"] == 3
subset_sdata = subset_sdata_by_table_mask(sdata, "table", third_elems)
# here elements with instance_id 3 are more than one in the table
# just to be able to annotate a cell in another region I had to duplicate the count information etc
My conclusion
Because we store each row-to-row mapping in the table itself we end up having to duplicate count information because we "explode" the table.
One row can only link to one item of a spatial element
One-to-many relationship is something we'd like to actually have for points I think. We already have this implicitly for the labels. And we can support this by just generalizing the current annotation scheme.
My suggestion to solve both issues
Ultimately we want a mapping {src_key: {dst_element_name: (dst_access, dst_kind, link_kind, dst_instance_key)}}
.
dst_access
is the access method of the dst element, for example"value"
or"key"
. Currently forlabels
we use"value"
since there is no columns in a raster image and forshapes
andpoints
we use"key"
since we have a column in the tabledst_kind
is the kind of the dst element, for example"labels"
,"shapes"
,"points"
.link_kind
is the kind of the link, for example"one-to-one"
,"one-to-many"
.dst_instance_key
is the key of the dst element ifdst_access
is"key"
.
Currently dst_kind
serves no purpose as we define the kind of linking we want but I added it for future flexibility.
User interface might look like this.
mapping = {
"instance_id": {
"blobs_labels": ("value", "label", "one-to-one", None),
"blobs_circles": ("key", "shape", "one-to-one", ("shape_id",)),
"parts_of_a_cell": ("key", "shape", "one-to-many", ("shape_id",)),
"blobs_points": ("key", "point", "one-to-many", ("contained_in_shape_id",)),
},
}
add_links(sdata, "table", mapping)
Stored in exploded normalized form for example sdata.tables["table"].uns["row_mappings"]
| src_instance_key | dst_elem_name | dst_instance_key | dst_access | dst_kind | link_kind |
| "instance_id" | "blobs_labels" | ... | "value" | "label" | "one-to-one" |
| "instance_id" | "blobs_circles" | ... | "key" | "shape" | "one-to-one" |
| "instance_id" | "parts_of_a_cell" | ... | "key" | "shape" | "one-to-many" |
| "instance_id" | "blobs_points" | ... | "key" | "point" | "one-to-many" |
I think we can manage these changes in a backwards compatible way and this will open up a lot of possibilities for future extensions.
Bonus points: we would have easier time achieving this #293 (comment) as well since the mapping descriptions is much smaller than adding a column to the .obs