Conversation
…st name to be in line with the new method name
src/segtraq/rs/region_similarity.py
Outdated
| and computes a correlation (e.g. Pearson) between the gene expression profiles | ||
| of the cell and that nucleus. | ||
| and computes the similarity (cosine similarity, Pearson correlation, Spearman correlation) | ||
| between the gene expression profiles of the whole cell (including the nucleus) and that nucleus. |
There was a problem hiding this comment.
"Including the nucleus" is a bit misleading. If the nucleus is outside the cell, the part that is outside the cell is not considered.
src/segtraq/rs/region_similarity.py
Outdated
| Returns DataFrame with columns ["cell_id", "best_nuc_id", "IoU", "correlation_parts"]. | ||
| For each cell in the SpatialData table, identifies the nucleus with highest intersection over union (IoU) | ||
| and computes the similarity (cosine similarity, Pearson correlation, Spearman correlation) | ||
| between the gene expression profiles of the cytoplasm (cell - nucleus) and that nucleus. |
There was a problem hiding this comment.
suggestion:
"between the gene expression profiles of the cytoplasm (cell - nucleus) and the cell region overlapping the nucleus.
| Neighborhood radius factor in the same coordinate units as the shapes. | ||
| neighborhood_radius_factor : float, default=2.0 | ||
| For each cell, the neighborhood consists of the cells whose centroids | ||
| lie within the radius of the cell times this factor. |
There was a problem hiding this comment.
I wasn't aware that the neighbor centroids have to lie within that distance. Is that really the case?
There was a problem hiding this comment.
As far as I can tell, this is what you do in rs/utils.py/_compute_ncvs_within_radius().
Code snippet:
# Query neighbors within radius (including itself)
idxs = tree.query_ball_point(coords[i], r=radii[i] * neighborhood_radius_factor)
LazDaria
left a comment
There was a problem hiding this comment.
Looks good to me! I would suggest that you re-run the notebook, after merging into main due to the changes in join_points_regions, which might affect the results.
| total_counts = (counts_intersection_raw + counts_remainder_raw).sum(axis=1).replace(0, np.nan) | ||
| counts_intersection_norm = counts_intersection_raw.div(total_counts, axis=0) * scale | ||
| counts_remainder_norm = counts_remainder_raw.div(total_counts, axis=0) * scale | ||
| counts_intersection_norm = np.log1p(counts_intersection_norm).fillna(0.0) |
There was a problem hiding this comment.
we should come back to this after meeting with WH discussing the normalisation step
This is a refactor of the region similarity (formerly region correlation) module. It renames functions and outputs to be more consistent with the rest of the package. In addition, the logic about assigning nuclei to cells changed slightly (a nucleus can now only be assigned to one cell, not to two). Also reworked the
run_region_similarity()function that runs all methods in the module.