Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Performance of Resolving Transcript Based Segmentations #197

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jeffquinn-msk
Copy link
Contributor

@jeffquinn-msk jeffquinn-msk commented Feb 3, 2025

Two issues with the solve_conflicts method I noticed:

I've noticed that resolving baysor patches has an ETA of 14 hours in my usage.

The method solve_conflicts merges overlapping segments only between patches, but this still means the number of overlaps is quadratic with the number of patches. I get something like 9 million overlaps, and the iteration of each geometry pair takes forever.

People will have to set a small patch size for baysor because it takes for some reason huge amounts of memory and time to run on larger patches. With 500 micron patches even it can take many hours for each patch. So I assume I won't be the only person to run into this.

I don't think that in the context of baysor merging overlapping segments from different patches is really appropriate. I feel like this would create edge effects. It may be appropriate for the imaging based methods, I'm not sure.

Previously in my work with baysor I have simply only kept the segments that are closest to the centroid of the patch to which they belong. This essentially splits the overlap regions in half and keeps the segments that are furthest from the edge. This has the benefit of being extremely computationally fast and eliminates the potential for edge effects assuming the overlap size is big enough.

In this PR I use this method in solve_conflicts if possible. After it runs any remaining overlapping segments will be merged in the previous way.

@quentinblampey
Copy link
Collaborator

Hi @jeffquinn-msk,

How many cells do you have? I'm surprised about your ETA. Would you mind sharing your shapes (as a GeoPandas parquet file)? This way I can compare the timing of different methods on your test case. But don't worry if it's not possible to share it.

I agree that the current "resolve" function is not really suited for transcript-based method. I already planned to update this after shapely==2.1.0 is released (it will add a quick way to remove the overlap via using the cells' corresponding voronoi polygons). Would that also sound good to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants