- TIA Toolbox version: 1.4
- Python version: ALL
- Operating System: ALL
Description
I think the logic for getting the coordinate list in patch extraction isn't quite right.
The following code:
|
output_x_end = ( |
|
np.ceil(image_shape[0] / patch_output_shape[0]) * patch_output_shape[0] |
|
) |
|
output_x_list = np.arange(0, int(output_x_end), stride_shape[0]) |
|
output_y_end = ( |
|
np.ceil(image_shape[1] / patch_output_shape[1]) * patch_output_shape[1] |
|
) |
|
output_y_list = np.arange(0, int(output_y_end), stride_shape[1]) |
Only works if stride is the same as the patch size. If it isn't, its possible that some patch locations are generated that are entirely outside the slide.
What I Did
An example:
wsi_shape = [43668, 14634]
coords = PatchExtractor.get_coordinates(
image_shape=wsi_shape,
patch_input_shape=(256,256),
stride_shape=(128,128),
input_within_bound=False,
)
np.max(coords, axis=0) # gives array([43648, 14720, 43904, 14976])
note there are patches with top-left y coord of 14720, but the slide dimension is 14634. That means there are patches which lie wholly outside the slide bounds, which should not be happening. (input_within_bounds=False just means we allow patches that overlap with the slide boundary)
This also raises another discussion point. WSIreader.read_rect (or read bounds) will happily read a patch that is entirely outside the bounds of the slide, and will do so silently. Its designed to safely pad regions that overlap the edge of the slide, and that is fine, but I think in most cases, if your code is ending up reading patches from entirely outside the slide, theres something wrong somewhere and it would be good to know that its happening. So we could have a discussion on what the behaviour for this case should be.
potential solution
I think the code should look like:
output_x_end = (
np.ceil(image_shape[0] / stride_shape[0]) * stride_shape[0]
)
output_x_list = np.arange(0, int(output_x_end), stride_shape[0])
output_y_end = (
np.ceil(image_shape[1] / stride_shape[1]) * stride_shape[1]
)
output_y_list = np.arange(0, int(output_y_end), stride_shape[1])
which gives in the above example:
wsi_shape = [43668, 14634]
np.max(coords, axis=0) # gives array([43648, 14592, 43904, 14848])
which correctly has all patches with at least some overlap with the slide.
This removes output_shape from those equations entirely, which I can't see a problem with but i'm also not 100% sure why output_shape was there in the first place so want to make sure i'm not missing anything.
Description
I think the logic for getting the coordinate list in patch extraction isn't quite right.
The following code:
tiatoolbox/tiatoolbox/tools/patchextraction.py
Lines 455 to 462 in eb49f66
Only works if stride is the same as the patch size. If it isn't, its possible that some patch locations are generated that are entirely outside the slide.
What I Did
An example:
note there are patches with top-left y coord of 14720, but the slide dimension is 14634. That means there are patches which lie wholly outside the slide bounds, which should not be happening. (input_within_bounds=False just means we allow patches that overlap with the slide boundary)
This also raises another discussion point. WSIreader.read_rect (or read bounds) will happily read a patch that is entirely outside the bounds of the slide, and will do so silently. Its designed to safely pad regions that overlap the edge of the slide, and that is fine, but I think in most cases, if your code is ending up reading patches from entirely outside the slide, theres something wrong somewhere and it would be good to know that its happening. So we could have a discussion on what the behaviour for this case should be.
potential solution
I think the code should look like:
which gives in the above example:
which correctly has all patches with at least some overlap with the slide.
This removes output_shape from those equations entirely, which I can't see a problem with but i'm also not 100% sure why output_shape was there in the first place so want to make sure i'm not missing anything.