WSI image reader #1504

drbeh · 2021-01-25T21:28:34Z

Is your feature request related to a problem? Please describe.
Whole slide imaging (WSI) in pathology generates huge images which makes is inefficient to load the whole image into memory and extract a small patch for training.

Describe the solution you'd like
There are libraries that are able to more efficiently extract and load small patches into memory from very large images. OpenSlide is the most popular and CuClaraImage is another under-development library. We should have a WSIreader that uses either of them under the hood.

drbeh · 2021-01-25T22:39:15Z

@Nic-Ma, @wyli
As we discussed before, I was implementing WSIReader as one of image reader classes (MONAI/monai/data/image_reader.py). However, I realized that there is a fundamental API difference between WSI libraries (OpenSlide and CuImage) and regular image loading libraries (ITK, nibabel, PIL, etc). In regular images, we first read the image from their filename(s) (with provided parameters) and create an object (or list of objects), and then just by invoking get_data on that object (or list of objects) we fill and load a (list of) numpy array(s) into memory. However, in WSI images, we read the image(s) (without any other parameter than filename) and create an object (or list of objects). Then we instruct separately for each object to where in the image and at what size extract patches and at which level (resolution) provide them as numpy array.

Regular images:

read(self, data: Union[Sequence[str], str], **kwargs)
get_data(self, img)

WSI images:

read(self, data: Union[Sequence[str], str])
get_data(self, img, location, size, level)

Thus, an option is to separate it from image_reader as wsi_reader and create its own abstract class (WSIReader instead of ImageReader ) . What do you think?

Nic-Ma · 2021-01-26T02:43:12Z

Hi @behxyz ,

As the ImageReader base class provides only the abstract APIs, I think you can extend it for your WSIReader:

class WSIReader(ImageReader):
    def read(self, data: str):
        return cuClaraImage.load(data) ...

    def get_data(self, img, location, size, level):
        return extract_patch(img, location, size, level)...

Overloading of functions to support different arg list in the sub-class.
Does it work for your case?

Thanks.

drbeh · 2021-01-26T17:03:32Z

Hi @Nic-Ma,
Thanks, that is the way that initially was going to implement and it definitely works but my concern is more on the different expected behavior from the same API (which is defined by the abstract class). Here, the concept of get_data is beyond just getting data (which is the case in other method) and involves with calculating and extracting regions and sometimes splitting into patches on a grid with cropping and padding parameters. I just didn't want to violate the expectation of a class derived from ImageReader class but if you are fine with it, I can live with that and will implement it by inheriting from the same abstract class.

wyli · 2021-01-26T20:23:50Z

I think get_data should allow for more arguments, perhaps it's time to revisit this thread of discussion: #893 (comment)

Nic-Ma · 2021-01-26T23:52:07Z

Hi @behxyz and @wyli ,

Thanks for the discussion.
I think in our initial discussion about get_data(), we didn't limit the behavior in it, actually, we supposed several different methods to get data in this API (including this Pathology request):
#893 (comment)

Thanks.

Leengit · 2021-01-28T15:03:23Z

We could have WSIReader inherit from ImageReader and have the former implement both an override of ImageReader.get_data and a WSIReader.get_WSI_data method. The WSIReader.get_data override method could be implemented in one of two ways depending on how ambitious we want to be.

It bypasses all the smarts of OpenSlide and reads in the whole slide despite its size.
It doesn't actually read in any pixel values but returns a subclass of the usual class returned by ImageReader.get_data (which maybe is a numpy array?). This subclass would be similar to a numpy view and would have to implement the methods of its base class. In these methods, it wouldn't actually read pixel values from disk except when absolutely necessary. It is otherwise always returning another object of its own type, much like the way that numpy array views are returned rather than array copies whenever reasonable.

The programmer is permitted to check for existence of the WSIReader.get_WSI_data method and use it if present, which argues for giving it a name distinct from get_data. Alternatively the programmer can just make use of the more generic get_data method; if it is an OpenSlide image and if we've gone with option 2 above, this gains much of (or at least some of) the efficiency that OpenSlide provides.

Leengit · 2021-01-28T15:34:15Z

The class returned by Zarr.open is like the lazily loaded whole slide image that I am imagining above. It or something that leverages its implementation might do for Option 2.

Additionally, there is a way to read OpenSlide images via the Zarr interface, provided via

from napari_lazy_openslide.lazy_openslide import OpenSlideStore
# and a subsequent call to zarr.open(...).

It is not as efficient as OpenSlide directly nor as efficient as a normal Zarr directory store, but it could be part of an Option 2 implementation.

The more I think about it, Option 2 would be a lot of work. :-(

drbeh · 2021-02-01T20:15:42Z

@Leengit Thanks for your suggestions. I agree that option 2 requires a lot of work. For option 1, we should also consider that in practice no one reads the whole slide, which is tens of gigabytes in memory. Thus to be consistent with ImageReader API, I am leaning towards expanding the scope of get_data, which is currently returns the entire image for radiology images, and return patches for pathology given the extra parameters. What do you think?

Leengit · 2021-02-02T01:40:10Z

Is there functionality of ImageReader that would be useful for WSIReader to inherit? If so, we could make WSIReader a subclass, and deal with the problem that its interface doesn't support all of the ImageReader interface by throwing explanatory exceptions. In particular, if code tries to load an entire WSI into RAM, we'd throw a clearly worded exception. It won't work the way full inheritance would have worked, but at least the bug won't be subtle and hard to find.

I probably would use a name other than get_data for the WSIReader functionality that goes beyond the ImageReader functionality.

aylward · 2021-02-02T19:13:17Z

I wonder if we should expand this discussion to include Curtis Lisle (clisle@knowledgevis.com) who has been working with the NIH/NCI on whole slide processing (patches and pyramids) using OpenSlide and other formats / libraries. He has been leading multiple efforts for large image (whole slide / pathology / microscopy) AI. He already approached several folks on the MONAI team with suggestions. He might have additional API/use-case considerations to include.

Leengit · 2021-02-03T15:24:20Z

There is a new version of tifffile out, which may be better for WSIs than OpenSlide. Whether it also happens to make the "Option 2" approach any easier ... I don't know.

drbeh · 2021-02-23T18:58:57Z

@aylward @Leengit @wyli @Nic-Ma
We need to move forward with this. I have implemented WSIReader with two options of OpenSlide and CuClaraImage (which will be released soon), and we can add support for tifffile later. Currently, In the current implementation, I expanded the get_data method to load patches (or regions) in a consistent with other image readers. Now if you call get_data with only the image object (similar to the other readers), it will return the whole image (similar to the other readers) although it requires a lot of memory. What do you think? Any volunteer to code review #1548 ?

wyli · 2021-02-23T21:58:01Z

@behxyz in #1548 the tests are skipped, and cuimage is not available, we couldn't verify the code changes at the moment. please let me know if that's not the case.

drbeh · 2021-02-24T14:52:18Z

@wyli
Right, CuImage is not available on the MONAI docker but it is currently accessible via NGC Clara Train EA docker. The most important thing here is to agree on the API so that we can build applications on top of that. If it helps, I can remove CuImage from WSIReader and we test and merge it only for OpenSlide. Then, I can add CuImage in a later PR. What do you think?

wyli · 2021-02-24T14:57:52Z

thanks, if we could test the implementation with an EA docker that's also good @behxyz. we should be able to review the PR now cc@Nic-Ma

drbeh · 2021-02-24T17:14:34Z

@Nic-Ma

Nic-Ma · 2021-02-25T01:54:57Z

Thanks for the discussion to clarify.
Is this PR #1548 ready for review now?

Thanks.

drbeh · 2021-03-02T00:29:42Z

@Nic-Ma,
Yes, it is ready! Thanks

drbeh self-assigned this Jan 25, 2021

drbeh added the enhancement New feature or request label Jan 25, 2021

drbeh mentioned this issue Feb 2, 2021

Implement readers for Whole Slide Images #1537

Closed

7 tasks

drbeh mentioned this issue Feb 4, 2021

WSI reader #1548

Merged

7 tasks

drbeh closed this as completed in #1548 Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WSI image reader #1504

WSI image reader #1504

drbeh commented Jan 25, 2021

drbeh commented Jan 25, 2021 •

edited

Loading

Nic-Ma commented Jan 26, 2021 •

edited

Loading

drbeh commented Jan 26, 2021

wyli commented Jan 26, 2021 •

edited

Loading

Nic-Ma commented Jan 26, 2021

Leengit commented Jan 28, 2021

Leengit commented Jan 28, 2021

drbeh commented Feb 1, 2021

Leengit commented Feb 2, 2021

aylward commented Feb 2, 2021

Leengit commented Feb 3, 2021

drbeh commented Feb 23, 2021

wyli commented Feb 23, 2021

drbeh commented Feb 24, 2021

wyli commented Feb 24, 2021

drbeh commented Feb 24, 2021

Nic-Ma commented Feb 25, 2021

drbeh commented Mar 2, 2021

WSI image reader #1504

WSI image reader #1504

Comments

drbeh commented Jan 25, 2021

drbeh commented Jan 25, 2021 • edited Loading

Nic-Ma commented Jan 26, 2021 • edited Loading

drbeh commented Jan 26, 2021

wyli commented Jan 26, 2021 • edited Loading

Nic-Ma commented Jan 26, 2021

Leengit commented Jan 28, 2021

Leengit commented Jan 28, 2021

drbeh commented Feb 1, 2021

Leengit commented Feb 2, 2021

aylward commented Feb 2, 2021

Leengit commented Feb 3, 2021

drbeh commented Feb 23, 2021

wyli commented Feb 23, 2021

drbeh commented Feb 24, 2021

wyli commented Feb 24, 2021

drbeh commented Feb 24, 2021

Nic-Ma commented Feb 25, 2021

drbeh commented Mar 2, 2021

drbeh commented Jan 25, 2021 •

edited

Loading

Nic-Ma commented Jan 26, 2021 •

edited

Loading

wyli commented Jan 26, 2021 •

edited

Loading