Skip to content

Efficient whole slide imaging IO #856

@Mr-Milk

Description

@Mr-Milk

Whole slide image (WSI) data plays a significant role in the digital pathology field. However, integrating WSI into SpatialData is quite challenging.

What makes WSI different:

  1. Large file size: WSI data is typically large on disk, ranging from approximately 300 MB to 2 GB per slide, even with JPEG compression.
  2. Proprietary formats: Most formats differ from TIFF, not even including OME-TIFF. Many formats require drivers like OpenSlide or BioFormat to be read.
  3. Read-only: In 99% of cases, users need only to read the WSI data instead of modifying anything.

So far, there are a few attempts to integrate WSI into SpatialData:

  1. DVP image readers lucas-diedrich/spatialdata-io#1
  2. SOPA's reader: https://github.com/gustaveroussy/sopa

The idea is to wrap OpenSlide behind xarray or the zarr store to mimic the image interface in SpatialData. The issue is that this approach creates an unnecessary copy of WSI data when serializing the SpatialData on disk. Without proper compression, this could lead to substantial disk usage. While it is a feasible solution for small datasets like ST with few slides, it becomes impractical in the digital pathology field, which often deals with thousands of slides.

I currently have a solution that extends SpatialData with WSI readers rendeirolab/wsidata. The wsidata will hold a reader object with extra APIs to access WSI images but will not mount the image to the images slot in SpatialData like previous solutions. This way, we can avoid unnecessary data copies during serialization. The main drawback of this solution is that it does not comply with the scverse ecosystem when it encounters anything related to images.

Another potential solution is to create soft links for the WSI image files on disk with SpatialData so that when a user saves a SpatialData object, we do not have to copy the WSI data.

Hi @LucaMarconato, I discussed this with you a few months ago at the scverse conference. Hope we can find a graceful solution soon!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions