Working examples demonstrating TIAToolbox computational pathology analysis tools applied to slide microscopy data from NCI Imaging Data Commons (IDC).
TIAToolbox is a comprehensive Python library for computational pathology providing WSI reading, stain normalization, tissue detection, patch classification, semantic segmentation, nucleus instance segmentation, and more -- all with pretrained deep learning models.
Imaging Data Commons (IDC) hosts thousands of publicly accessible slide microscopy images in DICOM WSI format, along with annotations and segmentations. No authentication is required to access IDC data.
TIAToolbox's DICOMWSIReader can directly read IDC's DICOM WSI files, making these two tools naturally complementary. These tutorials show you how to use them together.
| # | Notebook | TIAToolbox Tools | IDC Data | GPU |
|---|---|---|---|---|
| 01 | Reading IDC Slides | WSIReader, DICOMWSIReader | CPTAC lung | No |
| 02 | Tissue Masking & Patches | OtsuTissueMasker, PatchExtractor | CPTAC/TCGA | No |
| 03 | Stain Normalization | Macenko, Reinhard, Vahadane | Two collections | No |
| 04 | Patch Classification | PatchPredictor | TCGA colorectal | Recommended |
| 05 | Semantic Segmentation | SemanticSegmentor | TCGA breast | Recommended |
| 06 | Nucleus Segmentation | NucleusInstanceSegmentor | TCGA | Required |
| 07 | Comparing with IDC Annotations | HoVer-Net, SQLiteStore | TCGA + Pan-Cancer ANN | Recommended |
All notebooks pin tiatoolbox==1.6.0. This is necessary because several workarounds are required for DICOM WSI compatibility at this version, and pinning ensures they remain matched to the API they were tested against.
The following issues affect DICOM WSI files from IDC. Workarounds are implemented directly in the notebooks where needed:
DICOMWSIReaderdoes not populateobjective_powerormpp(all notebooks): The reader may returnNonefor these metadata fields. Workaround: set them manually from IDC'ssm_indexquery results after opening the slide.WSIPatchDatasetrejects directories (notebook 04):PatchPredictorin WSI mode callsPath.is_file()which fails for DICOM WSI directories (which contain multiple.dcmfiles). Workaround: temporarily patchPath.is_fileto also accept directories during thepredict()call.PatchPredictorcreates its ownWSIReaderinternally (notebook 04): Metadata fixes applied to the user's reader don't carry over. Workaround: temporarily patchWSIReader.opento injectobjective_powerandmppinto any newly created reader.PatchPredictorcoordinates are in extraction resolution space (notebook 04): Returned patch coordinates usecoord_space="resolution"(at the requested mpp), not baseline pixel coordinates. Visualization code must usereader.slide_dimensions(resolution=..., units="mpp")andread_bounds(..., coord_space="resolution")accordingly.DICOMWSIReadercoordinate issues at non-baseline resolutions (notebooks 05, 06, 07):read_boundswith resolutions mapping to non-baseline pyramid levels can produce incorrect results. Workaround: read at native resolution and resize manually.
Click the "Open in Colab" badge at the top of any notebook to run it directly in Google Colab. No local setup required.
To run locally:
pip install -r requirements.txt
jupyter notebook notebooks/- A Google account (for Colab) or local Python 3.9+ environment
- No authentication needed for IDC data access
- GPU runtime recommended for notebooks 04-07 (free T4 GPU available on Colab)
| Notebook | Download Size |
|---|---|
| 01 | ~300 MB (1 slide) |
| 02 | ~300 MB (1 slide) |
| 03 | ~600 MB (2 slides) |
| 04 | ~300 MB + model weights |
| 05 | ~300 MB + model weights |
| 06 | ~300 MB + model weights |
| 07 | ~500 MB + model weights |
- IDC Portal - Interactive data exploration
- IDC Documentation - IDC learning resources
- IDC Tutorials - More IDC tutorials
- IDC Forum - Community support
- TIAToolbox Documentation - TIAToolbox API reference
- TIAToolbox Examples - Official TIAToolbox examples
This repository was generated on February 16, 2026 using Claude Code (Anthropic's AI coding agent, model: Claude Opus 4.6).
Process:
- Claude Code researched TIAToolbox capabilities (via web search of GitHub, readthedocs, published papers) and IDC slide microscopy data (via the IDC Claude skill)
- A plan was designed matching TIAToolbox tools to appropriate IDC collections (e.g., Kather100K model with TCGA colorectal data, BCSS model with TCGA breast data)
- All 7 notebooks, README, and supporting files were generated in a single session
- Existing IDC-Tutorials pathomics notebooks were used as reference for IDC API conventions
Colab testing: All 7 notebooks have been tested end-to-end on Google Colab and confirmed to run successfully. Notebook outputs are saved in the repository so users can preview results without running the code. Several dependency and API issues were discovered and fixed during testing (numpy binary incompatibility, missing OpenSlide, zarr/numcodecs version conflict, DICOMWSIReader metadata gaps, PatchPredictor DICOM directory support, coordinate space mismatches). See the Known Issues and Workarounds section and dev/PROCESS.md for details.
If you use these tutorials or the underlying tools in your work, please cite:
IDC:
Fedorov, A., et al. "National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence." RadioGraphics 43.12 (2023). https://doi.org/10.1148/rg.230180
TIAToolbox:
Pocock, J., et al. "TIAToolbox as an end-to-end library for advanced tissue image analytics." Communications Medicine 2, 120 (2022). https://doi.org/10.1038/s43856-022-00186-5
