This repository contains the Google Colaboratory notebooks and other related materials accompanying the following dataset, which was generated as part of the effort to enrich the public data available within NCI Imaging Data Commons (IDC):
Krishnaswamy, D., Bontempi, D., Clunie, D., Aerts, H. & Fedorov, A. AI-derived annotations for the NLST and NSCLC-Radiomics computed tomography imaging collections. (2022). https://doi.org/10.5281/zenodo.7473970
For more details on how this dataset was generated and how to use it please refer to our publication:
Krishnaswamy D, Bontempi D, Thiriveedhi VK, Punzo D, Clunie D, Bridge CP, Aerts HJ, Kikinis R, Fedorov A. Enrichment of lung cancer computed tomography collections with AI-derived annotations. Scientific Data. 2024 Jan 4;11(1):25.
To generate this dataset, we use publicly available pre-trained AI tools to enhance CT lung cancer collections that are unlabeled or partially labeled. The first tool is the nnU-Net v1 deep learning framework for volumetric segmentation of organs, where we use a pretrained model (Task D18 using the SegTHOR dataset) for labeling volumetric regions in the image corresponding to the heart, trachea, aorta and esophagus. These are the major organs-at-risk for radiation therapy for lung cancer. We further enhance these annotations by computing 3D shape radiomics features using pyradiomics. The second tool is BodyPartRegression - a pretrained model for per-slice automatic labeling of anatomic landmarks and imaged body part regions in axial CT volumes.
We focus on enhancing two publicly available collections, the Non-small Cell Lung Cancer Radiomics (NSCLC-Radiomics collection) (avaialble in TCIA and IDC), and the National Lung Screening Trial (NLST collection) (available in TCIA and IDC). Importantly, the NSLSC-Radiomics collection includes expert-generated manual annotations of several chest organs, allowing us to quantify performance of the AI tools in that subset of data.
While the files corresponding to this dataset can be downloaded from the Zenodo record listed above, it is a lot more convenient to explore the dataset using NCI Imaging Data Commons, where it was included since data release v13, see https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=nnU-Net-BPR-annotations.
Code organization:
- The nnunet, bpr and common directories hold the code and metadata for creating the DICOM Segmentation and SR objects.
- The usage_notebooks directory contains materials to demonstrate how to interact with the data, downloading of data from IDC, conversion of DICOM to alternate medical imaging formats, and visualization using open source tools.
To get started with the dataset, check out this usage notebook. This will allow you to explore the dataset by clicking on points in the bokeh plot and open the corresponding images using viewer links embedded in the plot.
You can also use the pre-generated interactive bokeh plots referenced below. Each of the figures you see below is linked with its interactive version, as demonstrated in the video below.
Click the figures below to interact with them!
Figure 4 - Evaluation of the AI-generated annotations with respect to the expert annotations of the heart for NSCLC-Radiomics.
We also provide the other Dice, Hausdorff distance and Hausdorff 95 metrics for the heart and the esophagus at these links:
- Dice score of the heart
- Hausdorff distance of the heart
- Hausdorff distance 95 of the heart
- Dice score of the esophagus
- Hausdorff distance of the esopahgus
- Hausdorff distance 95 of the esophagus
Figure 5 - Evaluation of the heart sphericity radiomics features from the AI-generated annotations compared to the expert from NSCLC-Radiomics.
Figure 6 - Evaluation of the sphericity radiomics features from the AI-generated annotations from NLST.
Figure 7 - Difference between the expert lung segmentation and the BPR derived lung_start and lung_end landmarks for NSCLC-Radiomics.
Figure 8 - Evaluation of the distribution of distances between the start and end of the lungs in mm for the NLST collection.
Figure 9 - Evaluation of the AI-generated annotations with respect to the expert annotations of the heart for NSCLC-Radiomics.
For any questions related to this dataset, please open an issue in this repository, or post your question in the IDC forum.