Diagnosis of histologic growth patterns of lung cancer in digital slides using deep learning.
Classification of colon and lung cancer types in digital slides using deep learning.
Classification of lung cancer types in digital slides using deep learning.
- General Information
- Requirements
- Installation
- Usage
- Known Issues and Limitations
- Future Work
- Sources
- This is a final year graduation project.
1-Growth patterns classification :
- We are using 26 whole-slide images obtained from The Cancer Genome Atlas (LUAD).
- You can download the images using the manifest file.
- Download annotations from here.
- Annotations were acquired from the second source.
- Distribution of data among histologic patterns is as follows:
Histologic pattern | ACINAR | CRIBRIFORM | MACROPAPILLARY | NON CANCEROUS | SOLID |
---|---|---|---|---|---|
Crops | 22 | 4 | 23 | 53 | 85 |
Patches (no-overlap) | 1328 | 85 | 4053 | 6222 | 5207 |
Patches (overlap) | 5277 (0.45) | 921 (0.3) | 5706 (0.75) | 6222 (1) | 5207 (1) |
- We augmented data to 10000 patches per class.
2-Colon lung cancer classification :
- We used the Lung and Colon Cancer Histopathological Image Dataset (LC25000) dataset which contains 25000 patches of sizes (7687683).
- We augmented the data to 10000 patches per class.
3-Lung cancer classification :
- We used the same dataset (LC25000) but we only worked with the lung cancer image sets.
- We did not perform any data augmentation.
- OpenSlide Python
- PIL
- Python 3.6
- PyTorch
- pytorch-gradcam
- scikit-image
- scikit-learn
- seaborn
- tensorboard
- torchvision
sudo pip3 install -r requirements.txt
git clone https://github.com/Ala-Eddine-BOUDEMIA/Lung-Cancer-Diagnosis.git
cd Lung-Cancer-Diagnosis/2-PFE_Modification/
Codes
│
├────── 1-Growth patterns classification
│ │
│ ├────── All_WSI
│ │ ├──── Folder_1/Image_1.svs
│ │ ├──── Folder_2/Image_2.svs
│ │ └──── Folder_n/Image_n.svs
│ │
│ ├────── Annotations
│ │ ├──── Annotation_Image_1.xml
│ │ ├──── Annotation_Image_2.xml
│ │ └──── Annotation_Image_n.xml
│ │
│ ├────── CSV_files
│ │ ├──── Annotations
│ │ ├──── Diagnostics
│ │ └──── Predictions
│ │
│ ├────── Patches
│ │ ├──── ACINAR
│ │ ├──── CRIB
│ │ ├──── MICROPAP
│ │ ├──── NC
│ │ └──── SOLID
│ │
│ ├────── Train_folder
│ │ ├────── Model
│ │ │ ├──── Best_model_weights
│ │ │ └──── Checkpoints
│ │ │
│ │ ├────── Test_patches
│ │ ├────── Train_patches
│ │ └────── Validation_patches
│ │
│ ├────── Tensorboard
│ │
│ └────── Visualization
│ └────── Patchs
│ ├────── folder_1
│ │ └──── Image_1_visualization.tiff
│ ├────── folder_2
│ │ └──── Image_2_visualization.tiff
│ └────── folder_n
│ └──── Image_n_visualization.tiff
│
├────── 2-Colon lung cancer classification
│ │
│ ├────── CSV_files
│ │ ├──── Annotations
│ │ ├──── Diagnostics
│ │ └──── Predictions
│ │
│ ├────── lung_colon_image_set
│ │ ├────── colon_image_sets
│ │ │ ├──── colon_aca
│ │ │ └──── colon_n
│ │ └────── lung_image_sets
│ │ ├──── lung_aca
│ │ ├──── lung_n
│ │ └──── lung_scc
│ │
│ ├────── Patches
│ │ ├──── colon_aca
│ │ ├──── colon_n
│ │ ├──── lung_aca
│ │ ├──── lung_n
│ │ └──── lung_scc
│ │
│ ├────── Train_folder
│ │ ├────── Model
│ │ │ ├──── Best_model_weights
│ │ │ └──── Checkpoints
│ │ │
│ │ ├────── Test_patches
│ │ ├────── Train_patches
│ │ └────── Validation_patches
│ │
│ ├────── Tensorboard
│ │
│ └────── Visualization
│ └────── Patchs
│ ├────── folder_1
│ │ └──── Image_1_visualization.jpeg
│ ├────── folder_2
│ │ └──── Image_2_visualization.jpeg
│ └────── folder_n
│ └──── Image_n_visualization.jpeg
│
└────── 3-Lung cancer classification
│
├────── CSV_files
│ ├──── Annotations
│ ├──── Diagnostics
│ └──── Predictions
│
├────── lung_colon_image_set
│ └────── lung_image_sets
│ ├──── lung_aca
│ ├──── lung_n
│ └──── lung_scc
│
├────── Patches
│ ├──── lung_aca
│ ├──── lung_n
│ └──── lung_scc
│
├────── Train_folder
│ ├────── Model
│ │ ├──── Best_model_weights
│ │ └──── Checkpoints
│ │
│ ├────── Test_patches
│ ├────── Train_patches
│ └────── Validation_patches
│
├────── Tensorboard
│
└────── Visualization
└────── Patchs
├────── folder_1
│ └──── Image_1_visualization.jpeg
├────── folder_2
│ └──── Image_2_visualization.jpeg
└──── folder_n
└──── Image_n_visualization.jpeg
- Take a look at
Config.py
before you begin to get a feel for what parameters can be changed.
This code from 1-Growth patterns classification is meant to :
- Read from
Annotations
folder that contains XML annotation files.- XML files must be directly contained in the
Annotations
folder. - For example :
Annotations/annotation_1.xml
- XML files must be directly contained in the
- Read from
All_WSI
folder that contains Whole Slide Images.- Make sure that the WSIs are contained in at least one subfolder in the
All_WSI
folder. - For example :
All_WSI/WSI_1/Image.svs
- Make sure that the WSIs are contained in at least one subfolder in the
- Generates patches and saves information about the patches in a csv file.
python3 1-Preprocessing.py
Inputs: All_WSI
, Annotations
Outputs: Patches/SUBTYPE
, CSV_files/Annotations
- Note that:
SUBTYPE == ACINAR, CRIB, MICROPAP, NC, SOLID
.
If your histopathology images are H&E-stained, whitespace will automatically be filtered.
You can change overlapping area using the --Overlap
option.
This code from 2-Colon lung cancer classification and 3-Lung cancer classification is meant to :
- Parse the image sets in
lung_colon_image_set
folder. - Resize images from (7687683) to (2242243).
- Save the resized images in
Patches/subtype
python3 1-Resize.py
Inputs: lung_colon_image_set
Outputs: Patches/subtype
- Note that:
subtype == colon_aca, colon_n, lung_aca, lung_n, lung_scc
.
The goal of this code is to balance data using data augmentation techniques.
- Reads patches randomly from each subtype directory at a time.
- Applies diffrent transformations to the image.
- The nature and number of transformations applied to an image are chosen randomly.
- Modified patches are saved in the same directory as the original image.
python3 2-Processing.py
- Note that this may take some time and eventually a significant amount of space.
- Change
--Maximum
to be smaller if you wish not to generate as many windows. - Make sure that
--Maximum
is not more than 15 times greater than the initial number of patches contained in the least represented class.
- Change
Inputs: Patches/SUBTYPE
Outputs: Patches/SUBTYPE
Splits the data into a train, validation and test set. Default validation and test patches per class is 1000.
You can change these numbers by changing the --Validation_Set_Size
and --Test_Set_Size
.
You can skip this step if you did a custom split.
Note that the modified images will be ditributed to the same set as the original, so the model won't be memorizing patterns.
python3 3-split.py
Inputs: Patches
Outputs: Train_folder/Train_patches
, Train_folder/Validation_patches
, Train_folder/Test_patches
We recommend using ResNet-18 if you are training on a relatively small histopathology dataset. You can change hyperparameters using the argparse
flags. There is an option to retrain from a previous checkpoint. Model checkpoints are saved by default every epoch in Train_folder/Model/Checkpoints
.
python3 4-Train_val.py
Inputs: Train_folder/Train_patches
, Train_folder/Validation_patches
Outputs: Train_folder/Model/Checkpoints
, Train_folder/Model/Best_model_weights
, CSV_files/Diagnostics
,
Tensorboard
Run the model on all the patches for each WSI in the test set.
python3 5-test.py
We automatically choose the model with the best validation accuracy while training. You can also specify your own.
Inputs: Train_folder/Test_patches
Outputs: CSV_files/Diagnostics
, CSV_files/Predictions
, Tensorboard
We are using tensorboard to evaluate the model.
- Uploads a grid of train and validation images to make sure that the patches are good.
- Uploads the model's graph.
- Uploads confusion matrix and classification report of each epoch.
- Plots the loss function.
- Plots precision recall curve for each class.
to run Tensorboard write the following in your Terminal and it will create a local host for you.
tensorboard --logdir=Tensorboard
Aggregates the patches predictions from the Test code to predict a label at the whole-slide level. There are various methods to do so, we decided to perform patch averages. Therefore we average the probabilities of all patch predictions, and take the class with the highest probability.
python3 Code/6-Evaluation.py
Inputs: CSV_files/Predictions
Outputs: CSV_files/Predictions_cleaned
, CSV_files/WSI_Name_Prediction.csv
Note that WSI_Name_Prediction
refers to actaul name of the WSI in question.
This code allows to see what the network is looking at is to visualize the predictions for each class.
Note that The visualization is a patch level visualization using GradCAM.
python3 Code/7-visualization.py
Inputs: CSV_files
, Train_folder
Outputs: Visualization/Patchs
3_Split
Takes a lot of time since it is a naive implementation.4_Train_val
should have a better way to save the best model weights.
- Try diffrent architectures.
- Optimize the code to :
- To reduce computation time.
- To support multiprocessing.
- To handle diffrent situations.
- Visualize on WSI level.
- Create a web interface.
-
Jason Wei, Laura Tafe, Yevgeniy Linnik, Louis Vaickus, Naofumi Tomita, Saeed Hassanpour, "Pathologist-level Classification of Histologic Patterns on Resected Lung Adenocarcinoma Slides with Deep Neural Networks", Scientific Reports;9:3358 (2019).
-
Gertych, A., Swiderska-Chadaj, Z., Ma, Z. et al. Convolutional neural networks can accurately distinguish four histologic growth patterns of lung adenocarcinoma in digital slides. Sci Rep 9, 1483 (2019). (https://doi.org/10.1038/s41598-018-37638-9)