Feature Extraction and Unsupervised Clustering of Histopathological Images of Pancreatic Cancer Using Information Maximization
This repository shows the code implementation and figures for the conference paper titled Feature Extraction and Unsupervised Clustering of Histopathological Images of Pancreatic Cancer Using Information Maximization.
Let's walk through this repository:
- kpc16 folder:
- part_1_5000e_16c_training.ipynb: This file has the code for training the unsupervised model using the 16-cluster set.
- part_2_5000e_16c_clustering.ipynb: This file offers the code for clustering the training images using the trained unsupervised model with the 16-cluster set. We only showed the samples using the HE staining due to the page limitations during the paper submission.
- part_3_5000e_16c_umap.ipynb: This file shows the code for plotting UMAP for the 16-cluster set.
- part_4_5000e_16c_NLL.ipynb: This file includes the source code to compute the nagative log-likelihood for the 16-cluster set.
- part_5_save_clusters.ipynb: This file contains the code for saving clusters individually for the 16-cluster set.
HHH16 & C16 & D16.csv file has the information regarding the latent space.
kpc16 folder has two subfolders: clus16 and models16.
clus16 has the individual clusters in the form of
.csvfiles. We obtained these files after executing the code shown inpart_5_save_clusters.ipynbscript.models16 has 3
.ckptfiles:model_en_202203211747_5000.ckpt,model_cl_202203211747_5000.ckpt, andmodel_de_202203211747_5000.ckpt. These three files form our unsupervised model.hist_modelS_202203211747_5000.tsvfile stores diiferent metrics obtained during the training of our model.
The files present in the kpc16 folder can also be obtained for cluster sets of 8, 12, and 20. We haven't shown them here to avoid repetitiveness. In the paper, we showed the UMAP and clustering results for the 8-cluster set. These can be obtained after training the model using part_1_5000e_8c_training.ipynb file for the 8-cluster set. After that, running part_2_5000e_8c_clustering.ipynb and part_3_5000e_8c_umap.ipynb will provide the desired results. The users will have to do these by themselves.
optimum_NLL.ipynb file plots a line graph utilizing the negative log-likelihoods for the chosen cluster sets (8, 12, 16, and 20).
Finally, The folder
Figureshas 5.pngfiles in it:Fig_1.png,Fig_2.png,Fig_3.png,Fig_4.png, andFig_5.png. These are the same figures that can be seen in the conference paper.