Experiments with an autoencoder to reconstruct corrupted 3d lung mask. Masks are corrupted to resemble failure to segment high density pathologies
Segmentation masks are from
https://www.kaggle.com/sandorkonya/ct-lung-heart-trachea-segmentation
which are derived from
https://www.kaggle.com/competitions/osic-pulmonary-fibrosis-progression/overview
Run data/preprocess_osic_masks.py to unpack and preprocess the lung masks. It will produce both 5mm and 2.5mm lung masks.
If the data archive is not already downloaded to the data/osic_fibrosis_masks/ directory, the script will print instructions for downloading.
Note that the preprocessing will take some time.
A data-info file defining dataset splits is provided in osic_fibrosis_masks/data-info.csv. If you wish to run experiments with different data splits, either delete the file or change the path in preprocess_osic_masks.main.data_info.
The files in src/denoise_lung_masks are needed for the experiments. Either run reinstall_package.sh to install denoise_lung_masks as a python package, or create a symlink to src/denoise_lung_masks in the experiment directory.
Experiments are in experiments/.
Directory experiments/denoising-autoencoder.
Train an autoencoder to reconstruct a corrupted 3d lung mask. Masks are corrupted to resemble failure to segment high density pathologies. The autoencoder is fully convolutional with the following layers
TODO: specify layers
There are three versions of the experiments.
| Name | Description |
|---|---|
| Version 0 | Use 5mm isotropic resolution and always corrupt masks |
| Version 1 | Use 2.5mm isotropic resolution and always corrupt masks |
| Version 2 | Use 2.5mm isotropic resolution and corrupt 3/4 masks |
Parameters for each version are stored in parameters.py. Adjust batch_size as needed, version 1 and 2 requires around 20MB GPU RAM.
monai.losses.DiceLoss is used for all experiments
Train each version with
python train.py <version-number>
or all version sequentially with
bash train_all.sh
The two models with lowest validation loss are kept.
Approximate runtime on RTX3090
| Name | Approximate wall clock time |
|---|---|
| Version 0 | 11 min |
| Version 1 | 38 min |
| Version 2 | 40 min |
Predict each version with
python predict.py <model-checkpoint> <outdir> <version-number> [--with-corruptions]
or all version sequentially with
bash predict_all.sh
where you must set v*_checkpoint manually to the desired checkpoint.
The flag --with-corruptions will enable data corruption on all samples before prediction.
In experiments/analysis there are two tools for analysing the results
view_predictions.py: visualize one or more predictions usingnapari.estimate_volume.py: estimates volume for one or more scans based on denoised masks.
There are some helper scripts to generate plots
v0_estimate_volume.shv1_estimate_volume.shv2_estimate_volume.sh
these assume the directory structure generated by predict_all.sh.
Results are stored in experiments/denoising-autoencoder/results