This repository contains the code necessary for running federated training of MammoFL. This work is based on Deep-LIBRA, a deep-learning pipeline for breast percent density (PD) estimation from mammography. The paper and code for the original Deep-LIBRA work are linked.
Our base model predicts percent density by training a UNet to segment the breast from the mammogram, training a second UNet to segment the dense tissue from the image, and finally calculating the segmented breast and dense tissue areas to compute the percent density of the imaged breast. This base model improves upon the original Deep-LIBRA method by (1) getting rid of the explicit pectoral muscle segmentation step and instead incorporating pectoral muscle removal into the breast segmentation step itself, and (2) using deep learning rather than traditional machine learning methods to segment the dense tissue. Finally, we add federated learning to the training process to allow users to train models on multiple-institution datasets with privacy constraints.
Python >= 3.6 is required.
Install all necessary Python packages:
pip3 install -r requirements.txt
Also, ensure that you are running the code on a machine with access to a GPU (cuda).
For each of the two datasets, you must have: (1) a directory of mammograms in dicom format, where the file names are exactly the subject identifiers; (2) a directory of breast masks in png format, where the file names are exactly the subject identifiers, each pixel value is either 0 or 1, and the masks have the same dimension as their corresponding dicom images; and (3) a directory of dense tissue masks in png format, with the same constraints as the breast masks.
An example is provided below:
- dataset1
- original_dicoms
- sub_1.dcm
- sub_2.dcm
- breast_masks
- sub_1.png
- sub_2.png
- dense_masks
- sub_1.png
- sub_2.png
- original_dicoms
- dataset2
- original_dicoms
- sub_1.dcm
- sub_2.dcm
- breast_masks
- sub_1.png
- sub_2.png
- dense_masks
- sub_1.png
- sub_2.png
- original_dicoms
Run the following command in the terminal, inside the MammoFL directory:
./pipeline/federated_wrapper.sh path_to_dicoms_dataset1/ path_to_dicoms_dataset2/ path_to_breast_masks_dataset1/ path_to_breast_masks_dataset2/ path_to_dense_masks_dataset1/ path_to_dense_masks_dataset2/ output_dir/
The last argument, the output directory, must already be created.
The final model weights are saved in output_dir/results_breast_segmentation/final_aggregated_model.pth
and output_dir/results_dense_segmentation/final_aggregated_model.pth
for the breast and dense tissue segmentation models respectively. The tensorboard logs for the models are saved in output_dir/results_breast_segmentation/logs/
and output_dir/results_dense_segmentation/logs/
respectively.
The input directory must contain a set of 1 or more dicom files for inference. The name of each dicom file will be assumed to be a subject id.
After ensuring the input directory is correct, run the following command in the terminal, inside the MammoDL directory:
./pipeline/run_inference.sh path_to_input_dir/ path_to_breast_model_pth_file/ path_to_dense_model_pth_file/ output_dir/
The last argument, the output directory, must already be created.
The breast masks from inference are saved as png images in output_dir/breast_masks_inference/
with the subject id as the file name. The dense tissue masks from inference are saved as png images in output_dir/dense_masks_inference/
. The PD calculations for all subjects are saved as a csv file: output_dir/pd_inference.csv
.