Skip to content

Improve root README and 2d classification documentation #2005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 34 additions & 11 deletions 2d_classification/mednist_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,27 @@
"\n",
"# Medical Image Classification Tutorial with the MedNIST Dataset\n",
"\n",
"In this tutorial, we introduce an end-to-end training and evaluation example based on the MedNIST dataset.\n",
"This tutorial demonstrates how to build a complete medical image classification system using MONAI and the MedNIST dataset.\n",
"\n",
"We'll go through the following steps:\n",
"* Create a dataset for training and testing\n",
"* Use MONAI transforms to pre-process data\n",
"* Use the DenseNet from MONAI for classification\n",
"* Train the model with a PyTorch program\n",
"* Evaluate on test dataset\n",
"## Tutorial Overview\n",
"\n",
"This end-to-end tutorial covers the complete machine learning pipeline for medical image classification:\n",
"\n",
"1. **Dataset Preparation**: Create training, validation, and test datasets\n",
"2. **Data Preprocessing**: Apply medical image transforms and augmentations\n",
"3. **Model Architecture**: Use DenseNet121 for medical image classification\n",
"4. **Training Workflow**: Train with PyTorch\n",
"5. **Model Evaluation**: Comprehensive performance assessment and visualization\n",
"6. **Advanced Features**: Occlusion sensitivity analysis for model interpretability\n",
"\n",
"## Learning Objectives\n",
"\n",
"- Understand MONAI's integration with PyTorch workflows\n",
"- Learn medical image preprocessing techniques\n",
"- Implement data augmentation strategies for medical images\n",
"- Train robust classification models for medical data\n",
"- Evaluate model performance with medical AI metrics\n",
"- Use interpretation techniques to understand model decisions\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Project-MONAI/tutorials/blob/main/2d_classification/mednist_tutorial.ipynb)"
]
Expand Down Expand Up @@ -217,11 +230,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Read image filenames from the dataset folders\n",
"## Explore the Dataset Structure\n",
"\n",
"Let's examine our MedNIST dataset to understand its organization and characteristics. This exploration step is crucial for understanding the data before training.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a stylistic choice of voice when describing what's being done. One way is to be neutral and not referring to personal perspectives, eg. no "us" or "you" when describing actions or observations. This could read instead "Here the dataset is explored...." to not have any 2nd or 3rd person voices used. It's a question of what we want to do and prompting the network to adhere to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular point, we could add some guidance to CONTRIBUTING.md and use an agent to review PRs for stylistic consistency. We can also instruct the coding agent to follow these contributing guidelines directly. Either approach — or both — should help keep the voice aligned.

"\n",
"### Dataset Organization\n",
"\n",
"The MedNIST dataset contains 6 medical image categories:\n",
"- **Hand**: X-ray images of hands\n",
"- **AbdomenCT**: CT scans of the abdomen \n",
"- **CXR**: Chest X-rays\n",
"- **ChestCT**: CT scans of the chest\n",
"- **BreastMRI**: MRI images of breast tissue\n",
"- **HeadCT**: CT scans of the head\n",
"\n",
"First of all, check the dataset files and show some statistics. \n",
"There are 6 folders in the dataset: Hand, AbdomenCT, CXR, ChestCT, BreastMRI, HeadCT, \n",
"which should be used as the labels to train our classification model."
"Each folder name serves as the class label for our classification model."
]
},
{
Expand Down
121 changes: 91 additions & 30 deletions 2d_classification/monai_101.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,31 @@
"See the License for the specific language governing permissions and \n",
"limitations under the License.\n",
"\n",
"# MONAI 101 tutorial\n",
"# MONAI 101 Tutorial\n",
"\n",
"In this tutorial, we will introduce how simple it can be to run an end-to-end classification pipeline with MONAI.\n",
"This tutorial introduces the basics of building an end-to-end medical image classification pipeline with MONAI.\n",
"\n",
"These steps will be included in this tutorial, and each of them will take only a few lines of code:\n",
"- Dataset download\n",
"- Data pre-processing\n",
"- Define a DenseNet-121 and run training\n",
"- Check the results on test dataset\n",
"## What You'll Learn\n",
"\n",
"This tutorial will use about 7GB of GPU memory and 10 minutes to run.\n",
"In this tutorial, you'll discover how simple it can be to create a complete medical image classification system. We'll cover each step with just a few lines of code:\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Project-MONAI/tutorials/blob/main/2d_classification/monai_101.ipynb)"
"- **Dataset Download**: Automatically retrieve and set up the MedNIST dataset\n",
"- **Data Preprocessing**: Transform medical images for training\n",
"- **Model Definition**: Set up a DenseNet-121 neural network for classification\n",
"- **Training**: Train your model with medical imaging data\n",
"- **Evaluation**: Test your trained model's performance\n",
"\n",
"## Requirements\n",
"\n",
"- **GPU Memory**: Approximately 7GB\n",
"- **Runtime**: About 10 minutes\n",
"- **Level**: Beginner (no prior MONAI experience required)\n",
"\n",
"## Quick Start Options\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Project-MONAI/tutorials/blob/main/2d_classification/monai_101.ipynb)\n",
"\n",
"*Click the badge above to run this tutorial in Google Colab without any local setup.*"
]
},
{
Expand Down Expand Up @@ -130,11 +142,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup data directory\n",
"## Setup Data Directory\n",
"\n",
"You can specify a directory with the `MONAI_DATA_DIRECTORY` environment variable. \n",
"This allows you to save results and reuse downloads. \n",
"If not specified a temporary directory will be used."
"You can specify a directory for storing datasets and results using the `MONAI_DATA_DIRECTORY` environment variable. \n",
"This allows you to:\n",
"- Save results permanently\n",
"- Reuse downloaded datasets across different sessions\n",
"- Avoid re-downloading large datasets\n",
"\n",
"If not specified, a temporary directory will be used (data will be lost after the session ends)."
]
},
{
Expand Down Expand Up @@ -163,12 +179,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use MONAI transforms to preprocess data\n",
"## Use MONAI Transforms to Preprocess Data\n",
"\n",
"Medical images require specialized methods for input/output (I/O), preprocessing, and augmentation. Unlike natural images, medical images often:\n",
"- Follow specific formats (DICOM, NIfTI, etc.)\n",
"- Are handled with specific protocols\n",
"- Have high-dimensional data arrays\n",
"- Require domain-specific preprocessing\n",
"\n",
"Medical images require specialized methods for I/O, preprocessing, and augmentation.\n",
"They often follow specific formats, are handled with specific protocols, and the data arrays are often high-dimensional.\n",
"In this example, we'll create a preprocessing pipeline using three MONAI transforms:\n",
"\n",
"In this example, we will perform image loading, data format verification, and intensity scaling with three `monai.transforms` listed below, and compose a pipeline ready to be used in next steps."
"1. **`LoadImageD`**: Loads medical images from various formats\n",
"2. **`EnsureChannelFirstD`**: Ensures the image has the correct channel dimension\n",
"3. **`ScaleIntensityD`**: Normalizes pixel intensities to a standard range\n",
"\n",
"These transforms are combined into a pipeline that will be applied to our data."
]
},
{
Expand All @@ -191,18 +216,25 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prepare datasets using MONAI Apps\n",
"## Prepare Dataset Using MONAI Apps\n",
"\n",
"We'll use the `MedNISTDataset` from MONAI Apps to automatically download and set up our dataset. This convenience class will:\n",
"- Download the dataset to your specified directory\n",
"- Apply the preprocessing transforms we defined above\n",
"- Split the data into training, validation, and test sets\n",
"\n",
"We use `MedNISTDataset` in MONAI Apps to download a dataset to the specified directory and perform the pre-processing steps in the `monai.transforms` compose.\n",
"### About the MedNIST Dataset\n",
"\n",
"The MedNIST dataset was gathered from several sets from [TCIA](https://wiki.cancerimagingarchive.net/display/Public/Data+Usage+Policies+and+Restrictions),\n",
"[the RSNA Bone Age Challenge](http://rsnachallenges.cloudapp.net/competitions/4),\n",
"and [the NIH Chest X-ray dataset](https://cloud.google.com/healthcare/docs/resources/public-datasets/nih-chest).\n",
"\n",
"The dataset is kindly made available by [Dr. Bradley J. Erickson M.D., Ph.D.](https://www.mayo.edu/research/labs/radiology-informatics/overview) (Department of Radiology, Mayo Clinic)\n",
"under the Creative Commons [CC BY-SA 4.0 license](https://creativecommons.org/licenses/by-sa/4.0/).\n",
"### Dataset Information\n",
"- **Size**: 58,954 images\n",
"- **Classes**: 6 medical image types (AbdomenCT, BreastMRI, CXR, ChestCT, Hand, HeadCT)\n",
"- **Format**: 2D grayscale images\n",
"- **License**: Creative Commons [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)\n",
"\n",
"If you use the MedNIST dataset, please acknowledge the source. "
"The dataset is kindly made available by [Dr. Bradley J. Erickson M.D., Ph.D.](https://www.mayo.edu/research/labs/radiology-informatics/overview) (Department of Radiology, Mayo Clinic).\n",
"\n",
"*If you use the MedNIST dataset in your research, please acknowledge the source.*"
]
},
{
Expand Down Expand Up @@ -236,11 +268,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define a network and a supervised trainer\n",
"## Define Network and Supervised Trainer\n",
"\n",
"Now we'll set up our machine learning model and training configuration.\n",
"\n",
"### Model Selection: DenseNet-121\n",
"\n",
"We'll use DenseNet-121, a proven convolutional neural network architecture that:\n",
"- Has shown excellent performance on ImageNet and medical imaging tasks\n",
"- Features dense connections between layers for better gradient flow\n",
"- Is computationally efficient for medical image classification\n",
"\n",
"To train a model that can perform the classification task, we will use the DenseNet-121 which is known for its performance on the ImageNet dataset.\n",
"### Training Configuration\n",
"\n",
"For a typical supervised training workflow, MONAI provides `SupervisedTrainer` to define the hyper-parameters."
"MONAI provides `SupervisedTrainer` to simplify the training process. This high-level API handles:\n",
"- Training loops and optimization\n",
"- Loss computation and backpropagation \n",
"- Metric tracking and logging\n",
"- Device management (CPU/GPU)"
]
},
{
Expand Down Expand Up @@ -270,7 +315,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the training"
"## Run the Training\n",
"\n",
"Now let's start the training process! The trainer will:\n",
"- Load batches of medical images\n",
"- Forward them through the DenseNet-121 model\n",
"- Calculate the loss and update model weights\n",
"- Track training progress\n",
"\n",
"This should take about 10 minutes on a GPU."
]
},
{
Expand All @@ -287,7 +340,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Check the prediction on the test dataset"
"## Evaluate Model Performance on Test Dataset\n",
"\n",
"Let's see how well our trained model performs! We'll:\n",
"- Load the test dataset (images the model has never seen)\n",
"- Run predictions on these images\n",
"- Compare predictions with ground truth labels\n",
"- Display the results to see classification accuracy\n",
"\n",
"This evaluation helps us understand if our model can generalize to new medical images."
]
},
{
Expand Down
57 changes: 43 additions & 14 deletions 2d_classification/monai_201.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,31 @@
"See the License for the specific language governing permissions and \n",
"limitations under the License.\n",
"\n",
"# MONAI 201 tutorial\n",
"# MONAI 201 Tutorial: Advanced Training Techniques\n",
"\n",
"In this tutorial we'll revisit the [MONAI 101 notebook](https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/monai_101.ipynb) and add more features representing best practice concepts. This will include evaluation and tensorboard handler techniques.\n",
"Welcome to MONAI 201! This tutorial builds upon [MONAI 101](https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/monai_101.ipynb) and introduces advanced training techniques and best practices for production-ready medical AI models.\n",
"\n",
"These steps will be included in this tutorial, and each of them will take only a few lines of code:\n",
"- Dataset download and Data pre-processing\n",
"- Define a DenseNet-121 and run training\n",
"- Run inference using SupervisedEvaluator\n",
"## What You'll Learn\n",
"\n",
"This tutorial will use about 7GB of GPU memory and 10 minutes to run.\n",
"This intermediate tutorial covers advanced concepts that are essential for building robust medical AI systems:\n",
"\n",
"- **Advanced Training Workflow**: Enhanced training with validation monitoring\n",
"- **Model Evaluation**: Comprehensive evaluation using `SupervisedEvaluator`\n",
"- **Experiment Tracking**: TensorBoard integration for training visualization\n",
"- **Model Checkpointing**: Save and restore model states during training\n",
"- **Production Best Practices**: Techniques used in real-world medical AI applications\n",
"\n",
"## Prerequisites\n",
"\n",
"- Complete [MONAI 101](https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/monai_101.ipynb) or have basic MONAI knowledge\n",
"- Understanding of deep learning concepts (training, validation, etc.)\n",
"- Familiarity with PyTorch basics\n",
"\n",
"## Requirements\n",
"\n",
"- **GPU Memory**: Approximately 7GB\n",
"- **Runtime**: About 10 minutes\n",
"- **Level**: Intermediate\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Project-MONAI/tutorials/blob/main/2d_classification/monai_201.ipynb)"
]
Expand Down Expand Up @@ -127,9 +142,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use MONAI transforms to preprocess data\n",
"## Prepare Data with MONAI Transforms\n",
"\n",
"We'll first prepare the data very much like in the previous tutorial with the same transforms and dataset:"
"We'll prepare our data using the same transforms as MONAI 101, but this time we'll also create a validation dataset. This separation is crucial for monitoring training progress and preventing overfitting."
]
},
{
Expand Down Expand Up @@ -180,10 +195,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define a network and a supervised trainer\n",
"## Advanced Training Setup with Evaluation and Monitoring\n",
"\n",
"For training we have the same elements again and will slightly change the `SupervisedTrainer` by expanding its train_handlers. This upgrade will be beneficial for efficient utilization of TensorBoard.\n",
"Furthermore, we introduce a `SupervisedEvaluator` object that will efficiently track model progress. Accompanied by `TensorBoardStatsHandler`, it will log statistics for TensorBoard, ensuring precise tracking and management."
"Now we'll create a more sophisticated training setup that includes validation monitoring and experiment tracking. This represents production-level best practices for medical AI development.\n",
"\n",
"### Key Components\n",
"\n",
"1. **`SupervisedEvaluator`**: Handles validation during training to monitor model performance\n",
"2. **`TensorBoardStatsHandler`**: Logs training metrics for visualization\n",
"3. **`CheckpointSaver`**: Automatically saves model checkpoints during training\n",
"4. **`ValidationHandler`**: Coordinates validation runs at specified intervals\n",
"\n",
"This setup provides real-time monitoring of your model's learning progress and helps identify issues like overfitting early in the training process."
]
},
{
Expand Down Expand Up @@ -252,9 +275,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## View training in tensorboard\n",
"## Visualize Training Progress with TensorBoard\n",
"\n",
"TensorBoard provides powerful visualization tools to monitor your training progress. You can view:\n",
"- Training and validation loss curves\n",
"- Model performance metrics over time\n",
"- Learning rate schedules\n",
"- Model architecture graphs\n",
"\n",
"Please uncomment the following cell to load tensorboard results."
"To view the results, uncomment and run the following cell. TensorBoard will open in your browser showing real-time training metrics."
]
},
{
Expand Down
Loading
Loading