Welcome to the 2D Anatomy Segmentation Datasets repository! This project provides a collection of scripts, tools, and datasets to streamline training and evaluation for anatomical segmentation tasks in medical imaging. Whether you're a researcher or practitioner in the medical imaging field, this repository is designed to collect 2D anatomical datasets and make it easy to download, process, and visualize data from various imaging modalities.
This repository supports the following publicly available datasets:
Dataset | Modality | #Images | Description | Link |
---|---|---|---|---|
BS80k | Scintigraphy | 6,494 | Large-scale dataset for anatomy segmentation in nuclear imaging. | Link |
JSRT | Chest X-Ray | 247 | Dataset of chest radiographs for lung segmentation. | Link |
PAX-Ray | Chest X-Ray | 852 | Dataset for fine-grained anatomy segmentation. | Link |
PAX-Ray++ | Chest X-Ray | 14,754 | Enhanced version of PAX-Ray with additional annotations. | Link |
DUKE | Optical Coherence Tomography | 110 | OCT dataset for retinal layer segmentation. | Link |
RAVIR | Infrared Reflectance Imaging | 23 | Dataset for retinal vessel segmentation from infrared images. | Link |
Teeth | Panoramic X-Ray | 598 | Dataset of dental X-rays for teeth segmentation. | Link |
Here’s an overview of the repository's structure:
src/
│
├── data/ # PyTorch dataloaders for all datasets
│
├── prepare_data/
│ ├── prepare_bs80k/ # Scripts for processing the anatomical segmentations of BS80k dataset
│ ├── prepare_jsrt/ # Scripts for processing the JSRT dataset
│ ├── prepare_paxray/ # Scripts for processing the PAX-Ray dataset
│ ├── prepare_paxraypp/ # Scripts for processing the PAX-Ray++ dataset
│ ├── prepare_duke/ # Scripts for processing the DUKE dataset
│ ├── prepare_ravir/ # Scripts for processing the RAVIR dataset
│ └── prepare_teeth_kaggle/ # Scripts for processing the Teeth dataset
│
├── visualization/
│ └── visualize.py # Utility scripts for visualizing dataset samples
│
├── Notebooks/
│ └── Download_datasets.ipynb # shows how to setup all the datasets
│ └── Dataloader_example.ipynb # Examples for to setup the dataloaders for each dataset and visualize annotations
Each dataset includes standardized processing scripts to ensure consistency across datasets. These scripts:
- Download the raw data.
- Normalize and preprocess the images.
- Prepare data splits (train/val/test).
- Create PyTorch-friendly formats.
Use the scripts in src/visualization
to visualize the datasets, ground-truth labels, and predictions for easy inspection of data quality.
Prepared datasets can be plugged directly into any PyTorch-based segmentation model for training and evaluation.
git clone https://github.com/yourusername/2D-Anatomy-Segmentation-Datasets.git
cd 2D-Anatomy-Segmentation-Datasets
pip install -r requirements.txt
Run the dataset preparation script for the dataset of interest. For example:
sh src/prepare_data/prepare_bs80k/get_bs80k_full.sh
sh src/prepare_data/prepare_jsrt/get_jsrt_full.sh
...
To visualize samples:
python src/visualization/visualize.py --dataset BS80k
Contributions are welcome! If you'd like to add support for more datasets, improve baseline performance, or enhance data preparation scripts, feel free to open a pull request.
This project is licensed under the MIT License.
For questions or feedback, feel free to reach out:
📧 Email: constantinseibold[at]gmail.com
🔗 GitHub Issues
Enjoy building robust segmentation models with these datasets! 😊