This document contains links to selected datasets, models, papers, and related PyTorch links related to AI in Medical Image and video Analysis.
The links also include general-purpose foundation models, essential PyTorch models, and datasets. The links have been verified in December, 2025.
đź”´ Important: Click on the Outline button (upper-right button in GitHub) for a table of contents and to jump to a particular topic.
đź”´ Important: Right-click on each link to open in a new browser window.
Please reference:
A. S. Panayides et al., "Position Paper: Artificial Intelligence in Medical Image Analysis: Advances, Clinical Translation, and Emerging Frontiers," IEEE J. Biomed. Health Inform., vol. 10, no. 2, pp. 1187–1202, Feb. 2026, doi: 10.1109/JBHI.2025.3649496.
@article{AIinMedicalImaging,
title={Position paper: Artificial Intelligence in Medical Image Analysis: Advances, Clinical Translation, and Emerging Frontiers},
author={Panayides, A. S., and Chen, H. and Filipovic, N. D. and Geroski, T. and Hou, K. and Lekadir, K. and
Marias, K. and Matsopoulos, G. and Papanastasiou, G. and Sarder, P. and Tourassi, G. and
Tsaftaris, S. A. and Amini, A. and Fu, H. and Kyriacou, E. and Loizou, C. P. and Zervakis, M. and
Saltz, J. H. and Shamout, F. E. and Wong, K. C. L. and Yao, J. and Fotiadis, D. I. and
Pattichis, C. S. and Pattichis, M. S.}
journal={IEEE Journal on Biomedical and Health Informatics},
volume = {10},
number = {2},
pages = {1187 - 1202},
doi = {10.1038/s41586-021-00000-x},
month = feb,
year = {2026},
doi = {10.1109/JBHI.2025.3649496}
}For updates, email Prof. Marios S. Pattichis at pattichi@unm.edu.
- BiomedGPT is pre-trained and fine-tuned with multi-modal & multi-task biomedical datasets
- BiomedGPT paper link
- BiomedGPT Google Drive based Google Colab
- vit-pytorch: PyTorch-based implementations of vision transformer architectures
- vit-tensorflow: Tensorflow-based implementations of vision transformer architectures
- pytorch-image-models: Contains pre-trained trasnformer models
- HistoQC is an open-source quality control tool for digital pathology slides
- CLAM: A Deep-Learning-based Pipeline for Data Efficient and Weakly Supervised Whole-Slide-level Analysis
- HistomicsTK is a Python package for the analysis of digital pathology images
- Slideflow is a deep learning library for digital pathology, offering a user-friendly interface for model development
- Sarder Lab: Codes for computational pathology from Pinaki Sarder's lab
- MoPaDi - Morphing Histopathology Diffusion
- Low-Rank Adaptation of Pre-Trained Large Vision Models for Improved Lung Nodule Malignancy Classification, NLSTx Dataset: A subset of difficult lung nodules from the NLST database.
- NCI's CPTAC: Clinical Proteomic Tumor Analysis Consortium (proteomics, genomics, histopathology)
- NCI Imaging Data Commons (IDC) is a cloud-based repository of publicly available cancer imaging data co-located with analysis and exploration tools
- HTAN is a National Cancer Institute (NCI)-funded Cancer MoonshotSM initiative to construct 3-dimensional atlases of the dynamic cellular, morphological, and molecular features of human cancers as they evolve from precancerous lesions to advanced disease
- The KPMP is a multi-year collaboration of leading research institutions to study patients with kidney disease
- HUBMAP: Human BioMolecular Atlas Program Data Portal: An open platform to discover, visualize and download standardized healthy single-cell and spatial tissue data
- TCGA: The Cancer Genome Atlas Program
- GTEx: The Genotype-Tissue Expression (GTEx) Portal is a comprehensive public resource for researchers studying tissue and cell-specific gene expression and regulation across individuals, development, and species, with data from 3 NIH projects
- Download and visually explore data to understand the functionality of human tissues at the cellular level with Chan Zuckerberg CELL by GENE Discover (CZ CELLxGENE Discover
- The Human Cell Atlas is a global consortium that is mapping every cell type in the human body, creating a 3-dimensional Atlas of human cells to transform our understanding of biology and disease
- BRIDGE2AI's Functional genomics project (protein-protein interactions, single-cell imaging (Trey Ideker at UCSD is PI)
- 4 dimensional nucleome (imaging and omics to relate the spatial orientation of the nucleome to gene regulation)
- Apollo: The Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO) Network (DOD, VA, NCI; cancer program with genomics, proteomics, pathology and excellent longitudinal clinical data of veterans
- CELLxGENE is a suite of tools that help scientists to find, download, explore, analyze, annotate, and publish single cell datasets
- GDS: This database stores curated gene expression DataSets, as well as original Series and Platform records in the Gene Expression Omnibus (GEO) repository
- EchonNet-LVH: A Large Parasternal Long Axis Echocardiography Video Dataset, Model, and Paper, model, paper.
- EchoNet-Pediatric: A Large Pediatric Echocardiography Video Dataset and Model link, paper.
- EchoNet-Dynamic: Interpretable AI for beat-to-beat cardiac function assessment Dataset, Model, and Paper
- EchoNet: Tee-View-Classifier datasets and paper and model
- EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing and paper (also see Generative AI Video Models).
- Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation (CVPR 2022)
- Detectron2 is Facebook AI Research's next-generation library that provides state-of-the-art detection and segmentation
- CHIEF - Clinical Histopathology Imaging Evaluation Foundation Model (focused on cancer)
- UNI HIPT: Towards a general-purpose foundation model for computational pathology
- CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation CELLVIT
- Cellpose-SAM: cell and nucleus segmentation with superhuman generalization
- Prov-GigaPath A whole-slide foundation model for digital pathology from real-world data
- H0-mini is a lightweight foundation model for histology
- UNI: Towards a General-Purpose Foundation Model for Computational Pathology
- Contains links to 10 different endoscopy video datasets.
- A large-scale endoscopic video dataset with over 33K video clips.
- Supports 3 types of downstream tasks, including classification, segmentation, and detection.
- SAM (Segment Anything Model, META, 2023)
- SAM2 foundation model for video
- SAM2 paper
- SAM3 A unified model for detection, segmentation, and tracking of objects in images and video using text, exemplar, and visual prompts
- SAM3 paper
- SAM 3D contains two state-of-the-art models that enable 3D reconstruction of objects and humans from a single image., GitHub.
- Main website with model: OpenBiomedVid
- OpenBiomedVid dataset
- SurgeryVideoQA
- MIMIC-IV-ECHO: Echocardiogram Matched Subset
- Related OpenAI o3 and o4-mini System
- OpenAI models
- Deterministic Medical Image Translation via High-fidelity Brownian Bridges (CVPR 2025 (preprint) paper only). General (cross-modality translation) using MRI/CT simulated datasets (no fixed subject count). This is an image-to-image method. Deterministic diffusion using Brownian bridge paths to connect source and target modalities, improving realism and consistency without stochastic sampling.
- GDM-VE: Geodesic Diffusion Models for Medical Image-to-Image Generation (2025) (GitHub link, paper link also). MRI & CT (brain, thoracic; open datasets). Geodesic Diffusion Model. Image-to-image method. Introduces a geodesic metric in latent space for efficient and stable sampling in medical image-to-image synthesis.
- Cross-conditioned Diffusion Model for Medical Image to Image Translation (2024) (paper only). Multi-modal MRI (T1, T2, FLAIR; public datasets). Image-to-Image method. Cross-modality conditioning where the source MRI guides target-modality diffusion; modality-specific encoders enhance structural and contrast fidelity.
- GitHub: Cascaded diffusion models for medical image translation paper link. Brain / Cardiac (general datasets). Image-to-Image method. Combines a coarse GAN prior with a diffusion refinement stage; shortcut paths reduce steps while preserving fidelity and uncertainty quantification.
- Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation (JBHI 2025) (GitHub and paper link). General (denoising, SR, modality transfer). MRI / CT (Brain, Thorax; open datasets). Image-to-Image method. Efficient DDPM variant using only 10 diffusion steps; achieves state-of-the-art results on denoising, super-resolution, and modality translation tasks.
- EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing and paper (also see Echonet datasets and models).
- Endora: Video Generation Models as Endoscopy Simulators
- A multimodal video dataset of human spermatozoa
- A public endoscopic video dataset for polyp detection
- Carotid Ultrasound Boundary Study (CUBS): Technical considerations on an open multi-center analysis of computerized measurement systems for intima-media thickness measurement on common carotid artery longitudinal B-mode ultrasound scans
- DISIML models: Echo, ECG, tabular data models, and autoencoders for dimensionality reduction
- A Large-scale Multimodal Study for Predicting Mortality Risk Using Minimal and Low Parameter Models and Separable Risk Assessment
- Deep-learning-assisted analysis of echocardiographic videos improves predictions of all-cause mortality
- Gradcam: Advanced AI explainability for PyTorch
- HiResCAM: A small demo of the HiResCAM and Grad-CAM gradient-based neural network explanation methods
- Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks
- Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks
- Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization
- Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs
- LayerCAM: Exploring Hierarchical Class Activation Maps for Localization
- Models documentation
- Models on GitHub
- Pretrained models on specific datasets and performance
- Build your own model tutorial
- Vision ResNet model
- 3D ResNet
- X3D: Expanding Architectures for Efficient Video Recognition
- SlowFast Networks for Video Recognition
- MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
- Video Swin Transformer model
- A (Very Short) Visual Introduction to Learning Rate Schedulers (With Code)
- Step learning rate scheduler
- Reduce the learning rate when we reach a plateau
For evaluating your models, consider Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning by Sebastian Raschka..
- Search for Datasets on Google Dataset Search.
- Search for Papers with code. Look separately for Methods and Datasets.
- Search for datasets, models, and dataset competitions on kaggle.
- Search for Computer Vision datasets on PyTorch vision datasets website.
- Search for pretrained PyTorch models PyTorch models website.