Skip to content

A collection of resources on applications of multi-modal learning in medical imaging.

License

Notifications You must be signed in to change notification settings

TFboys-lzz/awesome-multimodal-in-medical-imaging

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Maintenance PR's Welcome Awesome

Awesome-Multimodal-Applications-In-Medical-Imaging

This repository includes resources on several applications of multi-modal learning in medical imaging, including papers related to large language models (LLM). Papers involving LLM are bold.

Contributing

Please feel free to send me pull requests or email to add links or to discuss with me about this area. Markdown format:

- [**Name of Conference or Journal + Year**] Paper Name. [[pdf]](link) [[code]](link)

Overview

Survey

  • [arXiv 2022] Visual Attention Methods in Deep Learning: An In-Depth Survey [pdf]
  • [arXiv 2022] Vision+X: A Survey on Multimodal Learning in the Light of Data [pdf]
  • [arXiv 2023] Vision Language Models for Vision Tasks: A Survey [pdf] [code]
  • [arXiv 2023] A Systematic Review of Deep Learning-based Research on Radiology Report Generation [pdf] [code]
  • [Artif Intell Med 2023] Medical Visual Question Answering: A Survey [pdf]
  • [arXiv 2023] Medical Vision Language Pretraining: A survey [pdf]
  • [arXiv 2023] CLIP in Medical Imaging: A Comprehensive Survey [pdf] [code]

Medical Report Generation

2018

  • [EMNLP 2018] Automated Generation of Accurate & Fluent Medical X-ray Reports [pdf] [code]
  • [ACL 2018] On the Automatic Generation of Medical Imaging Reports [pdf] [code]
  • [NeurIPS 2018] Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation [pdf]

2019

  • [AAAI 2019] Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation [pdf]
  • [ICDM 2019] Automatic Generation of Medical Imaging Diagnostic Report with Hierarchical Recurrent Neural Network [pdf]
  • [MICCAI 2019] Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment [pdf]

2020

  • [AAAI 2020] When Radiology Report Generation Meets Knowledge Graph [pdf]
  • [EMNLP 2020] Generating Radiology Reports via Memory-driven Transformer [pdf] [code]
  • [ACCV 2020] Hierarchical X-Ray Report Generation via Pathology tags and Multi Head Attention [pdf] [code]

2021

  • [NeurIPS 2021 D&B] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark [pdf] [code]
  • [ACL 2021] Competence-based Multimodal Curriculum Learning for Medical Report Generation [pdf]
  • [CVPR 2021] Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation [pdf]
  • [MICCAI 2021] AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [pdf]
  • [NAACL-HLT 2021] Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation [pdf] [code]
  • [MICCAI 2021] RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting [pdf][code]
  • [MICCAI 2021] Trust It or Not: Confidence-Guided Automatic Radiology Report Generation [pdf]
  • [MICCAI 2021] Surgical Instruction Generation with Transformers [pdf]
  • [MICCAI 2021] Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation [pdf] [code]
  • [ACL 2021] Cross-modal Memory Networks for Radiology Report Generation [pdf] [code]

2022

  • [CVPR 2022] Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [pdf]
  • [Nature Machine Intelligence 2022] Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports [pdf] [code]
  • [MICCAI 2022] A Self-Guided Framework for Radiology Report Generation [pdf]
  • [MICCAI 2022] A Medical Semantic-Assisted Transformer for Radiographic Report Generation [pdf]
  • [MIDL 2022] Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation [pdf]
  • [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
  • [ICML 2022] Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors [pdf]
  • [TNNLS 2022] Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition Penalty [pdf]
  • [MedIA 2022] CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation [pdf]
  • [MedIA 2022] Knowledge matters: Chest radiology report generation with general and specific knowledge [pdf] [code]
  • [MICCAI 2022] Lesion Guided Explainable Few Weak-shot Medical Report Generation [pdf] [code]
  • [BMVC 2022] On the Importance of Image Encoding in Automated Chest X-Ray Report Generation [pdf] [code]
  • [arXiv 2022] RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [pdf]
  • [COLING 2022] DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis [pdf] [code]
  • [ECCV 2022] Cross-modal Prototype Driven Network for Radiology Report Generation [pdf] [code]

2023

  • [ICIP 2023] Self adaptive global-local feature enhancement for radiology report generation [pdf]
  • [TMI 2023] Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation [pdf]
  • [arXiv 2023] Unified Chest X-ray and Radiology Report Generation Model with Multi-view Chest X-rays [pdf] [code]
  • [WWW 2023] Auxiliary signal-guided knowledge encoder-decoder for medical report generation [pdf]
  • [CVPR 2023] Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation [pdf] [code]
  • [CVPR 2023] KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation [pdf]
  • [CVPR 2023] Interactive and Explainable Region-guided Radiology Report Generation [pdf] [code]
  • [MIDL 2023] Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report Generation [pdf] [code]
  • [arXiv 2023] Visual-Linguistic Causal Intervention for Radiology Report Generation [pdf] [code]
  • [MIDL 2023] Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [pdf]
  • [arXiv 2023] Cross-Modal Causal Intervention for Medical Report Generation [pdf] [code]
  • [ICASSP 2023] MvCo-DoT:Multi-View Contrastive Domain Transfer Network for Medical Report Generation [pdf]
  • [CHIL 2023] Token Imbalance Adaptation for Radiology Report Generation [pdf] [code]
  • [arXiv 2023] Boosting Radiology Report Generation by Infusing Comparison Prior [pdf]
  • [AAAI 2023] "Nothing Abnormal": Disambiguating Medical Reports via Contrastive Knowledge Infusion [pdf] [code]
  • [arXiv 2023] Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives [pdf]
  • [arXiv 2023] S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts [pdf] [code]
  • [arXiv 2023] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [pdf] [code]
  • [ACL W 2023] shs-nlp at RadSum23: Domain-Adaptive Pre-training of Instruction-tuned LLMs for Radiology Report Impression Generation [pdf]
  • [arXiv 2023] Customizing General-Purpose Foundation Models for Medical Report Generation [pdf]
  • [CVPR 2023] KiUT: Knowledge-injected U-Transformer for Radiology Report Generation [pdf]
  • [arXiv 2023] Utilizing Longitudinal Chest X-Rays and Reports to Pre-Fill Radiology Reports [pdf]
  • [ACL 2023] Replace and Report: NLP Assisted Radiology Report Generation [pdf]
  • [ICCV 2023] PRIOR: Prototype Representation Joint Learning from Medical Images and Reports [pdf] [code]
  • [ICML W 2023] Rethinking Medical Report Generation: Disease Revealing Enhancement with Knowledge Graph [pdf] [code]
  • [MICCAI 2023] Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting [pdf] [code]
  • [arXiv 2023] IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer [pdf]
  • [arXiv 2023] Can Prompt Learning Benefit Radiology Report Generation? [pdf]
  • [arXiv 2023] Finding-Aware Anatomical Tokens for Chest X-Ray Automated Reporting [pdf]
  • [arXiv 2023] PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation [pdf]
  • [arXiv 2023] Dynamic Multi-Domain Knowledge Networks for Chest X-ray Report Generation [pdf]
  • [arXiv 2023] ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [pdf]
  • [MedIA 2023] C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network [pdf]
  • [EMNLP 2023 Findings] Controllable Chest X-Ray Report Generation from Longitudinal Representations [pdf]
  • [BIBM 2023] Enhanced Knowledge Injection for Radiology Report Generation [pdf]
  • [EMNLP 2023 Findings] Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting [pdf]
  • [arXiv 2023] Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning [pdf]
  • [ACL 2023] ORGAN: Observation-Guided Radiology Report Generation via Tree-Reasoning [pdf] [code]
  • [EMNLP 2023 Findings] RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning [pdf] [code]
  • [arXiv 2023] RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance [pdf] [code]
  • [arXiv 2023] Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation [pdf]
  • [NeurIPS W 2023] Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation [pdf]
  • [arXiv 2023] Radiology-Aware Model-Based Evaluation Metric for Report Generation [pdf]
  • [arXiv 2023] Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [pdf]
  • [arXiv 2023] Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation [pdf]
  • [EMNLP 2023] PhenotypeCLIP: Phenotype-based Contrastive Learning for Medical Imaging Report Generation [pdf]
  • [arXiv 2023] Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation [pdf]
  • [arXiv 2023] Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models [pdf]
  • [arXiv 2023] Breast Ultrasound Report Generation using LangChain [pdf]
  • [NLPCC 2023] Medical Report Generation based on Segment-Enhanced Contrastive Representation Learning [pdf]
  • [arXiv 2023] Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation [pdf]
  • [MICCAI 2023] SGT: Scene Graph-Guided Transformer for Surgical Report Generation [pdf] [code]

2024

  • [WACV 2024] Complex Organ Mask Guided Radiology Report Generation [pdf] [code]

  • [TMM 2024] From Observation to Concept: A Flexible Multi-view Paradigm for Medical Report Generation [pdf] (Early Access)

  • [TMI 2024] SGT++: Improved Scene Graph-guided Transformer for Surgical Report Generation [pdf] (Early Access)

Medical Visual Question Answering

2020

  • [TMI 2020] A Question-Centric Model for Visual Question Answering in Medical Imaging [pdf] [code]
  • [CLEF 2020 Working Notes] HCP-MIC at VQA-Med 2020: Effective visual representation for medical visual question answering [pdf] [code]
  • [CLEF 2020 Working Notes] TeamS at VQA-Med 2021: BBN-Orchestra for long-tailed medical visual question answering [pdf] [code]

2021

  • [arXiv 2021] MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering [pdf]
  • [Nature Scientific Reports 2021] MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain [pdf]

2022

  • [MICCAI 2022] Consistency-preserving Visual Question Answering in Medical Imaging [pdf] [code]
  • [MICCAI 2022] Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer [pdf] [code]
  • [ECCV 2022] Distilled Dual-Encoder Model for Vision-Language Understanding [pdf] [code]
  • [arXiv 2022] UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering [pdf]

2023

  • [TMI 2023] A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering [pdf] [code]
  • [ISBI 2023] MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering [pdf]
  • [ISBI 2023] Self-supervised vision-language pretraining for Medical visual question answering [pdf] [code]
  • [arXiv 2023] Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning [pdf]
  • [arXiv 2023] Medical visual question answering using joint self-supervised learning [pdf]
  • [ACM MM 2023] RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [pdf] [code]
  • [IPMI 2023] Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder [pdf]
  • [MICCAI 2023] Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models [pdf] [code]
  • [arXiv 2023] PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [pdf] [code]
  • [MICCAI 2023] Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering [pdf] [code]
  • [MICCAI 2023] Localized Questions in Medical Visual Question Answering [pdf] [code]
  • [arXiv 2023] Multimodal Prompt Retrieval for Generative Visual Question Answering [pdf] [code]
  • [KDD 2023] Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering [pdf] [code]
  • [MICCAI 2023] Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery [pdf] [code]
  • [MICCAI 2023] CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery [pdf] [code]
  • [CLEF 2023] UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering [pdf]
  • [DICTA 2023] Visual Question Answering in the Medical Domain [pdf]
  • [NeurIPS 2023 D&B] EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images [pdf] [code]
  • [TETCI 2023] Parameter-Efficient Transfer Learning for Medical Visual Question Answering [pdf]
  • [MICCAI 2023] Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting [pdf] [code]
  • [arXiv 2023] BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering [pdf] [demo]

Medical Vision-Language Model

2022

  • [EMNLP 2022] Medclip: Contrastive learning from unpaired medical images and text [pdf] [code]
  • [NeurIPS W 2022] Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains [pdf]
  • [ACL 2022] ViLMedic: a framework for research at the intersection of vision and language in medical AI [pdf] [code]
  • [MICCAI 2022] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training [pdf] [code]
  • [JBHI 2022] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training [pdf] [code]
  • [AAAI 2022] Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation [pdf]
  • [JBHI 2022] Vision-language transformer for interpretable pathology visual question answering [link]
  • [arXiv 2022] RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [pdf]
  • [ECCV 2022] Making the most of text semantics to improve biomedical vision–language processing [pdf]
  • [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
  • [NeurIPS 2022] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning [pdf] [code]
  • [MICCAI 2022] Berthop: An effective vision-and-language model for chest x-ray disease diagnosis [pdf]

2023

  • [TMI 2023] LViT: Language meets Vision Transformer in Medical Image Segmentation [pdf] [code]
  • [ICCV 2023] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts [pdf] [code]
  • [ICCV 2023] CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection [pdf] [code]
  • [arXiv 2023] Towards General Purpose Medical AI: Continual Learning Medical Foundation Model [pdf]
  • [arXiv 2023] Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing [pdf] [code]
  • [ICLR 2023] Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [pdf] [code]
  • [ICLR 2023] Advancing Radiograph Representation Learning with Masked Record Modeling [pdf] [code]
  • [arXiv 2023] ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax [pdf]
  • [MICCAI 2023] PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents [pdf]
  • [arXiv 2023] ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [pdf][code]
  • [ICCV 2023] MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training [pdf] [project]
  • [CVPR 2023] Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing [pdf]
  • [CVPR W 2023] One-shot and Partially-Supervised Cell Image Segmentation Using Small Visual Prompt [pdf]
  • [arXiv 2023] CLIP-Lung: Textual Knowledge-Guided Lung Nodule Malignancy Prediction [pdf]
  • [MICCAI 2023] UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner [pdf] [code]
  • [ICCV 2023] UniverSeg: Universal Medical Image Segmentation [pdf] [project website]
  • [arXiv 2023] Bi-VLGM : Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation [pdf]
  • [arXiv 2023] Prompt-based Tuning of Transformer Models for Multi-Center Medical Image Segmentation [pdf]
  • [arXiv 2023] FoPro-KD: Fourier Prompted Effective Knowledge Distillation for Long-Tailed Medical Image Recognition [pdf]
  • [arXiv 2023] ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs [pdf][code]
  • [arXiv 2023] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [pdf] [code]
  • [arXiv 2023] BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks [pdf] [code]
  • [CHIL 2023] Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark [pdf] [code]
  • [arXiv 2023] Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias [pdf]
  • [arXiv 2023] OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue [pdf] [code]
  • [ICML W 2023] A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis [pdf]
  • [MICCAI 2023] M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization [pdf] [code]
  • [arXiv 2023] Towards Generalist Biomedical AI [pdf] [Med-PaLM]
  • [MICCAI 2023] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training [pdf] [code]
  • [MICCAI 2023] Unified Medical Image-Text-Label Contrastive Learning With Continuous Prompt [pdf]
  • [arXiv 2023] Few-shot medical image classification with simple shape and texture text descriptors using vision-language models [pdf] [code]
  • [ICML W 2023] Med-Flamingo: a Multimodal Medical Few-shot Learner [pdf] [code]
  • [MICCAI 2023] Ariadne's Thread: Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images [pdf] [code]
  • [arXiv 2023] A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert knowledge in text supervision [pdf] [code]
  • [arXiv 2023] Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models [pdf]
  • [ICCV 2023] ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data [pdf] [code]
  • [arXiv 2023] IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training [pdf]
  • [arXiv 2023] Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images [pdf]
  • [MICCAI 2023] CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training [pdf] [code]
  • [arXiv 2023] BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys [pdf] [project]
  • [arXiv 2023] Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare [pdf] [code]
  • [NeurIPS 2023 D&B] LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [pdf] [code]
  • [arXiv 2023] Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data [pdf] [code]
  • [arXiv 2023] Unified Medical Image Pre-training in Language-Guided Common Semantic Space [pdf]
  • [arXiv 2023] RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization [pdf]
  • [arXiv 2023] MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model [pdf]
  • [arXiv 2023] G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training [pdf]
  • [npj digital medicine] A medical multimodal large language model for future pandemics [pdf]
  • [arXiv 2023] MEDAGENTS: Large Language Models as Collaborators for Zero-shot Medical Reasoning [pdf] [code]
  • [arXiv 2023] A Foundational Multimodal Vision Language AI Assistant for Human Pathology [pdf]
  • [arXiv 2023] UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts [pdf]
  • [arXiv 2023] ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training [pdf] [code]

2024

  • [ICASSP 2024] Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training [pdf]

About

A collection of resources on applications of multi-modal learning in medical imaging.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published