This repository includes resources on several applications of multi-modal learning in medical imaging.
Please feel free to send me pull requests or email (richard.peng.xia@gmail.com) to add links or to discuss with me about this area. Markdown format:
- [**Name of Conference or Journal + Year**] Paper Name. [[pdf]](link) [[code]](link)
- [arXiv 2022] Visual Attention Methods in Deep Learning: An In-Depth Survey [pdf]
- [arXiv 2022] Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [pdf]
- [arXiv 2022] Vision+X: A Survey on Multimodal Learning in the Light of Data [pdf]
- [NeurIPS 2021 Datasets and Benchmarks Track (Round 2)] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark [pdf] [code]
- [CVPR 2022] Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [pdf]
- [EMNLP 2018] Automated Generation of Accurate & Fluent Medical X-ray Reports [pdf] [code]
- [ACL 2018] On the Automatic Generation of Medical Imaging Reports [pdf] [code]
- [ACL 2021] Competence-based Multimodal Curriculum Learning for Medical Report Generation [pdf]
- [NeurIPS 2018] Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation [pdf]
- [CVPR 2021] Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation [pdf]
- [MICCAI 2021] AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [pdf]
- [NAACL-HLT 2021] Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation [pdf] [code]
- [MICCAI 2021] RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting [pdf][code]
- [TMI 2023] Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation [pdf]
- [EMNLP 2020] Generating Radiology Reports via Memory-driven Transformer [pdf] [code]
- [ACCV 2020] Hierarchical X-Ray Report Generation via Pathology tags and Multi Head Attention [pdf] [code]
- [MICCAI 2021] Trust It or Not: Confidence-Guided Automatic Radiology Report Generation [pdf]
- [MICCAI 2021] Surgical Instruction Generation with Transformers [pdf]
- [MICCAI 2021] Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation [pdf] [code]
- [Nature Machine Intelligence 2022] Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports [pdf] [code]
- [MICCAI 2022] A Self-Guided Framework for Radiology Report Generation [pdf]
- [ACL-IJCNLP 2021] Cross-modal Memory Networks for Radiology Report Generation [pdf] [code]
- [MICCAI 2022] A Medical Semantic-Assisted Transformer for Radiographic Report Generation [pdf]
- [MIDL 2022] Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation [pdf]
- [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
- [PMLR 2022] Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors [pdf]
- [TNNLS 2022] Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition Penalty [pdf]
- [MedIA 2022] CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation [pdf]
- [MICCAI 2022] Lesion Guided Explainable Few Weak-shot Medical Report Generation [pdf] [code]
- [arXiv 2022] Self adaptive global-local feature enhancement for radiology report generation [pdf]
- [BMVC 2022] On the Importance of Image Encoding in Automated Chest X-Ray Report Generation [pdf] [code]
- [arXiv 2022] RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [pdf]
- [arXiv 2022] DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis [pdf] [code]
- [arXiv 2023] Unified Chest X-ray and Radiology Report Generation Model with Multi-view Chest X-rays [pdf] [code]
- [ECCV 2022] Cross-modal Prototype Driven Network for Radiology Report Generation [pdf] [code]
- [WWW 2023] Auxiliary signal-guided knowledge encoder-decoder for medical report generation [pdf]
- [CVPR 2023] Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation [pdf] [code]
- [MIDL 2023] Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report Generation [pdf] [code]
- [arXiv 2023] Visual-Linguistic Causal Intervention for Radiology Report Generation [pdf] [code]
- [MIDL 2023] Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [pdf]
- [ICASSP 2023] MvCo-DoT:Multi-View Contrastive Domain Transfer Network for Medical Report Generation [pdf]
- [CHIL 2023] Token Imbalance Adaptation for Radiology Report Generation [pdf] [code]
- [arXiv 2023] Boosting Radiology Report Generation by Infusing Comparison Prior [pdf]
- [arXiv 2021] MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering [pdf]
- [MICCAI 2022] Consistency-preserving Visual Question Answering in Medical Imaging [pdf] [code]
- [TMI 2020] A Question-Centric Model for Visual Question Answering in Medical Imaging [pdf] [code]
- [arXiv 2021] Medical Visual Question Answering: A Survey [pdf]
- [MICCAI 2022] Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer [pdf] [code]
- [CLEF 2020 Working Notes] HCP-MIC at VQA-Med 2020: Effective visual representation for medical visual question answering [pdf] [code]
- [CLEF 2020 Working Notes] TeamS at VQA-Med 2021: BBN-Orchestra for long-tailed medical visual question answering [pdf] [code]
- [Nature Scientific Reports 2021] MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain [pdf]
- [ECCV 2022] Distilled Dual-Encoder Model for Vision-Language Understanding [pdf] [code]
- [arXiv 2022] A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering [pdf] [code]
- [arXiv 2022] MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering [pdf]
- [arXiv 2022] Self-supervised vision-language pretraining for Medical visual question answering [pdf] [code]
- [arXiv 2022] UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering [pdf]
- [arXiv 2023] Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning [pdf]
- [arXiv 2023] Medical visual question answering using joint self-supervised learning [pdf]
- [arXiv 2023] RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [pdf] [code]
- [arXiv 2023] Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder [pdf]
- [arXiv 2023] Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models [pdf]
- [ICLR 2023] Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [pdf] [code]
- [EMNLP 2022] Medclip: Contrastive learning from unpaired medical images and text [pdf] [code]
- [arXiv 2023] CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection [pdf] [code]
- [arXiv 2023] Towards General Purpose Medical AI: Continual Learning Medical Foundation Model [pdf]
- [NeurIPS W 2022] Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains [pdf]
- [ACL 2022] ViLMedic: a framework for research at the intersection of vision and language in medical AI [pdf] [code]
- [MICCAI 2022] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training [pdf] [code]
- [JBHI 2022] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training [pdf] [code]
- [AAAI 2022] Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation [pdf]
- [arXiv 2022] LViT: Language meets Vision Transformer in Medical Image Segmentation [pdf] [code]
- [arXiv 2023] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts [pdf] [code]
- [JBHI 2022] Vision-language transformer for interpretable pathology visual question answering [link]
- [arXiv 2022] RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [pdf]
- [ECCV 2022] Making the most of text semantics to improve biomedical vision–language processing [pdf]
- [arXiv 2023] Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing [pdf] [code]
- [MICCAI 2022] Berthop: An effective vision-and-language model for chest x-ray disease diagnosis [pdf]
- [ICLR 2023] Advancing Radiograph Representation Learning with Masked Record Modeling [pdf] [code]
- [arXiv 2023] ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax [pdf]
- [arXiv 2023] PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents [pdf]
- [arXiv 2023] Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models [pdf]
- [arXiv 2023] ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [pdf]
- [arXiv 2023] MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training [pdf] [project]
- [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
- [CVPR 2023] Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing [pdf]
- [NeurIPS 2022] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning [pdf] [code]
- [CVPR W 2023] One-shot and Partially-Supervised Cell Image Segmentation Using Small Visual Prompt [pdf]
- [arXiv 2023] CLIP-Lung: Textual Knowledge-Guided Lung Nodule Malignancy Prediction [pdf]
- [arXiv 2023] UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner [pdf] [code]
- [arXiv 2023] UniverSeg: Universal Medical Image Segmentation [pdf] [project website]