This repository contains the official implementation of our paper MoRA, which addresses the challenging problem of multimodal learning with missing modalities. Our method introduces a novel low-rank adaptation approach specifically designed for robust visual recognition when one or more modalities are absent during training or inference.
- ✨ Robust to Missing Modalities: Handles missing text, image, or both modalities
- 🚀 Efficient Fine-tuning: Low-rank adaptation for parameter-efficient training
- 🎯 State-of-the-art Performance: Evaluated on MM-IMDb, Hateful Memes, and Food-101
- Python >= 3.12
- PyTorch >= 2.6.0
- CUDA compatible GPU (recommended)
We recommend using uv for fast dependency management:
# Clone the repository
git clone https://github.com/Tree-Shu-Zhao/MoRA.git
cd MoRA
# Install dependencies using uv
uv sync
# Activate the virtual environment
source .venv/bin/activatePlease refer to DATA.md for detailed instructions on downloading and organizing the datasets.
After organizing the dataset directories, run the preprocessing script:
bash scripts/preprocess.shThis will generate the required preprocessed files for training.
Train on different datasets with various missing modality configurations:
# Train on Hateful Memes
python src/main.py experiment=mora_hatememes
# Train on MM-IMDb
python src/main.py experiment=mora_mmimdb
# Train on Food-101
python src/main.py experiment=mora_food101Evaluate a trained model checkpoint:
# Test on Hateful Memes
python src/main.py \
experiment=mora_hatememes \
test.TEST_ONLY=True \
test.CHECKPOINT_PATH=/path/to/checkpoint.pthIf you found our paper useful, please cite it:
@article{zhao2025mora,
title={MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition},
author={Zhao, Shu and Ahuja, Nilesh and Yu, Tan and Shen, Tianyi and Narayanan, Vijaykrishnan},
journal={arXiv preprint arXiv:2511.06225},
year={2025}
}