Skip to content
/ GRF Public

Gated Recurrent Fusion (GRF): a compact, scalable multimodal fusion model that achieves competitive performance to MulT with 3× fewer parameters.

Notifications You must be signed in to change notification settings

Vixel2006/GRF

Repository files navigation

🚀 Gated Recurrent Fusion (GRF): Efficient Multimodal Learning with Fewer Parameters

📄 Preprint: arXiv:2507.02985
🧪 Submitted to: ACMMM 2025 Workshop UAVM 🔥 TL;DR: We propose GRF, a lightweight gated recurrent fusion module that compete with MulT on CMU-MOSI unaligned setting using 1.5x fewer parameters.


🧠 Overview

Multimodal models like MulT are powerful but suffer from computational overhead and sensitivity to alignment. We propose Gated Recurrent Fusion (GRF) — a sparse, parameter-efficient fusion module that captures cross-modal dynamics without relying on input alignment.


📦 Features

  • ✅ Gated Recurrent Fusion (GRF) for unaligned multimodal fusion
  • ✅ Modular design compatible with any modality order
  • ✅ Includes ablation studies and visualization tools
  • ✅ Reproducible experiments using config files and MLflow

📊 Results

Model F1 Score Parameters Dataset Alignment
MulT 81 ~8M CMU-MOSI Unaligned
GRF (Ours) 79 4.5M CMU-MOSI Unaligned

For full details, see the arXiv paper.


📁 Project Structure

GRF
├── src/ # GRF model definitions, data handlers, utils, and engine
├── data/ # Dataloaders for CMU-MOSI
├── train.py # Training script
├── configs/ # YAML config files for training setups
├── requirements.txt # Python dependencies
└── README.md # You're here

🔧 Installation

Clone the repo and install dependencies:

git clone https://github.com/yushi2006/GRF.git
cd GRF
pip install -r requirements.txt

🏃‍♂️ Quick Start

Train the GRF Model

chmod +x run_experiments.sh

./run_experiments.sh

📚 Citation

@misc{shihata2025gatedrecursivefusionstateful,
      title={Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers}, 
      author={Yusuf Shihata},
      year={2025},
      eprint={2507.02985},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.02985}, 
}

About

Gated Recurrent Fusion (GRF): a compact, scalable multimodal fusion model that achieves competitive performance to MulT with 3× fewer parameters.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published