Yuhao Wang · Yang Liu · Aihua Zheng · Pingping Zhang*
DeMo is an advanced multi-modal object Re-Identification (ReID) framework designed to tackle dynamic imaging quality variations across modalities. By employing decoupled features and a novel Attention-Triggered Mixture of Experts (ATMoE), DeMo dynamically balances modality-specific and modality-shared information, enabling robust performance even under missing modality conditions. The framework sets new benchmarks for multi-modal and missing-modality object ReID.
- We released the DeMo codebase and paper! 🚀 Paper
- Great news! Our paper has been accepted to AAAI 2025! 🎉
Multi-modal object ReID combines the strengths of different modalities (e.g., RGB, NIR, TIR) to achieve robust identification across challenging scenarios. DeMo introduces a decoupled approach using Mixture of Experts (MoE) to preserve modality uniqueness and enhance diversity. This is achieved through:
- Patch-Integrated Feature Extractor (PIFE): Captures multi-granular representations.
- Hierarchical Decoupling Module (HDM): Separates modality-specific and shared features.
- Attention-Triggered Mixture of Experts (ATMoE): Dynamically adjusts feature importance with adaptive attention-guided weights.
- Introduced a decoupled feature-based MoE framework, DeMo, addressing dynamic quality changes in multi-modal imaging.
- Developed the Hierarchical Decoupling Module (HDM) for enhanced feature diversity and Attention-Triggered Mixture of Experts (ATMoE) for context-aware weighting.
- Achieved state-of-the-art performance on RGBNT201, RGBNT100, and MSVR310 benchmarks under both full and missing-modality settings.
- RGBNT201: Google Drive
- RGBNT100: Baidu Pan (Code:
rjin) - MSVR310: Google Drive
- RGBNT201:
configs/RGBNT201/DeMo.yml - RGBNT100:
configs/RGBNT100/DeMo.yml - MSVR310:
configs/MSVR310/DeMo.yml
conda create -n DeMo python=3.8.12 -y
conda activate DeMo
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
cd (your_path)
pip install -r requirements.txt
python train_net.py --config_file configs/RGBNT201/DeMo.yml- This repository is based on MambaPro. The prompt and adapter tuning on the CLIP backbone are reserved (the corresponding hyperparameters are set to
False), allowing users to explore them independently. - This code provides multi-modal Grad-CAM visualization, multi-modal ranking list generation, and t-SNE visualization tools to facilitate further research.
- The hyperparameter configuration is designed to ensure compatibility with devices equipped with less than 24GB of memory.
- Thank you for your attention and interest!
If you find DeMo helpful in your research, please consider citing:
@inproceedings{wang2025DeMo,
title={DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification},
author={Wang, Yuhao and Liu, Yang and Zheng, Aihua and Zhang, Pingping},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2025}
}








