Chuanyu Sun1* Jiqing Zhang2* Yang Wang1 Huilin Ge3 Qianchen Xia4 Baocai Yin5 Xin Yang1
1Key Laboratory of Social Computing and Cognitive Intelligence, Dalian University of Technology;
2Dalian Maritime University 3Jiangsu University of Science and Technology
4The Future Laboratory, Tsinghua University 5Beijing University of Technology
*Equal Contribution
Project Page
⭐ Our paper has been accepted by CVPR2025 !
Combining the advantages of conventional and event cameras for robust visual tracing has drawn extensive interest. However, existing tracking approaches heavily engage in complex cross-modal fusion modules, leading to higher computational complexity and training challenges. Besides, these methods generally ignore the effective integration of historical information, which is crucial to grasping the change in the target's appearance and motion trends. Given the recent advancements in Mamba's long-range modeling and linear complexity, we explore its potential in addressing the above issues in RGBE tracking tasks. Specifically, we first propose an efficient fusion module based on Mamba, which utilizes a simple gate-based interaction scheme to achieve effective modality-selective fusion. This module can be seamlessly integrated into the encoding layer of prevalent Transformer-based backbones. Moreover, we further present a novel historical decoder that leverages Mamba's advanced long sequence modeling to effectively capture the target appearance changes with autoregressive queries. Extensive experiments show that our proposed approach achieves state-of-the-art performance on multiple challenging short-term and long-term RGBE benchmarks. Besides, the effectiveness of each key Mamba-based component of our approach is evidenced by our thorough ablation study.
If you find this project useful, please consider citing:
@inproceedings{mamtrack,
title={Exploring Historical Information for RGBE Visual Tracking with Mamba},
author={Sun, Chuanyu and Zhang, Jiqing and Wang, Yang and Ge, Huilin and Xia, Qianchen and Yin, Baocai and Yang, Xin},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
year={2025}
}