The official implementation of MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection. (Submiited to ICASSP 2025)
Authors: Zehao Wang, Haobo Yue, Zhicheng Zhang, Da Mu, Jin Tang,Jianqin Yin
Code will be released soon!
By comparing the feature maps of the long-term, short-term audio adapter with the spectrograms of the input data, we can see that most of the time-frequency patterns modeled by short-term audio adapter are temporally isolated and disjoint. In contrast, the long-term audio adapter's patterns and their neighbors are in a whole, thereby forming a special time-frequency representation. We compare the aggregated feature map using the DBMF module with the feature map of the baseline, we can see that the feature map of the baseline is blurred, as shown in the second row of the figure. After aggregating the local and global feature, the information within the DBMF feature map is more prominent, as shown in the third row of the figure.MTDA-HSED is evaluated on DESED and Mestro
Model | PSDS1 |
PSDS1(sed score) |
mpAUC |
---|---|---|---|
Baseline | 0.494 | 0.499 | 0.709 |
ATST-SED | 0.297 | 0.301 | 0.554 |
MONA | 0.497 | 0.507 | 0.709 |
ADAPTER | 0.494 | 0.503 | 0.704 |
ACT-NET | 0.308 | 0.316 | 0.696 |
M3A(ours) | 0.503 | 0.511 | 0.753 |
DBMF(ours) | 0.494 | 0.501 | 0.748 |
MTDA-HSED(ours) | 0.503 | 0.514 | 0.757 |
If this repository helped your works, please cite papers below! 😘
@article{wang2024mtda,
title={MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection},
author={Wang, Zehao and Yue, Haobo and Zhang, Zhicheng and Mu, Da and Tang, Jin and Yin, Jianqin},
journal={arXiv preprint arXiv:2409.06196},
year={2024}
}