The official implementation of Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection. (Accepted by ICPR 2024)
Authors: Haobo Yue, Zhicheng Zhang, Da Mu, Yonghao Dang, Jianqin Yin, Jin Tang
Code is available now!
Full-frequency dynamic convolution (FFDConv) is proposed as the first full-dynamic method in SED. It generates frequency kernels for every frequency band, which is designed directly in the structure for frequency-dependent modeling. FFDConv physically furnished 2D convolution with the capability of frequency-dependent modeling.
Most SED models are trained in a frame-based supervised way, which always leads to the feature and output being discrete over time. FFDConv can alleviate this by frequency-dependent modeling. Besides, the convolution kernel of FFDConv for a frequency band is shared in all frames, which can produce temporally coherent representations. This is consistent with both the continuity of the sound waveform and the vocal continuity of sound events.
FFDConv is evaluated on DESED
Model | Params | PSDS1 |
PSDS2 |
EB-F1 |
IB-F1 |
---|---|---|---|---|---|
CRNN | 4M | 0.370 | 0.579 | 0.469 | 0.714 |
DDFConv | 7M | 0.387 | 0.624 | 0.467 | 0.720 |
FTDConv | 7M | 0.395 | 0.651 | 0.495 | 0.740 |
SKConv | - | 0.400 | - | 0.520 | - |
FDConv | 11M | 0.431 | 0.663 | 0.521 | 0.738 |
MFDConv | 33M | 0.461 | 0.680 | 0.542 | - |
FFDConv | 11M | 0.436 | 0.685 | 0.526 | 0.751 |
Our code is implemented based on FDY-SED and ddfnet.
Specifically, experimental environment is based on FDY-SED, and model structure is based on ddfnet.
Thanks for their great work!
If this repository helped your works, please cite papers below! 😘
@article{yue2024fullfrequency,
title={Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection},
author={Haobo Yue and Zhicheng Zhang and Da Mu and Yonghao Dang and Jianqin Yin and Jin Tang},
journal={arXiv preprint arXiv:2401.04976},
year={2024},
}