AI for Sound (AI4S) live sound recognition demo

Real-time sound recognition demo using Efficient PANNs (E-PANNs).

The E-PANNs [1, 2] {~92MB, 24M parameters} are obtained from the original PANNs (CNN14) {~312MB, 81M parameters)[3].

A sample video of demo can be viewed at: (https://youtu.be/HiZw0pGXGQk)

Authors

[1] Arshdeep Singh, Haohe Liu and Mark D PLumbley, "E-PANNS: Sound Recognition using Efficient Pre-Trained Audio Neural Networks", accepted in Internoise 2023.

[2] Singh, Arshdeep, and Mark D. Plumbley. "Efficient CNNs via Passive Filter Pruning." arXiv preprint arXiv:2304.02319 (2023).

[3] Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley. "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition." arXiv preprint arXiv:1912.10211 (2019).

and the current demonstration shown in above Figure is a modification on the top of the AI4S project (https://ai4s.surrey.ac.uk/) previous demo https://github.com/yinkalario/General-Purpose-Sound-Recognition-Demo built by Yin Cao and Andres Fernandez.

The prediction/confidence in the above Figure is obtained by applying the audio tagging system (E-PANNs) on consecutive short audio segments.

Installation

Software dependencies:

We recommend using Anaconda to install the dependencies as follows:

conda create -n panns python=3.7
conda activate panns
pip install -r requirements.txt
conda install -c anaconda pyaudio

A comprehensive list of working dependencies can be found in the full_dependencies.txt file.

Model architecture

PANNs (CNN14)

E-PANNs model, (Efficient CNN14), model size ~ 92.53MB):

Download the model into your preferred <model_location> from https://doi.org/10.5281/zenodo.7939403 <model_location> == <repo_root>/models

RUN

Assuming our command line is on <repo_root>, the panns environment is active and the model has been downloaded into <repo_root>/models, the following command should run the GUI with default parameters (tested on Ubuntu20.04):

python demo_tag.py

Note that the terminal will print model summary upon start. The syntax to alter them is the same as with MODEL_PATH, e.g. to change the number of classes displayed to 10, add TOP_K=10.

Experimental setup/Analysis

Overall flow diagram to obtain E-PANNs from original PANNs

Convergence plot of E-PANNs when 25%, 50% and 75% filters are removed from the C7 to C13 layers of PANNS (CNN14).

Parameters vs mAPs across frameworks for Audio tagging

Citation

[1] Arshdeep Singh, Haohe Liu and Mark D PLumbley, "E-PANNS: Sound Recognition using Efficient Pre-Trained Audio Neural Networks", accepted in Internoise 2023.

[2] Singh, Arshdeep, and Mark D. Plumbley. "Efficient CNNs via Passive Filter Pruning." arXiv preprint arXiv:2304.02319 (2023).

Acknowldgements

This work was supported by Engineering and Physical Sciences Research Council (EPSRC) Grant EP/T019751/1 “AI for Sound (AI4S)”. Project link: https://ai4s.surrey.ac.uk/

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
config		config
models		models
sed_demo		sed_demo
LICENSE		LICENSE
README.md		README.md
demo_tag.py		demo_tag.py
requirements.txt		requirements.txt
updated_SANE_EPANNs_Poster_ICASSP_2023.pdf		updated_SANE_EPANNs_Poster_ICASSP_2023.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI for Sound (AI4S) live sound recognition demo

Real-time sound recognition demo using Efficient PANNs (E-PANNs).

A sample video of demo can be viewed at: (https://youtu.be/HiZw0pGXGQk)

Authors

Installation

Software dependencies:

Model architecture

PANNs (CNN14)

E-PANNs model, (Efficient CNN14), model size ~ 92.53MB):

RUN

Experimental setup/Analysis

Overall flow diagram to obtain E-PANNs from original PANNs

Convergence plot of E-PANNs when 25%, 50% and 75% filters are removed from the C7 to C13 layers of PANNS (CNN14).

Parameters vs mAPs across frameworks for Audio tagging

Citation

Related links:

Acknowldgements

About

Releases

Packages

Languages

License

Arshdeep-Singh-Boparai/E-PANNs

Folders and files

Latest commit

History

Repository files navigation

AI for Sound (AI4S) live sound recognition demo

Real-time sound recognition demo using Efficient PANNs (E-PANNs).

A sample video of demo can be viewed at: (https://youtu.be/HiZw0pGXGQk)

Authors

Installation

Software dependencies:

Model architecture

PANNs (CNN14)

E-PANNs model, (Efficient CNN14), model size ~ 92.53MB):

RUN

Experimental setup/Analysis

Overall flow diagram to obtain E-PANNs from original PANNs

Convergence plot of E-PANNs when 25%, 50% and 75% filters are removed from the C7 to C13 layers of PANNS (CNN14).

Parameters vs mAPs across frameworks for Audio tagging

Citation

Related links:

Acknowldgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages