This project provides components for predicting audio input and showing its corresponding spectrogram representation. The repository contains a backend and a sample frontend Kivy - App which periodically calls the backend REST endpoints for prediction and spectrogram updates.
/server
directory containing all backend components
/viewer
directory with a sample python GUI
Use Python 3 and Anaconda in order to get the code running for guarantee!
Version used during development: 3.6.7
python setup.py develop
- Execute conda environment.yaml with
conda env create -f environment.yml
The figure below should give a basic understanding of the information flow and the components involved in the running backend system:
The file config.py includes some options which can be set before backend startup. Before the backend can be started, please set the variable PROJECT_ROOT
to the absolute path of the projects root directory.
The current predictions and spectrograms can be accessed via a REST interface. Following command starts the backend:
python server/webserver.py
In order to guarantee independence of programming languages, all output can be accessed by calling URL endpoints.
Http-Method | GET |
Response Content-Type | JPEG |
URL | http://127.0.0.1:5000/live_visual |
Return | a JPEG from the current visualization (e.g. spectrogram) |
There is an additional endpoint to display the same content in the browser:
http://127.0.0.1:5000/live_visual_browser
Example output:
Http-Method | GET |
Response Content-Type | JSON |
URL | http://127.0.0.1:5000/live_pred |
Return | 2D array of most current class probabilities with respect to the currently selected predictor |
Example response: [["Acoustic_guitar", 0.0006955251446925104, 0], ["Applause", 0.0032770668622106314, 1], ...]
1. element
: category name
2. element
: probability of prediction for this class
3. element
: positional argument (can be used to if special order of displayed classes is desired)
Http-Method | GET |
Response Content-Type | JSON |
URL | http://127.0.0.1:5000/audiofile_list |
Return | the available audio files as a list of json objects |
Example response: [{"id": 0, "displayname": "Trumpets"}, {"id": 1, "displayname": "Song1"}, {"id": 2, "displayname": "Song2"}, ...]
Http-Method | GET |
Response Content-Type | JSON |
URL | http://127.0.0.1:5000/pred_list |
Return | the available prediction models as a list of json objects |
Example response: [{"id": 0, "displayname": "DCASEPredictor", "classes": "41", "description": "sample description for dcase"}, {"id": 1, "displayname": "SportsPredictor", "classes": "3", "description": "sample description for detecting sports"}, ...]
The audio tagger backend implements another endpoint to change the audio source as well as the currently active prediction model on the fly.
Http-Method | POST |
Request Body | JSON |
URL | http://127.0.0.1:5000/settings |
Example request body: {'isLive': 1, 'file': 0, 'predictor': 1}
where
isLive
: 1 -> true, 0 -> false
file
: id of the selected file
predictor
: id of the predictor
One can add new predictors by editing the CSV-file predictors.csv.
The next few steps show how to integrate a predictor into the backend system of the audio tagger:
-
Extend predictors.csv with the properties of the new predictor
- Important note: Make sure that the given path in column
predictorClassPath
correctly identifies the path to the wrapper class. Otherwise, the backend cannot find the new predictor. There are already 2 predictors included. Have a look at this.
- Important note: Make sure that the given path in column
-
Implement a predictor such that it inherits from
PredictorContract
(see here). -
Inform the manager once a new prediction has made with the function
onNewPredictionCalculated(probabilities)
- parameter
probabilities
:[["class1", 0.0006955251446925104, 0], ["class2", 0.0032770668622106314, 1], ...]
1. element
: category name
2. element
: probability of prediction for this class
3. element
: positional argument (can be used to if special order of displayed classes is desired)
- parameter
Consumers should rely on the global timing variable tGroundTruth
which is provided by AudioTaggerManager
. This counter variable should guarantee synchronization among consumers.
For further information read the corresponding documentation and have a look at the existing predictors (see here).
One can equip the backend with new selectable WAV files by editing the CSV-file sources.csv.
The csv-file is of the following form:
id;displayname;path
0;ExampleFile1;pathToWAVfile/file1.wav
1;ExampleFile2;pathToWAVfile/file2.wav
2;ExampleFile3;pathToWAVfile/file3.wav
The /viewer directory contains a sample GUI for the audio tagger backend. The app is based on the Python framework Kivy. GUI startup can be easily performed with the following command:
python viewer/startup.py
Important: The GUI requires a started instance of server/webserver.py
to start up.
If you want to dive deeper into the source code of audio tagger backend, please have a look at the detailed documentation.
Please contact the Institute for Computational Perception at Johannes Kepler University in Linz for questions regarding usage and code.