This repository implements generative AI-based data augmentation techniques for improving bioacoustic classification in noisy environments, particularly for bird species detection at wind farm sites. We explore Auxiliary Classifier Generative Adversarial Networks (ACGAN) and Denoising Diffusion Probabilistic Models (DDPMs) to synthesize spectrograms, enhancing training data diversity without requiring extensive expert labeling.
The project includes a new audio dataset of 640 hours of bird calls recorded at wind farm sites in Ireland. Approximately 800 samples are expert-labeled, providing a challenging benchmark due to background wind and turbine noise.
-
Clone the repository:
git clone https://github.com/gibbona1/SpectrogramGenAI.git cd SpectrogramGenAI -
Install dependencies:
pip install -r requirements.txt
Ensure you have Python 3.8+.
-
Prepare Data: Either
.wavfiles or 256x256 spectrograms -
Train ACGAN:
python train_acgan.py
-
Train DDPM:
python train_ddpm.py
-
Generate Synthetic Spectrograms:
python generate_spectrograms.py --model ddpm --num_samples 1000 --output_path data/synthetic/
Inception Score
python inception_score.py <image_folder>Fréchet Inception Distance
pip install pytorch-fid
python -m pytorch_fid folder1 folder2Fréchet Audio Distance
python frechet_audio_distance.py --bg_dir folder1 --eval_dir folder2Train all classifiers on combined real and synthetic data:
python train_classifiers.pyFor more details, refer to the arXiv preprint.