An acoustic system for detecting and classifying drones based on their sound signatures, primarily using the Audio Spectrogram Transformer (AST) model.
# Clone and enter the project directory (if you haven't already)
git clone https://github.com/preszzz/hover.git
cd hover
cp .env.example .env
# Install dependencies
uv sync
# Activate the virtual environment
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # WindowsThis project uses a pre-trained Audio Spectrogram Transformer (AST) model from Hugging Face for binary classification of audio signals (drone vs. non-drone). Key features include:
- AST Model: Leverages "MIT/ast-finetuned-audioset-10-10-0.4593" as a base. The model was fine-tuned on a substantial dataset (approximately 585k training samples, 32k validation samples, and 32k test samples) for the drone detection task.
- Fine-tuned Model on Hugging Face Hub: The fine-tuned model is available on the Hugging Face Hub: Drone Audio Detection
- On-the-Fly Feature Extraction: Uses
ASTFeatureExtractorto convert raw audio (resampled to 16kHz) into spectrograms during data loading. No pre-computation and storage of spectrograms is required. - Hugging Face
datasets: Manages data loading and transformations. Supports loading from Hugging Face Hub or local audio folders. - PyTorch Framework: Model training and evaluation are implemented in PyTorch.
- Hyperparameter Tuning: An Optuna-based script (
hyperparameter/tune.py) is provided for finding optimal hyperparameters. Results (best parameters) are logged to the console. - Configuration: Core settings are managed in
src/config.py. Environment variables (e.g., Hugging Face dataset and model IDs) are loaded via a.envfile.
The system primarily uses Hugging Face datasets for data input.
- Hugging Face Hub: You can load datasets directly from the Hub (e.g.,
load_dataset("your_username/your_dataset_name")). - Local Audio Files: Use
load_dataset("audiofolder", data_dir="data/raw/your_audio_data"). Your audio files should be organized into subdirectories where each subdirectory name corresponds to a label (e.g.,drone,non_drone).data/raw/your_audio_data/ ├── drone/ │ ├── drone_audio_001.wav │ └── ... └── non_drone/ ├── ambient_sound_001.wav └── ...
The AST feature extractor requires audio to be at a 16kHz sampling rate. The data loading script (src/feature_engineering/feature_loader.py) handles casting the audio to this rate.
-
Environment Setup & Configuration (
.env,src/config.py):- Create a
.envfile from.env.exampleand fill inHF_DATASET_ID(your dataset on Hugging Face Hub) andHF_MODEL_ID(the specific pre-trained or fine-tuned model ID on Hugging Face Hub to be used, e.g.,preszzz/your-fine-tuned-model). - Review base parameters in
src/config.pylike batch size, learning rate, number of epochs if needed. These can be overridden by hyperparameter tuning.
- Create a
-
Hyperparameter Tuning (Optional, Recommended):
- The script
src/hyperparameter/tune.pyuses Optuna with Hugging FaceTrainer'shyperparameter_searchmethod. - It defines search spaces for hyperparameters like learning rate, batch size, etc.
- Outputs from trials (checkpoints, logs) are saved in
output_models/hpo_trainer_output/. - The best hyperparameters are logged to the console.
uv run src/hyperparameter/tune.py
- Update
src/config.pyor a separate optimal config file with the best parameters found, or use them to inform the next training run.
- The script
-
Training (
src/training/train.py):
- This script trains the AST model using settings from
src/config.py(or your optimal config). - It loads data, performs on-the-fly feature extraction, and saves the best model checkpoint (based on validation performance) to
trained_models_pytorch/.
uv run src/training/train.py- Evaluation (
src/training/evaluate.py):- Evaluates the trained model (specified by
config.MODEL_HUB_ID, which should point to your fine-tuned model on the Hub, or a local path if adapted) on a test set. - Outputs metrics like accuracy, precision, recall, F1-score to the console.
- A summary of evaluation results is saved to
evaluation_results/evaluation_summary.txt.
uv run src/training/evaluate.py
- Evaluates the trained model (specified by
(Note: The primary workflow for this project uses the AST model with on-the-fly feature extraction. The following describes an older MFCC-based preprocessing pipeline which may be used for other models or tasks. Its components are located in src/preprocessing/.)
This pipeline processes raw audio into MFCC features:
- Convert and Resample (
src/preprocessing/step_1_resample.py): Converts to WAV, resamples to 16kHz. - Process and Label (
src/preprocessing/step_2_process.py): Chunks audio, extracts MFCCs, applies labels viasrc/preprocessing/label_mapping.yaml.
- Configure
src/preprocessing/label_mapping.yaml. - Run:
uv run src/preprocessing/main_preprocess.pyOutput:data/processed/with.npyfeatures and.txtlabels.- Raw signal data (
.npy) - MFCC features (
.npy) - Label information (
.txt)
- Raw signal data (
- Python 3.11
uv(for package management)torchtransformersdatasetsevaluateoptunascikit-learnnumpylibrosasoundfilepython-dotenv
Installation is handled by uv sync based on pyproject.toml.
This section lists publicly available drone audio datasets. You may need to adapt their structure or use Hugging Face datasets tools to load them.
-
Audio Based Drone Detection and Identification using Deep Learning
- Sara A Al-Emadi, Abdulla K Al-Ali, Abdulaziz Al-Ali, Amr Mohamed.
- GitHub
-
Drone Detection and Classification using Machine Learning and Sensor Fusion
- Svanström F. (2020).
- GitHub
-
DREGON
- Audio-Based Search and Rescue with a Drone: Highlights from the IEEE Signal Processing Cup 2019 Student Competition. Antoine Deleforge, Diego Di Carlo, Martin Strauss, Romain Serizel, & Lucio Marcenaro. (2019). IEEE Signal Processing Magazine, 36(5), 138-144.
- Kaggle
-
DronePrint
- Harini Kolamunna, Thilini Dahanayake, Junye Li, Suranga Seneviratne, Kanchana Thilakaratne, Albert Y. Zomaya, Aruna Seneviratne.
- GitHub
-
DroneNoise Database
- Carlos Ramos-Romero, Nathan Green, César Asensio and Antonio J Torija Martinez.
- Figshare
-
ESC: Dataset for Environmental Sound Classification (General environmental sounds, useful for non-drone class)
- Piczak, Karol J.
- GitHub
-
drone-audio-detection
-
SOUND-BASED DRONE FAULT CLASSIFICATION USING MULTI-TASK LEARNING
- 29th International Congress on Sound and Vibration (ICSV29), Wonjun Yi, Jung-Woo Choi, & Jae-Woo Lee. (2023).
- Zenodo