Skip to content

armanrasta/yoho

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎯 YOHO - You Only Hear Once

Real-time audio event detection inspired by YOLO's philosophy, adapted for temporal audio processing.

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/armanrasta/yoho
cd yoho

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

Training

python train.py \
  --audio_dir /path/to/audio/files \
  --annotations /path/to/annotations.json \
  --num_classes 10 \
  --epochs 100 \
  --batch_size 16 \
  --save_dir checkpoints

Detection

python detect.py \
  --model_path checkpoints/yoho_best.pth \
  --audio_path test_audio.wav \
  --num_classes 10 \
  --confidence_thresh 0.5 \
  --visualize

📁 Project Structure

yoho/
├── 🐍 train.py                 # Training script
├── 🔍 detect.py                # Detection script  
├── 📚 requirements.txt         # Dependencies
├── ⚙️ setup.py                # Package setup
├── 📖 README.md               # This file
├── 📊 example_annotations.json # Example data format
├── 🎵 yoho/                   # Core YOHO package
│   ├── __init__.py
│   ├── 🧠 model.py            # YOHO architecture
│   ├── 📉 loss.py            # YOHO loss function
│   ├── 🎼 data.py            # Dataset & feature extraction
│   ├── 🏋️ trainer.py         # Training utilities
│   └── 🔮 detector.py        # Inference engine
└── 🔧 utils/                  # Utility functions
    ├── __init__.py
    ├── ⚓ anchors.py          # Anchor calculation
    └── 📈 evaluation.py      # Evaluation metrics

🎯 Key Features

  • ⚡ Real-time Detection: Single-pass inference like YOLO
  • 🎵 Multi-scale Architecture: Detects events at different temporal resolutions
  • 🔊 Professional Audio Processing: Mel-spectrograms, MFCCs, and more
  • 🔄 Data Augmentation: Audio-specific augmentations for robustness
  • 📊 Visualization: Detection results with audio waveform and spectrogram
  • 🏭 Production Ready: Proper training pipeline and model checkpointing

🧠 Model Architecture

YOHO adapts YOLO's core principles for audio:

  • Backbone: CNN with residual connections for temporal feature extraction
  • Neck: Feature pyramid network for multi-scale feature fusion
  • Heads: Multiple detection heads for different temporal resolutions
  • Anchors: Optimized for typical audio event durations

📊 Input Format

Audio Annotations (JSON)

{
  "audio1.wav": [
    [1.2, 2.5, 0, 1.0],  // [start_time, end_time, class_id, confidence]
    [3.1, 4.0, 2, 1.0]
  ],
  "audio2.wav": [
    [0.5, 1.8, 1, 1.0]
  ]
}

🎼 Supported Features

  • Mel-spectrograms (recommended)
  • MFCCs with delta features
  • Log-spectrograms
  • Combined features (Mel + MFCC)

⚡ Performance

  • Real-time capable on modern GPUs
  • Multi-event detection in single audio clip
  • Temporal localization with start/end times
  • Class confidence scores for each detection

🔧 Configuration

Key training parameters:

--num_classes 10      # Number of event classes
--batch_size 16       # Training batch size  
--lr 1e-4            # Learning rate
--epochs 100         # Training epochs
--feature_type mel_spectrogram  # Feature extraction method

📈 Evaluation Metrics

  • Event-based F1 Score: Temporal matching with tolerance
  • Precision/Recall: Standard detection metrics
  • Temporal IoU: Intersection-over-Union for time segments

🚀 Use Cases

  • 🎵 Music Analysis: Chord detection, beat tracking
  • 🔊 Sound Event Detection: Environmental sounds, alarms
  • 🎬 Audio Analysis: Scene segmentation, event tagging
  • 🦻 Healthcare: Cough detection, heart sound analysis
  • 🐾 Bioacoustics: Animal call detection

🛠️ Customization

Adding New Feature Extractors

class CustomFeatureExtractor(AudioFeatureExtractor):
    def forward(self, waveform, feature_type='custom'):
        if feature_type == 'custom':
            # Your custom feature extraction
            return custom_features

Modifying Model Architecture

class CustomYOHO(YOHO):
    def _build_backbone(self):
        # Your custom backbone
        return custom_backbone

📚 Citation

If you use YOHO in your research, please cite:

@software{yoho2024,
  title = {YOHO: You Only Hear Once for Audio Event Detection},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/armanrasta/yoho}
}

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


YOHO - Because you should only have to hear it once! 🎯

About

YOHO - You Only Hear Once

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages