🎯 YOHO - You Only Hear Once

Real-time audio event detection inspired by YOLO's philosophy, adapted for temporal audio processing.

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/armanrasta/yoho
cd yoho

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

Training

python train.py \
  --audio_dir /path/to/audio/files \
  --annotations /path/to/annotations.json \
  --num_classes 10 \
  --epochs 100 \
  --batch_size 16 \
  --save_dir checkpoints

Detection

python detect.py \
  --model_path checkpoints/yoho_best.pth \
  --audio_path test_audio.wav \
  --num_classes 10 \
  --confidence_thresh 0.5 \
  --visualize

📁 Project Structure

yoho/
├── 🐍 train.py                 # Training script
├── 🔍 detect.py                # Detection script  
├── 📚 requirements.txt         # Dependencies
├── ⚙️ setup.py                # Package setup
├── 📖 README.md               # This file
├── 📊 example_annotations.json # Example data format
├── 🎵 yoho/                   # Core YOHO package
│   ├── __init__.py
│   ├── 🧠 model.py            # YOHO architecture
│   ├── 📉 loss.py            # YOHO loss function
│   ├── 🎼 data.py            # Dataset & feature extraction
│   ├── 🏋️ trainer.py         # Training utilities
│   └── 🔮 detector.py        # Inference engine
└── 🔧 utils/                  # Utility functions
    ├── __init__.py
    ├── ⚓ anchors.py          # Anchor calculation
    └── 📈 evaluation.py      # Evaluation metrics

🎯 Key Features

⚡ Real-time Detection: Single-pass inference like YOLO
🎵 Multi-scale Architecture: Detects events at different temporal resolutions
🔊 Professional Audio Processing: Mel-spectrograms, MFCCs, and more
🔄 Data Augmentation: Audio-specific augmentations for robustness
📊 Visualization: Detection results with audio waveform and spectrogram
🏭 Production Ready: Proper training pipeline and model checkpointing

🧠 Model Architecture

YOHO adapts YOLO's core principles for audio:

Backbone: CNN with residual connections for temporal feature extraction
Neck: Feature pyramid network for multi-scale feature fusion
Heads: Multiple detection heads for different temporal resolutions
Anchors: Optimized for typical audio event durations

📊 Input Format

Audio Annotations (JSON)

{
  "audio1.wav": [
    [1.2, 2.5, 0, 1.0],  // [start_time, end_time, class_id, confidence]
    [3.1, 4.0, 2, 1.0]
  ],
  "audio2.wav": [
    [0.5, 1.8, 1, 1.0]
  ]
}

🎼 Supported Features

Mel-spectrograms (recommended)
MFCCs with delta features
Log-spectrograms
Combined features (Mel + MFCC)

⚡ Performance

Real-time capable on modern GPUs
Multi-event detection in single audio clip
Temporal localization with start/end times
Class confidence scores for each detection

🔧 Configuration

Key training parameters:

--num_classes 10      # Number of event classes
--batch_size 16       # Training batch size  
--lr 1e-4            # Learning rate
--epochs 100         # Training epochs
--feature_type mel_spectrogram  # Feature extraction method

📈 Evaluation Metrics

Event-based F1 Score: Temporal matching with tolerance
Precision/Recall: Standard detection metrics
Temporal IoU: Intersection-over-Union for time segments

🚀 Use Cases

🎵 Music Analysis: Chord detection, beat tracking
🔊 Sound Event Detection: Environmental sounds, alarms
🎬 Audio Analysis: Scene segmentation, event tagging
🦻 Healthcare: Cough detection, heart sound analysis
🐾 Bioacoustics: Animal call detection

🛠️ Customization

Adding New Feature Extractors

class CustomFeatureExtractor(AudioFeatureExtractor):
    def forward(self, waveform, feature_type='custom'):
        if feature_type == 'custom':
            # Your custom feature extraction
            return custom_features

Modifying Model Architecture

class CustomYOHO(YOHO):
    def _build_backbone(self):
        # Your custom backbone
        return custom_backbone

📚 Citation

If you use YOHO in your research, please cite:

@software{yoho2024,
  title = {YOHO: You Only Hear Once for Audio Event Detection},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/armanrasta/yoho}
}

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

YOHO - Because you should only have to hear it once! 🎯

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

🎯 YOHO - You Only Hear Once

🚀 Quick Start

Installation

Training

Detection

📁 Project Structure

🎯 Key Features

🧠 Model Architecture

📊 Input Format

Audio Annotations (JSON)

🎼 Supported Features

⚡ Performance

🔧 Configuration

📈 Evaluation Metrics

🚀 Use Cases

🛠️ Customization

Adding New Feature Extractors

Modifying Model Architecture

📚 Citation

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
utils		utils
yoho		yoho
README.md		README.md
detect.py		detect.py
example_annotations.json		example_annotations.json
setup.py		setup.py
train.py		train.py

Uh oh!

Uh oh!

armanrasta/yoho

Folders and files

Latest commit

History

Repository files navigation

🎯 YOHO - You Only Hear Once

🚀 Quick Start

Installation

Training

Detection

📁 Project Structure

🎯 Key Features

🧠 Model Architecture

📊 Input Format

Audio Annotations (JSON)

🎼 Supported Features

⚡ Performance

🔧 Configuration

📈 Evaluation Metrics

🚀 Use Cases

🛠️ Customization

Adding New Feature Extractors

Modifying Model Architecture

📚 Citation

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages