Empowering Environmental Sound Recognition for Hearing Accessibility
TimeScaleNet leverages Deep Learning techniques to process raw audio waveforms with multi-resolution analysis. Initially introduced by researchers at the CNAM, it features BiquadNet (a learnable passband IIR filter layer) and FrameNet (a residual network with depthwise separable convolutions) to analyze sounds at sample and frame scales. Our project optimizes TimeScaleNet for real-world deployment on portable devices, achieving significant accuracy improvements.
├── app/ # Next.js frontend application
├── backend/ # FastAPI backend for prediction and preprocessing
├── public/ # Static files for the frontend
├── timescalenet_model.h5 # Optimized ESC-10 TimeScaleNet model
├── timescalenet_urbansound8k.h5 # Optimized UrbanSound8K model
- Frontend: Next.js
- Backend: FastAPI
- Deep Learning Framework: TensorFlow
- Audio Preprocessing: Librosa
- Python 3.8+
- Node.js 14+
- TensorFlow and other dependencies (
pip install -r requirements.txt
)
-
Clone the repository:
git clone https://github.com/your-repo/TimeScaleNetApi.git cd TimeScaleNetApi
-
Start the FastAPI backend:
cd backend uvicorn main:app --reload
-
Start the Next.js frontend:
cd ../app npm install npm run dev
-
Access the application at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
-
ESC-10: Environmental Sound Classification (10 classes).
- Accuracy: 89%
- F1 Score: 0.88
-
UrbanSound8K: Urban sound recognition (10 classes).
- Accuracy: 94%
- ESC-10: Boosted from 69% to 89% with enhanced contextual learning.
- UrbanSound8K: Achieved state-of-the-art performance with 94% accuracy.
- Researchers at CNAM for developing TimeScaleNet and laying the groundwork for our project.
- Our coach and mentors for their continuous support and guidance.
- Our team, whose collaboration and dedication drove this project to success.
© Timescalenet : A Multiresolution Approach for Raw Audio Recognition : https://ieeexplore.ieee.org/document/8682378