This project evaluates spoken responses from users based on several linguistic parameters including:
- Fluency
- Vocabulary
- Grammar
- Topic Relevance
It uses OpenAI's Whisper for transcription, spaCy for NLP processing, and FastAPI to expose the functionality via a simple API.
- 🎙 Upload audio responses (e.g.,
.mp3,.mp4,.wav) - 🧠 Automatic transcription using Whisper
- ✍️ Evaluate linguistic features (fluency, vocabulary, etc.)
- 🚀 FastAPI-based backend for scalable deployment
- 📊 Returns structured JSON scores per parameter
audio_abex/
├── audio_evaluator.py # Core logic to analyze transcription
├── main.py # FastAPI server
├── utils/ # Helper functions and tools
├── venv/ # Virtual environment
└── requirements.txt # Python dependencies
git clone https://github.com/vishalgoyal316/analyze_audio.git
cd analyze_audiopython3 -m venv venv
source venv/bin/activatepip install -r requirements.txt
python -m spacy download en_core_web_smuvicorn main:app --reloadVisit http://localhost:8000/docs for Swagger API documentation.
Request:
audio_file: form-data file upload (audio format)
Response:
{
"fluency": 8.5,
"grammar": 9.0,
"vocabulary": 7.8,
"relevance": 8.2,
"transcript": "This is the spoken content."
}MIT License. See LICENSE file for details.
Pull requests are welcome. For major changes, please open an issue first to discuss what you'd like to change.