Audio/Video Transcription System

A robust web application for transcribing audio and video content with dual transcription engines, file upload capabilities, and a modern React frontend.

Features

Dual Transcription Engines: Uses both local speech_recognition and AssemblyAI cloud transcription
Multiple Input Methods:
- Real-time audio recording
- File upload supporting various audio/video formats
Format Conversion: Automatically converts any audio/video format to proper WAV format
Robust Error Handling: Gracefully handles format issues and failed transcriptions
Modern UI: User-friendly interface with tabs, previews, and feedback mechanisms

Technologies Used

Backend

Django / Django REST Framework
speech_recognition for local transcription
AssemblyAI API integration for cloud transcription
pydub and ffmpeg for audio/video conversion

Frontend

React
MediaRecorder API
File handling and preview capabilities

Installation

Prerequisites

Python 3.7+
Node.js and npm
ffmpeg (for audio/video conversion)

Backend Setup

Clone the repository:

git clone <repository-url>
cd <project-directory>

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies:

pip install django djangorestframework pydub SpeechRecognition requests

Install ffmpeg:
- Ubuntu/Debian: sudo apt-get install ffmpeg
- macOS: brew install ffmpeg
- Windows: Download from ffmpeg.org and add to PATH

Configure Django settings:

# In settings.py
MEDIA_URL = '/media/'
MEDIA_ROOT = os.path.join(BASE_DIR, 'media')

# Optional: AssemblyAI API key
ASSEMBLYAI_API_KEY = os.environ.get('ASSEMBLYAI_API_KEY', '')

Run migrations:
```
python manage.py migrate
```

Frontend Setup

Navigate to the frontend directory:
```
cd frontend
```
Install dependencies:
```
npm install
```
Build the frontend:
```
npm run build
```

Usage

Start the Django server:
```
python manage.py runserver
```
Access the application at http://localhost:8000
Choose your input method:
- Record: Click the microphone icon to start recording, and click again to stop
- Upload: Switch to the upload tab, select a file, and click "Transcribe"
View the transcription results displayed on the page

API Endpoints

POST /api/transcriptions/: Submit audio for transcription
- Accepts multipart/form-data with an audio file
- Returns transcription text and status
GET /api/transcriptions/: List all transcriptions
- Returns a list of all transcription records

Error Handling

The system handles several types of errors:

Unsupported file formats
Corrupted audio files
Network issues with AssemblyAI
Speech recognition failures

All errors are properly reported to the user interface with helpful messages.

How It Works

Audio Input: The system accepts either recorded audio or uploaded files
Format Conversion: pydub converts any format to proper WAV format
Transcription Engine Selection:
- If AssemblyAI API key is available, it uses cloud transcription
- Otherwise, it falls back to local speech_recognition
Processing: Audio is processed and converted to text
Result Display: Transcription text is returned to the UI

Future Improvements

Enhanced Transcription Quality:
- Implement noise reduction preprocessing
- Add speaker diarization (identify different speakers)
- Support for specialized vocabularies or domains
User Experience:
- Real-time transcription streaming during recording
- Progress indicators for longer files
- Interactive transcript editor for corrections
- Save and edit transcript history
Performance Optimizations:
- Background processing for large files
- Caching mechanism for previously processed audio
- Chunked processing for very large files
Additional Features:
- Multi-language support and language detection
- Timestamp generation for each sentence or paragraph
- Sentiment analysis integration
- Export options (TXT, SRT, DOCX, etc.)
- Audio/video bookmarking based on transcript content
Advanced Integrations:
- Multiple transcription API options (Google, Azure, etc.)
- Integration with content management systems
- Automated summarization of transcripts
- Keyword extraction and topic modeling
Security Enhancements:
- End-to-end encryption for sensitive audio content
- Enhanced access controls for shared transcriptions
- Compliance features for regulated industries
Deployment & Scaling:
- Containerization with Docker
- CI/CD pipeline setup
- Load balancing for high-traffic implementations
- Dedicated worker processes for transcription tasks

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app-frontend		app-frontend
app_backend		app_backend
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Audio/Video Transcription System

Features

Technologies Used

Backend

Frontend

Installation

Prerequisites

Backend Setup

Frontend Setup

Usage

API Endpoints

Error Handling

How It Works

Future Improvements

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

Uh oh!

mutuajames/speech-to-text

Folders and files

Latest commit

History

Repository files navigation

Audio/Video Transcription System

Features

Technologies Used

Backend

Frontend

Installation

Prerequisites

Backend Setup

Frontend Setup

Usage

API Endpoints

Error Handling

How It Works

Future Improvements

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages