A robust web application for transcribing audio and video content with dual transcription engines, file upload capabilities, and a modern React frontend.
- Dual Transcription Engines: Uses both local speech_recognition and AssemblyAI cloud transcription
- Multiple Input Methods:
- Real-time audio recording
- File upload supporting various audio/video formats
- Format Conversion: Automatically converts any audio/video format to proper WAV format
- Robust Error Handling: Gracefully handles format issues and failed transcriptions
- Modern UI: User-friendly interface with tabs, previews, and feedback mechanisms
- Django / Django REST Framework
- speech_recognition for local transcription
- AssemblyAI API integration for cloud transcription
- pydub and ffmpeg for audio/video conversion
- React
- MediaRecorder API
- File handling and preview capabilities
- Python 3.7+
- Node.js and npm
- ffmpeg (for audio/video conversion)
-
Clone the repository:
git clone <repository-url> cd <project-directory>
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Python dependencies:
pip install django djangorestframework pydub SpeechRecognition requests
-
Install ffmpeg:
- Ubuntu/Debian:
sudo apt-get install ffmpeg - macOS:
brew install ffmpeg - Windows: Download from ffmpeg.org and add to PATH
- Ubuntu/Debian:
-
Configure Django settings:
# In settings.py MEDIA_URL = '/media/' MEDIA_ROOT = os.path.join(BASE_DIR, 'media') # Optional: AssemblyAI API key ASSEMBLYAI_API_KEY = os.environ.get('ASSEMBLYAI_API_KEY', '')
-
Run migrations:
python manage.py migrate
-
Navigate to the frontend directory:
cd frontend -
Install dependencies:
npm install
-
Build the frontend:
npm run build
-
Start the Django server:
python manage.py runserver
-
Access the application at
http://localhost:8000 -
Choose your input method:
- Record: Click the microphone icon to start recording, and click again to stop
- Upload: Switch to the upload tab, select a file, and click "Transcribe"
-
View the transcription results displayed on the page
-
POST /api/transcriptions/: Submit audio for transcription- Accepts multipart/form-data with an audio file
- Returns transcription text and status
-
GET /api/transcriptions/: List all transcriptions- Returns a list of all transcription records
The system handles several types of errors:
- Unsupported file formats
- Corrupted audio files
- Network issues with AssemblyAI
- Speech recognition failures
All errors are properly reported to the user interface with helpful messages.
- Audio Input: The system accepts either recorded audio or uploaded files
- Format Conversion: pydub converts any format to proper WAV format
- Transcription Engine Selection:
- If AssemblyAI API key is available, it uses cloud transcription
- Otherwise, it falls back to local speech_recognition
- Processing: Audio is processed and converted to text
- Result Display: Transcription text is returned to the UI
-
Enhanced Transcription Quality:
- Implement noise reduction preprocessing
- Add speaker diarization (identify different speakers)
- Support for specialized vocabularies or domains
-
User Experience:
- Real-time transcription streaming during recording
- Progress indicators for longer files
- Interactive transcript editor for corrections
- Save and edit transcript history
-
Performance Optimizations:
- Background processing for large files
- Caching mechanism for previously processed audio
- Chunked processing for very large files
-
Additional Features:
- Multi-language support and language detection
- Timestamp generation for each sentence or paragraph
- Sentiment analysis integration
- Export options (TXT, SRT, DOCX, etc.)
- Audio/video bookmarking based on transcript content
-
Advanced Integrations:
- Multiple transcription API options (Google, Azure, etc.)
- Integration with content management systems
- Automated summarization of transcripts
- Keyword extraction and topic modeling
-
Security Enhancements:
- End-to-end encryption for sensitive audio content
- Enhanced access controls for shared transcriptions
- Compliance features for regulated industries
-
Deployment & Scaling:
- Containerization with Docker
- CI/CD pipeline setup
- Load balancing for high-traffic implementations
- Dedicated worker processes for transcription tasks