A modern, feature-rich Text-to-Speech web application with multi-format file support, voice customization, and real-time audio controls.
Table of Contents
This project demonstrates full-stack development expertise with a focus on modern web technologies, user experience, and real-time audio processing. It is built as a portfolio piece to showcase:
- π¨ Modern UI/UX Design - Clean, intuitive interface with smooth animations powered by Framer Motion.
- ποΈ Full-Stack Architecture - Robust React/TypeScript frontend paired with a high-performance Python FastAPI backend.
- π΅ Real-time Audio Processing - Custom waveform visualization and seamless playback controls.
- π Multi-Format Support - Intelligent parsing of PDF, DOCX, TXT, and Markdown files.
- π Voice Customization - Diverse voice options with adjustable pitch, rate, and presets.
- β‘ Performance Optimized - Lightning-fast development and production builds using Vite.
[Add screenshots of your application here to showcase the UI]
- Voice Gallery - Browse and select from a wide range of TTS voices.
- Custom Voice Presets - Pre-configured celebrity/character voices with optimized settings.
- Real-time Controls - Adjust speech rate and pitch on the fly.
- Waveform Visualization - Visual feedback during playback.
- Audio Download - Export generated speech as MP3 files.
- Multi-Format Support - PDF, DOCX, TXT, Markdown.
- Smart Text Extraction - Preserves formatting and structure.
- Drag & Drop Upload - Intuitive file handling.
- Direct Text Input - Type or paste text directly.
- Responsive Design - Works seamlessly on desktop and mobile.
- Modern UI Components - Glassmorphism, smooth animations, and vibrant colors.
- Real-time Feedback - Loading states and progress indicators.
- Keyboard Shortcuts - Efficient workflow for power users.
- React 19 - Latest React features with functional components
- TypeScript - Type-safe development
- Vite - Next-generation build tool
- TailwindCSS - Utility-first CSS framework
- Framer Motion - Production-ready animation library
- Lucide React - Beautiful icon system
- FastAPI - High-performance Python web framework
- Edge-TTS - Microsoft Edge's TTS engine integration
- Uvicorn - Lightning-fast ASGI server
- PDF.js - PDF parsing and rendering
- Mammoth.js - DOCX to HTML conversion
- Marked - Markdown parser
- ESLint - Code quality and consistency
- Node.js (v18 or higher)
- Python (v3.8 or higher)
- npm or pnpm
-
Clone the repository
git clone https://github.com/VijayAdithyaBK/text-reader.git cd text-reader -
Install frontend dependencies
npm install
-
Install backend dependencies
cd backend pip install -r requirements.txt
Option 1: Using the startup script (Windows)
./start_server.batOption 2: Manual startup
-
Start the backend server:
cd backend uvicorn main:app --reload -
In a new terminal, start the frontend:
npm run dev
-
Open your browser to
http://localhost:5173
| Method | Endpoint | Description |
|---|---|---|
GET |
/voices |
Fetch available TTS voices |
POST |
/tts |
Generate speech from text |
Request Body for /tts:
{
"text": "Hello, world!",
"voice": "en-US-GuyNeural",
"rate": "+0%",
"pitch": "+0Hz"
}The application can be customized through:
- Voice Presets (
src/data/voicePresets.ts) - Add custom voice configurations - Backend URL (
src/App.tsx) - Configure API endpoint - Tailwind Config - Customize design tokens
text-reader/
βββ src/
β βββ components/ # React components
β β βββ Controls.tsx # Audio control components
β β βββ FileUploader.tsx # File upload handling
β β βββ TextInput.tsx # Text input component
β β βββ VoiceGallery.tsx # Voice selection UI
β β βββ WaveformPlayer.tsx # Audio visualization
β βββ utils/
β β βββ fileParsers.ts # Multi-format file parsers
β βββ data/
β β βββ voicePresets.ts # Voice configuration
β βββ App.tsx # Main application
βββ backend/
β βββ main.py # FastAPI server
β βββ requirements.txt # Python dependencies
βββ package.json
βββ vite.config.ts
- Separation of Concerns - Clean frontend/backend architecture.
- Type Safety - Full TypeScript coverage for maintainability.
- API Design - RESTful API with clear endpoints.
- State Management - React hooks for efficient state handling.
- Performance Optimizations - Lazy loading components, efficient blob handling, and tree-shaking.
- Code Quality - ESLint integration, modular component architecture, and graceful error handling.
- Build Size: Optimized production bundle
- First Contentful Paint: <1s
- Time to Interactive: <2s
- Lighthouse Score: (Add your scores here)
- Multi-language support with i18n
- User authentication and saved preferences
- Cloud storage integration
- Batch processing for multiple files
- Advanced audio effects (reverb, echo, etc.)
- Voice cloning capabilities
- Progressive Web App (PWA) support
- Real-time collaboration features
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is available for portfolio demonstration purposes.
Vijay Adithya B K
- π§ Email: vijayadithyak@gmail.com
- πΌ LinkedIn: linkedin.com/in/vijayadithyabk
- π Portfolio: vijayadithyabk.github.io/data-nexus/
- π GitHub: @VijayAdithyaBK
- Microsoft Edge TTS for voice synthesis
- The React and FastAPI communities
β If you find this project interesting, please consider giving it a star! β
β‘ Crafted by Vijay Adithya B K