Skip to content

A modern, feature-rich Text-to-Speech web application built with React, TypeScript, and FastAPI. Supports multi-format file parsing (PDF, DOCX), real-time voice customization, and audio export.

Notifications You must be signed in to change notification settings

VijayAdithyaBK/text-reader

Repository files navigation

πŸŽ™οΈ Text Reader - Advanced Text-to-Speech Platform

A modern, feature-rich Text-to-Speech web application with multi-format file support, voice customization, and real-time audio controls.

TypeScript React Vite FastAPI TailwindCSS

Report Bug β€’ Request Feature


Table of Contents
  1. About The Project
  2. Tech Stack
  3. Getting Started
  4. Usage
  5. Architecture
  6. Roadmap
  7. Contributing
  8. License
  9. Contact
  10. Acknowledgments

🌟 About The Project

This project demonstrates full-stack development expertise with a focus on modern web technologies, user experience, and real-time audio processing. It is built as a portfolio piece to showcase:

  • 🎨 Modern UI/UX Design - Clean, intuitive interface with smooth animations powered by Framer Motion.
  • πŸ—οΈ Full-Stack Architecture - Robust React/TypeScript frontend paired with a high-performance Python FastAPI backend.
  • 🎡 Real-time Audio Processing - Custom waveform visualization and seamless playback controls.
  • πŸ“„ Multi-Format Support - Intelligent parsing of PDF, DOCX, TXT, and Markdown files.
  • 🎭 Voice Customization - Diverse voice options with adjustable pitch, rate, and presets.
  • ⚑ Performance Optimized - Lightning-fast development and production builds using Vite.

[Add screenshots of your application here to showcase the UI]

✨ Key Features

🎀 Voice & Audio

  • Voice Gallery - Browse and select from a wide range of TTS voices.
  • Custom Voice Presets - Pre-configured celebrity/character voices with optimized settings.
  • Real-time Controls - Adjust speech rate and pitch on the fly.
  • Waveform Visualization - Visual feedback during playback.
  • Audio Download - Export generated speech as MP3 files.

πŸ“ File Processing

  • Multi-Format Support - PDF, DOCX, TXT, Markdown.
  • Smart Text Extraction - Preserves formatting and structure.
  • Drag & Drop Upload - Intuitive file handling.
  • Direct Text Input - Type or paste text directly.

🎨 User Experience

  • Responsive Design - Works seamlessly on desktop and mobile.
  • Modern UI Components - Glassmorphism, smooth animations, and vibrant colors.
  • Real-time Feedback - Loading states and progress indicators.
  • Keyboard Shortcuts - Efficient workflow for power users.

πŸ› οΈ Tech Stack

Frontend

  • React 19 - Latest React features with functional components
  • TypeScript - Type-safe development
  • Vite - Next-generation build tool
  • TailwindCSS - Utility-first CSS framework
  • Framer Motion - Production-ready animation library
  • Lucide React - Beautiful icon system

Backend

  • FastAPI - High-performance Python web framework
  • Edge-TTS - Microsoft Edge's TTS engine integration
  • Uvicorn - Lightning-fast ASGI server

Libraries & Tools

  • PDF.js - PDF parsing and rendering
  • Mammoth.js - DOCX to HTML conversion
  • Marked - Markdown parser
  • ESLint - Code quality and consistency

πŸš€ Getting Started

Prerequisites

  • Node.js (v18 or higher)
  • Python (v3.8 or higher)
  • npm or pnpm

Installation

  1. Clone the repository

    git clone https://github.com/VijayAdithyaBK/text-reader.git
    cd text-reader
  2. Install frontend dependencies

    npm install
  3. Install backend dependencies

    cd backend
    pip install -r requirements.txt

Running the Application

Option 1: Using the startup script (Windows)

./start_server.bat

Option 2: Manual startup

  1. Start the backend server:

    cd backend
    uvicorn main:app --reload
  2. In a new terminal, start the frontend:

    npm run dev
  3. Open your browser to http://localhost:5173


⚑ Usage

🚦 API Endpoints

Method Endpoint Description
GET /voices Fetch available TTS voices
POST /tts Generate speech from text

Request Body for /tts:

{
  "text": "Hello, world!",
  "voice": "en-US-GuyNeural",
  "rate": "+0%",
  "pitch": "+0Hz"
}

πŸ”§ Configuration

The application can be customized through:

  • Voice Presets (src/data/voicePresets.ts) - Add custom voice configurations
  • Backend URL (src/App.tsx) - Configure API endpoint
  • Tailwind Config - Customize design tokens

πŸ—οΈ Architecture

πŸ“‚ Project Structure

text-reader/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ components/          # React components
β”‚   β”‚   β”œβ”€β”€ Controls.tsx     # Audio control components
β”‚   β”‚   β”œβ”€β”€ FileUploader.tsx # File upload handling
β”‚   β”‚   β”œβ”€β”€ TextInput.tsx    # Text input component
β”‚   β”‚   β”œβ”€β”€ VoiceGallery.tsx # Voice selection UI
β”‚   β”‚   └── WaveformPlayer.tsx # Audio visualization
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   └── fileParsers.ts   # Multi-format file parsers
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   └── voicePresets.ts  # Voice configuration
β”‚   └── App.tsx              # Main application
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py              # FastAPI server
β”‚   └── requirements.txt     # Python dependencies
β”œβ”€β”€ package.json
└── vite.config.ts

🎯 Technical Achievements

  • Separation of Concerns - Clean frontend/backend architecture.
  • Type Safety - Full TypeScript coverage for maintainability.
  • API Design - RESTful API with clear endpoints.
  • State Management - React hooks for efficient state handling.
  • Performance Optimizations - Lazy loading components, efficient blob handling, and tree-shaking.
  • Code Quality - ESLint integration, modular component architecture, and graceful error handling.

πŸ“Š Performance Metrics

  • Build Size: Optimized production bundle
  • First Contentful Paint: <1s
  • Time to Interactive: <2s
  • Lighthouse Score: (Add your scores here)

πŸ”œ Roadmap

  • Multi-language support with i18n
  • User authentication and saved preferences
  • Cloud storage integration
  • Batch processing for multiple files
  • Advanced audio effects (reverb, echo, etc.)
  • Voice cloning capabilities
  • Progressive Web App (PWA) support
  • Real-time collaboration features

🀝 Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

οΏ½ License

This project is available for portfolio demonstration purposes.


πŸ‘¨β€πŸ’» Contact

Vijay Adithya B K


πŸ™ Acknowledgments

  • Microsoft Edge TTS for voice synthesis
  • The React and FastAPI communities

⭐ If you find this project interesting, please consider giving it a star! ⭐

⚑ Crafted by Vijay Adithya B K

About

A modern, feature-rich Text-to-Speech web application built with React, TypeScript, and FastAPI. Supports multi-format file parsing (PDF, DOCX), real-time voice customization, and audio export.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published