Skip to content

deepgram-devs/GPT5-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

13 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

VocalFlow ๐ŸŽ™๏ธ

Empowering Everyone to Build with Voice

This is technology as it should be: intuitive, accessible, and empowering. Your voice is your code, your ideas are your blueprint, and your imagination is the only limit.


A voice-first AI assistant that helps users design and build web applications through natural conversation. Simply describe your app idea by speaking, and watch as the AI generates a complete, professionally designed web application with real-time visual progress feedback.

๐ŸŒŸ Latest Enhancements

๐Ÿง  Enhanced AI Interaction

  • Problem-Focused AI: The AI now acts as an innovative problem-solving partner that challenges your assumptions and pushes for breakthrough solutions
  • Provocative Questions: AI asks challenging questions like "What assumption is everyone making that you could prove wrong?" and "What would make this 10x better than existing solutions?"
  • Faster Conversations: Reduced from 7-8 exchanges to just 3-4 focused exchanges
  • Smart Responsiveness: AI detects readiness signals like "let's build this" and "I'm ready" to move forward quickly
  • No Repetition: AI never repeats questions, keeping conversations fresh and efficient

๐Ÿ“Š Visual Progress System

  • Real-Time Progress Bar: Elegant oval progress indicator (200px x 20px) positioned next to the VocalFlow branding
  • Dynamic Status Words: Rotating status words that change every 1.5 seconds per phase:
    • Ideation: "Discovering" โ†’ "Exploring" โ†’ "Analyzing" โ†’ "Investigating"
    • Prompt Review: "Designing" โ†’ "Planning" โ†’ "Structuring" โ†’ "Crafting"
    • Code Generation: "Building" โ†’ "Creating" โ†’ "Generating" โ†’ "Coding"
    • Voice Editing: "Refining" โ†’ "Polishing" โ†’ "Enhancing" โ†’ "Perfecting"
  • Accelerated Progress: 10x message multiplier with 30% maximum progress within each phase
  • Always Visible: Minimum 10% progress shown, floating below the header for constant visibility

๐Ÿ”’ Advanced YAML Handling

  • Intelligent YAML Processing: System processes YAML specifications silently in the background
  • Clean User Experience: YAML content is completely hidden from users while still being processed by the system
  • Aggressive Filtering: Advanced detection patterns block any YAML content from being displayed or spoken
  • Seamless Flow: Users experience smooth transitions without technical interruptions

๐Ÿ› ๏ธ Complete Technical Stack

Core Technologies

  • Runtime: Node.js 18+ (JavaScript/TypeScript execution environment)
  • Frontend Framework: Next.js 14 (React-based full-stack framework)
  • UI Library: React 18 (Component-based user interface)
  • Language: TypeScript (Type-safe JavaScript)
  • Styling: Tailwind CSS (Utility-first CSS framework)
  • Icons: Lucide React (Modern icon library)

AI & Voice Processing

  • Language Model: OpenAI GPT-5 (Primary for conversation and code generation)
  • Speech-to-Text: Deepgram Nova-3 (Real-time speech recognition)
  • Text-to-Speech: Deepgram Aura-2 (Natural voice synthesis)
  • Audio Processing: Web Audio API (Browser-based audio handling)
  • Sample Rate: 24kHz (High-quality audio streaming)

Backend & Communication

  • WebSocket Server: ws (Real-time bidirectional communication)
  • HTTP Server: Node.js built-in (Static file serving)
  • Code Generation: OpenAI API (Dynamic application creation via GPT-5)
  • Local Preview: Child process spawning (Development server management)

Development Tools

  • Build System: Next.js SWC (Fast TypeScript/JavaScript compiler)
  • Linting: ESLint (Code quality and consistency)
  • CSS Processing: PostCSS + Autoprefixer (CSS optimization)
  • Package Manager: npm (Dependency management)

Critical OS Packages & Dependencies

System Requirements:

  • Operating System: macOS, Linux, or Windows 10+
  • Node.js: Version 18.0.0 or higher
  • npm: Version 8.0.0 or higher (comes with Node.js)
  • Memory: Minimum 4GB RAM (8GB recommended)
  • Storage: 500MB free space for project files

Browser Requirements:

  • Modern Browser: Chrome 88+, Firefox 85+, Safari 14+, Edge 88+
  • Microphone Access: Required for voice input
  • JavaScript: Must be enabled
  • WebSocket Support: Required for real-time communication

Network Requirements:

  • Internet Connection: Required for AI API calls
  • Firewall: Allow connections to OpenAI and Deepgram APIs
  • Ports: 3000, 3001, and dynamic ports 4000+ for generated apps

๐Ÿ”„ Enhanced Workflow

Phase 1: Voice Ideation (Human โ†” AI Conversation)

User speaks โ†’ Deepgram STT โ†’ GPT-5 Processing โ†’ Deepgram TTS โ†’ User hears response
  1. Audio Capture: Browser captures microphone input at 24kHz
  2. Speech Recognition: Deepgram converts speech to text in real-time
  3. AI Processing: GPT-5 challenges assumptions and drives innovation
  4. Speech Synthesis: Deepgram converts AI responses back to speech
  5. Accelerated Flow: 3-4 focused exchanges with smart readiness detection

Phase 2: Specification Generation (AI โ†’ Silent YAML)

Conversation Context โ†’ GPT-5 Analysis โ†’ Silent YAML Generation โ†’ Seamless Transition
  1. Context Analysis: AI reviews entire conversation history
  2. Requirement Extraction: Identifies key features, users, and technical needs
  3. Silent YAML Creation: Generates structured specification document in background
  4. Seamless Transition: Moves directly to code generation without user interruption

Phase 3: Code Generation (AI โ†’ Full Application)

YAML Specification โ†’ OpenAI GPT-5 Code Generation โ†’ File System Creation โ†’ Local Preview
  1. Specification Processing: AI analyzes YAML requirements silently
  2. Architecture Planning: Determines optimal file structure and components
  3. Code Generation: Creates complete Next.js application with TypeScript
  4. Visual Progress: Real-time progress bar with rotating status words
  5. File System Setup: Writes all files to local directory structure
  6. Development Server: Spawns local preview server for immediate testing

Phase 4: Preview & Iteration (Application โ†’ User)

Generated App โ†’ Local Server โ†’ Browser Preview โ†’ User Feedback โ†’ Refinements
  1. Server Startup: Launches Next.js development server
  2. Live Preview: Opens generated application in browser
  3. Real-time Updates: Hot reloading for any changes
  4. User Testing: Full interaction with generated application
  5. Voice Refinements: Natural language modifications and improvements

๐Ÿ“Š Enhanced System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Browser UI    โ”‚    โ”‚  Voice Agent    โ”‚    โ”‚   AI Services   โ”‚
โ”‚ (React/Next.js) โ”‚โ—„โ”€โ”€โ–บโ”‚  (Node.js/WS)   โ”‚โ—„โ”€โ”€โ–บโ”‚ (OpenAI/Deepgram)โ”‚
โ”‚  Progress Bar   โ”‚    โ”‚  YAML Filter    โ”‚    โ”‚  GPT-5          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                       โ”‚                       โ”‚
         โ–ผ                       โ–ผ                       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Audio Stream   โ”‚    โ”‚  Code Generator โ”‚    โ”‚  Generated App  โ”‚
โ”‚ (WebAudio API)  โ”‚    โ”‚   (OpenAI API)  โ”‚    โ”‚  (Next.js App)  โ”‚
โ”‚ Status Updates  โ”‚    โ”‚ Progress Events โ”‚    โ”‚  Live Preview   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โœจ Features

  • ๐Ÿ—ฃ๏ธ Voice-First Interface: Speak naturally to describe your app idea
  • ๐Ÿง  Enhanced AI: GPT-5 conversation and code generation
  • ๐Ÿ“Š Visual Progress: Real-time progress bar with dynamic status updates
  • ๐Ÿค– AI-Powered Generation: Uses OpenAI GPT-5 for understanding and code generation (Claude optional)
  • ๐ŸŽจ Beautiful Design: Creates professionally designed applications, not plain templates
  • โšก Real-Time Preview: See your app come to life instantly with live preview
  • ๐Ÿ“ฑ Responsive Design: Generated apps work perfectly on all devices
  • ๐Ÿ”ง Full-Stack: Generates complete Next.js applications with TypeScript
  • ๐Ÿ”„ Seamless Flow: Intelligent YAML processing without user interruption
  • ๐Ÿ“ฅ Code Download: Download your generated applications as ZIP files

๐Ÿš€ How It Works

VocalFlow follows an enhanced 4-phase workflow:

  1. ๐Ÿ’ก Ideation Phase: Have a focused conversation with problem-solving AI (3-4 exchanges)
  2. ๐Ÿ“ Silent Processing: AI processes specifications in the background seamlessly
  3. โšก Code Generation: Watch real-time progress as AI generates your complete application
  4. ๐ŸŽ™๏ธ Voice Refinement: Make natural language improvements and refinements

๐Ÿ› ๏ธ Tech Stack

  • Frontend: Next.js 14, React 18, TypeScript, Tailwind CSS
  • Backend: Node.js, WebSocket (ws)
  • AI: OpenAI GPT-5 (Primary), Deepgram (STT & TTS), Anthropic Claude (optional)
  • Voice Processing: Real-time audio streaming with 24kHz sampling
  • Progress System: Custom React components with WebSocket event streaming

๐Ÿ“‹ Prerequisites

  • Node.js 18+
  • npm or yarn
  • OpenAI API key (GPT-5 access)
  • Deepgram API key
  • (Optional) Anthropic API key (for Claude fallback)

๐Ÿ”ง Setup

  1. Clone the repository

    git clone <your-repo-url>
    cd VoiceCreation
  2. Install dependencies

    npm install
  3. Set up environment variables Create a .env file in the root directory:

    OPENAI_API_KEY=your_openai_api_key_here
    OPENAI_MODEL=gpt-5
    OPENAI_CODEGEN_MODEL=gpt-5
    DEEPGRAM_API_KEY=your_deepgram_api_key_here
    # Optional, only if using Claude for codegen fallback
    ANTHROPIC_API_KEY=your_anthropic_api_key_here
  4. Start the development server

    npm run dev

    This will start:

    • Voice agent server on http://localhost:3000
    • Frontend interface on http://localhost:3001
  5. Enable Code Download Feature (Optional)

    npm install archiver @types/archiver

    This adds the ability to download your generated applications as ZIP files for local development or deployment.

๐ŸŽฏ Usage

  1. Open your browser and navigate to http://localhost:3001
  2. Click "Start Recording" to begin voice interaction
  3. Describe your app idea naturally
  4. Signal readiness by saying "Let's build this!"
  5. Watch real-time progress as the AI generates your complete application

๐Ÿ“ Project Structure

VoiceCreation/
โ”œโ”€โ”€ agents/                 # AI agents for different phases
โ”‚   โ”œโ”€โ”€ ideation.ts        # Enhanced voice conversation agent with GPT-5
โ”‚   โ””โ”€โ”€ codeGen.ts         # OpenAI (GPT-5) code generation orchestrator
โ”œโ”€โ”€ pages/                 # Next.js frontend pages
โ”‚   โ”œโ”€โ”€ _app.tsx          # App wrapper with global styles
โ”‚   โ””โ”€โ”€ index.tsx         # Main interface with visual progress system
โ”œโ”€โ”€ utils/                 # Utility functions
โ”‚   โ”œโ”€โ”€ claudeCodegen.ts  # Claude code generation (optional)
โ”‚   โ”œโ”€โ”€ openaiCodegen.ts  # OpenAI GPT-5 code generation (default)
โ”‚   โ””โ”€โ”€ localPreview.ts   # Local development server
โ”œโ”€โ”€ test/                  # Test files
โ”‚   โ””โ”€โ”€ testCodeGen.ts    # Code generation tests
โ”œโ”€โ”€ generated/             # Generated project files (gitignored)
โ”œโ”€โ”€ styles/               # Global styles
โ”‚   โ””โ”€โ”€ globals.css       # Tailwind CSS with progress bar styles
โ””โ”€โ”€ package.json          # Project dependencies

๐Ÿงช Testing

Run code generation tests:

npm run test:codegen

Run with custom YAML:

npm run test:codegen:custom

๐ŸŽจ Generated App Features

Every generated application includes:

  • ๐Ÿ  Beautiful Landing Page: Professional hero section, features, testimonials
  • ๐Ÿ“ฑ Responsive Design: Mobile-first approach with modern UI
  • ๐ŸŽฏ Conversion Focused: Clear CTAs and user journey
  • โšก Modern Tech Stack: Next.js 14, TypeScript, Tailwind CSS
  • ๐Ÿ”ง Ready to Deploy: Complete with package.json and config files
  • ๐ŸŽญ Custom Design: Tailored to your app's target audience and purpose

๐Ÿ”ง Available Scripts

  • npm run dev - Start both agent and frontend in development mode
  • npm run agent - Start only the voice agent server
  • npm run frontend - Start only the frontend development server
  • npm run build - Build the TypeScript project
  • npm run start - Start the production server
  • npm run test:codegen - Test code generation functionality

๐ŸŒŸ Example Generated Apps

The system can generate various types of applications:

  • ๐Ÿ“š Educational Platforms: Learning management systems, tutoring marketplaces
  • ๐Ÿ’ผ Business Tools: CRM systems, project management, analytics dashboards
  • ๐Ÿ›’ E-commerce: Online stores, marketplaces, booking systems
  • ๐ŸŽฎ Entertainment: Gaming platforms, social apps, content creators
  • ๐Ÿฅ Healthcare: Appointment booking, health tracking, telemedicine
  • ๐Ÿ’ฐ Fintech: Payment systems, expense trackers, investment platforms

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ†˜ Troubleshooting

Voice not working?

  • Check microphone permissions in your browser
  • Ensure you're using HTTPS or localhost
  • Verify Deepgram API key is set correctly

AI not challenging assumptions?

  • Ensure OpenAI API key is valid and has GPT-5 access
  • Check that the enhanced system prompts are loaded correctly
  • Verify the ideation agent is using the GPT-5 configuration

Progress bar not updating?

  • Check WebSocket connection in browser developer tools
  • Verify progress events are being sent from the backend
  • Ensure the progress multiplier is configured correctly (10x)

Code generation failing?

  • Verify OpenAI API key is valid and has sufficient credits
  • Check console logs for detailed error messages
  • Ensure all dependencies are installed
  • Verify YAML processing is working in the background

Preview not loading?

  • Check if port 4000+ is available for generated apps
  • Look for build errors in the generation logs
  • Verify the generated package.json has correct dependencies

๐Ÿ”„ Alternative Setup: Using Claude Codegen

By default, VocalFlow uses OpenAI GPT-5 for code generation. If you prefer Anthropic Claude instead, switch as follows:

Prerequisites for Claude Setup

  • Anthropic API key with Claude Sonnet access

Environment Variables

Add to your .env file:

ANTHROPIC_API_KEY=your_anthropic_api_key_here

Code Generation Changes

Change the import and call in agents/codeGen.ts:

// Change from:
import { runOpenAICodegen } from '../utils/openaiCodegen';

// To:
import { runClaudeCodegen } from '../utils/claudeCodegen';

Then update the function call:

// Change from:
const result = await runOpenAICodegen(yamlPrompt, sessionId, events);

// To:
const result = await runClaudeCodegen(yamlPrompt, sessionId, events);

Performance Notes

  • GPT-5 offers unified model usage for both voice and codegen
  • Claude Sonnet remains a solid alternative depending on preferences and cost

VocalFlow - Built with โค๏ธ using AI and voice technology

Transforming ideas into reality, one voice at a time.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published