Empowering Everyone to Build with Voice
This is technology as it should be: intuitive, accessible, and empowering. Your voice is your code, your ideas are your blueprint, and your imagination is the only limit.
A voice-first AI assistant that helps users design and build web applications through natural conversation. Simply describe your app idea by speaking, and watch as the AI generates a complete, professionally designed web application with real-time visual progress feedback.
- Problem-Focused AI: The AI now acts as an innovative problem-solving partner that challenges your assumptions and pushes for breakthrough solutions
- Provocative Questions: AI asks challenging questions like "What assumption is everyone making that you could prove wrong?" and "What would make this 10x better than existing solutions?"
- Faster Conversations: Reduced from 7-8 exchanges to just 3-4 focused exchanges
- Smart Responsiveness: AI detects readiness signals like "let's build this" and "I'm ready" to move forward quickly
- No Repetition: AI never repeats questions, keeping conversations fresh and efficient
- Real-Time Progress Bar: Elegant oval progress indicator (200px x 20px) positioned next to the VocalFlow branding
- Dynamic Status Words: Rotating status words that change every 1.5 seconds per phase:
- Ideation: "Discovering" โ "Exploring" โ "Analyzing" โ "Investigating"
- Prompt Review: "Designing" โ "Planning" โ "Structuring" โ "Crafting"
- Code Generation: "Building" โ "Creating" โ "Generating" โ "Coding"
- Voice Editing: "Refining" โ "Polishing" โ "Enhancing" โ "Perfecting"
- Accelerated Progress: 10x message multiplier with 30% maximum progress within each phase
- Always Visible: Minimum 10% progress shown, floating below the header for constant visibility
- Intelligent YAML Processing: System processes YAML specifications silently in the background
- Clean User Experience: YAML content is completely hidden from users while still being processed by the system
- Aggressive Filtering: Advanced detection patterns block any YAML content from being displayed or spoken
- Seamless Flow: Users experience smooth transitions without technical interruptions
- Runtime: Node.js 18+ (JavaScript/TypeScript execution environment)
- Frontend Framework: Next.js 14 (React-based full-stack framework)
- UI Library: React 18 (Component-based user interface)
- Language: TypeScript (Type-safe JavaScript)
- Styling: Tailwind CSS (Utility-first CSS framework)
- Icons: Lucide React (Modern icon library)
- Language Model: OpenAI GPT-5 (Primary for conversation and code generation)
- Speech-to-Text: Deepgram Nova-3 (Real-time speech recognition)
- Text-to-Speech: Deepgram Aura-2 (Natural voice synthesis)
- Audio Processing: Web Audio API (Browser-based audio handling)
- Sample Rate: 24kHz (High-quality audio streaming)
- WebSocket Server: ws (Real-time bidirectional communication)
- HTTP Server: Node.js built-in (Static file serving)
- Code Generation: OpenAI API (Dynamic application creation via GPT-5)
- Local Preview: Child process spawning (Development server management)
- Build System: Next.js SWC (Fast TypeScript/JavaScript compiler)
- Linting: ESLint (Code quality and consistency)
- CSS Processing: PostCSS + Autoprefixer (CSS optimization)
- Package Manager: npm (Dependency management)
System Requirements:
- Operating System: macOS, Linux, or Windows 10+
- Node.js: Version 18.0.0 or higher
- npm: Version 8.0.0 or higher (comes with Node.js)
- Memory: Minimum 4GB RAM (8GB recommended)
- Storage: 500MB free space for project files
Browser Requirements:
- Modern Browser: Chrome 88+, Firefox 85+, Safari 14+, Edge 88+
- Microphone Access: Required for voice input
- JavaScript: Must be enabled
- WebSocket Support: Required for real-time communication
Network Requirements:
- Internet Connection: Required for AI API calls
- Firewall: Allow connections to OpenAI and Deepgram APIs
- Ports: 3000, 3001, and dynamic ports 4000+ for generated apps
User speaks โ Deepgram STT โ GPT-5 Processing โ Deepgram TTS โ User hears response
- Audio Capture: Browser captures microphone input at 24kHz
- Speech Recognition: Deepgram converts speech to text in real-time
- AI Processing: GPT-5 challenges assumptions and drives innovation
- Speech Synthesis: Deepgram converts AI responses back to speech
- Accelerated Flow: 3-4 focused exchanges with smart readiness detection
Conversation Context โ GPT-5 Analysis โ Silent YAML Generation โ Seamless Transition
- Context Analysis: AI reviews entire conversation history
- Requirement Extraction: Identifies key features, users, and technical needs
- Silent YAML Creation: Generates structured specification document in background
- Seamless Transition: Moves directly to code generation without user interruption
YAML Specification โ OpenAI GPT-5 Code Generation โ File System Creation โ Local Preview
- Specification Processing: AI analyzes YAML requirements silently
- Architecture Planning: Determines optimal file structure and components
- Code Generation: Creates complete Next.js application with TypeScript
- Visual Progress: Real-time progress bar with rotating status words
- File System Setup: Writes all files to local directory structure
- Development Server: Spawns local preview server for immediate testing
Generated App โ Local Server โ Browser Preview โ User Feedback โ Refinements
- Server Startup: Launches Next.js development server
- Live Preview: Opens generated application in browser
- Real-time Updates: Hot reloading for any changes
- User Testing: Full interaction with generated application
- Voice Refinements: Natural language modifications and improvements
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Browser UI โ โ Voice Agent โ โ AI Services โ
โ (React/Next.js) โโโโโบโ (Node.js/WS) โโโโโบโ (OpenAI/Deepgram)โ
โ Progress Bar โ โ YAML Filter โ โ GPT-5 โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Audio Stream โ โ Code Generator โ โ Generated App โ
โ (WebAudio API) โ โ (OpenAI API) โ โ (Next.js App) โ
โ Status Updates โ โ Progress Events โ โ Live Preview โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
- ๐ฃ๏ธ Voice-First Interface: Speak naturally to describe your app idea
- ๐ง Enhanced AI: GPT-5 conversation and code generation
- ๐ Visual Progress: Real-time progress bar with dynamic status updates
- ๐ค AI-Powered Generation: Uses OpenAI GPT-5 for understanding and code generation (Claude optional)
- ๐จ Beautiful Design: Creates professionally designed applications, not plain templates
- โก Real-Time Preview: See your app come to life instantly with live preview
- ๐ฑ Responsive Design: Generated apps work perfectly on all devices
- ๐ง Full-Stack: Generates complete Next.js applications with TypeScript
- ๐ Seamless Flow: Intelligent YAML processing without user interruption
- ๐ฅ Code Download: Download your generated applications as ZIP files
VocalFlow follows an enhanced 4-phase workflow:
- ๐ก Ideation Phase: Have a focused conversation with problem-solving AI (3-4 exchanges)
- ๐ Silent Processing: AI processes specifications in the background seamlessly
- โก Code Generation: Watch real-time progress as AI generates your complete application
- ๐๏ธ Voice Refinement: Make natural language improvements and refinements
- Frontend: Next.js 14, React 18, TypeScript, Tailwind CSS
- Backend: Node.js, WebSocket (ws)
- AI: OpenAI GPT-5 (Primary), Deepgram (STT & TTS), Anthropic Claude (optional)
- Voice Processing: Real-time audio streaming with 24kHz sampling
- Progress System: Custom React components with WebSocket event streaming
- Node.js 18+
- npm or yarn
- OpenAI API key (GPT-5 access)
- Deepgram API key
- (Optional) Anthropic API key (for Claude fallback)
-
Clone the repository
git clone <your-repo-url> cd VoiceCreation
-
Install dependencies
npm install
-
Set up environment variables Create a
.env
file in the root directory:OPENAI_API_KEY=your_openai_api_key_here OPENAI_MODEL=gpt-5 OPENAI_CODEGEN_MODEL=gpt-5 DEEPGRAM_API_KEY=your_deepgram_api_key_here # Optional, only if using Claude for codegen fallback ANTHROPIC_API_KEY=your_anthropic_api_key_here
-
Start the development server
npm run dev
This will start:
- Voice agent server on
http://localhost:3000
- Frontend interface on
http://localhost:3001
- Voice agent server on
-
Enable Code Download Feature (Optional)
npm install archiver @types/archiver
This adds the ability to download your generated applications as ZIP files for local development or deployment.
- Open your browser and navigate to
http://localhost:3001
- Click "Start Recording" to begin voice interaction
- Describe your app idea naturally
- Signal readiness by saying "Let's build this!"
- Watch real-time progress as the AI generates your complete application
VoiceCreation/
โโโ agents/ # AI agents for different phases
โ โโโ ideation.ts # Enhanced voice conversation agent with GPT-5
โ โโโ codeGen.ts # OpenAI (GPT-5) code generation orchestrator
โโโ pages/ # Next.js frontend pages
โ โโโ _app.tsx # App wrapper with global styles
โ โโโ index.tsx # Main interface with visual progress system
โโโ utils/ # Utility functions
โ โโโ claudeCodegen.ts # Claude code generation (optional)
โ โโโ openaiCodegen.ts # OpenAI GPT-5 code generation (default)
โ โโโ localPreview.ts # Local development server
โโโ test/ # Test files
โ โโโ testCodeGen.ts # Code generation tests
โโโ generated/ # Generated project files (gitignored)
โโโ styles/ # Global styles
โ โโโ globals.css # Tailwind CSS with progress bar styles
โโโ package.json # Project dependencies
Run code generation tests:
npm run test:codegen
Run with custom YAML:
npm run test:codegen:custom
Every generated application includes:
- ๐ Beautiful Landing Page: Professional hero section, features, testimonials
- ๐ฑ Responsive Design: Mobile-first approach with modern UI
- ๐ฏ Conversion Focused: Clear CTAs and user journey
- โก Modern Tech Stack: Next.js 14, TypeScript, Tailwind CSS
- ๐ง Ready to Deploy: Complete with package.json and config files
- ๐ญ Custom Design: Tailored to your app's target audience and purpose
npm run dev
- Start both agent and frontend in development modenpm run agent
- Start only the voice agent servernpm run frontend
- Start only the frontend development servernpm run build
- Build the TypeScript projectnpm run start
- Start the production servernpm run test:codegen
- Test code generation functionality
The system can generate various types of applications:
- ๐ Educational Platforms: Learning management systems, tutoring marketplaces
- ๐ผ Business Tools: CRM systems, project management, analytics dashboards
- ๐ E-commerce: Online stores, marketplaces, booking systems
- ๐ฎ Entertainment: Gaming platforms, social apps, content creators
- ๐ฅ Healthcare: Appointment booking, health tracking, telemedicine
- ๐ฐ Fintech: Payment systems, expense trackers, investment platforms
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Voice not working?
- Check microphone permissions in your browser
- Ensure you're using HTTPS or localhost
- Verify Deepgram API key is set correctly
AI not challenging assumptions?
- Ensure OpenAI API key is valid and has GPT-5 access
- Check that the enhanced system prompts are loaded correctly
- Verify the ideation agent is using the GPT-5 configuration
Progress bar not updating?
- Check WebSocket connection in browser developer tools
- Verify progress events are being sent from the backend
- Ensure the progress multiplier is configured correctly (10x)
Code generation failing?
- Verify OpenAI API key is valid and has sufficient credits
- Check console logs for detailed error messages
- Ensure all dependencies are installed
- Verify YAML processing is working in the background
Preview not loading?
- Check if port 4000+ is available for generated apps
- Look for build errors in the generation logs
- Verify the generated package.json has correct dependencies
By default, VocalFlow uses OpenAI GPT-5 for code generation. If you prefer Anthropic Claude instead, switch as follows:
- Anthropic API key with Claude Sonnet access
Add to your .env
file:
ANTHROPIC_API_KEY=your_anthropic_api_key_here
Change the import and call in agents/codeGen.ts
:
// Change from:
import { runOpenAICodegen } from '../utils/openaiCodegen';
// To:
import { runClaudeCodegen } from '../utils/claudeCodegen';
Then update the function call:
// Change from:
const result = await runOpenAICodegen(yamlPrompt, sessionId, events);
// To:
const result = await runClaudeCodegen(yamlPrompt, sessionId, events);
- GPT-5 offers unified model usage for both voice and codegen
- Claude Sonnet remains a solid alternative depending on preferences and cost
VocalFlow - Built with โค๏ธ using AI and voice technology
Transforming ideas into reality, one voice at a time.