A production-ready AI-powered developer assistant built entirely on Cloudflare's edge computing platform. DevAssist combines state-of-the-art LLM capabilities with semantic search to help developers build, understand, and deploy Cloudflare Workers applications through natural language interaction.
Built for Cloudflare AI Use Cases - A comprehensive demonstration of Cloudflare's AI infrastructure, showcasing Workers AI, Agents SDK, Vectorize, Durable Objects, and Pages working together in a cohesive, production-grade application.
-
Intelligent Code Generation: Transform natural language descriptions into complete, production-ready Cloudflare Workers code. The system generates multi-file project structures with proper TypeScript types, error handling, and Cloudflare best practices. Code blocks are intelligently parsed with filename extraction and context-aware ordering.
-
Advanced RAG Pipeline: Implements a sophisticated Retrieval Augmented Generation system using Vectorize for semantic search over Cloudflare's documentation. The pipeline uses BGE (BAAI General Embedding) model to generate 768-dimensional embeddings, enabling contextually relevant documentation retrieval that enhances LLM responses with accurate, up-to-date information.
-
Stateful AI Agent Architecture: Built on Cloudflare's Agents SDK, the assistant maintains persistent state across sessions using SQLite. Each conversation preserves context, project state, and generated code history, enabling multi-turn interactions that build upon previous exchanges.
-
Real-time Communication: WebSocket-based bidirectional communication with automatic HTTP fallback. Features include streaming responses, dynamic progress indicators, and typing animations that provide immediate user feedback during AI processing.
-
Production-Grade Frontend: Modern, responsive UI built with React and Tailwind CSS. Features VS Code-style syntax highlighting, smooth animations, and an intuitive chat interface that scales seamlessly across devices.
-
Model Management: Intelligent fallback system using Llama 3.3 70B (fp8-fast) as primary model with automatic fallback to Llama 3.1 8B for broader availability. Dynamic token allocation based on query complexity (1024-1536 for chat, 3072 for code generation).
-
Performance Optimizations: Parallel execution of Vectorize queries and database operations, truncated context windows, and optimized embedding queries. Response times typically 1-3 seconds for chat, 2-5 seconds for code generation.
-
Edge-Native Architecture: Fully serverless deployment across Cloudflare's global network, ensuring low-latency responses from 300+ data centers worldwide.
DevAssist is architected as a distributed system leveraging Cloudflare's edge computing infrastructure. The application demonstrates enterprise-grade patterns including stateful agents, semantic search, and real-time communication.
Cloudflare Worker (Entry Point)
- Routes requests to the Durable Object agent
- Handles CORS and authentication
- Manages Vectorize population endpoint
- Provides health check and monitoring endpoints
Durable Object (DeveloperAssistantAgent)
- Extends Cloudflare's Agents SDK for stateful AI agent capabilities
- Manages WebSocket connections for real-time bidirectional communication
- Implements SQLite database for persistent conversation history and project state
- Handles chat message processing, code generation, and documentation search
- Maintains isolated state per agent instance with automatic scaling
Workers AI Integration
- LLM Models: Primary Llama 3.3 70B (fp8-fast) with intelligent fallback to Llama 3.1 8B
- Embedding Model: BGE (BAAI General Embedding) v1.5 for 768-dimensional vector generation
- Serverless GPU inference at the edge with automatic model selection and error handling
Vectorize (Vector Database)
- Stores 768-dimensional embeddings of Cloudflare documentation
- Cosine similarity search for semantic retrieval
- Optimized queries with topK=3 for performance
- Metadata storage for title, content, and URL references
Cloudflare Pages (Frontend)
- Edge-hosted static site with Pages Functions for API proxying
- React-based UI with real-time WebSocket communication
- Automatic global CDN distribution
SQLite Database (via Agents SDK)
conversation_messages: Persistent chat history with conversation groupingproject_state: Project context and generated code history- Indexed queries for fast conversation retrieval
- Automatic schema initialization and migrations
- User Request → Frontend sends message via WebSocket or HTTP
- Worker Routing → Main worker routes to Durable Object instance
- Parallel Processing → Agent simultaneously:
- Generates query embedding using BGE model
- Queries Vectorize for relevant documentation
- Retrieves conversation history from SQLite
- RAG Context Assembly → Documentation chunks are formatted and truncated for optimal token usage
- LLM Inference → System prompt + context + history sent to Llama model
- Response Processing → Generated code is parsed, ordered, and formatted
- State Persistence → Conversation and project state saved to SQLite
- Real-time Delivery → Response streamed back via WebSocket with progress updates
- Node.js 20+ and npm (or nvm for automatic version management)
- Cloudflare account (free accounts work fine)
- Workers AI enabled in your Cloudflare dashboard
- Vectorize enabled (may require account verification)
Note: This project uses a virtual environment approach (similar to Python's venv):
- Node.js version is pinned in
.nvmrc(automatically managed) - Dependencies are installed locally in
node_modules/(isolated from global packages) - The setup script automatically handles Node.js version switching using nvm
The easiest way to get started is using the automated setup script:
./start.shFor Windows users: If you don't have bash, install one of these or perform manual setup below:
- Git Bash (recommended): Download from https://git-scm.com/downloads
- WSL (Windows Subsystem for Linux): Follow Microsoft's WSL installation guide
Then run ./start.sh in Git Bash or WSL.
This script will:
- Check prerequisites (Node.js, npm)
- Install dependencies locally
- Authenticate with Cloudflare (opens browser)
- Auto-detect your account ID (uses environment variable for security)
- Create Vectorize index if needed
- Deploy the Worker
- Populate Vectorize with documentation
- Start the local frontend development server
The frontend will be available at http://localhost:8788 (or the next available port if 8788 is in use) and will connect to your deployed Worker. The actual port will be shown in the terminal output.
Important Notes:
- Account ID is set via
CLOUDFLARE_ACCOUNT_IDenvironment variable (never committed to git) - If deployment fails with "workers.dev subdomain" error, visit https://dash.cloudflare.com → Workers & Pages → open Workers menu to create your subdomain
If you prefer to set up manually:
git clone https://github.com/munish-shah/cf_ai_.git
cd cf_ai
npm installnpx wrangler loginThis will open your browser for authentication. Free Cloudflare accounts work fine for this project.
Important: Never commit your account ID to git! The setup script will auto-detect it, or you can set it in a .env file.
Option 1: Use .env file (Recommended for manual setup)
Create a .env file in the project root:
# Copy the example file
cp .env.example .env
# Edit .env and add your account ID
# CLOUDFLARE_ACCOUNT_ID=your-account-id-here# Copy the example file
copy .env.example .env
# Edit .env and add your account ID
# CLOUDFLARE_ACCOUNT_ID=your-account-id-hereGet your account ID:
npx wrangler whoamiThe .env file is already in .gitignore, so it will never be committed to git.
Option 2: Environment variable (Session only) If you prefer to set it as an environment variable for just this session:
# Linux/macOS/Git Bash
export CLOUDFLARE_ACCOUNT_ID='your-account-id-here'
# Windows PowerShell
$env:CLOUDFLARE_ACCOUNT_ID='your-account-id-here'Note:
- The
.envfile persists across sessions (recommended) - Environment variables are only active for your current terminal session
- Wrangler can also auto-detect your account ID from your authenticated session, so this step is optional
- The account ID is never written to
wrangler.tomlto prevent accidentally committing it to git
Create a Vectorize index for storing documentation embeddings:
npx wrangler vectorize create cloudflare-docs --dimensions=768 --metric=cosineNote: Vectorize is in beta and may require enabling in your Cloudflare dashboard or account verification.
Deploy the backend Worker:
npm run deployImportant: If you get an error about "workers.dev subdomain" (error code 10063), you need to create one first:
The setup script will automatically detect your account ID and provide a direct link. Or manually:
- Get your account ID:
npx wrangler whoami(look for the 32-character hex string) - Visit:
https://dash.cloudflare.com/YOUR_ACCOUNT_ID/workers-and-pages- Replace
YOUR_ACCOUNT_IDwith your actual account ID
- Replace
- Open the Workers menu for the first time (this creates your subdomain automatically)
- Then run
npm run deployagain
The deployment output will show your Worker URL (e.g., https://cf-ai-developer-assistant.your-subdomain.workers.dev).
After deployment, populate the index with Cloudflare documentation:
Linux/macOS/Git Bash:
curl -X POST "https://your-worker.your-subdomain.workers.dev/populate"Windows PowerShell:
Invoke-WebRequest -Uri "https://your-worker.your-subdomain.workers.dev/populate" -Method POSTWindows CMD:
curl -X POST "https://your-worker.your-subdomain.workers.dev/populate"Replace your-worker.your-subdomain with your actual Worker URL from step 5.
Start the local development server:
npm run pages:devThe frontend will be available at http://localhost:8788 (or the next available port if 8788 is in use). The actual port will be shown in the terminal output.
The chat interface provides intelligent documentation search and Q&A capabilities:
- Open the deployed Pages URL or local development server (default:
http://localhost:8788, check terminal for actual port) - Ask questions about any Cloudflare service:
- "How do I use Durable Objects for WebSocket coordination?"
- "What's the best way to implement RAG with Vectorize?"
- "How do I configure D1 database bindings in wrangler.toml?"
- The assistant performs semantic search over Cloudflare documentation using Vectorize, retrieves relevant context, and generates accurate, context-aware responses
- Conversation history is automatically maintained across messages, enabling follow-up questions and multi-turn discussions
Technical Details: Each query triggers a RAG pipeline that generates embeddings, searches Vectorize for top-3 relevant documentation chunks, formats context, and sends to the LLM with conversation history. Responses are streamed in real-time via WebSocket.
Transform natural language into production-ready Cloudflare Workers code:
-
Click the code generation button or describe what you want to build
-
Provide detailed descriptions like:
- "Create a Workers API that stores user data in D1 with proper error handling"
- "Generate a RAG application with Vectorize and Workers AI for semantic search"
- "Build a real-time chat app using Durable Objects with WebSocket support"
-
The assistant generates complete, production-ready code including:
- Multiple TypeScript files with proper structure and filenames
- Complete
wrangler.tomlconfiguration with bindings - Type-safe implementations with proper error handling
- Cloudflare best practices and edge computing optimizations
- Code blocks displayed with VS Code-style syntax highlighting
-
Review generated code directly in the chat interface - code blocks are intelligently ordered with filenames, preserving the natural flow of explanations and code
Technical Details: Code generation uses an enhanced system prompt with documentation context and project state. The LLM generates markdown with code blocks, which are parsed to extract filenames, preserve order, and format for display. Generated code is stored in project state for future context.
.
├── src/
│ ├── index.ts # Main Worker entry point, routing, CORS
│ ├── agent.ts # DeveloperAssistantAgent Durable Object
│ ├── db-init.ts # SQLite database schema initialization
│ └── populate.ts # Vectorize population endpoint
├── frontend/
│ ├── index.html # Frontend HTML with React components
│ └── _functions/ # Pages Functions for proxying
│ └── [[path]].ts # Proxy function for Worker requests
├── scripts/
│ └── populate-vectorize.ts # Documentation data for Vectorize
├── start.sh # Automated setup and deployment script
├── wrangler.toml # Wrangler configuration
├── package.json
├── tsconfig.json
└── README.md
GET /health- Health checkPOST /populate- Populate Vectorize index with documentationPOST /agent/chat- Send chat message (HTTP fallback)POST /agent/generate- Generate code (HTTP fallback)POST /agent/search- Search documentationWebSocket /agent- Real-time chat and code generation
GET /- Frontend application- All
/agent/*routes are proxied to the Worker via Pages Functions
The wrangler.toml file configures:
- Worker name and entry point
- AI binding for Workers AI
- Vectorize index binding
- Durable Object binding and migrations
- Environment variables
The application uses these Cloudflare bindings:
AI- Workers AI for LLM (Llama 3.3/3.1) and embeddings (BGE)VECTORIZE_INDEX- Vectorize index for documentation searchDEVELOPER_AGENT- Durable Object for the agent instance
The agent uses SQLite (via Agents SDK) for persistent state management:
conversation_messages
id(INTEGER PRIMARY KEY) - Auto-incrementing message IDconversation_id(TEXT) - Groups messages into conversationsrole(TEXT) - Message role: 'user' or 'assistant'content(TEXT) - Full message contentcreated_at(INTEGER) - Unix timestamp- Index on
conversation_idfor O(log n) conversation retrieval
project_state
id(INTEGER PRIMARY KEY) - Single row for current project statestate(TEXT) - JSON-encoded project state including:- Generated files with paths and content
- Last generation timestamp
- Project metadata
updated_at(INTEGER) - Unix timestamp of last update
The schema is automatically initialized on first Durable Object instantiation via initializeDatabase().
The application implements several sophisticated optimizations to minimize latency and maximize throughput:
Query Processing
- Parallel execution of Vectorize semantic search and SQLite conversation history retrieval using
Promise.all() - Dynamic token allocation: 1024 tokens for simple queries, 1536 for complex queries, 3072 for code generation
- Context window truncation: Documentation context limited to 1000 chars, project state to 800 chars
- Conversation history limited to last 6 messages (3 exchanges) to reduce prompt size
Vectorize Optimization
- Reduced topK from 5 to 3 for faster queries
- Content truncation to ~500 characters per result
- Efficient embedding generation using BGE model's optimized inference
Database Operations
- Parallel saves for user and assistant messages
- Indexed queries on
conversation_idfor O(log n) lookups - Batch operations where possible
Model Selection
- Primary model (Llama 3.3 fp8-fast) for best quality
- Automatic fallback to Llama 3.1 8B for availability
- Final fallback to prompt-based format if message API fails
- Model selection logged and included in responses for transparency
Response Streaming
- WebSocket-based streaming for immediate user feedback
- Progress messages during processing ("Searching documentation...", "Generating code...")
- Typing animations in frontend for perceived performance
Ensure you've created the index and it matches the name in wrangler.toml:
npx wrangler vectorize listIf the index doesn't exist, create it:
npx wrangler vectorize create cloudflare-docs --dimensions=768 --metric=cosineEnsure Workers AI is enabled in your Cloudflare dashboard:
- Go to https://dash.cloudflare.com
- Navigate to Workers & Pages > AI
- Enable Workers AI if not already enabled
Some models may require account verification or may not be available in all regions.
The application automatically falls back to HTTP if WebSocket fails. Check that Durable Objects are properly configured:
npx wrangler durable-objects listCommon issues:
- Account ID not set: Add your account ID to
wrangler.toml - Workers AI not enabled: Enable it in the dashboard
- Vectorize not available: May require account verification (check email)
- Model not found: The app will automatically fallback to Llama 3.1 if 3.3 is unavailable
If running locally, the frontend connects directly to your deployed Worker URL. Ensure:
- The Worker is deployed successfully
- CORS headers are properly configured (they are by default)
- The Worker URL in the frontend matches your deployed URL
npm run devThis starts the Worker locally. Note that Durable Objects and Workers AI require deployment to work fully.
npm run pages:devThe frontend will be available at http://localhost:8788 (or the next available port if 8788 is in use) and will connect to your deployed Worker. The actual port will be shown in the terminal output.
Edit scripts/populate-vectorize.ts to add more Cloudflare documentation chunks. The script uses the BGE embedding model to create vectors.
Edit the system prompts in src/agent.ts:
processChatMessage()- Chat system promptprocessCodeGeneration()- Code generation system prompt
Modify frontend/index.html to customize the UI. The design uses Tailwind CSS for styling.
- All communication uses HTTPS/WSS
- No API keys stored in client code (uses Cloudflare bindings)
- State isolated per Durable Object instance
- Input validation on all endpoints
- CORS headers configured for cross-origin requests
- Response Time: Typically 2-5 seconds for code generation, 1-3 seconds for chat
- Concurrent Users: Scales automatically with Cloudflare's edge network
- Cost: Pay-per-use for Workers AI, Vectorize queries, and Durable Object invocations
- Free Tier: Cloudflare's free tier includes generous limits for Workers AI and Vectorize
This is a demonstration project for Cloudflare's platform capabilities. Feel free to fork and extend it!
MIT
Built with:
This project was developed with AI assistance for code generation, documentation, and optimization. See PROMPTS.md for details on AI prompts used during development.
Note: This project requires a Cloudflare account with Workers AI enabled. Some features may require account verification depending on your region and account status.