An AI-powered video processing platform that allows users to upload instructional videos and ask questions about them using natural language. Features multimodal search across video content, frame analysis, and intelligent chat interface.
- Video Upload: Drag-and-drop interface with real-time processing status
- AI Chat Interface: Ask questions about uploaded videos with streaming responses
- Multimodal Search: Search across transcripts, frames, and uploaded images
- Frame Analysis: Automatic scene detection and frame extraction with AI vision
- Authentication: Simple demo authentication system
- Real-time Processing: Live status updates during video processing
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Browser │─────▶│ Next.js App │─────▶│ PostgreSQL │
│ │ │ (video-qa) │ │ + pgvector │
└─────────────┘ └──────────────┘ └──────┬──────┘
│ │
│ writes │ polls
│ video + │ jobs
│ job │
▼ ▼
┌─────────────┐ ┌──────────────┐
│ data/ │ │ Worker │
│ uploads/ │◀─────│ (video- │
│ processed/ │ │ worker) │
│ frames/ │ └──────────────┘
└─────────────┘
- Node.js 18+ and pnpm
- Docker and Docker Compose
- OpenAI API key
# Clone both repositories
git clone <video-qa-repo>
git clone <video-worker-repo>
# Create environment file
cd video-qa
cat > .env.local << EOF
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/videoqa
OPENAI_KEY=your_openai_api_key_here
EOF# Build worker image
cd ../video-qa-worker
docker build -t videoqa-worker:0.0.19 .
# Start database and worker
cd ../video-qa
docker-compose up -d
# Start Next.js app
pnpm install
pnpm dev- Open http://localhost:3000
- Login with demo credentials:
demo/demo123 - Upload a video file (max 500MB)
- Monitor processing status
- Ask questions about your video
- File Upload: Video saved to
data/uploads/{id}_{name}.mp4 - Database: Metadata stored in
videostable withoriginal_path - Job Queue: Processing job created in
jobstable - Worker: Polls for jobs using
FOR UPDATE SKIP LOCKED
The worker processes videos through 6 stages:
Input: uploads/{id}_{name}.mp4
│
├─▶ [1. NORMALIZE] → processed/{id}/normalized.mp4
│ → processed/{id}/audio.wav
│
├─▶ [2. TRANSCRIBE] → transcript_segments table
│ → subs/{id}.srt
│
├─▶ [3. SCENES] → scenes table (t_start, t_end)
│
├─▶ [4. FRAMES] → frames/{id}/scene_*.jpg
│ → frames table (phash, path)
│
├─▶ [5. VISION] → frame_captions table (caption, entities)
│
└─▶ [6. EMBEDDINGS] → UPDATE embeddings (1536-dim vectors)
Output: video.status = 'ready'
videos: Video metadata (original_path, normalized_path, status, duration)jobs: Processing queue with status trackingscenes: Scene boundaries detected in videosframes: Extracted frames with perceptual hashestranscript_segments: Audio transcription with embeddingsframe_captions: Vision analysis with embeddings
videos (1) ──→ (many) jobs
videos (1) ──→ (many) scenes
scenes (1) ──→ (many) frames
frames (1) ──→ (1) frame_captions
videos (1) ──→ (many) transcript_segments
- POST
/login- Login with demo credentials - Response: Redirect to upload page
- POST
/api/upload- Upload video file - Response:
{ id: string }
- GET
/api/videos- List all videos - GET
/api/videos/[id]/status- Get processing status - GET
/api/videos/[id]/summary- Get processing results
- POST
/api/ask- Ask questions about videos - POST
/api/ask/upload-image- Upload image for multimodal search - Response: Streaming text response
- GET
/api/frames/[videoId]/[frameNum]- Serve frame images - Response: JPEG image with caching headers
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
✅ | - | PostgreSQL connection string |
OPENAI_KEY |
✅ | - | OpenAI API key for AI processing |
NODE_ENV |
❌ | development | Environment mode |
- Stored in DB: Relative paths like
uploads/{id}_{name}.mp4 - Resolved by Worker:
{DATA_DIR}/{relative_path}→ absolute path - Benefits: Portable across environments, easy to move data
data/
├── uploads/ # Original uploaded videos
├── processed/ # Normalized videos and audio
│ └── {video_id}/
├── frames/ # Extracted frame images
│ └── {video_id}/
├── subs/ # SRT subtitle files
├── ask-uploads/ # User-uploaded images for chat
└── worker/ # Worker logs
└── log.log
video-qa/
├── src/app/api/ # API routes
│ ├── upload/ # Upload endpoint
│ ├── ask/ # Chat interface
│ ├── videos/ # Video management
│ └── frames/ # Frame image serving
├── src/app/(app)/ # Protected pages
│ ├── upload/ # Upload UI
│ └── ask/ # Chat interface
├── src/app/(auth)/ # Authentication
│ └── login/ # Login page
├── src/components/ # React components
│ ├── DashboardLayout # Main layout
│ ├── ChatMessage # Message rendering
│ └── ThemeProvider # MUI theming
├── lib/ # Shared utilities
│ ├── db.ts # Database functions
│ ├── rag.ts # RAG system
│ ├── vision.ts # Vision analysis
│ └── file.ts # File operations
└── postgres/ # Database schema
└── initdb/
- Authentication: Demo login system with cookie-based sessions
- Multimodal Search: RAG system with vector embeddings and image analysis
- Real-time Chat: Streaming AI responses with frame and timestamp references
- Material-UI: Modern, responsive interface with custom theming
- Idempotent Operations: Safe to re-run processing
- Error Handling: Comprehensive error logging and user feedback
-
Worker can't find video files
- Check
DATA_DIRis correctly mounted in docker-compose - Verify file exists at resolved path
- Check
-
Database connection errors
- Ensure PostgreSQL is running:
docker-compose ps - Check connection string in
.env.local
- Ensure PostgreSQL is running:
-
OpenAI API errors
- Verify
OPENAI_API_KEYis set correctly - Check API key has sufficient credits
- Verify
-
Path resolution issues
- Ensure uploads use relative paths (
uploads/...) - Check
DATA_DIRenvironment variable
- Ensure uploads use relative paths (
- Worker logs:
data/worker/log.log - Database logs:
docker-compose logs postgres - Next.js logs: Terminal output
docker-compose down
rm -rf data/postgres
docker-compose up -d- ARCHITECTURE.md - Detailed system design
- QUICKSTART.md - 5-minute setup guide
- ../video-qa-worker/README.md - Worker documentation