Orion is a multi-tenant, context-aware customer support assistant. It blends hybrid retrieval, structured LLM reasoning, and adaptive confidence tracking to power human-grade chat experiences for small businesses
Orion consists of two coordinated layers:
-
Frontend (Vite + React + TanStack Router) Structured chat interface, setup dashboard, and escalation console.
-
Backend (Node.js + Prisma + Neon Postgres) Multi-tenant API managing companies, sessions, FAQs, and messages. Integrates Gemini / OpenAI for response generation.
The architecture is designed for clarity, not complexity: each request follows a transparent data path from user message โ contextual reasoning โ confidence judgment โ structured rendering.
A productionโready, contextโaware assistant that responds in structured JSONโrendered into clean, trustworthy UI.
- Hybrid context retrieval (Company Profile + Semantic FAQ TopโK)
- Structured responses (summary, sections, confidence, escalation)
- Adaptive session confidence with EMA smoothing
- Multiโtenant company context (localStorage + API scoping)
- Fast, modern stack (React 19, Vite 7, Tailwind v4, Prisma)
Orion/
โโโ Backend/ # Express + Prisma API
โ โโโ prisma/ # Schema + migrations
โ โโโ src/ # Routes, services, LLM, confidence
โ
โ
โโโ Frontend/ # React + Vite app
โโโ src/
โ โโโ Pages/ChatPage.tsx
โ โโโ Components/Frontend_ChatMessage_Component.tsx
โ โโโ hooks/useChat.ts
โโโ README.md # Frontend readme
โโโ FRONTEND_GUIDE.md # Focused guide (this project)
# 1) Backend
cd Backend
npm install
npx prisma migrate dev
npm run dev
# 2) Frontend (in a new terminal)
cd ../Frontend
npm install
# point to backend if needed
echo "VITE_API_BASE=http://localhost:5000/api/v1" > .env.local
npm run devOpen the chat at: http://localhost:5173/chat
flowchart LR
A[User Message] --> B[Hybrid Context Retrieval]
B -->|Embeddings + Company Profile + FAQs| C[Structured LLM Generation]
C -->|JSON title, summary, sections , confidence| D[Confidence Engine]
D -->|EMA Smoothing + Escalation Logic| E[Session Store Postgres]
E -->|Persist messages + summaries| F[Frontend Renderer]
F -->|Structured UI| A
| Component | Responsibility |
|---|---|
| Hybrid Context | Combines semantic FAQ retrieval + company profile summary. |
| Structured LLM | Produces validated JSON with confidence and tone metadata. |
| Confidence Engine | Smooths confidence with EMA and flags low-confidence turns for escalation. |
| Session Store | Persists all message history, summaries, and confidence traces. |
| Frontend Renderer | Visualizes structured replies, confidence badges, and escalation states. |
-
User Message โ Sent via
/sessions/:id/messages -
Context Assembly โ Hybrid Context fetches last 6 messages, session summary, and Top-K FAQs
-
Generation โ LLM returns structured JSON (title, sections, confidence)
-
Evaluation โ Confidence Engine applies EMA smoothing
-
Decision โ
- โฅ threshold โ normal reply
- < threshold โ
status=escalated
-
Persistence โ Message + metadata saved to Postgres
-
Frontend Render โ Displays sections, confidence badge, escalation banner if flagged
Simplified from the full schema:
Company โ FAQ[] โ Session[] โ Message[]
- Company: Tenant boundary, owns FAQs & sessions
- FAQ: Question, answer, vector embedding
- Session: Tracks conversation state & escalation status
- Message: Text, confidence, and structured metadata (JSON)
No separate escalations table โ handled via Session.status.
See ESCALATIONS_ARCHITECTURE.md for reasoning behind this lightweight model.
- Backend: Node, Express, Prisma (PostgreSQL), Gemini API (@google/genai)
- Frontend: React 19, Vite 7, Tailwind v4, Radix UI, TanStack Router
Backend:
cd Backend
npm testFrontend:
- Manual:
npm run devand exercise/chatand dashboard routes
- Embeds query using Gemini 768-dim vectors
- Computes cosine similarity over stored FAQ embeddings
- Fuses retrieved FAQs + company profile into the LLM prompt
-
Applies EMA smoothing (
ฮฑ=0.2) across turns -
Classifies outcomes via multi-threshold scheme:
- Strong โฅ 0.8
- Weak 0.5โ0.8
- Escalate < 0.3
-
Emits
shouldEscalateflag used by UI and session patcher
- Enforces JSON schema for LLM responses
- Produces
metapayload (type, title, sections, tone, confidence) - Enables rich frontend rendering and explainable responses
sequenceDiagram
participant User
participant Backend
participant ConfidenceEngine
participant Store
participant Frontend
User->>Backend: POST /sessions/:id/messages
Backend->>ConfidenceEngine: Evaluate confidence
alt Low confidence
ConfidenceEngine-->>Store: Update session.status = "escalated"
Store-->>Frontend: { shouldEscalate: true }
Frontend-->>User: Show escalation banner
else Normal
ConfidenceEngine-->>Store: Save as active message
Store-->>Frontend: Normal structured reply
end
Escalations appear instantly in /dashboard/escalations, handled via session filters.
- Transparency โ Every response carries structured metadata.
- Simplicity โ Escalations live inside the session lifecycle.
- Context Preservation โ Short-term memory per session, not global state.
- Graceful Degradation โ Works in mock mode without external LLMs.
- Scalability โ Swappable vector backend (e.g., Pinecone or pgvector).
| Extension | Benefit |
|---|---|
| Vector DB Migration | Enables sub-second semantic search at scale |
| Analytics Dashboard | Confidence trends, FAQ performance, escalation ratios |
| Agent Queueing | Human takeover assignment & SLA tracking |
| Multilingual Support | Translated FAQs and localized context retrieval |
FRONTEND_GUIDE.mdโ Frontend setup, routing, and structured chat renderingOrion Backend API Reference (v1)โ Endpoint specificationsESCALATIONS_ARCHITECTURE.mdโ Session-based escalation designIMPLEMENTATION_COMPLETE.mdโ Full phase log & production readiness summary
Orion unifies retrieval, reasoning, and responsibility into one consistent system:
Hybrid context gives it memory, structured intelligence gives it form, confidence tracking gives it judgment.
Together, these make Orion a real, production-grade customer-support AI โ not just a chatbot.