👉 Open App → satark-deepfake.vercel.app
Satark-AI is a production-grade, full-stack deepfake detection and speaker verification platform. Built as a scalable microservices monorepo, it combines advanced audio forensics (MFCC, Spectral Analysis, Zero Crossing Rate), multimodal vision AI via NVIDIA NIM (Llama 3.2-90B Vision), and deep learning speaker biometrics (ECAPA-TDNN) to identify synthetic media and verify speaker identities in real time — across audio, video, and image inputs.
| Dashboard | Mobile View |
|---|---|
![]() |
![]() |
| Live Monitor | Speaker Identity |
|---|---|
![]() |
![]() |
- Wav2Vec2 Model: Transformer-based deep learning model fine-tuned for synthetic speech detection.
- Multi-Feature Forensics: Analyzes MFCC coefficients, Spectral Rolloff, and Zero Crossing Rate (ZCR) for composite risk scoring.
- Multi-Format Support: Upload MP3, WAV, or extract audio from MP4 video files — handled via
moviepyfallback. - Explainable AI (XAI): Returns structured
analysisDetailswith per-feature reasoning (e.g.,"Anomalous zero crossing rate (0.214)"). - Confidence Scoring: 4-decimal precision confidence score returned per scan.
- Smart Deduplication: SHA-256 file hashing prevents redundant re-processing of identical files.
Image analysis runs on an entirely separate, serverless pipeline — independent of the Python engine.
- Model:
meta/llama-3.2-90b-vision-instructvia NVIDIA NIM API — a 90B multimodal vision-language model analyzing spatial artifacts, blending edges, texture inconsistencies, asymmetric features, and lighting anomalies. - Cloudflare Worker Proxy: A dedicated Cloudflare Worker (
satark-image-proxy) sits between the frontend and NVIDIA's API — handling CORS, secret management, size enforcement, and timeout control. - Input Formats: Supports
multipart/form-datafile upload or raw binary body. Images are converted to base64 Data URI client-side and forwarded to NVIDIA NIM. - 5MB Size Limit: Enforced both via
content-lengthheader (pre-read) and actualbyteLengthpost-read — double-layer enforcement. - 30s Timeout:
AbortControllercancels hung NVIDIA requests after 30 seconds, returning504gracefully. - Output Schema:
{ isDeepfake: boolean, confidenceScore: float (0–1), details: string }— normalized and validated before returning to client. - Robust JSON Parsing: Strips markdown code fences, extracts first valid
{ }block, clampsconfidenceScoreto[0, 1], falls back gracefully if NVIDIA returns unexpected format. - CORS Whitelisting: Strict origin whitelist (
satark-deepfake.vercel.app,localhost:5173,localhost:3000) — no wildcard*.
- Enrollment System: Enroll a speaker by uploading a reference audio sample. ECAPA-TDNN extracts a 192-dim voice embedding stored securely in PostgreSQL.
- Verification: Match an unknown voice against all enrolled speakers using Cosine Similarity (threshold: 0.75).
- Scoped Isolation: Users only verify against their own enrolled speakers — cross-user data access is prevented at the query level.
- Auto-History Logging: Every verification attempt is saved to the scan history table with identity details.
- Real-Time Protection: Continuously captures microphone input and processes it in 5-second chunks.
- Instant Feedback: Each chunk is scanned and flagged as real or synthetic with confidence score.
- Auto-Persistence: All detected threats are saved to the history database automatically.
- Detection Ratio Chart: Donut chart (Recharts PieChart) visualizing Real vs. Fake scan breakdown.
- Confidence Bucketing: Bar chart grouping scans into High (>80%), Medium (50–80%), and Low (<50%) confidence bands.
- Summary Cards: Total Scans, Deepfakes Detected, Real Audio count, and Average Confidence — animated with Framer Motion.
DeepfakeGamecomponent — an interactive challenge mode that tests the user's ability to distinguish real from AI-generated audio samples.
- Users can submit feedback on any scan via
FeedbackWidget. - Stored in the
scans.feedbackcolumn and retrievable via/scans/:id/feedback.
- Progressive Web App: Installable on Android/iOS and Desktop via
InstallPWAcomponent. Powered by Workbox service worker with precaching and network-only strategies. - Dark / Light Mode: Full theme toggle via
theme-providerandmode-toggle. - Multilingual Support: Language context (
LanguageContext.tsx) with a language toggle component. - History & Playback: Review all past scans, listen back to saved audio, and export detailed PDF reports.
Satark-AI is structured as a Turborepo monorepo with three independent microservices and one shared package:
satark-ai/
├── apps/
│ ├── web/ → React + Vite (Frontend)
│ ├── api/ → Hono + Node.js (API Gateway)
│ └── engine/ → FastAPI + Python (AI Engine — Audio/Speaker)
├── packages/
│ └── shared/ → Shared Zod schemas & TypeScript types
├── cloudflare-worker/ → satark-image-proxy (NVIDIA NIM image proxy)
├── docker-compose.yml
└── turbo.json
| Service | Runtime | Role | Port |
|---|---|---|---|
apps/web |
React 18 + Vite | User interface, PWA shell | 5173 |
apps/api |
Node.js + Hono | Auth, DB, orchestration | 3000 |
apps/engine |
Python 3.11 + FastAPI | Audio deepfake + speaker inference | 8000 |
cloudflare-worker |
Cloudflare Workers (V8) | Image deepfake proxy → NVIDIA NIM | Edge |
┌─────────────────────────────────────────┐
│ Browser (React PWA) │
└──────────┬──────────────┬───────────────┘
│ │
Audio/ │ │ Image Upload
Speaker │ │
▼ ▼
┌──────────────┐ ┌──────────────────────┐
│ Hono API │ │ Cloudflare Worker │
│ Gateway │ │ (satark-image-proxy)│
│ (Node.js) │ └──────────┬───────────┘
└──────┬───────┘ │
│ ▼
▼ ┌──────────────────┐
┌──────────────┐ │ NVIDIA NIM API │
│ FastAPI │ │ Llama 3.2-90B │
│ AI Engine │ │ Vision Instruct │
│ (Python) │ └──────────────────┘
└──────┬───────┘
│
┌───────────┴──────────┐
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ PostgreSQL │ │ PyTorch Models │
│ (Drizzle ORM)│ │ Wav2Vec2 + │
└──────────────┘ │ ECAPA-TDNN │
└──────────────────┘
Complete technical documentation for the Satark-AI platform:
| Document | Description |
|---|---|
docs/AI_DISCOVERABILITY_FRAMEWORKS.md |
AI discoverability & search optimization (AEO, GEO, LLMO, AISEO, E-E-A-T, SEO) |
docs/API.md |
Complete REST API reference (20+ endpoints across 3 services) |
docs/ARCHITECTURE.md |
System design, service map, request flows, C4 diagrams |
docs/DB_SCHEMA.md |
PostgreSQL schema with field-level documentation (Drizzle ORM) |
docs/DEPLOYMENT.md |
Production deployment guide (Vercel + Render + Docker) |
docs/EDGE_CASES.md |
Error handling, graceful degradation, failure modes |
docs/TECH_STACK.md |
Complete technology inventory (40+ packages) |
docs/WORKFLOW.md |
Development workflow, testing strategy, CI/CD pipeline |
| Signal | Feature Extracted | Anomaly Trigger |
|---|---|---|
| Raw waveform | Wav2Vec2 classifier | Model confidence > threshold |
| Frequency domain | Spectral Rolloff | Rolloff < 2500 Hz |
| Time domain | Zero Crossing Rate | ZCR > 0.12 |
| Combined | Composite risk score | Weighted multi-feature fusion |
| Step | Technology | Detail |
|---|---|---|
| Audio loading | Librosa | Resampled to 16 kHz mono |
| Embedding extraction | SpeechBrain ECAPA-TDNN | 192-dimensional vector |
| Similarity scoring | Cosine Similarity (TypeScript) | Computed server-side in API |
| Match decision | Threshold (0.75) | score > 0.75 → Identity Confirmed |
| Step | Component | Detail |
|---|---|---|
| Request intake | Cloudflare Worker | Accepts multipart/form-data or raw binary |
| Size enforcement | Worker (double-check) | Pre-read via content-length, post-read via byteLength — 5MB cap |
| Image encoding | Worker | ArrayBuffer → Base64 → Data URI |
| Vision inference | NVIDIA NIM API | meta/llama-3.2-90b-vision-instruct analyzes artifacts, blending, texture |
| Response parsing | extractJSON() |
Strips markdown fences, extracts {}, clamps score to [0,1] |
| Timeout control | AbortController |
30s hard timeout → 504 response |
| Output | Normalized JSON | { isDeepfake, confidenceScore, details } |
src/
├── api/
│ └── client.ts → Typed API client (scanAudio, scanUpload, scanImage,
│ enrollSpeaker, verifySpeaker, getHistory, submitFeedback)
├── components/
│ ├── AnalyticsStats.tsx → Recharts pie + bar dashboard
│ ├── AudioUpload.tsx → File picker with drag-drop for audio
│ ├── AudioVisualizer.tsx → Real-time waveform canvas
│ ├── ConfidenceMeter.tsx → Animated confidence score bar
│ ├── DeepfakeGame.tsx → Interactive detection challenge game
│ ├── ErrorBoundary.tsx → React error boundary wrapper
│ ├── FakeHeatmap.tsx → Feature-level heatmap visualization
│ ├── FeatureChart.tsx → Per-feature forensic breakdown chart
│ ├── FeedbackWidget.tsx → User feedback submission UI
│ ├── Footer.tsx
│ ├── ImageUpload.tsx → Image deepfake upload → Cloudflare Worker → NVIDIA NIM
│ ├── InstallPWA.tsx → PWA install prompt handler
│ ├── LandingNavbar.tsx → Public landing page navigation
│ ├── language-toggle.tsx → i18n language switcher
│ ├── LiveMonitor.tsx → Real-time mic monitoring (5s chunks)
│ ├── mode-toggle.tsx → Dark/light theme switch
│ ├── Navbar.tsx → Authenticated app navigation
│ ├── ScanHistory.tsx → History list with audio playback
│ ├── SpeakerIdentity.tsx → Enrollment + verification UI
│ └── theme-provider.tsx → Global theme context
├── context/
│ └── LanguageContext.tsx → i18n context provider
├── lib/
│ └── utils.ts → Shared utility helpers
├── pages/
│ ├── History.tsx → Full scan history page
│ └── Landing.tsx → Public marketing landing
├── utils/
│ └── pdfGenerator.ts → jsPDF-powered report generation
├── App.tsx → Root router + Clerk provider
└── AuthenticatedShell.tsx → Protected app shell wrapper
Key Libraries:
| Library | Version | Purpose |
|---|---|---|
| React | 18 | Core UI framework |
| Vite | — | Build tool + HMR |
| TypeScript | 5.3 | Type safety |
| Tailwind CSS | — | Utility-first styling |
| Framer Motion | — | Animations |
| Clerk | — | Auth (JWT) |
| Recharts | — | Analytics charts |
| Lucide React | — | Icon set |
| Workbox | 7.3 | PWA / Service Worker |
src/
├── db/
│ ├── index.ts → Drizzle + pg connection pool (max 20, timeout 5s, idle 30s)
│ └── schema.ts → PostgreSQL schema definitions
├── middleware/
│ └── auth.ts → Clerk JWT verification middleware (authMiddleware + requireAuth)
├── routes/
│ └── speaker.ts → /speaker/enroll + /speaker/verify endpoints
└── index.ts → Main Hono app, all route registration
Database Schema:
// scans table
{
id: serial (PK),
userId: text (NOT NULL), // Clerk user ID
audioUrl: text (NOT NULL),
isDeepfake: boolean,
confidenceScore: float8,
fileHash: text, // SHA-256 for deduplication
audioData: text, // Base64 encoded audio (for playback)
analysisDetails: text, // Human-readable XAI output
createdAt: timestamp (default now),
feedback: text
// Indexes: userId, createdAt, fileHash
}
// speakers table
{
id: uuid (PK, random),
userId: text (NOT NULL), // Clerk user ID (scoped isolation)
name: text (NOT NULL),
embedding: json (NOT NULL), // 192-dim ECAPA-TDNN float array
createdAt: timestamp (NOT NULL)
// Index: userId
}API Endpoints:
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/upload |
✅ | Upload audio file for deepfake scan |
POST |
/scan |
✅ | Scan audio from URL |
GET |
/scans |
✅ | Get user's scan history |
GET |
/audio/:id |
✅ | Stream audio blob for playback |
POST |
/scans/:id/feedback |
✅ | Submit feedback on a scan |
POST |
/speaker/enroll |
✅ | Enroll speaker voice print |
POST |
/speaker/verify |
✅ | Verify speaker identity |
apps/engine/
├── main.py → FastAPI app, endpoint definitions, lifespan context
├── detect.py → Deepfake detection pipeline (Wav2Vec2 + spectral)
├── detect_image.py → Image deepfake detection
├── speaker.py → ECAPA-TDNN embedding generation + HF patches
├── schemas.py → Pydantic models (AudioUpload, ScanResult)
├── dummy_custom.py → SpeechBrain HuggingFace 404 fallback patch
├── requirements.txt → Pinned Python dependencies
└── Dockerfile → Python 3.11-slim, non-root user (appuser)
Engine Endpoints:
| Method | Path | Description |
|---|---|---|
GET |
/ |
Health check — {"status": "AI Engine Running"} |
POST |
/scan |
Scan audio via URL (async download → analyze) |
POST |
/scan-upload |
Scan uploaded audio file |
POST |
/analyze |
Video/audio analysis with moviepy fallback |
POST |
/embed |
Generate ECAPA-TDNN speaker embedding vector |
Performance Notes:
- Lazy Model Loading: Models load on first request (not at startup) to prevent OOM crashes on free-tier Render instances.
- Thread Executor: CPU-bound inference runs in
loop.run_in_executor()to keep FastAPI async event loop non-blocking. - Temp File Cleanup: All uploaded/extracted files are deleted in
finallyblocks — no disk leaks.
Shared Zod validation schemas and TypeScript types (ScanResultType, AudioUploadSchema, etc.) consumed by both apps/api and apps/web.
GitHub Actions cron job that pings both Render services every 14 minutes to prevent cold starts on the free tier.
schedule:
- cron: "*/14 * * * *"Pings:
- API:
https://satark-ai-f5t7.onrender.com/ - Engine:
https://satark-ai-es1v.onrender.com/
| Requirement | Version |
|---|---|
| Node.js | v18+ |
| Python | 3.11+ |
| PostgreSQL | 14+ |
| Docker (optional) | Latest |
1. Clone the Repository
git clone https://github.com/theunstopabble/Satark-AI.git
cd Satark-AI2. Install Node.js Dependencies
npm install # installs all workspaces via Turborepo3. Install Python Dependencies (AI Engine)
cd apps/engine
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt4. Configure Environment Variables
Create .env files in each app directory:
apps/web/.env
VITE_CLERK_PUBLISHABLE_KEY=pk_test_xxxx
VITE_API_URL=http://localhost:3000apps/api/.env
DATABASE_URL=postgresql://user:password@localhost:5432/satark_db
CLERK_SECRET_KEY=sk_test_xxxx
CLERK_PUBLISHABLE_KEY=pk_test_xxxx
ALLOWED_ORIGINS=http://localhost:5173
ENGINE_URL=http://localhost:8000apps/engine/.env
# No required vars — models download from HuggingFace on first run5. Run Database Migrations
cd apps/api
npx drizzle-kit push6. Start All Services
| Terminal | Command |
|---|---|
| Terminal 1 — Frontend | cd apps/web && npm run dev |
| Terminal 2 — API Gateway | cd apps/api && npm run dev |
| Terminal 3 — AI Engine | cd apps/engine && uvicorn main:app --reload --port 8000 |
Or run everything at once from root:
npm run dev # Turborepo orchestrates all three concurrently# Copy and fill in your .env values first
cp .env.example .env
docker-compose up --buildServices will start at:
- Frontend:
http://localhost:5173 - API Gateway:
http://localhost:3000 - AI Engine:
http://localhost:8000
Docker security hardening included:
- Non-root user (
appuser) in the engine container no-new-privileges:truesecurity option on all servicestmpfsmount for/tmpin API and web containers
| Service | Platform | URL |
|---|---|---|
Frontend (apps/web) |
Vercel | satark-deepfake.vercel.app |
API Gateway (apps/api) |
Render | satark-ai-f5t7.onrender.com |
AI Engine (apps/engine) |
Render | satark-ai-es1v.onrender.com |
| Image Proxy (Worker) | Cloudflare Workers | satark-image-proxy.gautamkumar43421.workers.dev |
| Database | Supabase / Neon / Railway | PostgreSQL (SSL enabled) |
Vercel config (apps/web/vercel.json) — SPA routing rewrites all paths to index.html.
| Layer | Mechanism | Detail |
|---|---|---|
| Authentication | Clerk JWT | All protected routes verify token server-side |
| Authorization | Context-scoped userId | userId extracted from auth token — never trusted from request body |
| Speaker isolation | DB-level scoping | Verify queries filter by userId — no cross-user voice data access |
| Speaker threshold | Cosine similarity ≥ 0.75 | Strict match threshold prevents false identity confirmations |
| File handling | UUID-prefixed temp files | Uploaded files stored with random UUID prefix, deleted post-processing |
| Container | Non-root user | Engine runs as appuser — no root privileges inside Docker |
| Connection pool | pg Pool | Max 20 connections, 5s timeout, graceful error recovery |
| Image proxy CORS | Origin whitelist | Worker rejects requests from unlisted origins — no wildcard * |
| Image size limit | Double-layer check | Enforced via content-length header + actual byteLength post-read (5MB cap) |
| NVIDIA key isolation | Cloudflare Secrets | NVIDIA_API_KEY never exposed to frontend — stored in Worker environment only |
| Variable | App | Required | Description |
|---|---|---|---|
VITE_CLERK_PUBLISHABLE_KEY |
web | ✅ | Clerk frontend public key |
VITE_API_URL |
web | ✅ | Backend API base URL |
DATABASE_URL |
api | ✅ | PostgreSQL connection string |
CLERK_SECRET_KEY |
api | ✅ | Clerk backend secret key |
CLERK_PUBLISHABLE_KEY |
api | ✅ | Clerk public key (for validation) |
ALLOWED_ORIGINS |
api | ✅ | CORS allowed origins (comma-separated) |
ENGINE_URL |
api | ✅ | FastAPI engine base URL |
IMAGE_API_URL |
api | ✅ | Cloudflare Worker URL for image deepfake proxy |
NVIDIA_API_KEY |
cloudflare-worker | ✅ | NVIDIA NIM API key — set as Cloudflare Worker Secret |
Note on
NVIDIA_API_KEY: This is stored viawrangler secret put NVIDIA_API_KEYand is never in source code or.envfiles. It lives exclusively in Cloudflare's encrypted secret store.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create a feature branch (
git checkout -b feat/your-feature) - Commit your changes (
git commit -m 'feat: add your feature') - Push to the branch (
git push origin feat/your-feature) - Open a Pull Request
Gautam Kumar — Lead Developer
This project is licensed under the MIT License — see the LICENSE file for details.
Satark-AI — Because the truth still matters.
| Project | Description | Link |
|---|---|---|
| Portfolio | Personal portfolio & developer profile | gautam-kr.vercel.app |
| InterviewMinds | Enterprise AI mock interview platform | interviewminds.vercel.app |
| SwadKart | Multi-vendor food delivery platform with AI chatbot | swadkart.vercel.app |
| TexFolio | AI-powered LaTeX resume builder with RBAC | texfolio.vercel.app |




