Satark-AI: Defending Truth in the Age of Generative AI 🛡️

🌐 Live Demo

👉 Open App → satark-deepfake.vercel.app

Satark-AI is a production-grade, full-stack deepfake detection and speaker verification platform. Built as a scalable microservices monorepo, it combines advanced audio forensics (MFCC, Spectral Analysis, Zero Crossing Rate), multimodal vision AI via NVIDIA NIM (Llama 3.2-90B Vision), and deep learning speaker biometrics (ECAPA-TDNN) to identify synthetic media and verify speaker identities in real time — across audio, video, and image inputs.

📸 Screenshots

Dashboard	Mobile View

Live Monitor	Speaker Identity

🌟 Feature Overview

🕵️ Deepfake Audio Detection

Wav2Vec2 Model: Transformer-based deep learning model fine-tuned for synthetic speech detection.
Multi-Feature Forensics: Analyzes MFCC coefficients, Spectral Rolloff, and Zero Crossing Rate (ZCR) for composite risk scoring.
Multi-Format Support: Upload MP3, WAV, or extract audio from MP4 video files — handled via moviepy fallback.
Explainable AI (XAI): Returns structured analysisDetails with per-feature reasoning (e.g., "Anomalous zero crossing rate (0.214)").
Confidence Scoring: 4-decimal precision confidence score returned per scan.
Smart Deduplication: SHA-256 file hashing prevents redundant re-processing of identical files.

🖼️ Deepfake Image Detection — Powered by NVIDIA NIM

Image analysis runs on an entirely separate, serverless pipeline — independent of the Python engine.

Model: meta/llama-3.2-90b-vision-instruct via NVIDIA NIM API — a 90B multimodal vision-language model analyzing spatial artifacts, blending edges, texture inconsistencies, asymmetric features, and lighting anomalies.
Cloudflare Worker Proxy: A dedicated Cloudflare Worker (satark-image-proxy) sits between the frontend and NVIDIA's API — handling CORS, secret management, size enforcement, and timeout control.
Input Formats: Supports multipart/form-data file upload or raw binary body. Images are converted to base64 Data URI client-side and forwarded to NVIDIA NIM.
5MB Size Limit: Enforced both via content-length header (pre-read) and actual byteLength post-read — double-layer enforcement.
30s Timeout: AbortController cancels hung NVIDIA requests after 30 seconds, returning 504 gracefully.
Output Schema: { isDeepfake: boolean, confidenceScore: float (0–1), details: string } — normalized and validated before returning to client.
Robust JSON Parsing: Strips markdown code fences, extracts first valid { } block, clamps confidenceScore to [0, 1], falls back gracefully if NVIDIA returns unexpected format.
CORS Whitelisting: Strict origin whitelist (satark-deepfake.vercel.app, localhost:5173, localhost:3000) — no wildcard *.

🆔 Voice Biometrics — Speaker Identity

Enrollment System: Enroll a speaker by uploading a reference audio sample. ECAPA-TDNN extracts a 192-dim voice embedding stored securely in PostgreSQL.
Verification: Match an unknown voice against all enrolled speakers using Cosine Similarity (threshold: 0.75).
Scoped Isolation: Users only verify against their own enrolled speakers — cross-user data access is prevented at the query level.
Auto-History Logging: Every verification attempt is saved to the scan history table with identity details.

🎙️ Live Monitor

Real-Time Protection: Continuously captures microphone input and processes it in 5-second chunks.
Instant Feedback: Each chunk is scanned and flagged as real or synthetic with confidence score.
Auto-Persistence: All detected threats are saved to the history database automatically.

📊 Analytics Dashboard

Detection Ratio Chart: Donut chart (Recharts PieChart) visualizing Real vs. Fake scan breakdown.
Confidence Bucketing: Bar chart grouping scans into High (>80%), Medium (50–80%), and Low (<50%) confidence bands.
Summary Cards: Total Scans, Deepfakes Detected, Real Audio count, and Average Confidence — animated with Framer Motion.

🎮 Deepfake Game (Interactive)

DeepfakeGame component — an interactive challenge mode that tests the user's ability to distinguish real from AI-generated audio samples.

💬 Feedback System

Users can submit feedback on any scan via FeedbackWidget.
Stored in the scans.feedback column and retrievable via /scans/:id/feedback.

📱 PWA & Accessibility

Progressive Web App: Installable on Android/iOS and Desktop via InstallPWA component. Powered by Workbox service worker with precaching and network-only strategies.
Dark / Light Mode: Full theme toggle via theme-provider and mode-toggle.
Multilingual Support: Language context (LanguageContext.tsx) with a language toggle component.
History & Playback: Review all past scans, listen back to saved audio, and export detailed PDF reports.

🏗️ Architecture

Satark-AI is structured as a Turborepo monorepo with three independent microservices and one shared package:

satark-ai/
├── apps/
│   ├── web/          → React + Vite  (Frontend)
│   ├── api/          → Hono + Node.js (API Gateway)
│   └── engine/       → FastAPI + Python (AI Engine — Audio/Speaker)
├── packages/
│   └── shared/       → Shared Zod schemas & TypeScript types
├── cloudflare-worker/ → satark-image-proxy (NVIDIA NIM image proxy)
├── docker-compose.yml
└── turbo.json

Service Responsibilities

Service	Runtime	Role	Port
`apps/web`	React 18 + Vite	User interface, PWA shell	5173
`apps/api`	Node.js + Hono	Auth, DB, orchestration	3000
`apps/engine`	Python 3.11 + FastAPI	Audio deepfake + speaker inference	8000
`cloudflare-worker`	Cloudflare Workers (V8)	Image deepfake proxy → NVIDIA NIM	Edge

Request Flow

                        ┌─────────────────────────────────────────┐
                        │           Browser (React PWA)           │
                        └──────────┬──────────────┬───────────────┘
                                   │              │
                          Audio/   │              │ Image Upload
                          Speaker  │              │
                                   ▼              ▼
                        ┌──────────────┐   ┌──────────────────────┐
                        │  Hono API    │   │  Cloudflare Worker   │
                        │  Gateway     │   │  (satark-image-proxy)│
                        │  (Node.js)   │   └──────────┬───────────┘
                        └──────┬───────┘              │
                               │                      ▼
                               ▼              ┌──────────────────┐
                        ┌──────────────┐      │  NVIDIA NIM API  │
                        │ FastAPI      │      │  Llama 3.2-90B   │
                        │ AI Engine    │      │  Vision Instruct │
                        │ (Python)     │      └──────────────────┘
                        └──────┬───────┘
                               │
                   ┌───────────┴──────────┐
                   ▼                      ▼
           ┌──────────────┐     ┌──────────────────┐
           │  PostgreSQL  │     │  PyTorch Models  │
           │ (Drizzle ORM)│     │ Wav2Vec2 +       │
           └──────────────┘     │ ECAPA-TDNN       │
                                └──────────────────┘

📚 Documentation

Complete technical documentation for the Satark-AI platform:

Document	Description
`docs/AI_DISCOVERABILITY_FRAMEWORKS.md`	AI discoverability & search optimization (AEO, GEO, LLMO, AISEO, E-E-A-T, SEO)
`docs/API.md`	Complete REST API reference (20+ endpoints across 3 services)
`docs/ARCHITECTURE.md`	System design, service map, request flows, C4 diagrams
`docs/DB_SCHEMA.md`	PostgreSQL schema with field-level documentation (Drizzle ORM)
`docs/DEPLOYMENT.md`	Production deployment guide (Vercel + Render + Docker)
`docs/EDGE_CASES.md`	Error handling, graceful degradation, failure modes
`docs/TECH_STACK.md`	Complete technology inventory (40+ packages)
`docs/WORKFLOW.md`	Development workflow, testing strategy, CI/CD pipeline

🧠 AI Models & Algorithms

Deepfake Detection Pipeline (`detect.py`)

Signal	Feature Extracted	Anomaly Trigger
Raw waveform	Wav2Vec2 classifier	Model confidence > threshold
Frequency domain	Spectral Rolloff	Rolloff < 2500 Hz
Time domain	Zero Crossing Rate	ZCR > 0.12
Combined	Composite risk score	Weighted multi-feature fusion

Speaker Verification Pipeline (`speaker.py`)

Step	Technology	Detail
Audio loading	Librosa	Resampled to 16 kHz mono
Embedding extraction	SpeechBrain ECAPA-TDNN	192-dimensional vector
Similarity scoring	Cosine Similarity (TypeScript)	Computed server-side in API
Match decision	Threshold (0.75)	`score > 0.75` → Identity Confirmed

Image Deepfake Pipeline (Cloudflare Worker)

Step	Component	Detail
Request intake	Cloudflare Worker	Accepts `multipart/form-data` or raw binary
Size enforcement	Worker (double-check)	Pre-read via `content-length`, post-read via `byteLength` — 5MB cap
Image encoding	Worker	`ArrayBuffer → Base64 → Data URI`
Vision inference	NVIDIA NIM API	`meta/llama-3.2-90b-vision-instruct` analyzes artifacts, blending, texture
Response parsing	`extractJSON()`	Strips markdown fences, extracts `{}`, clamps score to `[0,1]`
Timeout control	`AbortController`	30s hard timeout → 504 response
Output	Normalized JSON	`{ isDeepfake, confidenceScore, details }`

🗂️ Codebase Deep Dive

`apps/web` — Frontend

src/
├── api/
│   └── client.ts            → Typed API client (scanAudio, scanUpload, scanImage,
│                               enrollSpeaker, verifySpeaker, getHistory, submitFeedback)
├── components/
│   ├── AnalyticsStats.tsx   → Recharts pie + bar dashboard
│   ├── AudioUpload.tsx      → File picker with drag-drop for audio
│   ├── AudioVisualizer.tsx  → Real-time waveform canvas
│   ├── ConfidenceMeter.tsx  → Animated confidence score bar
│   ├── DeepfakeGame.tsx     → Interactive detection challenge game
│   ├── ErrorBoundary.tsx    → React error boundary wrapper
│   ├── FakeHeatmap.tsx      → Feature-level heatmap visualization
│   ├── FeatureChart.tsx     → Per-feature forensic breakdown chart
│   ├── FeedbackWidget.tsx   → User feedback submission UI
│   ├── Footer.tsx
│   ├── ImageUpload.tsx      → Image deepfake upload → Cloudflare Worker → NVIDIA NIM
│   ├── InstallPWA.tsx       → PWA install prompt handler
│   ├── LandingNavbar.tsx    → Public landing page navigation
│   ├── language-toggle.tsx  → i18n language switcher
│   ├── LiveMonitor.tsx      → Real-time mic monitoring (5s chunks)
│   ├── mode-toggle.tsx      → Dark/light theme switch
│   ├── Navbar.tsx           → Authenticated app navigation
│   ├── ScanHistory.tsx      → History list with audio playback
│   ├── SpeakerIdentity.tsx  → Enrollment + verification UI
│   └── theme-provider.tsx   → Global theme context
├── context/
│   └── LanguageContext.tsx  → i18n context provider
├── lib/
│   └── utils.ts             → Shared utility helpers
├── pages/
│   ├── History.tsx          → Full scan history page
│   └── Landing.tsx          → Public marketing landing
├── utils/
│   └── pdfGenerator.ts      → jsPDF-powered report generation
├── App.tsx                  → Root router + Clerk provider
└── AuthenticatedShell.tsx   → Protected app shell wrapper

Key Libraries:

Library	Version	Purpose
React	18	Core UI framework
Vite	—	Build tool + HMR
TypeScript	5.3	Type safety
Tailwind CSS	—	Utility-first styling
Framer Motion	—	Animations
Clerk	—	Auth (JWT)
Recharts	—	Analytics charts
Lucide React	—	Icon set
Workbox	7.3	PWA / Service Worker

`apps/api` — API Gateway

src/
├── db/
│   ├── index.ts    → Drizzle + pg connection pool (max 20, timeout 5s, idle 30s)
│   └── schema.ts   → PostgreSQL schema definitions
├── middleware/
│   └── auth.ts     → Clerk JWT verification middleware (authMiddleware + requireAuth)
├── routes/
│   └── speaker.ts  → /speaker/enroll + /speaker/verify endpoints
└── index.ts        → Main Hono app, all route registration

Database Schema:

// scans table
{
  id: serial (PK),
  userId: text (NOT NULL),         // Clerk user ID
  audioUrl: text (NOT NULL),
  isDeepfake: boolean,
  confidenceScore: float8,
  fileHash: text,                  // SHA-256 for deduplication
  audioData: text,                 // Base64 encoded audio (for playback)
  analysisDetails: text,           // Human-readable XAI output
  createdAt: timestamp (default now),
  feedback: text
  // Indexes: userId, createdAt, fileHash
}

// speakers table
{
  id: uuid (PK, random),
  userId: text (NOT NULL),         // Clerk user ID (scoped isolation)
  name: text (NOT NULL),
  embedding: json (NOT NULL),      // 192-dim ECAPA-TDNN float array
  createdAt: timestamp (NOT NULL)
  // Index: userId
}

API Endpoints:

Method	Path	Auth	Description
`POST`	`/upload`	✅	Upload audio file for deepfake scan
`POST`	`/scan`	✅	Scan audio from URL
`GET`	`/scans`	✅	Get user's scan history
`GET`	`/audio/:id`	✅	Stream audio blob for playback
`POST`	`/scans/:id/feedback`	✅	Submit feedback on a scan
`POST`	`/speaker/enroll`	✅	Enroll speaker voice print
`POST`	`/speaker/verify`	✅	Verify speaker identity

`apps/engine` — AI Engine

apps/engine/
├── main.py           → FastAPI app, endpoint definitions, lifespan context
├── detect.py         → Deepfake detection pipeline (Wav2Vec2 + spectral)
├── detect_image.py   → Image deepfake detection
├── speaker.py        → ECAPA-TDNN embedding generation + HF patches
├── schemas.py        → Pydantic models (AudioUpload, ScanResult)
├── dummy_custom.py   → SpeechBrain HuggingFace 404 fallback patch
├── requirements.txt  → Pinned Python dependencies
└── Dockerfile        → Python 3.11-slim, non-root user (appuser)

Engine Endpoints:

Method	Path	Description
`GET`	`/`	Health check — `{"status": "AI Engine Running"}`
`POST`	`/scan`	Scan audio via URL (async download → analyze)
`POST`	`/scan-upload`	Scan uploaded audio file
`POST`	`/analyze`	Video/audio analysis with moviepy fallback
`POST`	`/embed`	Generate ECAPA-TDNN speaker embedding vector

Performance Notes:

Lazy Model Loading: Models load on first request (not at startup) to prevent OOM crashes on free-tier Render instances.
Thread Executor: CPU-bound inference runs in loop.run_in_executor() to keep FastAPI async event loop non-blocking.
Temp File Cleanup: All uploaded/extracted files are deleted in finally blocks — no disk leaks.

`packages/shared`

Shared Zod validation schemas and TypeScript types (ScanResultType, AudioUploadSchema, etc.) consumed by both apps/api and apps/web.

`.github/workflows/keep-alive.yml`

GitHub Actions cron job that pings both Render services every 14 minutes to prevent cold starts on the free tier.

schedule:
  - cron: "*/14 * * * *"

Pings:

API: https://satark-ai-f5t7.onrender.com/
Engine: https://satark-ai-es1v.onrender.com/

🚀 Getting Started

Prerequisites

Requirement	Version
Node.js	v18+
Python	3.11+
PostgreSQL	14+
Docker (optional)	Latest

Option A — Manual Setup (3 Terminals)

1. Clone the Repository

git clone https://github.com/theunstopabble/Satark-AI.git
cd Satark-AI

2. Install Node.js Dependencies

npm install   # installs all workspaces via Turborepo

3. Install Python Dependencies (AI Engine)

cd apps/engine
python -m venv venv
source venv/bin/activate       # Windows: venv\Scripts\activate
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

4. Configure Environment Variables

Create .env files in each app directory:

apps/web/.env

VITE_CLERK_PUBLISHABLE_KEY=pk_test_xxxx
VITE_API_URL=http://localhost:3000

apps/api/.env

DATABASE_URL=postgresql://user:password@localhost:5432/satark_db
CLERK_SECRET_KEY=sk_test_xxxx
CLERK_PUBLISHABLE_KEY=pk_test_xxxx
ALLOWED_ORIGINS=http://localhost:5173
ENGINE_URL=http://localhost:8000

apps/engine/.env

# No required vars — models download from HuggingFace on first run

5. Run Database Migrations

cd apps/api
npx drizzle-kit push

6. Start All Services

Terminal	Command
Terminal 1 — Frontend	`cd apps/web && npm run dev`
Terminal 2 — API Gateway	`cd apps/api && npm run dev`
Terminal 3 — AI Engine	`cd apps/engine && uvicorn main:app --reload --port 8000`

Or run everything at once from root:

npm run dev   # Turborepo orchestrates all three concurrently

Option B — Docker Compose

# Copy and fill in your .env values first
cp .env.example .env

docker-compose up --build

Services will start at:

Frontend: http://localhost:5173
API Gateway: http://localhost:3000
AI Engine: http://localhost:8000

Docker security hardening included:

Non-root user (appuser) in the engine container
no-new-privileges:true security option on all services
tmpfs mount for /tmp in API and web containers

☁️ Deployment

Service	Platform	URL
Frontend (`apps/web`)	Vercel	satark-deepfake.vercel.app
API Gateway (`apps/api`)	Render	`satark-ai-f5t7.onrender.com`
AI Engine (`apps/engine`)	Render	`satark-ai-es1v.onrender.com`
Image Proxy (Worker)	Cloudflare Workers	`satark-image-proxy.gautamkumar43421.workers.dev`
Database	Supabase / Neon / Railway	PostgreSQL (SSL enabled)

Vercel config (apps/web/vercel.json) — SPA routing rewrites all paths to index.html.

🔐 Security Architecture

Layer	Mechanism	Detail
Authentication	Clerk JWT	All protected routes verify token server-side
Authorization	Context-scoped userId	`userId` extracted from auth token — never trusted from request body
Speaker isolation	DB-level scoping	Verify queries filter by `userId` — no cross-user voice data access
Speaker threshold	Cosine similarity ≥ 0.75	Strict match threshold prevents false identity confirmations
File handling	UUID-prefixed temp files	Uploaded files stored with random UUID prefix, deleted post-processing
Container	Non-root user	Engine runs as `appuser` — no root privileges inside Docker
Connection pool	pg Pool	Max 20 connections, 5s timeout, graceful error recovery
Image proxy CORS	Origin whitelist	Worker rejects requests from unlisted origins — no wildcard `*`
Image size limit	Double-layer check	Enforced via `content-length` header + actual `byteLength` post-read (5MB cap)
NVIDIA key isolation	Cloudflare Secrets	`NVIDIA_API_KEY` never exposed to frontend — stored in Worker environment only

📦 Environment Variables Reference

Variable	App	Required	Description
`VITE_CLERK_PUBLISHABLE_KEY`	web	✅	Clerk frontend public key
`VITE_API_URL`	web	✅	Backend API base URL
`DATABASE_URL`	api	✅	PostgreSQL connection string
`CLERK_SECRET_KEY`	api	✅	Clerk backend secret key
`CLERK_PUBLISHABLE_KEY`	api	✅	Clerk public key (for validation)
`ALLOWED_ORIGINS`	api	✅	CORS allowed origins (comma-separated)
`ENGINE_URL`	api	✅	FastAPI engine base URL
`IMAGE_API_URL`	api	✅	Cloudflare Worker URL for image deepfake proxy
`NVIDIA_API_KEY`	cloudflare-worker	✅	NVIDIA NIM API key — set as Cloudflare Worker Secret

Note on NVIDIA_API_KEY: This is stored via wrangler secret put NVIDIA_API_KEY and is never in source code or .env files. It lives exclusively in Cloudflare's encrypted secret store.

🤝 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create a feature branch (git checkout -b feat/your-feature)
Commit your changes (git commit -m 'feat: add your feature')
Push to the branch (git push origin feat/your-feature)
Open a Pull Request

👨‍💻 Author

Gautam Kumar — Lead Developer

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

Built with ❤️ in India 🇮🇳
_{Satark-AI — Because the truth still matters.}

🌐 More Projects by Gautam Kumar

Project	Description	Link
Portfolio	Personal portfolio & developer profile	gautam-kr.vercel.app
InterviewMinds	Enterprise AI mock interview platform	interviewminds.vercel.app
SwadKart	Multi-vendor food delivery platform with AI chatbot	swadkart.vercel.app
TexFolio	AI-powered LaTeX resume builder with RBAC	texfolio.vercel.app

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
.github/workflows		.github/workflows
apps		apps
docs		docs
packages/shared		packages/shared
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.jshintrc		.jshintrc
.stylelintignore		.stylelintignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
turbo.json		turbo.json

Folders and files

Latest commit

History

Repository files navigation

Satark-AI: Defending Truth in the Age of Generative AI 🛡️

🌐 Live Demo

📸 Screenshots

🌟 Feature Overview

🕵️ Deepfake Audio Detection

🖼️ Deepfake Image Detection — Powered by NVIDIA NIM

🆔 Voice Biometrics — Speaker Identity

🎙️ Live Monitor

📊 Analytics Dashboard

🎮 Deepfake Game (Interactive)

💬 Feedback System

📱 PWA & Accessibility

🏗️ Architecture

Service Responsibilities

Request Flow

📚 Documentation

🧠 AI Models & Algorithms

Deepfake Detection Pipeline (detect.py)

Speaker Verification Pipeline (speaker.py)

Image Deepfake Pipeline (Cloudflare Worker)

🗂️ Codebase Deep Dive

apps/web — Frontend

apps/api — API Gateway

apps/engine — AI Engine

packages/shared

.github/workflows/keep-alive.yml

🚀 Getting Started

Prerequisites

Option A — Manual Setup (3 Terminals)

Option B — Docker Compose

☁️ Deployment

🔐 Security Architecture

📦 Environment Variables Reference

🤝 Contributing

👨‍💻 Author

📄 License

🌐 More Projects by Gautam Kumar

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Deepfake Detection Pipeline (`detect.py`)

Speaker Verification Pipeline (`speaker.py`)

`apps/web` — Frontend

`apps/api` — API Gateway

`apps/engine` — AI Engine

`packages/shared`

`.github/workflows/keep-alive.yml`

Packages