End-to-end system for generating compliant ID photos from user uploads, featuring a production-style workflow from raw images → segmentation/matting → face-aligned cropping → background synthesis → standardized outputs via API + Web UI.
This project replicates the architecture used in real-world ID-photo services and demonstrates how to wrap open-source models such as HivisionIDPhotos / MODNet / RMBG / BiRefNet / RetinaFace into a modern frontend + backend + model-processing pipeline.
| Generator Options | Processing | Before vs After |
|---|---|---|
![]() |
![]() |
![]() |
| Two-Factor Authentication | User Profile | User History photos |
![]() |
![]() |
![]() |
- Designed a full-stack ID-photo generation service.
- Integrated open-source segmentation & face-detection models into a unified pipeline.
- Built Next.js UI + Node.js API with async job flow.
- Added CPU stub mode so anyone can clone & run locally.
- Deployed original version on industrial GPU cluster with remote inference.
- Structured the project like a real production system, not a classroom app.
Given a user portrait photo, the system must:
- Remove background accurately.
- Detect, crop, and align faces.
- Generate ID-compliant output sizes, including:
- 2×2 inch
- 35×45 mm
- US Visa / Passport / Driving License, etc.
- Allow background color options (white / blue / red / custom).
- Provide both a REST API and a Next.js Web UI.
- ✔ Full Production-Style Pipeline
- Upload ingestion & file validation
- Segmentation / matting
- Face detection & alignment
- Cropping & resizing
- Background synthesis
- Export to PNG/JPG/DPI presets
- ✔ Modular Model Integration
- Supports plugging in various models:
MODNet,RMBG,BiRefNet,HivisionIDPhotospipeline - Face Detection support:
MTCNN,RetinaFace,Face++
- Supports plugging in various models:
- ✔ Full Stack Implementation
- Next.js (TS) → Upload UI / templates / preview
- Node.js API → Validation / processing orchestration
- Optional Python bridge for GPU-accelerated models (
onnxruntime-gpu)
- ✔ Developer-Friendly
- Clear API contract
- Easy
.envsetup - Optional Docker / Compose
- Ability to run stub mode for demos without GPU
- Upload interface
- Background/color/template selection
- Output preview
- Downloads
- Handles only UI logic → no heavy processing
- Receives uploads
- Validates inputs (size, ratio, EXIF)
- Calls processing pipeline
- Returns job status + URLs
- Logs + error handling
- Optional Python service if GPU models are used
- Unified interface:
processImage(inputPath, { bgColor, size, dpi })
- Originals →
uploads/ - Final images →
outputs/ - Can be swapped for S3/GCS easily
ai-photo-generator/
├─ ai_id_photo_backend_api/ # API service
├─ ai_id_photo_web_app/ # Next.js frontend
├─ docs/ # Architecture, API docs, diagrams
├─ demos/ # Sample inputs/outputs, demo video
├─ docker-compose.yml # (Optional) multi-service dev setup
└─ scripts/ # Helpers & tools
| Component | Technology |
|---|---|
| Frontend | Next.js (TypeScript), React, Tailwind / custom design |
| Backend | Node.js (Express), Multer (file upload), Sharp (image transform), Optional Python (GPU models) |
| Infra | Docker / Compose, .env config, Cross-platform (macOS/Linux/Windows) |
cd ai_id_photo_backend_api
npm install
nodemon server.js
# Default: http://localhost:4000cd ai_id_photo_web_app
npm install
npm run dev
# Open: http://localhost:3000Setup .env.development:
NEXT_PUBLIC_REACT_APP_BASE_API_URL=http://localhost:4000
NEXT_PUBLIC_GOOGLE_CLIENT_ID=your-google-client-id
docker compose up --build| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/generate |
Generate ID photo from upload. |
GET |
/api/v1/jobs/{job_id} |
Async job tracking. |
GET |
/api/v1/templates |
Available output sizes. |
GET |
/health |
Health probe. |
Store your screenshots & video here:
demos/
├─ sample_input.jpg
├─ sample_output.png
└─ demo_video.mp4
nodemonfor backend hot reloadnpm run devfor frontend.env,.env.development,.env.productionsupported- Use stub mode for CPU-only demo
- GPU integration documented in
docs/deployment_gpu.md
- Webhooks for async callbacks
- Batch generation / zip export
- Pose correction
- Country-specific templates (JP/KR/EU)
- Full GPU inference pipeline with BiRefNet + RetinaFace
- Access control + signed URLs
- Preprocessing API (EXIF fix, color balance)
- API Reference
View all REST API endpoints, request parameters, and response examples.
- Docker GPU Deployment
Guide on how to build and run the Docker container with NVIDIA GPU support.
- GPU Acceleration Guide
Detailed steps for configuring your local environment with CUDA/cuDNN and
onnxruntime-gpu. - GPU Performance Benchmark
Performance comparison data and charts for models running on CPU versus various GPUs.
This guide will help you enable NVIDIA GPU acceleration for inference on your local machine or server, significantly boosting performance when using high-performance matting models like BiRefNet.
Currently, the only model officially supporting GPU acceleration is: birefnet-v1-lite.
If you wish to use the GPU, please ensure your local environment meets the following prerequisites:
- NVIDIA GPU (VRAM of ≥16GB is recommended for BiRefNet).
- The corresponding versions of the following installed:
- CUDA Toolkit
- cuDNN
- Installation of the matching version of
onnxruntime-gpu. - (Optional) Installation of the matching PyTorch CUDA version.
GPU acceleration primarily benefits the following tasks:
- ID photo enhancement (matting + rotation + cropping flow).
- Generating high-definition matting results.
- Generating high-resolution six-inch layout photos (acceleration is most noticeable when using BiRefNet).
Please select the official installer packages based on the major version of CUDA you are using:
Note: CUDA supports a degree of backward compatibility. For example, a system with CUDA 12.6 installed can still typically use PyTorch's
cu121wheel.
HivisionIDPhotos internally uses onnxruntime for ONNX model inference. To enable the GPU, simply install the onnxruntime-gpu package corresponding to your CUDA version:
pip install onnxruntime-gpu==1.18.0License: MIT License
Thanks to:
- MODNet
- RMBG
- BiRefNet
- MTCNN / RetinaFace / Face++
- HivisionIDPhotos






