A high-performance web API for object detection, video analytics, live streaming, and gallery management using trained Faster R-CNN (Stand-alone model & w/ Kalman Filter) checkpoints.
Includes endpoints for single-image prediction, video analysis (sync & async), live MJPEG stream, model comparison, and automatic COCO export β optimized for CPU, CUDA, and Apple Silicon (MPS).
- β‘ FastAPI REST API with threaded live inference (MJPEG stream)
- π§° Two models loaded from
.pth: Stand-Alone and Hybrid (choose per endpoint) - π― Single-image JSON prediction + on-the-fly visualization
- π¬ Video analysis:
- Async pipeline:
/start-analyzeβ/progress/{job_id}β/result/{job_id}(+ saved to gallery) - Sync pipeline:
/analyze-video(returns processed MP4 immediately)
- Async pipeline:
- ποΈ Auto-saving of originals, detections, and COCO annotations
- πΌοΈ Image & video galleries (with thumbnails, metadata, and deletion)
- π Model comparison (side-by-side image visualization + JSON diffs)
- π Built-in web UI pages (
/,/live,/comparison) and OpenAPI docs (/docs) - π³ Docker-ready (mount
models/andresults/for persistence) - π CORS pre-configured for:
https://gd-live.com,http://localhost:8080,http://localhost:3000
Storage layout
/results/imagesβ original uploaded images/results/jsonβ raw detections per image/results/cocoβ COCO-format annotations per image/results/videosβ processed MP4 clips/results/videos_jsonβ per-video metadata/results/comparisonsβ comparison image + JSON
This API expects two checkpoints:
model_1_pathβ e.g.,models/sample_model_1.pthmodel_2_pathβ e.g.,models/sample_model_2.pth
(You can use your own weights trained on your dataset.)
Hosted via LocalTunnel (example links):
-
Image / Video Analysis:
https://gd-live.loca.lt/ -
Live Detection:
https://gd-live.loca.lt/live -
API Docs:
https://gd-live.loca.lt/docs
Python 3.10+
fastapi==0.115.0
uvicorn[standard]==0.30.6
torch==2.4.1
torchvision==0.19.1
pillow==10.4.0
numpy==1.26.4
python-multipart==0.0.9
opencv-python==4.10.0.84
jinja2==3.1.4
pydantic==2.8.2
pydantic-settings==2.5.2If your
app/config.pyuses Pydantic Settings v1, pinfastapi<0.110andpydantic<2. Otherwise use the versions above.
# 1οΈβ£ Clone
git clone https://github.com/fglend/kalman-fastercnn.git
cd kalman-fastercnn
# 2οΈβ£ Create & activate venv
python -m venv venv
# Windows (PowerShell)
venv\Scripts\activate
# macOS / Linux
# source venv/bin/activate
# 3οΈβ£ Install deps
pip install -r requirements.txt
# 4οΈβ£ (Optional) Set environment variables
# PowerShell
$env:MODEL_1_PATH="models/sample_1.pth"
$env:MODEL_2_PATH="models/sample_2.pth"
$env:RESULTS_DIR="results"
$env:NUM_CLASSES="7"
# bash/zsh
export MODEL_1_PATH=models/sample_1.pth
export MODEL_2_PATH=models/sample_2.pth
export RESULTS_DIR=results
export NUM_CLASSES=7
# 5οΈβ£ Run the API
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reloadOpen:
- Docs β
http://localhost:8080/docs - Live β
http://localhost:8080/live
# Build
docker build -t fasterrcnn-kalman-api .
# Run (basic)
docker run -p 8080:8080 fasterrcnn-kalman-api
# Run with mounted volumes (recommended)
# macOS / Linux
docker run -p 8080:8080 -v $(pwd)/models:/models -v $(pwd)/results:/results -e MODEL_1_PATH=/models/sample_1.pth -e MODEL_2_PATH=/models/sample_2.pth -e RESULTS_DIR=/results -e NUM_CLASSES=7 fasterrcnn-kalman-api
# Windows PowerShell
docker run -p 8080:8080 `
-v ${PWD}/models:/models `
-v ${PWD}/results:/results `
-e MODEL_1_PATH=/models/sample_1.pth `
-e MODEL_2_PATH=/models/sample_2.pth `
-e RESULTS_DIR=/results `
-e NUM_CLASSES=7 `
fasterrcnn-kalman-apiapp/config.py (example)
DEVICE = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
NUM_THREADS = 4
SCORE_THRESH = 0.5Environment variables used in app.main:
MODEL_1_PATH(default:models/sample_1.pth)MODEL_2_PATH(default:models/sample_2.pth)RESULTS_DIR(default:/results)NUM_CLASSES(default:7)
CORS is enabled for: https://gd-live.com, http://localhost:8080, http://localhost:3000.
GET /health β returns runtime info:
{ "status": "ok", "device": "cuda|mps|cpu", "models": ["standalone","hybrid"] }POST /predict-image β JSON detections (uses ResNet-101)
curl -X POST "http://localhost:8080/predict-image" -F "file=@sample.jpg"Response (example)
{
"detections": [
{"x_min": 12.3, "y_min": 45.6, "x_max": 123.4, "y_max": 234.5, "score": 0.93, "label_id": 3}
],
"num_detections": 1
}Saves image, JSON, and COCO in /results/... (done asynchronously).
POST /visualize-image β JPEG with grayscale + darkened background and red boxes
curl -X POST "http://localhost:8080/visualize-image" -F "file=@sample.jpg" --output output.jpg- POST
/start-analyzeβ{ "job_id": "<uuid>" } - GET
/progress/{job_id}β{ "status": "...", "progress": 42.0 } - GET
/result/{job_id}β MP4 stream (addsX-Video-Timestampheader)
- Frames are processed with grayscale + darken, resize to 384Γ384 internally for speed, and FP16 on CUDA.
- Results saved to
/results/videos+ metadata in/results/videos_json.
POST /analyze-video β returns processed MP4 immediately (no job tracking)
curl -X POST "http://localhost:8080/analyze-video" -F "file=@input.mp4" --output out.mp4- GET
/liveβ HTML page - GET
/video-feedβ MJPEG stream (for embedding)
| Type | Endpoint | Description |
|---|---|---|
| π List | GET /gallery/list |
List saved images with counts |
| πΌοΈ Original | GET /gallery/image/{timestamp} |
Stream original uploaded image |
| π₯ Visualized | GET /gallery/visualize/{timestamp} |
On-the-fly boxes over saved image |
| π JSON | GET /gallery/json/{timestamp} |
Raw detections |
| ποΈ Delete | DELETE /gallery/delete/{timestamp} |
Remove image/JSON/COCO |
| Type | Endpoint | Description |
|---|---|---|
| ποΈ List | GET /gallery/videos/list |
List analyzed videos |
GET /gallery/videos/stream/{timestamp} |
Stream processed video | |
| π§Ύ Metadata | GET /gallery/videos/json/{timestamp} |
Video metadata JSON |
| πΌοΈ Thumb | GET /gallery/videos/thumbnail/{timestamp} |
JPEG thumbnail (first frame) |
| ποΈ Delete | DELETE /gallery/videos/delete/{timestamp} |
Remove video + metadata |
| Type | Endpoint | Description |
|---|---|---|
| π§ͺ JSON compare | POST /compare-models |
Runs Standalone Model & Hybrid Model, returns both |
| πΌοΈ Visual compare | POST /visualize-comparison |
Side-by-side annotated JPEG |
| π₯οΈ UI page | GET /comparison |
Comparison web interface |
Comparison artifacts saved under /results/comparisons.
.
βββ app/
β βββ main.py # FastAPI + endpoints
β βββ model.py # Checkpoint loader
β βββ predict_utils.py # Preprocessing & filtering
β βββ config.py # Settings (DEVICE, NUM_THREADS, SCORE_THRESH)
β βββ templates/ # index.html, live.html, comparison.html
βββ models/
β βββ sample_1.pth
β βββ sample_2.pth
βββ results/
β βββ images/
β βββ json/
β βββ coco/
β βββ videos/
β βββ videos_json/
β βββ comparisons/
βββ requirements.txt
βββ Dockerfile
βββ README.md
- GPU: CUDA (with FP16) or Apple MPS recommended
- Image resize: 384Γ384 (video path) or 512Γ512 (image path); trade speed vs. accuracy
- Video:
skip_frames=5by default; lower for higher accuracy, higher for speed - Model heads tuned at runtime:
roi_heads.detections_per_img = 350rpn.pre_nms_top_n_test = 2000,rpn.post_nms_top_n_test = 2000
- Store
/resultson SSD or mounted Docker volume
Glend Dale Ferrer
π§ mgdferrer@tip.edu.ph
MIT License Β© 2025 Glend Dale Ferrer


