A modular, plugin-based Stable Diffusion inference framework designed for learning and real-world deployment. Built with aggressive layer offloading to run production workloads locally on a GTX 1650 (4 GB VRAM).
- Run Stable Diffusion on 4GB GPUs (GTX 1650) without OOM crashes
- Plugin-based pipeline system (no messy if/else routing blocks)
- Fully supports community LoRAs, ControlNets, and Upscalers
- Built headless: wrap it instantly in Web, Desktop, or Mobile UIs
Quick Start Β· How & Why it Works Β· Benchmarks Β· Managing Pipelines Β· Building UIs
Note
TL;DR: This is a minimalist, plugin-based Stable Diffusion 1.5 framework specifically optimized for 4GB GPUs (like the GTX 1650). It teaches you how to cleanly manage 10+ different AI art styles using a Registry Pattern, how to aggressively optimize VRAM using layer offloading and subprocess isolation, and how to attach web, desktop, or mobile UIs to a headless Python backend.
If you've ever wondered how to transition from messy Jupyter notebooks with hardcoded Stable Diffusion scripts into a production-grade architecture that can handle dozens of different art styles, models, and upscalers cleanlyβthis repository is your blueprint.
- Midjourney-Style Apps: Wrap the engine in FastAPI to power your own Discord bots or web dashboards.
- SaaS AI Tools: Use the headless engine as a scalable backend generation API.
- Low-Cost Local Hosting: Run full inference pipelines on cheap, low-end hardware without OOM crashing.
- Rapid Prototyping: Experiment with LoRAs, ControlNets, and Schedulers via the decoupled Registry Pattern.
It demonstrates:
- The Registry Pattern: How to manage multiple pipelines without massive
if/elseblocks. - VRAM Mastery: How to squeeze 2GB+ models, LoRAs, and ControlNets into a 4GB GPU without crashing.
- Engine Separation: How to decouple the "What to generate" from the "How to execute it", making it trivial to attach a Web, Desktop, or Mobile UI.
Most open-source Stable Diffusion scripts start simple but quickly evolve into fragile monolithic scripts packed with if/else statements for every new model, autoencoder, or upscaler.
This framework takes a different approach. It applies enterprise software engineering principles to AI inference, ensuring the codebase remains clean no matter how many pipelines, LoRAs, or SDKs you add.
graph LR
U[Next.js Client] -->|React Three Fiber| F[Framer Motion UI]
F -->|REST Async Polling| W[FastAPI Gateway]
W -->|Subprocess Dispatch| R((Registry))
subgraph Core Engine
R -->|Load Pipeline| C[PipelineConfigs]
C -->|Config Object| E[Diffusion Engine]
end
E -->|Lazy Load| SD[SD 1.5 Model]
E -.->|Dynamic Swap| CN[ControlNet]
SD -->|512x768| ES[Real-ESRGAN]
ES -->|2k/4k| O[Final Output Image]
style E fill:#f9f,stroke:#333,stroke-width:2px
style W fill:#bbf,stroke:#333
style U fill:#d4edda,stroke:#333
- Ultra-Premium UI (Apple-Tier Aesthetics): Included is a Next.js 14 (App Router) client featuring a dynamic WebGL GPU shader background (
@react-three/fiber) that smoothly interpolates between 5 custom MacOS themes (Sonoma, Monterey, Catalina, Big Sur, Sequoia). - Physics-Based UI Morphs: The frontend uses
framer-motionto execute layout animations and physics-based transitions when transitioning from "Awaiting Parameters" to "Generating Tensor Graph" without any jarring loading screens. - The Registry Pattern (Zero-Friction Scaling): Adding a new style to the engine requires zero modifications to the core inference logic. You simply drop a new file into
image_gen/pipeline/, decorate it with@register_pipeline, and the engine handles the rest. - FastAPI Subprocess Isolation: PyTorch's CUDA memory allocator is notorious for holding onto VRAM even after
torch.cuda.empty_cache()is called. This framework sets up a FastAPI microservice that spins up isolated Python subprocesses, guaranteeing 100% VRAM release back to the OS between generations. - Headless by Design: Because the system is strictly decoupled, the AI engine can be queried via the FastAPI server asynchronously, meaning the Node.js/Next.js thread is never blocked during inference.
The core innovation of this framework is the Pipeline Registry.
Instead of writing a massive main file that tries to load every model, this framework uses an event-driven decorator pattern.
Adding a new style pipeline takes one file and zero engine modifications.
- Create a file in
image_gen/pipeline/my_style.py - Use the
@register_pipelinedecorator:
from configs.paths import DIFFUSION_MODELS, IMAGE_GEN_OUTPUT_DIR
from image_gen.pipeline.pipeline_types import PipelineConfigs
from image_gen.pipeline.registry import register_pipeline
@register_pipeline(
name="my_custom_style",
keywords=["mystyle", "custom art"],
description="Minimal example pipeline"
)
def get_config(prompt: str, **kwargs) -> PipelineConfigs:
return PipelineConfigs(
base_model=DIFFUSION_MODELS["dreamshaper"],
output_dir=IMAGE_GEN_OUTPUT_DIR / "custom",
prompt=f"masterpiece, best quality, {prompt}",
neg_prompt="worst quality, blurry",
vae="realistic",
style_type="realistic",
scheduler_name="dpm++_2m_karras",
width=512, height=768, steps=25, cfg=7.0
)- Import it in
image_gen/pipeline/registry.py:
def discover_pipelines():
from . import my_styleThat's it. The CLI, API, and engine automatically know how to use it. The engine will lazily load the necessary components only when requested.
π See docs/CUSTOM_PIPELINES.md for advanced tutorials on adding LoRAs and ControlNets to your pipelines.
Because this framework strictly separates configuration (PipelineConfigs) from execution (DiffusionEngine), attaching a frontend UI is incredibly straightforward.
Whether you're building a web app, a desktop tool, or a mobile client, the backend interaction is always the same three steps:
Create a server.py using FastAPI:
from fastapi import FastAPI, BackgroundTasks
from image_gen.engine import DiffusionEngine
from image_gen.pipeline.registry import discover_pipelines, get_pipeline
app = FastAPI()
discover_pipelines() # Load registry on startup
@app.post("/generate")
async def generate_image(prompt: str, style: str):
# 1. Get the pipeline configuration
config_fn = get_pipeline(style)["get_config"]
config = config_fn(prompt=prompt)
# 2. Execute
engine = DiffusionEngine()
saved_path = engine.generate(config)
engine.unload() # Crucial for freeing VRAM for the next request
return {"url": f"/static/{saved_path.name}"}For a local desktop app, run the engine in a QThread to keep the UI responsive:
from PyQt6.QtCore import QThread, pyqtSignal
from image_gen.engine import DiffusionEngine
class GeneratorThread(QThread):
finished = pyqtSignal(str) # Emits the final image path
def __init__(self, config):
super().__init__()
self.config = config
def run(self):
engine = DiffusionEngine()
saved_path = engine.generate(self.config)
engine.unload()
self.finished.emit(str(saved_path))If you're building a Flutter app, host the FastAPI server above, then call it using Dart's http package:
import 'package:http/http.dart' as http;
import 'dart:convert';
Future<String> generateArt(String prompt, String style) async {
final response = await http.post(
Uri.parse('http://your-server.local:8000/generate'),
body: jsonEncode({'prompt': prompt, 'style': style}),
headers: {'Content-Type': 'application/json'},
);
if (response.statusCode == 200) {
return jsonDecode(response.body)['url'];
} else {
throw Exception('Failed to generate image');
}
}- Python 3.10+
- NVIDIA GPU with CUDA support (minimum: GTX 1650, 4 GB VRAM)
- PyTorch with CUDA (see pytorch.org)
git clone https://github.com/RajTewari01/image-gen.git
cd image-gen
# Install PyTorch with CUDA first (example for CUDA 12.1)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Install as an editable package with AI and Vision dependencies
pip install -e .[ai,vision]- Download models according to the Model Downloads Guide.
- Edit
configs/paths.pyto point to your.safetensorsand.pthfiles.
Because this framework is installed as a pure Python package, you can run inference globally using the built-in CLI:
# List all registered styles
image-gen --list
# Anime style (automatically injects enhancers and negative prompts)
image-gen "beautiful anime girl with sword" --style anime
# Automotive engineering with sub-type specific LoRA injection
image-gen "midnight blue RX7 on mountain road" --style car --type rx7Optimized explicitly for low-end hardware, specifically the ubiquitous Nvidia GTX 1650 (4GB VRAM) laptop GPU.
- GPU: Nvidia GTX 1650 Ti (4GB GDDR6)
- RAM: 16GB DDR4
- Backend: CUDA 11.8 / PyTorch 2.1 / Diffusers 0.25+
- Precision: Forced
float32(Required to prevent critical NaN/black images native to the 1650 architecture).
This framework demonstrates exactly how to survive on a 4 GB GPU through aggressive layer management:
| Operation | Standard Diffusers | Image Gen Framework | Optimization Used |
|---|---|---|---|
| Load Base Model (SD 1.5) | 3.8 GB | 1.9 GB | enable_sequential_cpu_offload() |
| VAE Decoding (512x768) | OOM Crash | 2.6 GB | enable_vae_slicing() & tiling() |
| Attention Layers | +1.2 GB | +0.4 GB | enable_attention_slicing("max") |
| ControlNet (OpenPose) | OOM Crash | 3.8 GB | Dynamic swap + Layer offload |
| Real-ESRGAN Upscale | +2.1 GB | +0.6 GB | Tile size 256 + Half-precision prep |
| Pipeline | Resolution | Steps | Time to Generation |
|---|---|---|---|
| Anime (Meinamix) | 512x768 | 20 (Euler A) | ~14 - 18 seconds |
| Realistic Portrait | 512x768 | 25 (DPM++ 2M Karras) | ~22 - 25 seconds |
| Watercolor Sketch | 768x512 | 26 (Euler A) | ~18 - 20 seconds |
| Upscale (Real-ESRGAN) | 2048x3072 | Post-process | ~4 - 6 seconds |
| Img2Img Hallucination | 768x1152 | 15 (DPM++ SDE) | ~28 - 32 seconds |
Every image below was generated locally on a single GTX 1650 (4 GB VRAM). No cloud APIs. No rented compute. Pure local inference.
πΈ 34 Generations Across 13 Pipelines
Pipelines: anime Β· closeup_anime Β· ghost
Pipelines: hyperrealistic Β· difconsistency Β· ethnicity
Pipeline: cars β LoRA variants: RX7, F1, Sedan, Speedtail
Pipelines: space Β· drawing Β· diffusionbrush
Pipelines: horror Β· zombie Β· papercut
Mixed pipelines β demonstrating engine versatility across art directions
[!NOTE] All renders used sequential CPU offloading, VAE slicing, and subprocess isolation to guarantee zero OOM failures across a continuous 34-image batch on 4 GB VRAM.
The framework demonstrates how to string together multiple AI tools into a single pipeline.
- Diffusion Upscale (optional): Re-runs SD in img2img to hallucinate details.
- Real-ESRGAN (automatic): Selects between the standard 23-block model (
style_type="realistic") or the lightweight 6-block model (style_type="anime"). - Lanczos (always): CPU-based sharpening and unsharp masking.
π See docs/UPSCALERS.md for the architecture diagram and download links.
This framework provides an end-to-end workflow for discovering a new model on CivitAI, learning how the community uses it, and deploying it as a permanent style pipeline.
- Download any Stable Diffusion v1.5 model
.safetensorsfile from CivitAI. - Place it in your model directory (e.g.,
models/stable-diffusion/my_model.safetensors). - Add its path to your configuration file in
configs/paths.py:
DIFFUSION_MODELS = {
"my_model": STABLE_DIFFUSION_DIR / "my_model.safetensors"
}Instead of guessing which trigger words or "quality boosters" work best for your new model, scrape the top community generations using the API Scraper.
cp .env.example .env # Add your CIVITAI_API_KEY
python scripts/api_scraper.py https://civitai.com/models/46294This script downloads metadata from the highest-rated images for that model and saves a structured dataset to assets/prompts/model_46294_prompts.json. This JSON contains raw prompts, negative prompts, steps, CFG scales, and samplers.
Now, create your pipeline file. Under the hood, the Smart Prompt Enhancer reads the JSON dataset, counts the most frequent trigger keywords, LoRAs, and optimal settings (steps/CFG), and dynamically injects them into the user's prompt.
Create image_gen/pipeline/my_pipeline.py:
from configs.paths import DIFFUSION_MODELS, IMAGE_GEN_OUTPUT_DIR
from image_gen.pipeline.pipeline_types import PipelineConfigs
from image_gen.pipeline.registry import register_pipeline
# Import the enhancer we generated data for
from scripts.prompt_enhancer import ModelPromptEnhancer
@register_pipeline(
name="my_new_style",
keywords=["new style", "custom art"],
description="Pipeline powered by scraped community data."
)
def get_config(prompt: str, **kwargs) -> PipelineConfigs:
# 1. Let the enhancer read the JSON and determine the optimal settings
enhancer = ModelPromptEnhancer(model_id=46294)
learned_data = enhancer.enhance(prompt)
# 2. Return the config object injected with community-learned parameters
return PipelineConfigs(
base_model=DIFFUSION_MODELS["my_model"],
output_dir=IMAGE_GEN_OUTPUT_DIR / "custom_style",
# Injected from the JSON analysis:
prompt=learned_data.prompt,
neg_prompt=learned_data.negative_prompt,
steps=learned_data.steps,
cfg=learned_data.cfg_scale,
scheduler_name=learned_data.sampler,
width=512, height=768,
style_type="realistic"
)Finally, simply import this file in image_gen/pipeline/registry.py. You have just built a completely automated, data-driven AI inference pipeline!
Want to understand how it all works under the hood?
| Document | Description |
|---|---|
| Architecture Guide | How the engine lifecycle and VRAM management works |
| Custom Pipelines | Full tutorial for creating your own pipelines |
| Upscalers Guide | Upscaler mechanics, downloads, and troubleshooting |
| Model Downloads | Where to download all the weights used in this project |
This project is licensed under the MIT License.
