🎨 Image Gen Lite Framework

An Educational & Production-Inspired Stable Diffusion Engine

Made by Biswadeep Tewari

A modular, plugin-based Stable Diffusion inference framework designed for learning and real-world deployment. Built with aggressive layer offloading to run production workloads locally on a GTX 1650 (4 GB VRAM).

⚡ What is this?

Run Stable Diffusion on 4GB GPUs (GTX 1650) without OOM crashes
Plugin-based pipeline system (no messy if/else routing blocks)
Fully supports community LoRAs, ControlNets, and Upscalers
Built headless: wrap it instantly in Web, Desktop, or Mobile UIs

Quick Start · How & Why it Works · Benchmarks · Managing Pipelines · Building UIs

🎯 Project Goal

Note

TL;DR: This is a minimalist, plugin-based Stable Diffusion 1.5 framework specifically optimized for 4GB GPUs (like the GTX 1650). It teaches you how to cleanly manage 10+ different AI art styles using a Registry Pattern, how to aggressively optimize VRAM using layer offloading and subprocess isolation, and how to attach web, desktop, or mobile UIs to a headless Python backend.

If you've ever wondered how to transition from messy Jupyter notebooks with hardcoded Stable Diffusion scripts into a production-grade architecture that can handle dozens of different art styles, models, and upscalers cleanly—this repository is your blueprint.

💡 Real-World Use Cases

Midjourney-Style Apps: Wrap the engine in FastAPI to power your own Discord bots or web dashboards.
SaaS AI Tools: Use the headless engine as a scalable backend generation API.
Low-Cost Local Hosting: Run full inference pipelines on cheap, low-end hardware without OOM crashing.
Rapid Prototyping: Experiment with LoRAs, ControlNets, and Schedulers via the decoupled Registry Pattern.

It demonstrates:

The Registry Pattern: How to manage multiple pipelines without massive if/else blocks.
VRAM Mastery: How to squeeze 2GB+ models, LoRAs, and ControlNets into a 4GB GPU without crashing.
Engine Separation: How to decouple the "What to generate" from the "How to execute it", making it trivial to attach a Web, Desktop, or Mobile UI.

🏗️ How & Why This Architecture is Built for the Long Run

Most open-source Stable Diffusion scripts start simple but quickly evolve into fragile monolithic scripts packed with if/else statements for every new model, autoencoder, or upscaler.

This framework takes a different approach. It applies enterprise software engineering principles to AI inference, ensuring the codebase remains clean no matter how many pipelines, LoRAs, or SDKs you add.

📊 System Architecture

graph LR
    U[Next.js Client] -->|React Three Fiber| F[Framer Motion UI]
    F -->|REST Async Polling| W[FastAPI Gateway]
    W -->|Subprocess Dispatch| R((Registry))
    
    subgraph Core Engine
        R -->|Load Pipeline| C[PipelineConfigs]
        C -->|Config Object| E[Diffusion Engine]
    end
    
    E -->|Lazy Load| SD[SD 1.5 Model]
    E -.->|Dynamic Swap| CN[ControlNet]
    
    SD -->|512x768| ES[Real-ESRGAN]
    ES -->|2k/4k| O[Final Output Image]
    
    style E fill:#f9f,stroke:#333,stroke-width:2px
    style W fill:#bbf,stroke:#333
    style U fill:#d4edda,stroke:#333

Strategic Advantages

Ultra-Premium UI (Apple-Tier Aesthetics): Included is a Next.js 14 (App Router) client featuring a dynamic WebGL GPU shader background (@react-three/fiber) that smoothly interpolates between 5 custom MacOS themes (Sonoma, Monterey, Catalina, Big Sur, Sequoia).
Physics-Based UI Morphs: The frontend uses framer-motion to execute layout animations and physics-based transitions when transitioning from "Awaiting Parameters" to "Generating Tensor Graph" without any jarring loading screens.
The Registry Pattern (Zero-Friction Scaling): Adding a new style to the engine requires zero modifications to the core inference logic. You simply drop a new file into image_gen/pipeline/, decorate it with @register_pipeline, and the engine handles the rest.
FastAPI Subprocess Isolation: PyTorch's CUDA memory allocator is notorious for holding onto VRAM even after torch.cuda.empty_cache() is called. This framework sets up a FastAPI microservice that spins up isolated Python subprocesses, guaranteeing 100% VRAM release back to the OS between generations.
Headless by Design: Because the system is strictly decoupled, the AI engine can be queried via the FastAPI server asynchronously, meaning the Node.js/Next.js thread is never blocked during inference.

🛠️ Managing Multiple Pipelines

The core innovation of this framework is the Pipeline Registry.

Instead of writing a massive main file that tries to load every model, this framework uses an event-driven decorator pattern.

How to Add Your Own Pipeline

Adding a new style pipeline takes one file and zero engine modifications.

Create a file in image_gen/pipeline/my_style.py
Use the @register_pipeline decorator:

from configs.paths import DIFFUSION_MODELS, IMAGE_GEN_OUTPUT_DIR
from image_gen.pipeline.pipeline_types import PipelineConfigs
from image_gen.pipeline.registry import register_pipeline

@register_pipeline(
    name="my_custom_style",
    keywords=["mystyle", "custom art"],
    description="Minimal example pipeline"
)
def get_config(prompt: str, **kwargs) -> PipelineConfigs:
    return PipelineConfigs(
        base_model=DIFFUSION_MODELS["dreamshaper"],
        output_dir=IMAGE_GEN_OUTPUT_DIR / "custom",
        prompt=f"masterpiece, best quality, {prompt}",
        neg_prompt="worst quality, blurry",
        vae="realistic",
        style_type="realistic",
        scheduler_name="dpm++_2m_karras",
        width=512, height=768, steps=25, cfg=7.0
    )

Import it in image_gen/pipeline/registry.py:

def discover_pipelines():
    from . import my_style

That's it. The CLI, API, and engine automatically know how to use it. The engine will lazily load the necessary components only when requested.

📖 See docs/CUSTOM_PIPELINES.md for advanced tutorials on adding LoRAs and ControlNets to your pipelines.

🖥️ Connecting to User Interfaces

Because this framework strictly separates configuration (PipelineConfigs) from execution (DiffusionEngine), attaching a frontend UI is incredibly straightforward.

Whether you're building a web app, a desktop tool, or a mobile client, the backend interaction is always the same three steps:

1. Web UI (FastAPI / React / Vue)

Create a server.py using FastAPI:

from fastapi import FastAPI, BackgroundTasks
from image_gen.engine import DiffusionEngine
from image_gen.pipeline.registry import discover_pipelines, get_pipeline

app = FastAPI()
discover_pipelines() # Load registry on startup

@app.post("/generate")
async def generate_image(prompt: str, style: str):
    # 1. Get the pipeline configuration
    config_fn = get_pipeline(style)["get_config"]
    config = config_fn(prompt=prompt)
    
    # 2. Execute
    engine = DiffusionEngine()
    saved_path = engine.generate(config)
    engine.unload() # Crucial for freeing VRAM for the next request
    
    return {"url": f"/static/{saved_path.name}"}

2. Desktop UI (PyQt6 / PySide6)

For a local desktop app, run the engine in a QThread to keep the UI responsive:

from PyQt6.QtCore import QThread, pyqtSignal
from image_gen.engine import DiffusionEngine

class GeneratorThread(QThread):
    finished = pyqtSignal(str) # Emits the final image path

    def __init__(self, config):
        super().__init__()
        self.config = config

    def run(self):
        engine = DiffusionEngine()
        saved_path = engine.generate(self.config)
        engine.unload()
        self.finished.emit(str(saved_path))

3. Mobile UI (Flutter via REST API)

If you're building a Flutter app, host the FastAPI server above, then call it using Dart's http package:

import 'package:http/http.dart' as http;
import 'dart:convert';

Future<String> generateArt(String prompt, String style) async {
  final response = await http.post(
    Uri.parse('http://your-server.local:8000/generate'),
    body: jsonEncode({'prompt': prompt, 'style': style}),
    headers: {'Content-Type': 'application/json'},
  );
  
  if (response.statusCode == 200) {
    return jsonDecode(response.body)['url'];
  } else {
    throw Exception('Failed to generate image');
  }
}

🚀 Quick Start

Prerequisites

Python 3.10+
NVIDIA GPU with CUDA support (minimum: GTX 1650, 4 GB VRAM)
PyTorch with CUDA (see pytorch.org)

1. Clone and Install

git clone https://github.com/RajTewari01/image-gen.git
cd image-gen

# Install PyTorch with CUDA first (example for CUDA 12.1)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install as an editable package with AI and Vision dependencies
pip install -e .[ai,vision]

2. Setup Models and Paths

Download models according to the Model Downloads Guide.
Edit configs/paths.py to point to your .safetensors and .pth files.

3. Generate

Because this framework is installed as a pure Python package, you can run inference globally using the built-in CLI:

# List all registered styles
image-gen --list

# Anime style (automatically injects enhancers and negative prompts)
image-gen "beautiful anime girl with sword" --style anime

# Automotive engineering with sub-type specific LoRA injection
image-gen "midnight blue RX7 on mountain road" --style car --type rx7

⚡ Performance Benchmarks

Optimized explicitly for low-end hardware, specifically the ubiquitous Nvidia GTX 1650 (4GB VRAM) laptop GPU.

Hardware Testbed

GPU: Nvidia GTX 1650 Ti (4GB GDDR6)
RAM: 16GB DDR4
Backend: CUDA 11.8 / PyTorch 2.1 / Diffusers 0.25+
Precision: Forced float32 (Required to prevent critical NaN/black images native to the 1650 architecture).

Inference VRAM Footprints (Peak Usage)

This framework demonstrates exactly how to survive on a 4 GB GPU through aggressive layer management:

Operation	Standard Diffusers	Image Gen Framework	Optimization Used
Load Base Model (SD 1.5)	3.8 GB	1.9 GB	`enable_sequential_cpu_offload()`
VAE Decoding (512x768)	OOM Crash	2.6 GB	`enable_vae_slicing()` & `tiling()`
Attention Layers	+1.2 GB	+0.4 GB	`enable_attention_slicing("max")`
ControlNet (OpenPose)	OOM Crash	3.8 GB	Dynamic swap + Layer offload
Real-ESRGAN Upscale	+2.1 GB	+0.6 GB	Tile size 256 + Half-precision prep

Latency Profiles (GTX 1650)

Pipeline	Resolution	Steps	Time to Generation
Anime (Meinamix)	512x768	20 (Euler A)	~14 - 18 seconds
Realistic Portrait	512x768	25 (DPM++ 2M Karras)	~22 - 25 seconds
Watercolor Sketch	768x512	26 (Euler A)	~18 - 20 seconds
Upscale (Real-ESRGAN)	2048x3072	Post-process	~4 - 6 seconds
Img2Img Hallucination	768x1152	15 (DPM++ SDE)	~28 - 32 seconds

🖼️ Render Gallery — Local 4 GB VRAM

Every image below was generated locally on a single GTX 1650 (4 GB VRAM). No cloud APIs. No rented compute. Pure local inference.

📸 34 Generations Across 13 Pipelines

🎌 Anime & Character Art

Pipelines: anime · closeup_anime · ghost

📷 Photorealism & Portraits

Pipelines: hyperrealistic · difconsistency · ethnicity

🚗 Automotive & Industrial

Pipeline: cars — LoRA variants: RX7, F1, Sedan, Speedtail

🌌 Landscapes & Environments

Pipelines: space · drawing · diffusionbrush

🧟 Horror & Dark Fantasy

Pipelines: horror · zombie · papercut

🎨 Stylized & Experimental

Mixed pipelines — demonstrating engine versatility across art directions

[!NOTE] All renders used sequential CPU offloading, VAE slicing, and subprocess isolation to guarantee zero OOM failures across a continuous 34-image batch on 4 GB VRAM.

🔬 Multi-Stage Upscaling Pipeline

The framework demonstrates how to string together multiple AI tools into a single pipeline.

Diffusion Upscale (optional): Re-runs SD in img2img to hallucinate details.
Real-ESRGAN (automatic): Selects between the standard 23-block model (style_type="realistic") or the lightweight 6-block model (style_type="anime").
Lanczos (always): CPU-based sharpening and unsharp masking.

📖 See docs/UPSCALERS.md for the architecture diagram and download links.

🧩 Workflow: From CivitAI to Custom Pipeline

This framework provides an end-to-end workflow for discovering a new model on CivitAI, learning how the community uses it, and deploying it as a permanent style pipeline.

Step 1: Add Your Model

Download any Stable Diffusion v1.5 model .safetensors file from CivitAI.
Place it in your model directory (e.g., models/stable-diffusion/my_model.safetensors).
Add its path to your configuration file in configs/paths.py:

DIFFUSION_MODELS = {
    "my_model": STABLE_DIFFUSION_DIR / "my_model.safetensors"
}

Step 2: Scrape Community Prompts

Instead of guessing which trigger words or "quality boosters" work best for your new model, scrape the top community generations using the API Scraper.

cp .env.example .env  # Add your CIVITAI_API_KEY
python scripts/api_scraper.py https://civitai.com/models/46294

This script downloads metadata from the highest-rated images for that model and saves a structured dataset to assets/prompts/model_46294_prompts.json. This JSON contains raw prompts, negative prompts, steps, CFG scales, and samplers.

Step 3: Enhance and Register

Now, create your pipeline file. Under the hood, the Smart Prompt Enhancer reads the JSON dataset, counts the most frequent trigger keywords, LoRAs, and optimal settings (steps/CFG), and dynamically injects them into the user's prompt.

Create image_gen/pipeline/my_pipeline.py:

from configs.paths import DIFFUSION_MODELS, IMAGE_GEN_OUTPUT_DIR
from image_gen.pipeline.pipeline_types import PipelineConfigs
from image_gen.pipeline.registry import register_pipeline

# Import the enhancer we generated data for
from scripts.prompt_enhancer import ModelPromptEnhancer

@register_pipeline(
    name="my_new_style",
    keywords=["new style", "custom art"],
    description="Pipeline powered by scraped community data."
)
def get_config(prompt: str, **kwargs) -> PipelineConfigs:
    
    # 1. Let the enhancer read the JSON and determine the optimal settings
    enhancer = ModelPromptEnhancer(model_id=46294)
    learned_data = enhancer.enhance(prompt)
    
    # 2. Return the config object injected with community-learned parameters
    return PipelineConfigs(
        base_model=DIFFUSION_MODELS["my_model"],
        output_dir=IMAGE_GEN_OUTPUT_DIR / "custom_style",
        
        # Injected from the JSON analysis:
        prompt=learned_data.prompt,
        neg_prompt=learned_data.negative_prompt,
        steps=learned_data.steps,
        cfg=learned_data.cfg_scale,
        scheduler_name=learned_data.sampler,
        
        width=512, height=768,
        style_type="realistic"
    )

Finally, simply import this file in image_gen/pipeline/registry.py. You have just built a completely automated, data-driven AI inference pipeline!

📖 Deep Dives

Want to understand how it all works under the hood?

Document	Description
Architecture Guide	How the engine lifecycle and VRAM management works
Custom Pipelines	Full tutorial for creating your own pipelines
Upscalers Guide	Upscaler mechanics, downloads, and troubleshooting
Model Downloads	Where to download all the weights used in this project

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
api		api
configs		configs
docs		docs
image_gen		image_gen
proofs		proofs
scripts		scripts
web		web
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎨 Image Gen Lite Framework

An Educational & Production-Inspired Stable Diffusion Engine

Made by Biswadeep Tewari

⚡ What is this?

🎯 Project Goal

💡 Real-World Use Cases

🏗️ How & Why This Architecture is Built for the Long Run

📊 System Architecture

Strategic Advantages

🛠️ Managing Multiple Pipelines

How to Add Your Own Pipeline

🖥️ Connecting to User Interfaces

1. Web UI (FastAPI / React / Vue)

2. Desktop UI (PyQt6 / PySide6)

3. Mobile UI (Flutter via REST API)

🚀 Quick Start

Prerequisites

1. Clone and Install

2. Setup Models and Paths

3. Generate

⚡ Performance Benchmarks

Hardware Testbed

Inference VRAM Footprints (Peak Usage)

Latency Profiles (GTX 1650)

🖼️ Render Gallery — Local 4 GB VRAM

🎌 Anime & Character Art

📷 Photorealism & Portraits

🚗 Automotive & Industrial

🌌 Landscapes & Environments

🧟 Horror & Dark Fantasy

🎨 Stylized & Experimental

🔬 Multi-Stage Upscaling Pipeline

🧩 Workflow: From CivitAI to Custom Pipeline

Step 1: Add Your Model

Step 2: Scrape Community Prompts

Step 3: Enhance and Register

📖 Deep Dives

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages