Awesome Image Generation

A curated list of AI image generation APIs, SDKs, and production-ready tools. Focused on services developers can integrate today.

Maintained by Backblaze.

Related Lists

Text-to-Image APIs

Commercial image-generation APIs with hosted inference and developer SDKs.

Adobe Firefly API – Image generation, editing, Photoshop automation, and Lightroom operations. Part of Firefly Services platform. Docs | SDK: JS/TS (official)
Amazon Titan Image Generator – Text-to-image via AWS Bedrock. Image conditioning, color palette guidance, background removal, and variations. Docs | SDK: Python (boto3), Java, PHP
Black Forest Labs (FLUX Pro) – FLUX 1.1 Pro and FLUX.2 (32B params) via REST API. From the creators of FLUX and Stable Diffusion. Also on Replicate, fal.ai, Together AI. Docs
fal.ai – Serverless inference hosting 1000+ image models. Fastest diffusion inference engine. Hosts FLUX, SD, and more. SOC 2 compliant. Docs | SDK: Python, JS
Google Gemini Image API – Native image generation via Gemini models (gemini-2.5-flash-image, gemini-3.1-flash-image-preview). Text-to-image, editing, multi-turn. Python/JS/Go/Java SDKs. Free tier via AI Studio. Docs | SDK: Python (google-genai), JS (google/generative-ai), Go, Java
Google Imagen (Vertex AI) – Imagen 4 via Vertex AI. Text-to-image, editing, outpainting, inpainting, customization. Docs | SDK: Python (google-cloud-aiplatform), Node
Ideogram – Known for high-quality text rendering in images. Ideogram 3.0 supports generation, remix, edit, and character reference. OpenAI-compatible interface. Docs
Leonardo AI – Text-to-image, image-to-image, and image-to-video. Webhooks, LoRA models, and "Get API Code" export from web UI. Docs | SDK: TypeScript, Python
Midjourney – Official API released late 2025. Enterprise/Pro plan holders only; no public self-service access. Docs
MiniMax Image API – Text-to-image and image-to-image REST API from MiniMax. image-01 model supports subject reference images, configurable aspect ratios, and batch generation up to 9 images per request. Docs
ModelsLab – REST API for 10k+ image models including FLUX, SDXL, SD 1.5, LoRA fine-tunes, and community variants. Text-to-image, img2img, inpainting, ControlNet. Python, JS, PHP, Go, Dart SDKs. Docs | SDK: Python, JS/TS, PHP, Go, Dart
Novita AI – Serverless API with 200+ image models including FLUX.1 Kontext, Hunyuan Image 3, Seedream, and SD variants. Text-to-image, img2img, inpainting, ControlNet. Python and JS SDKs. Docs | SDK: Python
OpenAI GPT Image – gpt-image-1, gpt-image-1.5, gpt-image-1-mini. Natively multimodal generation, editing, and inpainting. DALL-E 2/3 deprecated May 2026. Docs | SDK: Python, Node
Pollinations.ai – Free open-source generative AI platform. Image generation via GET endpoint (FLUX, Turbo, Kontext). Anonymous tier; register for watermark-free access and higher rate limits. Docs
Prodia – Serverless image generation API with 50+ models including FLUX.1/2, Recraft V4, and SD variants. 190ms latency for FLUX. Sync and async endpoints. JavaScript (prodia-js) and Python SDKs. Docs | SDK: JS, Python
Recraft AI – Raster and vector image generation. V4 model (Feb 2026). Background removal, inpainting, outpainting, vectorization. OpenAI-compatible interface. Docs
Runware – Image generation API serving 400k+ models via proprietary Sonic Inference Engine. Text-to-image, inpainting, outpainting, upscaling. Pay-per-image pricing with $2 free trial. Docs | SDK: Python, JS
Segmind – Serverless REST API for 150+ image and video models including FLUX, GPT Image, Imagen 4, Kling, and Wan. PixelFlow workflow builder for chaining models. Free credits on signup. Docs
SiliconFlow – OpenAI-compatible inference API hosting FLUX, Qwen-Image, Kolors, and other open-weight image models. Text-to-image and image editing endpoints. Free tier available. Docs
Stability AI – Stable Diffusion 3.5 and Stable Image via REST API. Text-to-image, image-to-image, upscaling, inpainting. Docs
xAI Image Generation API – grok-imagine-image model via REST API. Text-to-image and image editing. Batch up to 10 images, 1k/2k resolution. OpenAI-compatible interface. Docs | SDK: Python (xai-sdk), JS (openai-compatible)

Open Source Models

Open-weight image-generation models you can run locally or self-host.

FLUX.1 [schnell] – 12B param rectified flow transformer. 1-4 step generation. Fully open for commercial use. Docs
FLUX.1 Kontext [dev] – 12B param instruction-based image editing model. Edit existing images via text prompts; character/style reference without finetuning. Non-commercial license. Docs
DeepFloyd IF – Cascaded pixel-space diffusion (64px → 256px → 1024px). Strong text rendering. Zero-Shot FID 6.66 on COCO.
LCM / LCM-LoRA – Latent Consistency Models enabling 2-4 step generation. LCM-LoRA is a lightweight ~100MB adapter for any SDXL model. Docs
PixArt-Alpha / PixArt-Sigma – DiT-based T2I at 10.8% of SD1.5 training cost. Near-commercial quality. Docs
Kandinsky 3 – Open-source T2I from AI Forever. 2x larger U-Net and 10x larger text encoder vs v2.x. Docs
Chroma – 8.9B FLUX-based open T2I model. Supports text-to-image, image-to-image, and inpainting via diffusers ChromaPipeline. Apache-2.0, designed as a finetuning base. Docs
ERNIE-Image – 8B single-stream DiT text-to-image model from Baidu. Strong text rendering for posters and infographics. Diffusers-native via ErnieImagePipeline. Turbo variant for 24GB VRAM. Apache-2.0. Docs
FLUX.1 [dev] – 12B param guidance-distilled model. High quality, competitive with closed-source. Non-commercial license.
FLUX.2 [dev] – 32B param model with generation, editing, and multi-reference combining.
GLM-Image – 16B hybrid autoregressive + diffusion model from Zhipu AI. Excels at text rendering inside images. Supports T2I and I2I. Runs via GlmImagePipeline in diffusers. Docs
HiDream-I1 – 17B sparse diffusion transformer for text-to-image. Three variants (Full, Dev, Fast). Top benchmark scores; diffusers-native via HiDreamImagePipeline. Docs
HunyuanImage 3.0 – 80B MoE T2I model from Tencent (13B params activated per token). Multimodal understanding and generation. Instruct-distilled variant released Jan 2026. Tencent Community License. Docs
Lumina-Image 2.0 – 2.6B DiT text-to-image model using Gemma-2-2B text encoder. Diffusers-native via Lumina2Pipeline. Supports fine-tuning and controllable generation. Apache-2.0. Docs
OmniGen2 – Unified T2I + instruction-based image editing model. Dual-path architecture with separate autoregressive LLM and diffusion transformer decoders. pip-installable, Apache-2.0. Docs
Playground v2.5 – Aesthetic-focused model fine-tuned on SDXL architecture.
Qwen-Image – Alibaba's open-weight T2I family. Qwen-Image-2512 (text-to-image) and Qwen-Image-Edit variants. Strong text rendering including Chinese. Diffusers-native, Apache 2.0. Docs
SANA – Efficient high-resolution T2I model from NVIDIA. Linear DiT architecture with DC-AE (32x compression). Generates 4K images 20x smaller and 100x faster than Flux-12B. Diffusers-native via SanaPipeline. Apache-2.0. Docs
SDXL-Turbo – Adversarial distillation of SDXL enabling single-step generation.
Stable Diffusion 1.5 – 860M UNet, runs on consumer GPUs. Foundation for massive community ecosystem of LoRAs, fine-tunes, and extensions.
Stable Diffusion 3.5 Large – MMDiT architecture with three text encoders (including T5-XXL). Highest-quality Stability open model. Docs
Stable Diffusion XL (SDXL) – Native 1024x1024. Improved text-in-image and limb generation. Base + refiner pipeline.
Z-Image – 6B param T2I model family from Alibaba Tongyi-MAI. Variants include Turbo (sub-second inference), Omni-Base (gen+edit), and Edit. Diffusers-native, Apache-2.0. Docs

Open Source Frameworks and UIs

Graphical and programmatic interfaces for running diffusion pipelines.

AUTOMATIC1111 WebUI – Most widely used Gradio-based SD web UI. 161k+ stars. Extensive extension ecosystem. Docs
ComfyUI – Node-based graph UI and backend for diffusion models. Highly customizable, API-accessible. Supports SD, SDXL, Flux, and modern models. Docs
Fooocus – Midjourney-inspired SDXL UI. Prompt-only workflow, no manual parameter tweaking.
InvokeAI – Creative engine for SD models targeting professionals. Industry-leading WebUI. Docs
Forge – Fork of AUTOMATIC1111 with improved GPU memory management and performance. Compatible with A1111 extensions.
AI Toolkit (ostris) – All-in-one training suite for diffusion models. GUI and CLI. Trains FLUX.1/2, SDXL, SD 1.5, Qwen-Image, HiDream, and video models on consumer hardware.
comfy-pack – Toolkit for locking, packaging, and deploying ComfyUI workflow environments. Bundles custom nodes, model hashes, and Python deps into a .cpack.zip. Serves workflows as REST APIs. Docs
ComfyUI Deploy – Open-source deployment platform for ComfyUI workflows. Exposes versioned REST APIs for production and staging. Supports serverless GPU backends and self-hosting on Vercel/Neon. Docs
ComfyUI-Manager – Extension for ComfyUI that installs, updates, and manages 800+ custom nodes via a GUI or CLI. Auto-installed with ComfyUI Desktop. Docs
DiffSynth-Studio – Python diffusion engine by ModelScope. Inference and LoRA training for FLUX.1/2, Qwen-Image, Z-Image, and JoyAI-Image. Low-VRAM optimizations, ControlNet, IP-Adapter support.
diffusion-pipe – Pipeline-parallel training script for diffusion models across multiple GPUs. Supports FLUX.1/2, Chroma, SDXL, SD3, HiDream, Qwen-Image, Z-Image, and Hunyuan. LoRA, full fine-tuning, and multi-GPU via DeepSpeed.
Fluxgym – Minimal Gradio web UI for FLUX.1 LoRA training on low VRAM (12GB–20GB). Wraps Kohya scripts with automatic image resizing, HuggingFace publish, and 100% Kohya feature access via Advanced tab.
kohya_ss – Gradio-based GUI for Kohya's SD training scripts. Supports LoRA, DreamBooth, and fine-tuning for SD 1.5, SDXL, SD3, and FLUX.1.
OneTrainer – GUI and CLI training suite for diffusion models. Supports FLUX.1/2, Chroma, SD 1.5/2/3/XL, SDXL, PixArt, HiDream, and Hunyuan Video.
SimpleTuner – General fine-tuning kit for diffusion models. Supports FLUX.1/2, SDXL, SD3, and more. Multi-GPU training, aspect bucketing, embedding caching, and a web UI. AGPL-3.0.
stable-diffusion.cpp – Diffusion model inference in pure C/C++ with no external dependencies. Runs SD 1.x/2.x/XL/3.5, FLUX.1/2, Chroma, Qwen-Image, and Z-Image. CPU/CUDA/Metal/Vulkan backends.

Image Editing and Enhancement

Conditioning, adaptation, restoration, and upscaling tools.

GFPGAN – Face restoration from Tencent ARC. Restores facial details from degraded images. Often paired with Real-ESRGAN.
Real-ESRGAN – Image and video upscaler, up to 8x. Handles real-world blind super-resolution with noise/artifact removal. Docs
IP-Adapter – Lightweight adapter (~100MB) for image-based prompting. New cross-attention layers for image feature conditioning. Docs
chaiNNer – Node-based GUI for chaining image processing tasks. Supports PyTorch, NCNN, ONNX, and TensorRT upscaling models, background removal, and batch processing. Cross-platform with CUDA/ROCm/MPS backends.
ComfyUI Impact Pack – ComfyUI custom node pack for detection-based face detailing, iterative upscaling, segmentation masking, and regional sampling. FaceDetailer node is widely used for portrait refinement.
comfyui_controlnet_aux – ComfyUI custom node set for generating ControlNet hint images. Preprocessors include Canny, HED, depth estimation, pose detection, segmentation, and 20+ others. Apache-2.0.
ControlNet – Precise structural control for diffusion models via edge maps, depth, pose, normals. Available for SD1.5, SDXL, and Flux. Docs
Krita AI Diffusion – Plugin integrating diffusion-based generation into Krita. Inpaint, outpaint, upscale, and ControlNet workflows without leaving the canvas. Uses ComfyUI as backend. Supports FLUX, SD 1.5/XL, Z-Image, and Illustrious. Docs
Upscayl – Desktop GUI for AI image upscaling on Linux, macOS, and Windows. Uses Real-ESRGAN and other models; up to 16x upscale. Requires Vulkan GPU. Docs

SDKs and Developer Tooling

Libraries and client SDKs for integrating image generation into apps.

Gradio – Python library for building interactive ML demos and web UIs. Foundation for AUTOMATIC1111, Fooocus, and HuggingFace Spaces. Includes gradio-client for programmatic access. Docs | SDK: Python (pip install gradio)
HuggingFace Diffusers – The canonical PyTorch library for diffusion models. SD 1.5, SDXL, SD3, Flux, ControlNet, IP-Adapter, and more. Docs | SDK: Python (pip install diffusers)
Replicate SDK – Python/JS client for 50,000+ hosted ML models. Pay-per-second, no GPU management. Docs | SDK: Python (pip install replicate), Node (npm install replicate)
fal.ai SDK – Python and JS SDKs for serverless inference. Also a Vercel AI SDK provider. Docs | SDK: Python (pip install fal-client), Node (npm install @fal-ai/client)
Cog – Open-source tool for packaging ML models into Docker containers. Defines environment via cog.yaml, auto-generates a REST prediction API (Rust/Axum server), and deploys to Replicate or any Docker host. Docs
HuggingFace Inference Providers – Unified Python/JS client routing requests to fal, Replicate, Together, WaveSpeedAI, and others. Supports text-to-image tasks including FLUX models. Free tier; single HF token for all providers. Docs | SDK: Python (pip install huggingface_hub), JS (npm install @huggingface/inference)
OpenAI SDK – Official SDK for GPT Image generation and editing. client.images.generate() and client.images.edit(). SDK: Python (pip install openai), Node (npm install openai)

GPU Cloud Providers

Serverless and on-demand GPU platforms for running image models.

fal.ai (GPU) – Fastest diffusion inference engine. 1000+ hosted models. Docs
Lambda Labs – On-demand A100 and H100 GPUs. Competitive pricing (~$1.10/hr A100 80GB). Docs
Modal – Serverless Python GPU cloud. Sub-second cold starts. Docs | SDK: Python (pip install modal)
Replicate – Serverless model hosting for open-source image models. Docs
RunPod – GPU pods and serverless endpoints. 48% of serverless cold starts under 200ms. Docs
Together AI – Inference API for 200+ open models. Docs
WaveSpeed AI – Serverless inference platform with 700+ image and video models. Sub-second cold starts for FLUX and other diffusion models. OpenAI-compatible REST API. Docs | SDK: Python, JS

Image Storage and Delivery

Object stores and CDNs suited to generated-image workloads.

Backblaze B2 – S3-compatible object storage at low cost. Free egress via Cloudflare. Docs | B2 integration
Cloudflare Images – Image CDN on Cloudflare's global network. Pre-defined variants for transformations.
Cloudinary – Enterprise image/video CDN with AI-powered transformations. Docs | SDK: Python, Node, Ruby, PHP, Java, .NET
Imgix – Real-time image processing CDN. URL-parameter-based transforms. Connects to existing S3/GCS storage. Docs

Evaluation and Observability

Metrics, leaderboards, and quality tooling for generated images.

pytorch-fid – PyTorch FID (Fréchet Inception Distance) implementation. Measures distribution similarity between real and generated images. SDK: Python (pip install pytorch-fid)
IQA-PyTorch – Comprehensive image quality toolbox. PSNR, SSIM, LPIPS, FID, NIQE, MUSIQ, TOPIQ, NIMA, BRISQUE, and more.
CLIP Score – Measures semantic alignment between text prompts and generated images using CLIP embeddings. Available via torchmetrics.multimodal.CLIPScore.
ImageReward – First general-purpose human preference reward model for T2I (NeurIPS 2023). Trained on 137k expert comparison pairs. Docs
torch-fidelity – High-fidelity ISC, FID, KID, and PRC metrics. Supports InceptionV3, CLIP, DINOv2, VGG16 feature extractors. Docs | SDK: Python (pip install torch-fidelity)
Artificial Analysis Image Leaderboard – Elo-rating leaderboard for text-to-image models based on blind user comparisons. Covers quality, speed, and price. Separate rankings for open-weight and API-only models. Docs

Templates and Example Projects

Reference implementations, demos, and starter projects.

B2 Background Removal with Transformers.js – Browser-based background removal using Transformers.js with Backblaze B2 storage. B2 integration
B2 Image Generation Prompt Flow – Image generation pipeline with prompt flow and Backblaze B2 cloud storage integration. B2 integration
HuggingFace Diffusers Examples – Official scripts for DreamBooth, LoRA fine-tuning, ControlNet training, and more.
HuggingFace Spaces – Free hosting for Gradio and Streamlit ML demos. Thousands of image generation demos. Docs
OpenAI Cookbook (GPT Image) – Official notebooks for image generation and editing with gpt-image-1.
Replicate Text-to-Image Collection – Curated runnable models with inline API code examples.

Contributing

Contributions are welcome. See CONTRIBUTING.md. One entry per PR — edit entries.yaml only and let the maintainers regenerate README.md.

Start building with Genblaze

Save on tokens by using the Genblaze SDK — Backblaze's open-source Python SDK for AI-generated video, audio, and images. It orchestrates multi-provider generation pipelines with built-in, tamper-evident provenance and native Backblaze B2 storage.

License

Released under CC0 1.0 Universal. You may copy, modify, and redistribute without attribution.

About Backblaze B2

Backblaze B2 Cloud Storage is S3-compatible object storage designed for AI and media workloads. This list is maintained as part of our work making B2 a convenient storage layer for AI workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
categories.yaml		categories.yaml
entries.yaml		entries.yaml
footer.md		footer.md
header.md		header.md
llms.txt		llms.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Image Generation

Related Lists

Contents

Text-to-Image APIs

Open Source Models

Open Source Frameworks and UIs

Image Editing and Enhancement

SDKs and Developer Tooling

GPU Cloud Providers

Image Storage and Delivery

Evaluation and Observability

Templates and Example Projects

Contributing

Start building with Genblaze

License

About Backblaze B2

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Image Generation

Related Lists

Contents

Text-to-Image APIs

Open Source Models

Open Source Frameworks and UIs

Image Editing and Enhancement

SDKs and Developer Tooling

GPU Cloud Providers

Image Storage and Delivery

Evaluation and Observability

Templates and Example Projects

Contributing

Start building with Genblaze

License

About Backblaze B2

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages