Skip to content

backblaze-labs/awesome-image-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Image Generation Awesome PRs Welcome License: CC0-1.0

A curated list of AI image generation APIs, SDKs, and production-ready tools. Focused on services developers can integrate today.

Maintained by Backblaze.

Related Lists

Contents


Text-to-Image APIs

Commercial image-generation APIs with hosted inference and developer SDKs.

  • Adobe Firefly API – Image generation, editing, Photoshop automation, and Lightroom operations. Part of Firefly Services platform. Docs | SDK: JS/TS (official)
  • Amazon Titan Image Generator – Text-to-image via AWS Bedrock. Image conditioning, color palette guidance, background removal, and variations. Docs | SDK: Python (boto3), Java, PHP
  • Black Forest Labs (FLUX Pro) – FLUX 1.1 Pro and FLUX.2 (32B params) via REST API. From the creators of FLUX and Stable Diffusion. Also on Replicate, fal.ai, Together AI. Docs
  • fal.ai – Serverless inference hosting 1000+ image models. Fastest diffusion inference engine. Hosts FLUX, SD, and more. SOC 2 compliant. Docs | SDK: Python, JS
  • Google Gemini Image API – Native image generation via Gemini models (gemini-2.5-flash-image, gemini-3.1-flash-image-preview). Text-to-image, editing, multi-turn. Python/JS/Go/Java SDKs. Free tier via AI Studio. Docs | SDK: Python (google-genai), JS (google/generative-ai), Go, Java
  • Google Imagen (Vertex AI) – Imagen 4 via Vertex AI. Text-to-image, editing, outpainting, inpainting, customization. Docs | SDK: Python (google-cloud-aiplatform), Node
  • Ideogram – Known for high-quality text rendering in images. Ideogram 3.0 supports generation, remix, edit, and character reference. OpenAI-compatible interface. Docs
  • Leonardo AI – Text-to-image, image-to-image, and image-to-video. Webhooks, LoRA models, and "Get API Code" export from web UI. Docs | SDK: TypeScript, Python
  • Midjourney – Official API released late 2025. Enterprise/Pro plan holders only; no public self-service access. Docs
  • MiniMax Image API – Text-to-image and image-to-image REST API from MiniMax. image-01 model supports subject reference images, configurable aspect ratios, and batch generation up to 9 images per request. Docs
  • ModelsLab – REST API for 10k+ image models including FLUX, SDXL, SD 1.5, LoRA fine-tunes, and community variants. Text-to-image, img2img, inpainting, ControlNet. Python, JS, PHP, Go, Dart SDKs. Docs | SDK: Python, JS/TS, PHP, Go, Dart
  • Novita AI – Serverless API with 200+ image models including FLUX.1 Kontext, Hunyuan Image 3, Seedream, and SD variants. Text-to-image, img2img, inpainting, ControlNet. Python and JS SDKs. Docs | SDK: Python
  • OpenAI GPT Image – gpt-image-1, gpt-image-1.5, gpt-image-1-mini. Natively multimodal generation, editing, and inpainting. DALL-E 2/3 deprecated May 2026. Docs | SDK: Python, Node
  • Pollinations.ai – Free open-source generative AI platform. Image generation via GET endpoint (FLUX, Turbo, Kontext). Anonymous tier; register for watermark-free access and higher rate limits. Docs
  • Prodia – Serverless image generation API with 50+ models including FLUX.1/2, Recraft V4, and SD variants. 190ms latency for FLUX. Sync and async endpoints. JavaScript (prodia-js) and Python SDKs. Docs | SDK: JS, Python
  • Recraft AI – Raster and vector image generation. V4 model (Feb 2026). Background removal, inpainting, outpainting, vectorization. OpenAI-compatible interface. Docs
  • Runware – Image generation API serving 400k+ models via proprietary Sonic Inference Engine. Text-to-image, inpainting, outpainting, upscaling. Pay-per-image pricing with $2 free trial. Docs | SDK: Python, JS
  • Segmind – Serverless REST API for 150+ image and video models including FLUX, GPT Image, Imagen 4, Kling, and Wan. PixelFlow workflow builder for chaining models. Free credits on signup. Docs
  • SiliconFlow – OpenAI-compatible inference API hosting FLUX, Qwen-Image, Kolors, and other open-weight image models. Text-to-image and image editing endpoints. Free tier available. Docs
  • Stability AI – Stable Diffusion 3.5 and Stable Image via REST API. Text-to-image, image-to-image, upscaling, inpainting. Docs
  • xAI Image Generation API – grok-imagine-image model via REST API. Text-to-image and image editing. Batch up to 10 images, 1k/2k resolution. OpenAI-compatible interface. Docs | SDK: Python (xai-sdk), JS (openai-compatible)

Open Source Models

Open-weight image-generation models you can run locally or self-host.

  • FLUX.1 [schnell] – 12B param rectified flow transformer. 1-4 step generation. Fully open for commercial use. Docs
  • FLUX.1 Kontext [dev] – 12B param instruction-based image editing model. Edit existing images via text prompts; character/style reference without finetuning. Non-commercial license. Docs
  • DeepFloyd IF – Cascaded pixel-space diffusion (64px → 256px → 1024px). Strong text rendering. Zero-Shot FID 6.66 on COCO.
  • LCM / LCM-LoRA – Latent Consistency Models enabling 2-4 step generation. LCM-LoRA is a lightweight ~100MB adapter for any SDXL model. Docs
  • PixArt-Alpha / PixArt-Sigma – DiT-based T2I at 10.8% of SD1.5 training cost. Near-commercial quality. Docs
  • Kandinsky 3 – Open-source T2I from AI Forever. 2x larger U-Net and 10x larger text encoder vs v2.x. Docs
  • Chroma – 8.9B FLUX-based open T2I model. Supports text-to-image, image-to-image, and inpainting via diffusers ChromaPipeline. Apache-2.0, designed as a finetuning base. Docs
  • ERNIE-Image – 8B single-stream DiT text-to-image model from Baidu. Strong text rendering for posters and infographics. Diffusers-native via ErnieImagePipeline. Turbo variant for 24GB VRAM. Apache-2.0. Docs
  • FLUX.1 [dev] – 12B param guidance-distilled model. High quality, competitive with closed-source. Non-commercial license.
  • FLUX.2 [dev] – 32B param model with generation, editing, and multi-reference combining.
  • GLM-Image – 16B hybrid autoregressive + diffusion model from Zhipu AI. Excels at text rendering inside images. Supports T2I and I2I. Runs via GlmImagePipeline in diffusers. Docs
  • HiDream-I1 – 17B sparse diffusion transformer for text-to-image. Three variants (Full, Dev, Fast). Top benchmark scores; diffusers-native via HiDreamImagePipeline. Docs
  • HunyuanImage 3.0 – 80B MoE T2I model from Tencent (13B params activated per token). Multimodal understanding and generation. Instruct-distilled variant released Jan 2026. Tencent Community License. Docs
  • Lumina-Image 2.0 – 2.6B DiT text-to-image model using Gemma-2-2B text encoder. Diffusers-native via Lumina2Pipeline. Supports fine-tuning and controllable generation. Apache-2.0. Docs
  • OmniGen2 – Unified T2I + instruction-based image editing model. Dual-path architecture with separate autoregressive LLM and diffusion transformer decoders. pip-installable, Apache-2.0. Docs
  • Playground v2.5 – Aesthetic-focused model fine-tuned on SDXL architecture.
  • Qwen-Image – Alibaba's open-weight T2I family. Qwen-Image-2512 (text-to-image) and Qwen-Image-Edit variants. Strong text rendering including Chinese. Diffusers-native, Apache 2.0. Docs
  • SANA – Efficient high-resolution T2I model from NVIDIA. Linear DiT architecture with DC-AE (32x compression). Generates 4K images 20x smaller and 100x faster than Flux-12B. Diffusers-native via SanaPipeline. Apache-2.0. Docs
  • SDXL-Turbo – Adversarial distillation of SDXL enabling single-step generation.
  • Stable Diffusion 1.5 – 860M UNet, runs on consumer GPUs. Foundation for massive community ecosystem of LoRAs, fine-tunes, and extensions.
  • Stable Diffusion 3.5 Large – MMDiT architecture with three text encoders (including T5-XXL). Highest-quality Stability open model. Docs
  • Stable Diffusion XL (SDXL) – Native 1024x1024. Improved text-in-image and limb generation. Base + refiner pipeline.
  • Z-Image – 6B param T2I model family from Alibaba Tongyi-MAI. Variants include Turbo (sub-second inference), Omni-Base (gen+edit), and Edit. Diffusers-native, Apache-2.0. Docs

Open Source Frameworks and UIs

Graphical and programmatic interfaces for running diffusion pipelines.

  • AUTOMATIC1111 WebUI – Most widely used Gradio-based SD web UI. 161k+ stars. Extensive extension ecosystem. Docs
  • ComfyUI – Node-based graph UI and backend for diffusion models. Highly customizable, API-accessible. Supports SD, SDXL, Flux, and modern models. Docs
  • Fooocus – Midjourney-inspired SDXL UI. Prompt-only workflow, no manual parameter tweaking.
  • InvokeAI – Creative engine for SD models targeting professionals. Industry-leading WebUI. Docs
  • Forge – Fork of AUTOMATIC1111 with improved GPU memory management and performance. Compatible with A1111 extensions.
  • AI Toolkit (ostris) – All-in-one training suite for diffusion models. GUI and CLI. Trains FLUX.1/2, SDXL, SD 1.5, Qwen-Image, HiDream, and video models on consumer hardware.
  • comfy-pack – Toolkit for locking, packaging, and deploying ComfyUI workflow environments. Bundles custom nodes, model hashes, and Python deps into a .cpack.zip. Serves workflows as REST APIs. Docs
  • ComfyUI Deploy – Open-source deployment platform for ComfyUI workflows. Exposes versioned REST APIs for production and staging. Supports serverless GPU backends and self-hosting on Vercel/Neon. Docs
  • ComfyUI-Manager – Extension for ComfyUI that installs, updates, and manages 800+ custom nodes via a GUI or CLI. Auto-installed with ComfyUI Desktop. Docs
  • DiffSynth-Studio – Python diffusion engine by ModelScope. Inference and LoRA training for FLUX.1/2, Qwen-Image, Z-Image, and JoyAI-Image. Low-VRAM optimizations, ControlNet, IP-Adapter support.
  • diffusion-pipe – Pipeline-parallel training script for diffusion models across multiple GPUs. Supports FLUX.1/2, Chroma, SDXL, SD3, HiDream, Qwen-Image, Z-Image, and Hunyuan. LoRA, full fine-tuning, and multi-GPU via DeepSpeed.
  • Fluxgym – Minimal Gradio web UI for FLUX.1 LoRA training on low VRAM (12GB–20GB). Wraps Kohya scripts with automatic image resizing, HuggingFace publish, and 100% Kohya feature access via Advanced tab.
  • kohya_ss – Gradio-based GUI for Kohya's SD training scripts. Supports LoRA, DreamBooth, and fine-tuning for SD 1.5, SDXL, SD3, and FLUX.1.
  • OneTrainer – GUI and CLI training suite for diffusion models. Supports FLUX.1/2, Chroma, SD 1.5/2/3/XL, SDXL, PixArt, HiDream, and Hunyuan Video.
  • SimpleTuner – General fine-tuning kit for diffusion models. Supports FLUX.1/2, SDXL, SD3, and more. Multi-GPU training, aspect bucketing, embedding caching, and a web UI. AGPL-3.0.
  • stable-diffusion.cpp – Diffusion model inference in pure C/C++ with no external dependencies. Runs SD 1.x/2.x/XL/3.5, FLUX.1/2, Chroma, Qwen-Image, and Z-Image. CPU/CUDA/Metal/Vulkan backends.

Image Editing and Enhancement

Conditioning, adaptation, restoration, and upscaling tools.

  • GFPGAN – Face restoration from Tencent ARC. Restores facial details from degraded images. Often paired with Real-ESRGAN.
  • Real-ESRGAN – Image and video upscaler, up to 8x. Handles real-world blind super-resolution with noise/artifact removal. Docs
  • IP-Adapter – Lightweight adapter (~100MB) for image-based prompting. New cross-attention layers for image feature conditioning. Docs
  • chaiNNer – Node-based GUI for chaining image processing tasks. Supports PyTorch, NCNN, ONNX, and TensorRT upscaling models, background removal, and batch processing. Cross-platform with CUDA/ROCm/MPS backends.
  • ComfyUI Impact Pack – ComfyUI custom node pack for detection-based face detailing, iterative upscaling, segmentation masking, and regional sampling. FaceDetailer node is widely used for portrait refinement.
  • comfyui_controlnet_aux – ComfyUI custom node set for generating ControlNet hint images. Preprocessors include Canny, HED, depth estimation, pose detection, segmentation, and 20+ others. Apache-2.0.
  • ControlNet – Precise structural control for diffusion models via edge maps, depth, pose, normals. Available for SD1.5, SDXL, and Flux. Docs
  • Krita AI Diffusion – Plugin integrating diffusion-based generation into Krita. Inpaint, outpaint, upscale, and ControlNet workflows without leaving the canvas. Uses ComfyUI as backend. Supports FLUX, SD 1.5/XL, Z-Image, and Illustrious. Docs
  • Upscayl – Desktop GUI for AI image upscaling on Linux, macOS, and Windows. Uses Real-ESRGAN and other models; up to 16x upscale. Requires Vulkan GPU. Docs

SDKs and Developer Tooling

Libraries and client SDKs for integrating image generation into apps.

  • Gradio – Python library for building interactive ML demos and web UIs. Foundation for AUTOMATIC1111, Fooocus, and HuggingFace Spaces. Includes gradio-client for programmatic access. Docs | SDK: Python (pip install gradio)
  • HuggingFace Diffusers – The canonical PyTorch library for diffusion models. SD 1.5, SDXL, SD3, Flux, ControlNet, IP-Adapter, and more. Docs | SDK: Python (pip install diffusers)
  • Replicate SDK – Python/JS client for 50,000+ hosted ML models. Pay-per-second, no GPU management. Docs | SDK: Python (pip install replicate), Node (npm install replicate)
  • fal.ai SDK – Python and JS SDKs for serverless inference. Also a Vercel AI SDK provider. Docs | SDK: Python (pip install fal-client), Node (npm install @fal-ai/client)
  • Cog – Open-source tool for packaging ML models into Docker containers. Defines environment via cog.yaml, auto-generates a REST prediction API (Rust/Axum server), and deploys to Replicate or any Docker host. Docs
  • HuggingFace Inference Providers – Unified Python/JS client routing requests to fal, Replicate, Together, WaveSpeedAI, and others. Supports text-to-image tasks including FLUX models. Free tier; single HF token for all providers. Docs | SDK: Python (pip install huggingface_hub), JS (npm install @huggingface/inference)
  • OpenAI SDK – Official SDK for GPT Image generation and editing. client.images.generate() and client.images.edit(). SDK: Python (pip install openai), Node (npm install openai)

GPU Cloud Providers

Serverless and on-demand GPU platforms for running image models.

  • fal.ai (GPU) – Fastest diffusion inference engine. 1000+ hosted models. Docs
  • Lambda Labs – On-demand A100 and H100 GPUs. Competitive pricing (~$1.10/hr A100 80GB). Docs
  • Modal – Serverless Python GPU cloud. Sub-second cold starts. Docs | SDK: Python (pip install modal)
  • Replicate – Serverless model hosting for open-source image models. Docs
  • RunPod – GPU pods and serverless endpoints. 48% of serverless cold starts under 200ms. Docs
  • Together AI – Inference API for 200+ open models. Docs
  • WaveSpeed AI – Serverless inference platform with 700+ image and video models. Sub-second cold starts for FLUX and other diffusion models. OpenAI-compatible REST API. Docs | SDK: Python, JS

Image Storage and Delivery

Object stores and CDNs suited to generated-image workloads.

  • Backblaze B2 – S3-compatible object storage at low cost. Free egress via Cloudflare. Docs | B2 integration
  • Cloudflare Images – Image CDN on Cloudflare's global network. Pre-defined variants for transformations.
  • Cloudinary – Enterprise image/video CDN with AI-powered transformations. Docs | SDK: Python, Node, Ruby, PHP, Java, .NET
  • Imgix – Real-time image processing CDN. URL-parameter-based transforms. Connects to existing S3/GCS storage. Docs

Evaluation and Observability

Metrics, leaderboards, and quality tooling for generated images.

  • pytorch-fid – PyTorch FID (Fréchet Inception Distance) implementation. Measures distribution similarity between real and generated images. SDK: Python (pip install pytorch-fid)
  • IQA-PyTorch – Comprehensive image quality toolbox. PSNR, SSIM, LPIPS, FID, NIQE, MUSIQ, TOPIQ, NIMA, BRISQUE, and more.
  • CLIP Score – Measures semantic alignment between text prompts and generated images using CLIP embeddings. Available via torchmetrics.multimodal.CLIPScore.
  • ImageReward – First general-purpose human preference reward model for T2I (NeurIPS 2023). Trained on 137k expert comparison pairs. Docs
  • torch-fidelity – High-fidelity ISC, FID, KID, and PRC metrics. Supports InceptionV3, CLIP, DINOv2, VGG16 feature extractors. Docs | SDK: Python (pip install torch-fidelity)
  • Artificial Analysis Image Leaderboard – Elo-rating leaderboard for text-to-image models based on blind user comparisons. Covers quality, speed, and price. Separate rankings for open-weight and API-only models. Docs

Templates and Example Projects

Reference implementations, demos, and starter projects.


Contributing

Contributions are welcome. See CONTRIBUTING.md. One entry per PR — edit entries.yaml only and let the maintainers regenerate README.md.

Start building with Genblaze

Save on tokens by using the Genblaze SDK — Backblaze's open-source Python SDK for AI-generated video, audio, and images. It orchestrates multi-provider generation pipelines with built-in, tamper-evident provenance and native Backblaze B2 storage.

License

Released under CC0 1.0 Universal. You may copy, modify, and redistribute without attribution.

About Backblaze B2

Backblaze B2 Cloud Storage is S3-compatible object storage designed for AI and media workloads. This list is maintained as part of our work making B2 a convenient storage layer for AI workflows.

About

A curated list of AI image generation APIs, SDKs, and tools including text-to-image, image editing, diffusion models, generative art systems, and multimodal AI platforms. Covers commercial services, open source models with APIs, and scalable infrastructure for developers building visual applications.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors