TinyDiffusion

Benchmarking and Optimized Stable Diffusion for Edge Devices.

Introduction • Setup • Quick Start • Results • Analysis

Introduction

This repository benchmarks Stable Diffusion UNet inference performance across different runtimes. The focus is on comparing:

Native PyTorch UNet (GPU) execution
ONNXRuntime UNet (GPU) execution
ONNXRuntime UNet (CPU) execution

The goal is to understand the trade-offs in inference speed, CPU/GPU memory usage, and runtime stability when exporting Stable Diffusion components to ONNX and running them with onnxruntime.

Features

ONNX Export: Export the Stable Diffusion UNet model from Hugging Face’s diffusers library into an ONNX graph.

Flexible Inference: Run inference on CPU or GPU with onnxruntime or fall back to native PyTorch.

Benchmarking Suite: Collect detailed metrics including:

Average inference time & standard deviation
CPU memory usage (Resident Set Size)
GPU memory allocation

Results Logging: Save benchmarking results to CSV, with the ability to append new results across runs.

Visualization: Generate plots comparing performance across backends for quick insights.

Why ONNX?

ONNX allows exporting deep learning models into a framework-agnostic format. With onnxruntime, models can run on multiple backends (CPU, CUDA, TensorRT, DirectML, etc.) without depending on PyTorch. While this repo shows that ONNX on CPU can be useful for portability, we also observe that PyTorch often outperforms ONNX on GPU for Stable Diffusion UNet inference.

Setup

Add the project root ie, Folder containing this README to PYTHONPATH whichever way you want. One way would be to create a .env and write the following in it

PYTHONPATH=\full\path\to\projectroot

And place this .env file in the project root. Works for VS Code.

Another option would be to run $env:PYTHONPATH = \full\path\to\projectroot in powershell to set the env variable and then run the scripts.

Install pytorch, torchvision via pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu118 - Conda doesnt install GPU version on Windows.

Sanity

Before committing changes run pre-commit run --all-files or pre-commit run --file <file1>, <file2> ...

Quick Start

Generating Benchmarks

Execute

notebooks/baseline_generation.ipynb

to benchmark the Pytorch version of Stable Diffusion's UNet from HuggingFace.

Run

python tinydiffusion/src/onnx_export.py

to export the UNet to an ONNX graph.

Then execute

notebooks/onnxruntime_generation.ipynb

to benchmark the ONNXRuntime version of UNet on GPU and CPU.

Visualizing

Once all the benchmarking results populated in results/benchmarks/benchmark_results.csv, run

python tinydiffusion/src/benchmark_visualizer.py

to generate the visualization plots for comparison.

Results

Analysis

For analysis see this

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
notebooks		notebooks
results/benchmarks		results/benchmarks
tinydiffusion		tinydiffusion
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Analysis.md		Analysis.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
tinydiffusion_icon.png		tinydiffusion_icon.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TinyDiffusion

Introduction

Features

Setup

Sanity

Quick Start

Generating Benchmarks

Visualizing

Results

Analysis

About

Uh oh!

Languages

License

DivyenduDutta/TinyDiffusion

Folders and files

Latest commit

History

Repository files navigation

TinyDiffusion

Introduction

Features

Setup

Sanity

Quick Start

Generating Benchmarks

Visualizing

Results

Analysis

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages