Skip to content

DivyenduDutta/TinyDiffusion

Repository files navigation

TinyDiffusion

tinydiffusion

Benchmarking and Optimized Stable Diffusion for Edge Devices.

License Black

IntroductionSetupQuick StartResultsAnalysis

Introduction

This repository benchmarks Stable Diffusion UNet inference performance across different runtimes. The focus is on comparing:

  • Native PyTorch UNet (GPU) execution
  • ONNXRuntime UNet (GPU) execution
  • ONNXRuntime UNet (CPU) execution

The goal is to understand the trade-offs in inference speed, CPU/GPU memory usage, and runtime stability when exporting Stable Diffusion components to ONNX and running them with onnxruntime.

Features

ONNX Export: Export the Stable Diffusion UNet model from Hugging Face’s diffusers library into an ONNX graph.

Flexible Inference: Run inference on CPU or GPU with onnxruntime or fall back to native PyTorch.

Benchmarking Suite: Collect detailed metrics including:

  • Average inference time & standard deviation
  • CPU memory usage (Resident Set Size)
  • GPU memory allocation

Results Logging: Save benchmarking results to CSV, with the ability to append new results across runs.

Visualization: Generate plots comparing performance across backends for quick insights.

Why ONNX?

ONNX allows exporting deep learning models into a framework-agnostic format. With onnxruntime, models can run on multiple backends (CPU, CUDA, TensorRT, DirectML, etc.) without depending on PyTorch. While this repo shows that ONNX on CPU can be useful for portability, we also observe that PyTorch often outperforms ONNX on GPU for Stable Diffusion UNet inference.

Setup

Add the project root ie, Folder containing this README to PYTHONPATH whichever way you want. One way would be to create a .env and write the following in it

PYTHONPATH=\full\path\to\projectroot

And place this .env file in the project root. Works for VS Code.

Another option would be to run $env:PYTHONPATH = \full\path\to\projectroot in powershell to set the env variable and then run the scripts.

Install pytorch, torchvision via pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu118 - Conda doesnt install GPU version on Windows.

Sanity

Before committing changes run pre-commit run --all-files or pre-commit run --file <file1>, <file2> ...

Quick Start

Generating Benchmarks

Execute

notebooks/baseline_generation.ipynb

to benchmark the Pytorch version of Stable Diffusion's UNet from HuggingFace.

Run

python tinydiffusion/src/onnx_export.py

to export the UNet to an ONNX graph.

Then execute

notebooks/onnxruntime_generation.ipynb

to benchmark the ONNXRuntime version of UNet on GPU and CPU.

Visualizing

Once all the benchmarking results populated in results/benchmarks/benchmark_results.csv, run

python tinydiffusion/src/benchmark_visualizer.py

to generate the visualization plots for comparison.

Results

Results

Analysis

For analysis see this

About

Benchmarking and Optimized Stable Diffusion for Edge Devices.

Topics

Resources

License

Stars

Watchers

Forks