OminiControl: Minimal and Universal Control for Diffuison Transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang
Learning and Vision Lab, National University of Singapore
OmniControl is a minimal yet powerful universal control framework for Diffusion Transformer models like FLUX.
-
Universal Control 🌐: A unified control framework that supports both subject-driven control and spatial control (such as edge-guided and in-painting generation).
-
Minimal Design 🚀: Injects control signals while preserving original model structure. Only introduces 0.1% additional parameters to the base model.
- Environment setup
conda create -n omini python=3.10
conda activate omini
- Requirements installation
pip install -r requirements.txt
- Subject-driven generation:
examples/subject.ipynb
- In-painting:
examples/inpainting.ipynb
- Canny edge to image, depth to image, colorization, deblurring:
examples/spatial.ipynb
Demos (Left: condition image; Right: generated image)
Text Prompts
- Prompt1: A close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!.'
- Prompt2: A film style shot. On the moon, this item drives across the moon surface. A flag on it reads 'Omini'. The background is that Earth looms large in the foreground.
- Prompt3: In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.
- Prompt4: In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.
- Image Inpainting (Left: original image; Center: masked image; Right: filled image)
- Prompt: The Mona Lisa is wearing a white VR headset with 'Omini' written on it.
- Prompt: A yellow book with the word 'OMINI' in large font on the cover. The text 'for FLUX' appears at the bottom.
-
Other spatially aligned tasks (Canny edge to image, depth to image, colorization, deblurring)
Subject-driven control:
Model | Base model | Description | Resolution |
---|---|---|---|
experimental / subject |
FLUX.1-schnell | The model used in the paper. | (512, 512) |
omini / subject_512 |
FLUX.1-schnell | The model has been fine-tuned on a larger dataset. | (512, 512) |
omini / subject_1024 |
FLUX.1-schnell | The model has been fine-tuned on a larger dataset and accommodates higher resolution. (To be released) | (1024, 1024) |
Spatial aligned control:
Model | Base model | Description | Resolution |
---|---|---|---|
experimental / <task_name> |
FLUX.1 | Canny edge to image, depth to image, colorization, deblurring, in-painting | (512, 512) |
experimental / <task_name>_1024 |
FLUX.1 | Supports higher resolution.(To be released) | (1024, 1024) |
@article{
tan2024omini,
title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
author={Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang},
journal={arXiv preprint arXiv:2411.15098},
year={2024}
}