Skip to content

Commit 80f9c80

Browse files
committed
README + Docs
1 parent 22dd726 commit 80f9c80

File tree

3 files changed

+47
-3
lines changed

3 files changed

+47
-3
lines changed

Docs/Memory.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
## Memory Modes
2+
3+
Memory modes control **how models are placed across GPUs and CPU memory** during inference. They are designed to simplify setup while offering fine-grained control when needed.
4+
5+
| Mode | Description |
6+
|------|-------------|
7+
| **Auto** | Automatically selects the best memory strategy for the selected device(s). |
8+
| **Balanced** | Distributes model weights across all available GPUs and the CPU (multi-GPU setups). |
9+
| **Lowest** | Sequential CPU offload for minimum GPU memory usage (slowest, lowest VRAM). |
10+
| **Low** | Model CPU offload with VAE slicing and tiling enabled. |
11+
| **Medium** | Model CPU offload without VAE slicing or tiling. |
12+
| **High** | All models loaded on the selected device, with VAE slicing and tiling enabled. |
13+
| **Highest** | All models fully loaded on the selected device (fastest, highest VRAM usage). |
14+
15+
> **Tip:**
16+
> If you’re unsure which mode to use, start with **Auto** — it handles most cases well.

Docs/Quantization.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
## Quantization
2+
3+
Diffuse supports **automatic INT8 quantization** during model load to reduce VRAM usage.
4+
5+
### Supported Backends
6+
7+
Diffuse supports two quantization backends:
8+
9+
1. **quanto**
10+
- Used in the default environments
11+
- Supports both **CUDA** and **ROCm**
12+
13+
2. **torchao**
14+
- Optional CUDA-only environment
15+
- Requires a custom environment build
16+
17+
### Key Notes
18+
19+
- Only **INT8 quantization** is currently supported
20+
- Quantization is **automatic** and happens during model loading
21+
- INT8 can reduce VRAM usage by **~30–40%**
22+
- Inference may be **slightly slower** when quantization is enabled
23+
24+
> Quantization is best suited for memory-constrained systems where VRAM is more important than raw speed.

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,17 +69,21 @@ Proof of concept, Focus on core functionality.
6969
- ~~Portable Python installation and management~~
7070
- ~~Device-specific virtual environments~~
7171
- ~~Minimal but functional Windows UI~~
72-
- B~~asic Diffusers pipeline support~~
72+
- ~~Basic Diffusers pipeline support~~
7373

7474
### Beta
7575
Focus on usability, stability, and feature expansion.
7676
- ~~Fully isolated Python execution~~
7777
- ~~Multiple virtual environments~~
7878
- ~~Installer and deployment tooling~~
79+
- ~~Upscaling and interpolation support~~
80+
- ~~Extractor pipeline support~~
7981
- Advanced UI and workflow options
8082
- ControlNet support
81-
- Upscaling and interpolation support
82-
- Extractor pipeline support
83+
- GGUF model support
84+
- Weighted prompt support
85+
- Inpaint/Outpaint processes
86+
- Model Manager, download queuing, online templates
8387
- Stability, performance, and reliability improvements
8488

8589
---

0 commit comments

Comments
 (0)