You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Memory modes control **how models are placed across GPUs and CPU memory** during inference. They are designed to simplify setup while offering fine-grained control when needed.
4
+
5
+
| Mode | Description |
6
+
|------|-------------|
7
+
|**Auto**| Automatically selects the best memory strategy for the selected device(s). |
8
+
|**Balanced**| Distributes model weights across all available GPUs and the CPU (multi-GPU setups). |
9
+
|**Lowest**| Sequential CPU offload for minimum GPU memory usage (slowest, lowest VRAM). |
10
+
|**Low**| Model CPU offload with VAE slicing and tiling enabled. |
11
+
|**Medium**| Model CPU offload without VAE slicing or tiling. |
12
+
|**High**| All models loaded on the selected device, with VAE slicing and tiling enabled. |
13
+
|**Highest**| All models fully loaded on the selected device (fastest, highest VRAM usage). |
14
+
15
+
> **Tip:**
16
+
> If you’re unsure which mode to use, start with **Auto** — it handles most cases well.
0 commit comments