You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/advanced_diffusion_training/README_flux.md
+18Lines changed: 18 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -76,6 +76,24 @@ This command will prompt you for a token. Copy-paste yours from your [settings/t
76
76
> `pip install wandb`
77
77
> Alternatively, you can use other tools / train without reporting by modifying the flag `--report_to="wandb"`.
78
78
79
+
### LoRA Rank and Alpha
80
+
Two key LoRA hyperparameters are LoRA rank and LoRA alpha.
81
+
-`--rank`: Defines the dimension of the trainable LoRA matrices. A higher rank means more expressiveness and capacity to learn (and more parameters).
82
+
-`--lora_alpha`: A scaling factor for the LoRA's output. The LoRA update is scaled by lora_alpha / lora_rank.
83
+
- lora_alpha vs. rank:
84
+
This ratio dictates the LoRA's effective strength:
85
+
lora_alpha == rank: Scaling factor is 1. The LoRA is applied with its learned strength. (e.g., alpha=16, rank=16)
86
+
lora_alpha < rank: Scaling factor < 1. Reduces the LoRA's impact. Useful for subtle changes or to prevent overpowering the base model. (e.g., alpha=8, rank=16)
87
+
lora_alpha > rank: Scaling factor > 1. Amplifies the LoRA's impact. Allows a lower rank LoRA to have a stronger effect. (e.g., alpha=32, rank=16)
88
+
89
+
> [!TIP]
90
+
> A common starting point is to set `lora_alpha` equal to `rank`.
91
+
> Some also set `lora_alpha` to be twice the `rank` (e.g., lora_alpha=32 for lora_rank=16)
92
+
> to give the LoRA updates more influence without increasing parameter count.
93
+
> If you find your LoRA is "overcooking" or learning too aggressively, consider setting `lora_alpha` to half of `rank`
94
+
> (e.g., lora_alpha=8 for rank=16). Experimentation is often key to finding the optimal balance for your use case.
95
+
96
+
79
97
### Target Modules
80
98
When LoRA was first adapted from language models to diffusion models, it was applied to the cross-attention layers in the Unet that relate the image representations with the prompts that describe them.
81
99
More recently, SOTA text-to-image diffusion models replaced the Unet with a diffusion Transformer(DiT). With this change, we may also want to explore
Two key LoRA hyperparameters are LoRA rank and LoRA alpha.
175
+
-`--rank`: Defines the dimension of the trainable LoRA matrices. A higher rank means more expressiveness and capacity to learn (and more parameters).
176
+
-`--lora_alpha`: A scaling factor for the LoRA's output. The LoRA update is scaled by lora_alpha / lora_rank.
177
+
- lora_alpha vs. rank:
178
+
This ratio dictates the LoRA's effective strength:
179
+
lora_alpha == rank: Scaling factor is 1. The LoRA is applied with its learned strength. (e.g., alpha=16, rank=16)
180
+
lora_alpha < rank: Scaling factor < 1. Reduces the LoRA's impact. Useful for subtle changes or to prevent overpowering the base model. (e.g., alpha=8, rank=16)
181
+
lora_alpha > rank: Scaling factor > 1. Amplifies the LoRA's impact. Allows a lower rank LoRA to have a stronger effect. (e.g., alpha=32, rank=16)
182
+
183
+
> [!TIP]
184
+
> A common starting point is to set `lora_alpha` equal to `rank`.
185
+
> Some also set `lora_alpha` to be twice the `rank` (e.g., lora_alpha=32 for lora_rank=16)
186
+
> to give the LoRA updates more influence without increasing parameter count.
187
+
> If you find your LoRA is "overcooking" or learning too aggressively, consider setting `lora_alpha` to half of `rank`
188
+
> (e.g., lora_alpha=8 for rank=16). Experimentation is often key to finding the optimal balance for your use case.
189
+
173
190
### Target Modules
174
191
When LoRA was first adapted from language models to diffusion models, it was applied to the cross-attention layers in the Unet that relate the image representations with the prompts that describe them.
175
192
More recently, SOTA text-to-image diffusion models replaced the Unet with a diffusion Transformer(DiT). With this change, we may also want to explore
0 commit comments