Soft min SNR gamma #1068

rockerBOO · 2024-01-24T05:26:52Z

In "Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers" they came up with Soft min SNR gamma which smooths out the transition area.

31 = soft_min_snr_gamma = 5
32 = min_snr_gamma = 5

Note I think the math is correct but I could be wrong so if anyone wants to correct I can update it.

feffy380 · 2024-01-24T11:41:21Z

library/custom_train_functions.py

@@ -68,6 +68,13 @@ def apply_snr_weight(loss, timesteps, noise_scheduler, gamma, v_prediction=False
    return loss


+def apply_soft_snr_weight(loss, timesteps, noise_scheduler, gamma, v_prediction=False):
+    snr = torch.stack([noise_scheduler.all_snr[t] for t in timesteps])
+    soft_min_snr_gamma_weight = 1 / (torch.pow(snr if v_prediction is False else snr + 1, 2) + (1 / float(gamma)))


The math here is incorrect. SNR is equal to the whole expression 1/sigma**2, not sigma (at least based on the fact that here they use Min(1/sigma**2, gamma) and in the min-snr paper they use Min(SNR, gamma). The variable names are inconsistent between papers so I don't blame you for getting them confused).
The correct weight should be:

sigma2 = 1 / snr 1 / (sigma2 + 1/gamma) 1 / (1/snr + 1/gamma) # simplified weight = snr * gamma / (snr + gamma)

Finally, the given formulation for soft-min-snr is for x_0 prediction. We use epsilon or v-prediction, which according to the original min-snr paper means we need to divide by SNR or SNR+1 respectively, so the final weight calculation should be:

snr_weight = (snr * gamma / (snr + gamma)).float().to(loss.device) if v_prediction: snr_weight /= snr + 1 else: snr_weight /= snr

Tried this formula

def apply_soft_snr_weight(loss, timesteps, noise_scheduler, gamma, v_prediction=False): snr = torch.stack([noise_scheduler.all_snr[t] for t in timesteps]) snr_weight = (snr * gamma / (snr + gamma)).float().to(loss.device) if v_prediction: snr_weight /= snr + 1 else: snr_weight /= snr loss = loss * snr_weight return loss

And produced the same loss curve as the current implementation. But the paper says it should match up except for the ones closer to the transition so the loss curves should be similar. Still seeing a difference with the Min SNR version though.

38 = weight = snr * gamma / (snr + gamma)
37 = the current PR version
35 = Min SNR version.

(38 and 37 are overlapping in the graph)

Maybe there's something else that is missing in these calculations.

Here is some example snr,gamma to use with the following test script
snr.txt

And using this code to test the formulas.

import math with open("snr.txt", "r") as f: lines = f.readlines() print("snr min_snr soft soft2") for line in lines: snr, gamma = line.split(",") snr = float(snr) gamma = float(gamma) min_snr = min(1 / math.pow(snr, 2), gamma) soft_min_snr_gamma = 1 / (math.pow(snr, 2) + (1 / gamma)) snr_weight = (snr * gamma / (snr + gamma)) print(f"{snr:10.4f} {min_snr:4.4f}, {soft_min_snr_gamma:4.4f}, {snr_weight:4.4f}")

I don't know what I'm doing, I think, with the math but trying to learn.

Birch-san · 2024-03-07T07:26:09Z

here's how to formulate it as an EDM target:
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/layers.py#L65

here's how to formulate it as an x0 loss weighting:
https://github.com/Birch-san/k-diffusion/blob/9bce54aec1e596548cf73f56f4842c11aa6271c6/k_diffusion/layers.py#L160

here's an alternative style for expressing it as an EDM target, where we use the x0 loss weighting and apply a correction to adapt it for EDM:
https://github.com/Birch-san/k-diffusion/blob/9bce54aec1e596548cf73f56f4842c11aa6271c6/k_diffusion/layers.py#L250

Birch-san · 2024-03-07T08:01:25Z

library/custom_train_functions.py

+        "--soft_min_snr_gamma",
+        type=float,
+        default=None,
+        help="gamma for reducing the weight of high loss timesteps. Lower numbers have stronger effect. 5 is recommended by paper. / 低いタイムステップでの高いlossに対して重みを減らすためのgamma値、低いほど効果が強く、論文では5が推奨",


we don't recommend gamma=5, we recommend gamma=sigma_data**-2.
for our pixel-space dataset, ImageNet, we declared sigma_data=0.5 and hence gamma=4.
for latent datasets, you probably want sigma_data=1.0 and hence gamma=1. because you are not training on the raw latents, you are first multiplying by 0.13025 to standardize their std to 1.

sigma_data is the standard deviation of your dataset's pixels (or latents).

Updated to recommend 1. Mistakenly I copied the help function from the min_snr_gamma option. Thank you!

rockerBOO · 2024-04-01T22:10:18Z

here's how to formulate it as an EDM target: https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/layers.py#L65

here's how to formulate it as an x0 loss weighting: https://github.com/Birch-san/k-diffusion/blob/9bce54aec1e596548cf73f56f4842c11aa6271c6/k_diffusion/layers.py#L160

here's an alternative style for expressing it as an EDM target, where we use the x0 loss weighting and apply a correction to adapt it for EDM: https://github.com/Birch-san/k-diffusion/blob/9bce54aec1e596548cf73f56f4842c11aa6271c6/k_diffusion/layers.py#L250

Thanks for sharing these. Not sure what EDM or x0 loss weighting are to know what makes sense to use here. I foolishly tried to implement it but do not understand the math underlying it.

I will try these implementation details you shared to see if i can discern how to implement it properly into this code also integrating what feffy has suggested.

kohya-ss · 2024-04-02T12:47:31Z

Sorry for the delay for merging. However, it would be nice if this PR could be improved. I am also trying to understand the math but it is quite difficult...

Please let me know when the PR is ready.

rockerBOO added 2 commits January 24, 2024 00:22

Add soft_min_snr_gamma

d8155bf

Clean up code

38ef8ea

feffy380 reviewed Jan 24, 2024

View reviewed changes

Birch-san reviewed Mar 7, 2024

View reviewed changes

Update help arguments for soft_min_snr_gamma

d004690

This was referenced Oct 4, 2024

Add Timestep Sampling Function from SD3 Branch to SD #1668

Closed

Add Timestep Sampling Function from SD3 Branch to SD (dev base) #1671

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soft min SNR gamma #1068

Soft min SNR gamma #1068

rockerBOO commented Jan 24, 2024 •

edited

Loading

feffy380 Jan 24, 2024 •

edited

Loading

rockerBOO Jan 24, 2024 •

edited

Loading

Birch-san commented Mar 7, 2024

Birch-san Mar 7, 2024

rockerBOO Apr 1, 2024

rockerBOO commented Apr 1, 2024

kohya-ss commented Apr 2, 2024

Soft min SNR gamma #1068

Are you sure you want to change the base?

Soft min SNR gamma #1068

Conversation

rockerBOO commented Jan 24, 2024 • edited Loading

feffy380 Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

rockerBOO Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

Birch-san commented Mar 7, 2024

Birch-san Mar 7, 2024

Choose a reason for hiding this comment

rockerBOO Apr 1, 2024

Choose a reason for hiding this comment

rockerBOO commented Apr 1, 2024

kohya-ss commented Apr 2, 2024

rockerBOO commented Jan 24, 2024 •

edited

Loading

feffy380 Jan 24, 2024 •

edited

Loading

rockerBOO Jan 24, 2024 •

edited

Loading