-
Notifications
You must be signed in to change notification settings - Fork 21
Add support for Flux1.D #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I'm not opposed to merging this but the model should be good enough to be useful. There seems to be quite large artifacts in the resized image. How much denoising is needed for the second stage to get rid of them? Scaling factor is for normalizing the latent standard deviation, it's not strictly necessary. SDXL resizer quality was lower than SD1.5 because the VAE is trained more and it compresses more. I guess the same thing happens with Flux that it might be difficult for this small model to resize the latent effectively. I have slightly better model locally. I didn't update it here because I didn't want to version the models. I pushed some training changes to dev branch. The biggest change was trying to make the model predict original latent from downsampled latent instead of the opposite. There were also some other loss function related changes, I don't think those were as important. |
Hey, thank you for your answer! And sorry for the long delay on my side, I'm quite swamped with work these days. 🫠
Actually, I think it is not that bad. The model was a bit undertrained, and after training it some more on the COCO dataset, it has improved some more. But even before, it was not that bad. My previous screenshot was not very clear, so maybe you've looked at the image showing upscaling with bislerp? You can see it add some blurriness (scaled image is the left side in the image comparer node), but it doesn't destroy the image like when scaling with vanilla interpolation. It's easier to see in a vid, I suppose: shrooms.mp4
Actually, I'm having a bit of a problem with this, and I have no idea of why. I was expecting that maybe you could have an answer for this. I actually can't get ridge of the blurriness by doing a second pass with low denoise... It makes no sense to me, but it is what happens. If I upscale via interpolation and use a denoise of ~0.6 for 10 steps (euler, beta), I get a sharp image, albeit changed a bit. Now, if I scale via the NN, then try to denoise in the second pass, my output image comes out still blurry, even with a high denoise value ~0.6. I have tried injecting noise back into the latent, with the hypothesis that, without the noise, the denoiser would have nothing to work with or something... But then I just end-up with a noisy blurry image... Even using stuff like detail daemon, lying sigma samples, or whatever, only gives me blurry noisy images... Obviously, if I crank up the denoise, it will change the whole image and get me some sharp image back. But then there is no point... Do you have any idea at all for why this would be happening? I'm up for doing some tests if you need some more information. This is driving me crazy lol
Ok, cool. In my training, I've used the value found in the flux vae's config,
Would you recommend increasing the model param count? I suppose we'd just need some more data in order to scale training along? 🤔
Nice! Thanks for letting me know. I'll try and train a new model using the changes and check if the results will improve. |
Slight blurring is probably unavoidable with this NN architecture. It would be easier for the neural net if it would only have to learn one or just few scaling factors. Superresolution training (predict high resolution from down scaled) slightly improves it, but doesn't completely get rid of it. Getting rid of the blurriness is in my opinion caused by the diffusion model first predicting the low frequency large scale details first and then improving the details at later steps. If the latent is blurry at middle or late steps, it won't get rid of it anymore. |
Hey!
I've trained a network for using with Flux1.D.
I've trained it using the pseudo-camera-10k dataset for 37k steps, fp16, resolution 512 and batch_size 1. (So it fits on my RTX3090).
It works somewhat fine:

Since I don't know exactly which slip of the COCO dataset you've used for validation, I can't add the stats to the readme.
I also have a bunch of questions, like why have you used
scaling_factor = 0.13025
everywhere? Is this critical?I've noticed that my upscaled images are a bit more blurry than the original ones before upscaling (without sampling, just vae encode -> upscale -> vae decode). When comparing with the results of your network for SDXL, the edges are a bit more blurry, but textures are a bit better. Not sure if the network is undertrained or what. Do you have any ideas?
Note: I know that there are unnecessary changes to the PR (formatting and whatnot). I was not expecting it to work so well, so I didn't take a lot of care while messing with it. If you want to take the contribution, I can clean it up before merging.
So, let me know what you think and thank you very much for the node and training code!
Cheers!