-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance 6/6] Add --precision half option to avoid casting during inference #15820
Conversation
will force-fp16 mode conflicting with fp8 unet? |
I'm not sure if this is related to using dynamic lora weight
wonder if it's related to this |
Enabling |
Found the offending line. In h = x.type(self.dtype) while in # h = x.type(self.dtype)
h = x
I don't know if it's the appropriate place to put it, but setting |
something like this?
this does fixed dtype mismatch error |
Thanks for digging out the solution! Verified that the solution works. |
I'm still getting the following runtime error with both SDXL and SD15 models:
Seems to be related to |
Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision. |
Sure, I tried a few:
Same error regardless of checkpoint. It probably has something to do with my environment, although I'm not sure what yet. Here's a bit more context:
I'll write back if I figure out the cause. |
I’ve tested this on a 6700 XT and there is a performance improvement. However, I think that this should not disallow setting |
another report of fp8 issue |
with using FP16 VAE i got almost double speed compared to no-half-vae nice FP16 VAE is mandatory |
Description
According to lllyasviel/stable-diffusion-webui-forge#716 (comment) , casting during inference is a main source of performance overhead. ComfyUI and Forge by default does fp16 inference without any casting, i.e. all tensors are fp16 before inference. The performance overhead is ~50ms/it.
This PR adds an option
--precision half
to disable autocasting and use all fp16 values during inference.Screenshots/videos:
Checklist: