-
Notifications
You must be signed in to change notification settings - Fork 575
WIP: update stable-diffusion.cpp to 5900ef6605c6 (new API) #1669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: update stable-diffusion.cpp to 5900ef6605c6 (new API) #1669
Conversation
|
thanks. i think most of the files should be relatively easy to upgrade, except for |
|
The current code seems to be working for basic gens + LoRAs, except for an upstream issue with img2img + flash attention ( leejet/stable-diffusion#756 ). I still need to test the more extensively changed code paths (Kontext, Photomaker, TAESD, ...). I've tried to move some local changes from stable-diffusion.cpp to sdtype_adapter.cpp, to facilitate future updates. The Chroma detection got a bit more complex, so let me know if you prefer the previous approach. For removing the LoRAs from the prompt, I've opted for emptying the LoRA list instead, to make the diff cleaner. That also makes a nice place to print a warning, in case the user tries to include a LoRA through the prompt. It may be worth to change the Kontext and Photomaker image lists to use a single image reference list, instead of one list per model. That could avoid further interface changes to support other edit models in the future, like Instruct-Pix2pix and CosXL. But of course that could be done after this update. |
f39ada1 to
aabe701
Compare
|
Chroma isn't working with flash attention (black images), but could be that same upstream issue. Maybe the safest would be a separate flash attention control for image generation... I can add one later, along with configs for the new conv2d_direct optimizations. I've also included here the fix for #1672 , to facilitate the tests. Apart from that: SD1.5, SDXL, SD3.5, Chroma, Flux, Kontext, Photomaker, inpaint, LoRAs and TAESD seem to be working so far (Linux + Vulkan). |
6ebede9 to
faad964
Compare
|
I decided to test flash attention on chroma in v1.97.1 (latest release) cuda, and I also got a black square, so at least that (appears) to be the case for now. |
|
I've rebased into 1.97.1, and folded missing stuff into the original commit, still keeping the refactors separate in case we need to tweak them. I could add a workaround to disable flash attention for Chroma. For img2img / inpaint is trickier, since the flag is set at model loading time; but maybe it could be changed inside the ctx at inference time, like the VAE tiling flag. |
|
we can do enhancements as a separate PR, lets get full feature parity and no regressions first. |
|
I took a look at enabling flash attention at inference time. It's doable, but we'd need to touch a lot of objects on the model tree to propagate the flag. Unless upstream is willing to switch to that approach (as was done for the enable_conv2d_direct flag), it may be best to simply disable flash attention for image gen until leejet/stable-diffusion.cpp#756 moves forward. So, I believe the current code is ready to be reviewed. |
e50114a to
9bf5e31
Compare
9bf5e31 to
5f104d2
Compare
5f104d2 to
4d71dac
Compare
|
I've fixed the pretty_progress throttling, added workarounds for the flash attention issues with Chroma and img2img, and cleaned up the workaround for clip_skip that got fixed upstream. I don't have much time right now to do full tests (Flux is really slow here :-) ), but I'll do it later. |
|
Thanks for everything so far. |
4d71dac to
1bd0b46
Compare
1bd0b46 to
c433877
Compare
|
I redid the tests with the CLIP weights change, and just found a minor bug on the pretty_progress throttling. ROCm is working fine, too. As far as I can tell, it's ready! |
|
Tested Sd1.5, SDXL, Flux, Kontext, seem ok so far. Are you able to use SD3.5 files? I think some time back there was a regression that messed up sd3.5 for me, I also get a black square the latest release either, so it's not the fault of this PR. Testing on this file https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors It does work back in 1.78 when it was introduced. Edit: looks like it broke at 2a890ec |
|
At least SD3.5 medium is working for me; sd3.5m_turbo-Q4_K_M.gguf :
My sd3.5-large gguf is failing right now, but it's a failure to load, not garbage rendering; probably a messed up config. Edit: using this VAE (I don't remember why I picked this one, I just found it again by the SHA256 hash). |
|
I can't get SD3.5-Large-Turbo-GGUF-mixed-sdcpp to load on Koboldcpp. And it's working fine on sd.cpp:
I think Koboldcpp was falling back to another previously working configuration during my tests :-( (also downloading the |
|
When I use taesd i get a brown square, otherwise I get a black square Edit: this was on medium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving first, lets see if we can figure it out.
if not, we can still merge it since all other stuff seems good to go.
|
okay looking at your gguf example I think the fallback code is failing, so the tensor names are appended twice I think the solution is to check for such tensor names before allowing the prefix to be appended. |
|
I did a few tests with sd3.5_medium_incl_clips_t5xxlfp8scaled on sd.cpp:
The external t5xxl comes from t5xxl-Q4_K.gguf. Edit: t5xxl-Q8_0.gguf works fine too. |
|
@wbruna alright after I added my fix for the tensor names, I was able to load the model. However, the results are very weird. I don't know if I am doing it wrong or if I used bad files, since I loaded my own T5 and clip, using your SD3.5-Large-Turbo-GGUF-mixed-sdcpp. Can you merge my changes from 7b5cf71 and check if it works fine for you?
I think maybe i have a faulty t5, clip or vae. but if it works for you i think thats good enough. please merge and try my fix |
could you also check in sdcpp |
It works! This was with the just-pushed merge 9c039b2 , on Vulkan:
|
|
Alright, if no other issues I will merge this now |
Black image for me too. |
|
@LostRuins , the target |
|
Alright I fixed the issues I could fine and it compiles fine for me. Please pull latest experimental. Does it work for you now? |
|
Yeah, it builds fine now; thanks! |
|
Btw i've been keeping my eye on the upstream Qwen Image and WAN developments. Thanks to your PR kcpp is now in sync with the latest API, so merging it shouldn't be too hard. However, we'd probably want to think of a good approach for the API and frontend side since video files will no doubt be massive. AVI is probably not the best format due to it's size, though transcoding without FFMPEG (which cannot be used) will be tricky. Open to suggestions if you have any. |
Yeah... The main issue is the needed ggml support; I don't know how close leejet is to upstreaming them. And I'm already missing fixes on that branch (he's working on a version with that slow Vulkan build time 😢). Unrelated breakage could be an issue, too... But I suspect leejet already had wan in mind for that big refactor, so it could be less of an issue now. And I'm considering again that old idea of keeping an sd branch as an 'upstream' for Kobold. Two, actually: one branch just to pick still-not-applied upstream PRs, another with the specific tweaks for Koboldcpp. I kinda did that locally for the VAE tiling fixes, but making it 'official' would make the changes easier to track (since it doesn't look like we'll have those applied to sd master any time soon...).
Koboldcpp can't embed libavcodec? (honest question; I know very little about video encoding. I confess I'm afraid of even trying video gen on my poor card 😅) |
|
It's really big cause it supports dozens of codecs. And it has multiple mixed licenses, LGPL is technically usable within APGL but i'd prefer to avoid it if i can. I only need ONE single codec that can be played in a browser, thats it. |
|
Would it be possible to use the CLI if it's available, or would that be too indirect? I already have ffmpeg on my PATH anyway for yt-dlp, for example. |
|
Most likely if I cannot find something suitable the output will remain as MJPEG, which the clients can then convert on their side to whatever format they prefer |
Maybe OpenH264? It's BSD licensed, and the binaries seem relatively small: |
|
Wan support just got merged into master. There is a ton of changes, so it may be reasonable to just update the sd.cpp code for now, and add video gen support later. Code-wise, the main issue is that the VAE tiling fixes conflict with some of the Wan changes; I asked stduhpf to update the PR. main.cpp also got a lot of changes, but the API itself didn't change that much, so sdtype_adapter may not need too many fixes right now. |
|
Alright cool. |
|
I think I've found a way to better track those changes. I'm keeping a branch with the upstream code + pending PRs, which will act as a vendor branch: https://github.com/wbruna/stable-diffusion.cpp/tree/koboldcpp_sd_base And another branch with the needed local changes: https://github.com/wbruna/stable-diffusion.cpp/tree/koboldcpp_sd_changes The idea is to keep Right now, that merge is just a first try; I'll review it later, and open a PR. Possibly including those FA changes that just arrived on master... |





This updates the embedded stable-diffusion.cpp code, and apply the necessary changes to the API adapter.
I've tried to preserve the local changes to the codebase:
I've applied other minor changes, like formatting, to minimize the diff against sd.cpp mainline.
Right now, the code merely builds successfully, but I'll keep working on it over the next few days.