-
Notifications
You must be signed in to change notification settings - Fork 574
additional options for image generation #1765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
additional options for image generation #1765
Conversation
|
For 1, I'm fine with this but I think that Euler actually makes a good default in all cases - I have noted instability with Euler A sampling even in SDXL models, is there ANY model that does not work well with Euler? If not I think we can just use that as a default. In fact I think that's the default for the webUi and the API if unspecified too. I'm ok with either. For 2, not really a fan of this approach. The prompt field should be considered "sandboxed" and not be overloadable with anything that affects generation behavior (indeed that's partially why I didn't want the lora switching there either). If you need addition gen-time parameters, I think it's best to add them to the generation API (which will be well known) |
I don't think it's a bad default. I only know about an admittedly very obscure corner case: SDXL DMD2 models (which are intended to be used with LCM) usually work OK with Euler A, but tend to make noisy outputs with Euler. Aside: in that same line, I miss a way to reset the interface to "known to be good" default parameters: say, choosing "LCM, 10 steps, CFG 1" for LCM models. But that's a bit too much front-end development for me...
The backend's default itself is Euler A right now (forcibly changed to Euler for SD3/Flux); I don't remember if koboldcpp.py sets its own default. My main argument would be from a least-surprises point of view: if the user know the "default" is good for mainline sd.cpp, that same string (or not specifying it) will give them similar results on Koboldcpp. The same for development on our side: "default" would simply mean "sd.cpp's responsibility" (like it is right now with the default scheduler), even if a new model comes up with completely different parameters for "good" results. So a "library default" would be a bit more future-proof (eg. for Qwen Image). The same argument suggests we shouldn't forcibly change Euler A to Euler, because users see "Euler A" on the interface, and could then learn that "Euler A is OK for Flux". So, with either Euler or "sd.cpp default" as a default sampler, I think it's best to disable that forceful change; I can add a commit to this PR to do that.
Fair enough. I don't disagree the prompt isn't really the best place to do it. It's just very convenient 🙂 Main motivation right now: NitroFusion models are very fast (under 2s on my card) , and give much better results than other 1-step models like SDXL Turbo, so they'd be ideal for scene generation in an adventure chat. But they need a different scheduler (like sgm_uniform) and an additional parameter (timestep-shift), which right now can't be specified either at model loading time or on the chat's media settings tab. And the prompt is already there... The scheduler arguably goes hand-in-hand with the sampler, and could even be specified in the same field if we managed to keep the number of combinations under control ("Euler A sgm_uniform", "Heun SmoothStep", "Euler Karras", etc). But timestep-shift is both obligatory for NitroFusion models (and the same value doesn't work for different models), and not really needed in most cases (although it can be specified for non-NitroFusion models, with interesting results). So... how do we specify it on the web interface without making it cluttered for all other uses? Would you consider OK a specialized "parameter prompt", which only receives white-listed key-value pairs like the code here? I don't care much about the syntax (could be like this, or JSON, "KEY:VALUE ", even command-line like), but the whole add-a-separate-control-for-each-field simply doesn't scale. |
Yeah, if needed I suppose adding an extra "meta" field would work for that. Though, did you need this in the UI as well? Or only in the API? Also, would it be possible to simply hard code good hardcoded values for each architecture type? (if possible that would be the easiest option) |
It's almost only useful at the UI level. If we had another way of specifying arbitrary key+value fields on the UI, we could simply make it set different API fields. The only reason for API-level support would be to help simplifying the UI for that implementation.
Sometimes. In the NitroFusion case, for instance, the scheduler doesn't matter that much, as long as you choose from a specific subset; in principle, we could just hardcode "sgm_uniform". Problem is, they look like normal SDXL models, so we'd have to find a way to identify them. The timestep-shift value is even worse, since that seemingly depends only on model training. Setting defaults at loading time would work... but then we get the same "UI fields" problem, on Tk 🙂 |
|
Just digging a little bit deeper onto another point:
The LoRA support is a bit of a worst offender, since it can load arbitrary (leejet/stable-diffusion.cpp#830) files on the server. But for some other cases, I'm not entirely sure. For instance, you can specify
at the prompt level. And if the model understood that, something like
would be even better. And this is not that different from
as long as we ensure there are no "malicious" values (say, a negative or INFINITY causing a server crash - which is something API fields and token weight parsing already need to do), and it can't be used for circumventing parsers or filters (a convincing example is a bit hard here, but someone could make use of the parameter removal to smuggle a forbidden word past a filter, like "NS(cfg_scale:10)FW" ). And since the result will just fill up some specific image generation fields, the worst that could happen would be a bad generation. Of course, we may not want the user having access to those fields; this should be processed before that restriction is applied. A separate 'meta' prompt field could help here. |
|
Another reason for avoiding generation config in the prompt is also the input might be from unknown sources or unsanitized. For example, the img prompt could be based off the output of a LLM, or fed forward from some search results. The user might not want such sources affecting gen params (imagine if your LLM could affect step count from the prompt by saying "render 5000 more steps") |
68c4a8f to
d3b7138
Compare
|
I kept the prompt parsing because it's the only way right now to test the new fields, but moved it to the Python level on the last revision, so it can be easily discarded. Note it's not ready for inclusion even if you'd agree to it, since I didn't whitelist the fields; I'll turn it into a proper implementation if/once we add a meta prompt. I still need to figure out how to properly add them to the API. Another thing: do we have a ready-to-use script or curl prompt to test the API directly? (by the way, I rebased this onto #1769 just to make it easier to test with Flux and Chroma; that PR should be uncontroversial enough to be included first )
I agree parameters that directly affect resource usage (like step count and resolution) should be kept out of it anyway. For the normal prompt, I'd simply whitelist parameters known to only affect image results (cfg_scale and sampler would be borderline according to this, since they could change the step duration). |
|
If you build koboldcpp, you can test it via the swagger API page docs at http://localhost:5001/api adding a new field is as simple as reading it inside the I'm a bit confused though, cause you mentioned you wanted (arbitary?) extra field support through a "meta" parameter, but then now you're adding them as individual named items again? What is I thought you were going for: whereafter the input.meta field gets parsed C++ side like you did originally. |
Thanks, I'll look into it.
That syntax (with a renamed timestep-shift -> shifted_timesteps for Python dictionary reasons) is still supported, just applied at the Python level. I've kept it at the main prompt just because a new meta_prompt can't be tested right now. The move is mainly for keeping the functional part (generation parameters) separate from the user and API facing part (separate fields / prompt / meta prompt), because rebasing and changing the C++ code to adapt that approach is costly (I have to fully rebuild Koboldcpp every time there is a Vulkan shader change). Separate C++ fields may also make sense if we decide to change the approach: for instance, setting a fixed timestep-shift value at model loading time. In a nutshell, my targets are:
The prompt parsing thing is just an approach to reach that: I need some way to change the timestep-shift and scheduler through the Koboldcpp UI, to see the resulting images on the Koboldcpp interface. We have several separate questions at this point:
The ( an aside: from a technical POV (say, if I was designing it from the ground), it doesn't make any sense to cram parameters into a new syntax at the API level: if there's software at the other end, it's simpler and less error-prone to use separate fields. That prompt syntax is a user thing, it should be done at the UI level. But I'm not sure my non-existent Vue/Javascript skills would be enough for that approach ) |
d3b7138 to
2044088
Compare
|
Yeah the idea with the single "meta" field is it can, like you said, use a simple text box for it's UI. It's like a parameter garbage bin, you just throw all the stuffing inside and the backend sorts out what it can interpret, ignoring what it cannot, while keeping the main prompt clean |
|
I included an additional config field to set default image generation options, as a simpler way to be able to customize the parameters for the adventure+nitro case. If you think it's a good feature, I can work on polishing it first, and leave the meta_prompt field (UI and all) for later, possibly a follow-up PR. Edit: to be clearer, my idea is it'd work as a "default meta prompt", with the same syntax. The meta prompt values would override these (field by field), and any dedicated UI control or API field would override both. I don't mind it that much, but the syntax is.. kinda weird, if we don't need to separate the parameters from prompt tokens. We could use instead:
What do you think? |
|
See how |
92df280 to
f282362
Compare
14e0b80 to
1e1c23e
Compare
|
I did the change to support JSON syntax on the The displayed image parameters still need some polishing, since those values currently come from the requested fields, not the values actually sent to the backend (that issue was already present for e.g. the seed value). I can fix that in a separate PR, if you wish to speed things up. EDIT: moved the prompt hack to https://github.com/wbruna/llama.cpp/tree/kcpp_sd_backend_options_hack , if needed for testing. |
|
Okay looks mostly fine, can we ditch the LambdaPrinter, just add it to the metadata like normal. |
1e1c23e to
f41a0ec
Compare
|
Done. I took a look at the displayed image metadata, and it seems to me it'd need to be done both at the API (to return the true image parameters) and the UI level (to consider the replaced parameters)? By the way, I did a bunch of tests with Qwen-Image on Vulkan; working fine so far. The clip_l -> clip_1 change got me thinking: what if we received all, or most parameters on the C++ API, inside a single JSON string? It'd be a bit of work at first to parse it at the C++ side, but we already have access to a JSON parser, and multiple-field issues like this would be far easier to support. The Python side would get much simpler too. I'm even considering proposing something like this upstream, to help out other web interfaces. |
|
Passing JSON to C++ is not ideal. the python json parser is much better than the C++ one and handling errors in python is much easier too. Especially when debugging, doing it in C++ is a pain (like you mentioned before) btw i might have caused a conflict when shifting a variable around |
724b262 to
e802a79
Compare
Not raw JSON from the web parameters, just as a replacement for the inputs struct. We'd still have to validate everything at the Python level, even if just to check defaults against the Another option that could work for the models is a pair of arrays (model type, model filename), with the type specified either an an enum or a fixed string. I'll experiment a bit with that (for another PR, though 🙂). |
|
I noticed you removed this we can't use Euler A on flux, it breaks completely. Since the frontend (e.g. SDUI) does not know this, I think we should use Euler even if the user intentionally picks Euler A in their UI, in order to avoid broken generations. |
I thought you had it switched to Euler? If the default is still Euler A, I agree it's best to keep it. I'll leave it to another PR. |
e802a79 to
b583890
Compare
I did, but my point is, anyone can have their own frontend. There might be third party UIs out there we don't even know exist. Anyway looking mostly good now so i will merge shortly |
|
@LostRuins , when loading a config file through the admin interface: if the file doesn't specify a parameter (for instance, because it was created with a previous version), is the parameter supposed to retain its current value? I have noticed that values set by the I don't think this behavior is necessarily wrong, at least not for all config fields (the admin password, for instance, could then be set on a single config file or command line, and kept out of most configs), but I'm not sure what would be the best way to handle this case. |
|
Hmm no, the config file should be independent and not carry over any settings from a previous config. Do you notice this issue with any other parameter? As far as I've seen they are all correctly reset on load/unload of a new file. |
Where it is supposed to happen? Looking at the code, I see
Now that I think about it, several weeks ago I had a similar issue while alternating between Vulkan and ROCm. I assumed I had hit a corner case related to my setup (I was used to reset everything anyway because I needed to change env vars), but it could have been exactly this: ROCm (or Vulkan, I don't remember which) getting "stuck" if not explicitly disabled by the new config, because of a parameter being renamed. |
|
You are correct, let me think of a better solution. |
Back to the "Euler A isn't really Euler A" subject: could we at least switch the default sampler to the string "default", both on the interface and at the API level, even if we still keep Euler as the default sampler in koboldcpp.py? This would prevent the same Euler A issue with Euler, if any future model happens to not work well with it. It would also allow a default sampler to be set by the sgdefaults field - LCM, for instance. |
|
Hey @wbruna , i'm gonna add the "Scheduler" dropdown to stableui since it seems quite a few people want to adjust the scheduler. For the sampler_name we can use "default" in the list now but i'm still unsure if putting that in first is ideal, since it does break existing cases and isnt a1111 recognized.
|
|
actually maybe i will keep your original name |
What if, for the "default" sampler, the sampler_name field was simply not defined (even if it appears as "default" to the user)? I'd expect most (all?) APIs would use a default value for any omitted field. |
actually i don't know lol. thats why i am trying to err on the safe side. I know for sure kobold would handle it correct, but ive already ditched a1111 so i dont know about those (whether its a mandatory field). I think it should be safe though, so we can try that |
|
alright adjusted |


The first commit adds support for the "default" sampler, letting sd.cpp choose according to the model (Euler for SD3 / Flux / Chroma, Euler A otherwise).
The other two are more of an experiment: they add support for specifying flags through the prompt string, using the Perchance syntax
(PARAMETER:::VALUE). This way, we don't need specific support from the web interface to test them. I added support for the scheduler and timestep-shift flags, enough to make NitroFusion models work (with(timestep-shift:::250)for NitroSD Realism,(timestep-shift:::400)for Vibrant, and(scheduler:::sgm_uniform)for both).That syntax works well because it can't really happen in a normal prompt without resulting in an incorrect prompt weighing. Another option could be following the LoRA general syntax, with
<param:PARAMETER:VALUE>or something like that.