Generate caption/TEnc only Lora's easily for influencing model CLIP bias/creating associations (e.g. removing girls, gaining sfw results, reducing lowres bias) proof of concept+current method included #295
SwiftIllusion
started this conversation in
Ideas
Replies: 1 comment
-
Got to do another experiment, this time seeing what working against "lowres" TEnc bias with a training via the above process on "lowres" caption could do, to show/prove its concept further. "portrait of man waving, wearing casual clothing" without negative prompt in 'LifelikeDiffusion' And another comparison with negatives And "doorstep" with another model with negatives |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Through experiments I've found a method to help peoples outputs and control over a model without having to impact the positive prompt/output by attempting to sway it with negative prompts (which can then be left to influence the positive prompt in more meaningful ways), to hopefully increase the utility/control people have with their generations.
However the method currently is a lot of steps and tools used to get the final Lora, and though I don't understand what's required to generate those Lora's an easier method could make this a new accessible tool for people to use.
Demonstrations of potential without ever adjusting the negative prompt:
Removing female bias
"night sky" in Anythingv3.1, without loras/with "girl" captioned Lora 0.8/with "girl" captioned Lora 0.8 + "girls" captioned Lora 1.1
"man, flower garden" in Anythingv3.1, without loras/with "girl" captioned Lora 0.1 + "girls" captioned Lora 0.3
Favoring sfw output by creating associations
Put a black bar over their parts but still nsfw so linking to imgur gallery of the images instead for these
https://imgur.com/a/R5zJrbE
In order of comparison, with a Lora trained on multiple captions - "man" and "woman" (to try and isolate their concept/dilute it away from the all the captions that probably man/woman and naked), and "person-wearing-clothes" at double the repeats (to build a stronger association that people should be wearing clothes more commonly)
"sexy woman, beautiful, standing next to desk" in ChilloutMix, without Loras/with Lora 1.4
"elegant woman" in ChilloutMix, without Loras/with Lora 1.2
"muscular man, standing" in ChilloutMix, without Loras/with Lora 1.3
"woman standing" in LifeLikeDiffusion, without Loras/with Lora 0.8
Distancing associations
"rabbit" in MareAcernis (most female biased model I've used), without Loras/with "person" Lora -0.3 + "people" Lora -0.25 + "girl" Lora 1
Current method:
At the moment it takes many steps and going back and forth between having the kohya_ss gui running and automatic1111 gui running. Instructions simplified just to explain what I'm doing and not meant to teach.
I did this with just 50 repeats and 1 epoch and that was much more than necessary every time
Now you have a Lora you can use to influence your output, and most importantly this just influences what your prompt is searching for/what it thinks is relevant. It does not take away the quality or style of anything that does end up in the image (this still of course depends on the UNET training of the model to produce it). This can be used on any model and the only difference will be the weight values you choose as it will depend on the bias of each individual model.
Idea:
The finetuning of the model takes less time than generating an image, the Lora difference takes more than that, but the whole process of moving between all the interfaces and steps, preparing/changing the captions to train on in the directories, etc, when I'm still not personally always clear on what caption will get me to my end goal, is a journey.
I don't understand what is required for a Lora to be built (would it require a reference point if it's just influencing the CLIP, etc), so I'm not sure how much can be accomplished.
However if there was a way for you to be able to generate this, in sd-webui-additional-networks especially (noting again the 'training' part of this process took less time than an image generation), that would be incredible. But if you could at least just generate a Lora in isolation on the TEnc (and adjust the repeats to make the starting strength flexible), so you don't have to go through all the steps required to prepare training and get a model difference to extract etc, that would make the process a lot more approachable.
Beta Was this translation helpful? Give feedback.
All reactions