-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add masked loss implementation #589
Conversation
Thank you for this! The implementation is simple and wonderful. However, as I mentioned here: #236 (comment) , I would like to integrate the ControlNet dataset with the mask dataset. This is because in the future I would like to be able to handle some additional images in addition to ControlNet and masks in a generic way in the dataset. Also, it is redundant to have different processes for ControlNet and masks considering bucketing, caching (memory and disk), cropping etc... I intend to extend these processes to mask loss after the ControlNet PR #551 is complete. If it seems that pull request #551 will take a long time to complete, I think it's possible that I could merge this pull request first. Please give me a little time to consider. Also, please understand that there is a possibility I may close this PR without merging it. |
No worries, if you decide to merge the other PR first I will rebase again |
I'm getting some errors with
As well as line 1037:
Is there a better way to ensure compatibility with bucketing? Thanks! |
Hi, I moved the initialization of Seems to work in my (brief) testing. EDIT: Hmm, it's blowing out the exposure on my DreamBooth models. Not sure if it's an issue with DB or my bucketing fix. Is it normal for masking to cause the model to trend toward white? |
I've been using this with square images and masks, of various sizes. The important part is that mask must be in grayscale format, dimensions exactly matching the corresponding training input. |
I haven't observed anything like this, do you have an example you could share? |
@Elevory It is possible that Perhaps try without? |
Hi @AI-Casanova , Thank you for the suggestion. Unfortunately, removing the Masked: I may try preparing another dataset with images at a single resolution to see if bucketing is somehow responsible, but I find it unlikely. Do you think regularization images might be at fault here? DreamBooth requires hundreds of pictures of the object class, and I haven't created masks for those. Are they perhaps having a stronger-than-intended effect on training? EDIT: I made a couple interesting discoveries. First, if I use Second, I found out that my custom token--which was gibberish--had a tendency to produce black-and-white images in the base model. I switched to a different token and, so far, the exposure problem is somewhat reduced. Still need to run more tests. EDIT 2: Tried disabling some other things to no avail: |
I think I got it working! When I moved the initialization of the I'm attaching my updated copies of |
any example images? |
I've just reimplemented this on top of the sdxl branch: https://github.com/briansemrau/kohya-ss-sd-scripts/tree/sdxl_lora_mask Feel free to pull it in |
Should probably make a pull request |
implement mask loading from mask folder
ea6ac42
to
7de0550
Compare
Update:
I've tested this with LoRA for SD 1.5 and SDXL. Have not yet tested with other methods, if someone would volunteer it would be great :) |
I have removed mask rescaling by mean value. I am concerned that it can make loss magnitude fluctuate in a hard to predict manner because the loss would be scaled differently for each image and cause an "uneven" learning process. Rescaling would give an increase to loss magnitude but I think we can achieve a similar effect just by adjusting the learning rate. |
So what's the dataset configuration for masks? What about multiple concepts:
Is fine? For some reason i'm running into this error using this configuration.
|
It will look for a folder named "mask" inside the dataset folder. Each mask image should have the same name as the training image, but end with Do you have this error when using a single dataset or multiple datasets? Do you see any message related to mask files in the console? If a mask fails to load it should print something. |
I somewhat figured it out, that error occured when using caching latents. There were other errors about tensor size mismatch when i used all kinds of resolutions of pictures and masks (res of corresponding pictures and masks matched), but disappeared when i rescaled them all to 768,768. It's like it was trying to hook up different mask and got a resolution mismatch? Maybe that one occured because i use batch size of 3, instead of 1? Also does training resolution must match resolution of dataset? Because when i tried to train on 512,512 with 768,768 dataset i've ran into an error. So, to start a training i had to:
|
Right now it does not work with disk caching of latents, but in-memory caching should work. Also I've been testing with a dataset having images of different sizes and is working fine. When the mask does not exactly match the corresponding image then it also should resize the mask automatically. |
One suggestion: when using masks, consider dividing loss by the mean of the mask. The idea is that you don't want a masked image's loss computation relative to the other samples affected by the portion of the image that is masked. For loss, you really care about how close the predicted noise got to the actual noise for only the masked pixels. Right now, the more masked an image is, the more its loss is damped, due to all the 0 MSE pixels included in the calculation. In the current implementation, the mask "knocks out" the contribution to loss for anything but the masked pixels. This means that all else equal, images with less masking will have higher loss. By dividing the loss by the mean of the mask, you boost the observed loss by the same amount that the loss scalar is reduced due to zeroing out the masked pixels. In my empirical tests, this vastly improved the results of my training. My implementation looks like this: loss_div = 1.0
if args.masked_loss and batch['masks'] is not None:
mask = get_latent_masks(batch['masks'], noise_pred.shape, noise_pred.device)
noise_pred = noise_pred * mask
target = target * mask
loss_div = mask.mean()
if loss_div == 0:
loss_div = 1.0
loss = torch.nn.functional.mse_loss(noise_pred.float(), target.float(), reduction="none")
loss = loss.mean([1, 2, 3])
loss_weights = batch["loss_weights"] # 各sampleごとのweight
loss = loss * loss_weights / loss_div |
This is something I had in a previous commit, but decided to roll back. I am not entirely sure dividing by the mean is the best "ratio", I guess we need further experimentation. |
My thought on dividing by mean: Sure the overall loss will be lower with masks, but the goal is to speed up convergence by eliminating extraneous input, and step size can be compensated for by raising the overall learning rate. Besides, adjusting a squared loss by a linear factor is far from perfect. |
how exactly does the mask work is it like this |
Precisely Though I now think the mask should be applied to the loss itself instead of the noise, because |
We can apply the mask to the noise, we just have to use For gray regions we're just reducing the loss magnitude in those regions, so in practice what we have is kind of a dynamic learning rate. That is, a gray value of 0.5 means we cutting the LR in half. |
Yes we can sqrt mask But I'm looking at clean implementation. We can just take the unreduced loss, and call a |
I recommend doing this kind of mask processing outside the training code, it is easy to do with a small python script. If we are going to add a new parameter every time we need a new operation the list would grow indefinitely. |
I've been testing masks will an all black background and it seems to be working fine. However I do provide lighting "cues" in my captions which is probably helping. |
I do agree with this approach but in practice it's prohibitively complicated. We'd make masks for the things we want but then would require creating new masks for the different minimums. Which requires some modifications of directories or putting originals elsewhere. It gets more complicated if you have many subset directories and need to modify each one separately. It can also cause false positive not knowing if you did all the masks correctly to the same value for testing purposes. I use upwards of 30 different dataset subsets which can make the process of individually modifying all the masks much more complicated and error prone. Scripting against my dataset_config toml is an option though but still need to manage different minimum masks for each subset. I believe Kohya supports many options already for different things to allow a lot of experimentation to happen. If we happen along a path of indefinite growth of arguments the option of specific masked arguments In this case I'm recommending 1 option that makes a clear testable, reproducible result that reduces complication, false positives, and other mistakes that can confuse the trainer. Provided the code works properly of course. Ultimately it's a suggestion so I'd be down with whatever is chosen. Thanks. |
I understand your concern, in fact it is something I also struggled with... that is why I am building a set of tools to make preparing and modifying training data very easy. This is something I am planning to open-source soon. |
Am i wrong or not, in a Runpod notebook my dataset is organized like this Does that mean that my masks should be organized like: |
correct |
Hello, I thought I'd throw in my opinion here. Which is easy for me to do, as I'm not the person doing any of the work. And I really appreciate that you people are actually doing the work here, so please don't mind me saying stuff. I also think that the masks subdirectory seems like it'll be a burden to work with. I do realize that the idea of 'use the alpha channel of the training image' had difficulties when it came to knowing whether premultiplied alpha was being used in the source .png files or not. But, premultiplied alpha does not affect any pixel that has an alpha of 0.0 or an alpha of 1.0. I doubt many people will care about alpha values that are not 0 or 1; they mostly want to just delete noisy objects from their training images. All you need is alpha=0 for the parts of the image that are to be deleted, and alpha=1 for the parts you want to keep. If you just assumed either premult alpha was present in the .png files, or not present, either way would be good, because alpha=0 and alpha=1 works the same in both cases. Then, a command line parameter could be added to override that default if that was actually important to them, but I imagine no-one would ever care enough to use that flag. Thank you for listening to my suggestion. And thank you for the work you've all done so far. |
I apologize for the long delay in responding to this PR. I've implemented the masked loss functionality in the Currently, I would be happy to assist with testing. |
Great to see this moving ahead, I feel it's a valuable feature for the training scripts. Thanks for getting this working. :) Do you know if it is difficult to port it to sdxl_train.py as well, for fine tuning instead of generating a LoRA? |
I've update the branch to support masked loss with |
I ran a test test with SDXL, but the fine-tuned model produces (fully) black images. I'll try the same config without masking today to see if this is an issue with my config, but posting as a heads up in the meantime. Using the follow command line (formatted for easier reading):
and
It could be that I've mis-configured the dataset.toml, as this is the first time I've used one, but training did appear to be proceeding as expected, with W&B showing a reasonable average loss at 1k steps, though there was a very high initial spike that I haven't seen before. |
Testing today, I spotted the mix of Adafactor/DAdaptation settings I had in above command line ( SDXL support appears to be working well - training still running now but early checkpoints clearly showing learning of masked areas only. 👍 |
Hi, I'm testing it out now for SDXL fine tuning. One thing I think might help people get used to the feature is adding something to the debug output. Maybe an INFO line of 'no mask images' or '<x> mask images found', maybe next to the line about there being no regularization images found. Nice work getting it in. And that future ControlNet-guided training sounds interesting too. |
So I gave this code a much stronger test today, and placed specific objects in a series of 10 test images, and then masked them out. And I found that they were being learned, so my masks were not working. This was because I was not passing --masked_loss as a parameter to sdxl_train.py. But when I added this parameter, I just got an error message that didn't explain what to do:
I then read @jordoh 's setup, above. It turns out that if you want to use the masked images, it seems that you must use a .toml file to set up the conditioning_data_dir, which should be set to the mask image directory. With that in place, the specific objects I had masked out stopped showing up in my sample output, so the masks do work. I don't generally use .toml files, so would it be possible to add a command line option, --conditioning_data_dir, to achieve the same result? I don't think bmaltais's webui supports .toml files. Or, just look in the 'mask' subdirectory by default for the masks? That's how I thought it worked when I was doing training yesterday, but my masks weren't being used at all. It's easy to talk yourself into believing that they're having an effect, even when they're not. I really do think some debug INFO lines are important to clarify if masks have been found or not. |
After that, I tried copying that .toml from my directory with 10 test images to my real training dataset. I updated the paths inside the .toml for the images and masks, but when I run sdxl_train.py, I just get:
I have no idea what this error means. It goes away if I delete the conditioning_data_dir = "[...]" bit from my .toml file, but then the masks won't work of course. Any thoughts what the cause / fix for this error message is? |
Can you show your dataset toml? Here's an example of what's working for me:
You could then add |
Thanks for asking, @cheald . Here's my dataset.toml:
I've tried it both with and without the trailing slashes on the ends of the paths. Also, I've copied and pasted those directories into Ubuntu Dolphin, to check they definitely exist, and they do. This same .toml worked for my test dataset of 10 images with 10 mask images. It just doesn't work for my actual dataset, which has far more images. I don't have masks for all of them yet, but that's not supposed to be necessary. |
If it's not expecting The trainer will expect masks for each of your inputs (and no extras), but you'll get an error message about that once you get past the schema checks. |
Aha yep that's it. I had that --masked_loss parameter set for my 10 image test, but I forgot to copy it across to my real training set. Thanks, @cheald . It might be worth upgrading that error message, or having --masked_loss set implicitly when people pass in a conditioning_data_dir. |
I found an issue that if the training image is a .jpg file, it seems that the mask image also has to be a .jpg, which is not likely to be something you want to do. You get this error message if you have a .jpg for the training image and a .png for the mask image:
|
Thank you for letting me know. I will fix it sooner. |
Is it possible to use masked loss with a regularization subset? Ideally masks for regularization images also, but for now I'd be happy if it worked just for the dataset when there are also regularization images. Calling the test script (from a notebook):
Toml:
When the reg subset has a
When
It runs when the entire reg subset is removed. |
I've done some tests using the I will close this PR once the feature lands in |
Unfortunately the dataset with |
Thanks for clearing this up! Once I tried to use reg images as training images and results were pretty different from a run with them as reg, but I didn't alter repeats. When you mean to set repeats so they're the same you mean that if I originally have something like: 150 images in 2_trainingImages (150 * 2 = 300) To achieve the closest result (and be able to use masks!) then I must turn them into: Dumb example with weird unnecessary repeats I know, I just want to make sure I understood it correctly. |
This is mostly a rebase of #236
Relevant differences:
.mask
files it will now look for a matching PNG file in themask
sub-directory