forked from bmaltais/kohya_ss
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request bmaltais#1428 from bmaltais/dev2
v21.8.8
- Loading branch information
Showing
63 changed files
with
4,968 additions
and
815 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
v21.8.7 | ||
v21.8.8 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# ControlNet-LLLite について | ||
|
||
__きわめて実験的な実装のため、将来的に大きく変更される可能性があります。__ | ||
|
||
## 概要 | ||
ControlNet-LLLite は、[ControlNet](https://github.com/lllyasviel/ControlNet) の軽量版です。LoRA Like Lite という意味で、LoRAからインスピレーションを得た構造を持つ、軽量なControlNetです。現在はSDXLにのみ対応しています。 | ||
|
||
## サンプルの重みファイルと推論 | ||
|
||
こちらにあります: https://huggingface.co/kohya-ss/controlnet-lllite | ||
|
||
ComfyUIのカスタムノードを用意しています。: https://github.com/kohya-ss/ControlNet-LLLite-ComfyUI | ||
|
||
生成サンプルはこのページの末尾にあります。 | ||
|
||
## モデル構造 | ||
ひとつのLLLiteモジュールは、制御用画像(以下conditioning image)を潜在空間に写像するconditioning image embeddingと、LoRAにちょっと似た構造を持つ小型のネットワークからなります。LLLiteモジュールを、LoRAと同様にU-NetのLinearやConvに追加します。詳しくはソースコードを参照してください。 | ||
|
||
推論環境の制限で、現在はCrossAttentionのみ(attn1のq/k/v、attn2のq)に追加されます。 | ||
|
||
## モデルの学習 | ||
|
||
### データセットの準備 | ||
通常のdatasetに加え、`conditioning_data_dir` で指定したディレクトリにconditioning imageを格納してください。conditioning imageは学習用画像と同じbasenameを持つ必要があります。また、conditioning imageは学習用画像と同じサイズに自動的にリサイズされます。 | ||
|
||
```toml | ||
[[datasets.subsets]] | ||
image_dir = "path/to/image/dir" | ||
caption_extension = ".txt" | ||
conditioning_data_dir = "path/to/conditioning/image/dir" | ||
``` | ||
|
||
現時点の制約として、random_cropは使用できません。 | ||
|
||
学習データとしては、元のモデルで生成した画像を学習用画像として、そこから加工した画像をconditioning imageとするのが良いようです。元モデルと異なる画風の画像を学習用画像とすると、制御に加えて、その画風についても学ぶ必要が生じます。ControlNet-LLLiteは容量が少ないため、画風学習には不向きです。 | ||
|
||
もし生成画像以外を学習用画像とする場合には、後述の次元数を多めにしてください。 | ||
|
||
### 学習 | ||
スクリプトで生成する場合は、`sdxl_train_control_net_lllite.py` を実行してください。`--cond_emb_dim` でconditioning image embeddingの次元数を指定できます。`--network_dim` でLoRA的モジュールのrankを指定できます。その他のオプションは`sdxl_train_network.py`に準じますが、`--network_module`の指定は不要です。 | ||
|
||
学習時にはメモリを大量に使用しますので、キャッシュやgradient checkpointingなどの省メモリ化のオプションを有効にしてください。また`--full_bf16` オプションで、BFloat16を使用するのも有効です(RTX 30シリーズ以降のGPUが必要です)。24GB VRAMで動作確認しています。 | ||
|
||
conditioning image embeddingの次元数は、サンプルのCannyでは32を指定しています。LoRA的モジュールのrankは同じく64です。対象とするconditioning imageの特徴に合わせて調整してください。 | ||
|
||
(サンプルのCannyは恐らくかなり難しいと思われます。depthなどでは半分程度にしてもいいかもしれません。) | ||
|
||
### 推論 | ||
|
||
スクリプトで生成する場合は、`sdxl_gen_img.py` を実行してください。`--control_net_lllite_models` でLLLiteのモデルファイルを指定できます。次元数はモデルファイルから自動取得します。 | ||
|
||
`--guide_image_path`で推論に用いるconditioning imageを指定してください。なおpreprocessは行われないため、たとえばCannyならCanny処理を行った画像を指定してください(背景黒に白線)。`--control_net_preps`, `--control_net_weights`, `--control_net_ratios` には未対応です。 | ||
|
||
## 謝辞 | ||
|
||
ControlNetの作者である lllyasviel 氏、実装上のアドバイスとトラブル解決へのご尽力をいただいた furusu 氏、ControlNetデータセットを実装していただいた ddPn08 氏に感謝いたします。 | ||
|
||
## サンプル | ||
Canny | ||
 | ||
|
||
 | ||
|
||
 | ||
|
||
 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# About ControlNet-LLLite | ||
|
||
__This is an extremely experimental implementation and may change significantly in the future.__ | ||
|
||
日本語版は[こちら](./train_lllite_README-ja.md) | ||
|
||
## Overview | ||
|
||
ControlNet-LLLite is a lightweight version of [ControlNet](https://github.com/lllyasviel/ControlNet). It is a "LoRA Like Lite" that is inspired by LoRA and has a lightweight structure. Currently, only SDXL is supported. | ||
|
||
## Sample weight file and inference | ||
|
||
Sample weight file is available here: https://huggingface.co/kohya-ss/controlnet-lllite | ||
|
||
A custom node for ComfyUI is available: https://github.com/kohya-ss/ControlNet-LLLite-ComfyUI | ||
|
||
Sample images are at the end of this page. | ||
|
||
## Model structure | ||
|
||
A single LLLite module consists of a conditioning image embedding that maps a conditioning image to a latent space and a small network with a structure similar to LoRA. The LLLite module is added to U-Net's Linear and Conv in the same way as LoRA. Please refer to the source code for details. | ||
|
||
Due to the limitations of the inference environment, only CrossAttention (attn1 q/k/v, attn2 q) is currently added. | ||
|
||
## Model training | ||
|
||
### Preparing the dataset | ||
|
||
In addition to the normal dataset, please store the conditioning image in the directory specified by `conditioning_data_dir`. The conditioning image must have the same basename as the training image. The conditioning image will be automatically resized to the same size as the training image. | ||
|
||
```toml | ||
[[datasets.subsets]] | ||
image_dir = "path/to/image/dir" | ||
caption_extension = ".txt" | ||
conditioning_data_dir = "path/to/conditioning/image/dir" | ||
``` | ||
|
||
At the moment, random_crop cannot be used. | ||
|
||
As a training data, it seems to be better to use the images generated by the original model as training images and the images processed from them as conditioning images. If you use images with a different style from the original model as training images, the model will have to learn not only the control but also the style. ControlNet-LLLite is not suitable for style learning because of its small capacity. | ||
|
||
If you use images other than the generated images as training images, please increase the dimension as described below. | ||
|
||
### Training | ||
|
||
Run `sdxl_train_control_net_lllite.py`. You can specify the dimension of the conditioning image embedding with `--cond_emb_dim`. You can specify the rank of the LoRA-like module with `--network_dim`. Other options are the same as `sdxl_train_network.py`, but `--network_module` is not required. | ||
|
||
Since a large amount of memory is used during training, please enable memory-saving options such as cache and gradient checkpointing. It is also effective to use BFloat16 with the `--full_bf16` option (requires RTX 30 series or later GPU). It has been confirmed to work with 24GB VRAM. | ||
|
||
For the sample Canny, the dimension of the conditioning image embedding is 32. The rank of the LoRA-like module is also 64. Adjust according to the features of the conditioning image you are targeting. | ||
|
||
(The sample Canny is probably quite difficult. It may be better to reduce it to about half for depth, etc.) | ||
|
||
### Inference | ||
|
||
If you want to generate images with a script, run `sdxl_gen_img.py`. You can specify the LLLite model file with `--control_net_lllite_models`. The dimension is automatically obtained from the model file. | ||
|
||
Specify the conditioning image to be used for inference with `--guide_image_path`. Since preprocess is not performed, if it is Canny, specify an image processed with Canny (white line on black background). `--control_net_preps`, `--control_net_weights`, and `--control_net_ratios` are not supported. | ||
|
||
## Credit | ||
|
||
I would like to thank lllyasviel, the author of ControlNet, furusu, who provided me with advice on implementation and helped me solve problems, and ddPn08, who implemented the ControlNet dataset. | ||
|
||
## Sample | ||
|
||
Canny | ||
 | ||
|
||
 | ||
|
||
 | ||
|
||
 |
Oops, something went wrong.