Releases: Sxela/WarpFusion
v0.24
Changelog:
-
add FreeU Hack from https://huggingface.co/papers/2309.11497
-
add option to apply FreeU before or after controlnet outputs
-
add inpaint-softedge and temporal-depth controlnet models
-
auto-download inpaint-softedge and temporal-depth checkpoints
-
fix sd21 lineart model not working
-
refactor get_controlnet_annotations a bit
-
add inpaint-softedge and temporal-depth controlnet preprocessors
-
fix controlnet preview (next_frame error)
-
fix dwpose 'final_boxes' error for frames with no people
-
move width_height to video init cell to avoid people forgetting to run it to update width_height
-
fix xformers version
-
fix flow preview error for fewer than 10 frames
-
fix pillow errors (UnidentifiedImageError: cannot identify image file)
-
fix timm import error (isDirectory error)
-
deprecate v2_depth model (use depth controlnet instead)
-
fix pytorch dependencies error
-
fix zoe depth error
-
move installers to github repo
FreeU
GUI - misc - apply_freeu_after_control, do_freeunet
This hack lowers the effect of stablediffusion unet residual skip-connections, prioritizing the core concepts in the image over low-frequency details. As you can see in the video, with FreeU on the image seems less cluttered, but still has enough high-frequency details. apply_freeu_after_control applies the hack after getting input from controlnets, which for me was producing a bit worse results.
Inpaint-softedge controlnet
I've experimented with mixed-input controlnets. This works the same way inpaint controlnet does + it uses softedge input for the inpainted area, so it relies not only on the masked area surroundings, but also on softedge filter output for the masked area, which gives a little more control.
Temporal-depth controlnet
This one takes previous frame + current frame depth + next frame depth as its inputs
Those controlnets are experimental, and you can try replacing some controlnet pairs with them, like replace depth with temporal-depth, or replace inpaint with inpaint-softedge
v0.23
Changelog:
- add dw pose estimator from https://github.com/IDEA-Research/DWPose
- add onnxruntime-gpu install, (update env for dw_pose)
- add dw_pose model downloader
- add controlnet preview, kudos to #kytr.ai, idea - https://discord.com/channels/973802253204996116/1124468917671309352
- add temporalnet sdxl - v1 (3-channel)
- add prores thanks to #sozeditit https://discord.com/channels/973802253204996116/1149027955998195742
- make width_height accept 1 number to resize the frame to that size keeping the aspect ratio
- add cc masked template for content-aware scheduling
- add reverse frames extraction
- move looped image to video init cell as video source mode
- fix settings not being loaded via the button
- fix bug when cc_masked_diffusion == 0
- add a message on missing audio during video export / mute exception
- go back to root dir after running rife
- add gdown install
- add deflicker for win
- add experimental deflicker from https://video.stackexchange.com/questions/23384/remove-flickering-due-to-artificial-light-with-ffmpeg
- fix linear and none blend modes for video export
- detect init_video fps and pass it down to video export with respect to the nth frame
- do not reload already loaded controlnets
- rename upscaler model path variable
- make mask use image folder correctly as a mask source
- fix gui for non controlnet-modes (getattr error in gui)
- fix video_out not defined
- fix dwpose 'final_boxes' error for frames with no people
- fix xformers version
- fix flow preview error for less than 10 frames
- fix pillow errors (UnidentifiedImageError: cannot identify image file)
- fix timm import error (isDirectory error)
- deprecate v2_depth model (use depth controlnet instead)
- fix pytorch dependencies error
- fix zoe depth error
- move installers to github repo
DW Pose
to select dw_pose, go to gui -> controlnet -> pose_detector -> dw_pose
Download new install.bat or install via !pip install onnxruntime-gpu gdown manually
Width_height max size
You can now specify a single number as width_height settings, and it will define the max size of the frame, fitting the output inside that size, keeping the aspect ratio. For example, if you have a 1920x1080 video, width_height=1280 will downscale that video to 1280x720
Controlnet Preview
Added controlnet annotations (detections) preview. Thanks to #kytr.ai for the idea.
To enable, check only_preview_controlnet in Do the Run cell. It will take the 1st frame in frame_range from your video and generate controlnet previews for it.
Prores
add prores codec thanks to #sozeditit https://discord.com/channels/973802253204996116/1149027955998195742
Create video cell -> output_format -> prores_mov
It has better quality than h264_mp4, and a smaller size than qtrle_mov
Video Init Update
Moved looped image init to video init settings. To use init image as video init, select video_source -> looped_init_image
Added reverse option
Extract frames in reverse. For example, if a person moves into the frame, it's easier to reverse the video so that the video starts with the person inside of the frame.
Detect fps
fps will now be detected from your init_video, divided by extract nth frame value and used in your video export if, you set your video export fps to -1.
Example: your video has 24fps. you extract every 2nd frame. The suggested output fps will be 24/2 = 12fps. If you set your video export fps to -1, it will use predicted fps = 12.
v0.22
Changelog:
- fix pytorch dependencies error
- fix zoe depth error
- move installers to github repo
- fix "TypeError: eval() arg 1..." error when loading non-existent settings on the initial run
- add error message for model version mismatch
- download dummypkl automatically
- fix venv install real-esrgan model folder not being created
- fix samtrack site-packages url
- fix samtrack missing groundingdino config
- make samtrack save separate bg mask
- fix rife imports
- fix samtrack imports
- fix samtrack not saving video
- add rife
- fix samtrack imports
- fix rife imports
- fix samtrack local install for windows
- fix samtrack incorrect frame indexing if starting not from 1st frame
- fix schedules not loading
- fix ---> 81 os.chdir(f'{root_dir}/Real-ESRGAN') file not found error thanks to Leandro Dreger
- hide "warp_mode","use_patchmatch_inpaiting","warp_num_k","warp_forward","sat_scale" from gui as deprecated
- clean up gui settings setters/getters
- fix contronet not updating in gui sometimes
SAMTrack
Samtrack should now download prebuilt binaries for torch v2/cuda 11.x, if you have a different setup, then you will need to get VS Build Tools: https://stackoverflow.com/questions/64261546/how-to-solve-error-microsoft-visual-c-14-0-or-greater-is-required-when-inst
Then install cuda toolkit for your GPU drivers/os: https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64
Then it should install itself just fine.
RIFE
Interpolates frames. Results example - in this post's video.
Settings are simple:
- exponent: the power of 2 to which to increase the fps. 1 - x2, 2 - x4, 3 - x8, etc.
- video_path: input video to interpolate. can be a folder with frames, but then you need to specify the fps manually.
- nth_frame: extracts only nth frame before interpolation
- fps: output fps (for image folder input only, fps for video will be based on input video's fps to keep the same video duration after interpolation)
If you have a high-fps output video (like 60ps), you can also try skipping frames to reduce high-frequency flicker. If you have already used the nth frame during your video render, skipping frames here may produce weird results.
v0.21
Changelog:
- add v1.x qr controlnet
- add v2.x contronets: qr, depth, scribble, openpose, normalbae, lineart, softedge, canny, seg
- upcast custom controlnet sources to RGB from Grayscale
- add v2, v2_768 control_multi modes
- fix gitpull error
- add dummy model init for sdxl - won't download unnecessary stuff. For this to work drop control_multi_sdxl.dummypkl into warp root folder. Note that visually results will differ.
- disable dummy for non-sdxl
- fix dummy init to behave closer to the original model (redownload the dummypkl)
- fix pytorch dependencies error
- fix zoe depth error
- move installers to GitHub repo
v0.20
Changelog:
- fix pytorch dependencies error
- fix zoe depth error
- move installers to github repo
- disable dummy init for non-sdxl (they load fast enough on their own)
- add dummy model init for sdxl - won't download unnecessary stuff.
- fix flow preview
- temporarily disable reference for sdxl
- fix ModuleNotFound: safetensors error
- fix cv2.error ssize.empty error for face controlnet
- fix clip import error
- add controlnet tile
- remove single controlnets from model versions
- fix guidance for controlnet_multi (now work with much higher clamp_max)
- fix instructpix2pix not working in newer versions (kudos to #stabbyrobot)
- fix AttributeError: 'OpenAIWrapper' object has no attribute 'get_dtype' error
- add control-lora loader from ComfyUI
- add stability ai control-loras: depth, softedge, canny
- refactor controlnet code a bit
- fix tiled vae for sdxl
- stop on black frames, print a comprehensive error message
- add cell execution check (kudos to #soze)
- add skip diffusion switch to generate video only (kudos to #soze)
- add error msg when creating video from 0 frames
- fix rec noise for control multi sdxl mode
- fix control mode
- fix control_multi annotator folder error
- add sdxl diffusers controlnet loader from ComfyUI
- add sdxl controlnets
- save annotators to controlnet folder
- hide sdxl model load spam
- fix sdxl tiled vae errors (still gives black output on sdxl with vanilla vae)
- fix cc_masked_diffusion not loaded from GUI
SDXL Controlnets
SDXL - Controlnets are downloaded automatically from huggingface. For now, 5 are supported:
"control_sdxl_canny":"https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0/resolve/main/diffusion_pytorch_model.fp16.safetensors",
"control_sdxl_depth":"https://huggingface.co/diffusers/controlnet-depth-sdxl-1.0/resolve/main/diffusion_pytorch_model.fp16.safetensors",
"control_sdxl_softedge":"https://huggingface.co/SargeZT/controlnet-sd-xl-1.0-softedge-dexined/resolve/main/controlnet-sd-xl-1.0-softedge-dexined.safetensors",
"control_sdxl_seg":"https://huggingface.co/SargeZT/sdxl-controlnet-seg/resolve/main/diffusion_pytorch_model.bin",
"control_sdxl_openpose":"https://huggingface.co/thibaud/controlnet-openpose-sdxl-1.0/resolve/main/OpenPoseXL2.safetensors"
Also added depth, canny, and sketch Stability-ai control-loras from here: https://huggingface.co/stabilityai/control-lora
v0.19
Changelog:
- add extra per-controlnet settings: source, mode, resolution, preprocess
- add global and per-controlnet settings to gui
- add beep sounds by Infinitevibes
- add "normalize controlnet weights" toggle
- fix the beep error
- bring back init scale, fix deflicker init scale error thanks to #rebirthai
- make cc_masked_diffusion a schedule
- fix control_source error for sdxl
- fix cc_masked_diffusion not loaded from GUI
- fix ModuleNotFound: safetensors error
- fix cv2.error ssize.empty error for face controlnet
Per-controlnet Settings
gui -> controlnet
Allows finer control over controlnet settings. Note, that some controlnets don't have preprocessors or resolution settings as they use raw images as input and render output resolution.
mode:
balanced - mode used in previous versions, balanced between prompt and controlnet
controlnet - pay more attention to controlnet (seems to be buggy at the moment)
prompt - pay more attention to prompt
global - use global settings
source:
stylized - use current frame init (stylized warped previous frame)
cond_video - use cond_video frame
raw_frame - use raw init frame
color_video - use color_video frame
custom source - use frames from a custom path
global - use global settings from cond_image_src variable
for inpaint model, the default is "stylized"
resolution:
sets controlnet annotator resolution
-1 = use global settings
preprocess:
apply annotator to controlnet source.
for example, you need to disable this for depth controlnet if you have a custom depth video.
Controlnet Global Settings
Manages global controlnet settings. The same logic applies, except the source is limited to init (raw_frame) stylized and cond video.
Normalize Controlnet Weights
gui -> controlnet -> normalize_cn_weights
By default, controlnet weights are normalized to add up to 1. You can disable this to have higher controlnet weights.
v0.18
Changelog:
- add sdxl_base support
- add sdxl refiner support
- temporarily disable loras/lycoris/init_scale for sdxl
- bring back xformers :D
- add weighted keywords support for sdxl
- clear gpu vram on render interrupt
- add sdxl lora support
- fix load settings file = -1 not getting latest file
- cutoff prompt at 77 tokens
- refactor lora support
- add other lora-like models support from automatic1111
- fix loras not being unloaded correctly
- disable deflicker scale for sdxl
Basic sdxl support for sdxl_base model
Most features work, like latent scale, cc masked diffusion.
No controlnets yet, though there are some already available.
Doesn't work with embeddings yet
The results are meh, need to try refiner model, as it seems to work much better with img2img in Comfyui.
Usage:
define SD + K functions, load model -> model_version -> sdxl_base
I'd suggest trying non-vanilla XL checkpoints like dreamshaperXL, seems to work a bit better both in img2img and text2img mode (vanilla may look like a bad realesrgan upscale without further refiner pass)
You'll also have to remember the old non-controlnet settings :D
style_strength [0.5,0.3] or something.
Sdxl refiner
define SD + K functions, load model -> model_version -> sdxl_refiner
Same imitations apply, but works better as img2img, also better with lower resolutions.
Extra networks
Add other lora-like models support from automatic1111. You can now mix different lora-like networks in one run, like lora+lycoris, the call syntax is the same as before, for all types of networks - lora:filename:weight
Also fixed loras not being unloaded when removed from prompt.
v0.17
Changelog:
- add SAMTrack from https://github.com/z-x-yang/Segment-and-Track-Anything
- fix bug with 2+ masks
- print error when reloading gui fails
- renamed default_settings_path to settings_path and load_default_settings to load_settings_from_file
- fix default_settings_path error by #sozeditit
- add lycoris
- add lycoris/lora selector
- fix SAMTrack ckpt folder error ty to laplaceswzz
- fix SAMTrack video export error
- fix pillow error
Segment and track anything
Made a CLI wrapper around https://github.com/z-x-yang/Segment-and-Track-Anything to export separate alpha masks for each object.
Only tested to run in colab/docker, may require extra steps to build and install Segment-and-Track-Anything on windows.
Scroll down to Extras - Masking and tracking. Run the install cell. Wait for it to finish, then restart the notebook and run the next cell - Detection setup.
This cell is used to tweak detection on a single frame.
After you're satisfied with detection results run the next cell to track the whole video. Outputs will be saved to root dir / videoFrames / _masks /
You can restart runtime after that and use masks as usual.
LyCORIS
Can be used the same way as LORAs, using the lora:loraname:lowareight syntax.
You can put them in the same folder. Can't be mixed together with loras, so you'll need to switch to either lora os lycoris and only use one type of them in your prompts.
v0.16
Changelog:
- add prompt weight parser/splitter
- update lora parser to support multiple prompts. if duplicate loras are used in more than 1 prompt, last prompt lora weights will be used
- unload unused loras
- add multiple prompts support
- add max batch size
- add prompt weights
- add support for different prompt number per frame
- add prompt weight blending between frames
- bring back protobuf install
- add universal frame loader
- add masked prompts support for controlnet_multi-internal mode
- add controlnet low vram mode
- add masked prompts for other modes
- fix undefined mask error
- fix consistency error between identical frames
- add ffmpeg deflicker option to video export (dfl postifx)
- export video with inv postfix for inverted mask video
- add sd_batch_size, normalize_prompt_weights, mask_paths, deflicker_scale, deflicker_latent_scale to gui/saved settings
- fix compare settings not working in new run
- fix reference controlnet not working with multiprompt
- disable ffmpeg deflicker for local install
- fix torchmetrics version thx to tomatoslasher
- fix pillow error
- fix safetensors error
Multiple prompts
You can now use multiple prompts per frame. Just like this:
{0:['a cat','a dog']}
In this case with no weights specify it will give each prompt a weight of 1.
You can speciffy weights like this: {0:['a cat:2','a dog:0.5']}
The weights should be at the end of the prompt.
normalize_prompt_weights: enable to normalize weights to add up to 1.
For example this prompt {0:['a cat:2','a dog:0.5']} with normalize_prompt_weights on will effectively have weights {0:['a cat:0.8','a dog:0.2']}
Prompt weights can be animated, but the weight will be applied to the specific prompt number, not exact text. So {0:['prompt1:1', 'prompt2:0'], 5: ['prompt1:0', 'prompt3:1']} will blend the weights but not the prompts, so you will have prompt1 until frame5, then it will be replaced with prompt3, but the weights will be animated, so that a prompt for a frame between 0 and 5 will look like ['prompt1:0.5', 'prompt2:0.5']
You can have different number of prompts per frame, but the weights for prompts missing in a frame will be set to 0
For example, if you have:
{0:['a cat:1', 'a landscape:0'], 5: ['a cat:0', 'a landscape:1'], 10:['a cat:0', 'a landscape:1', 'a galaxy:1']}
'a galaxy' prompt will have 0 weight for aall frames where it's missing, and will have weight 1 at frame 10
Each additional prompt adds +50% to vram usage and diffusion render times.
Masked prompts
You can now use masks for your prompts. The logic is a bit complicated, but I hope you'll get the idea.
You can use masks if you have more than one prompt.
The first prompt is always the background prompt, you don't need a mask for it.
If you decide to use masks, you will need to provide them for every other prompt other than the 1st one. Each next prompt+mask will be placed on top of the previous, only white areas of the mask will be preserved. For example, if your 2nd prompt mask is completely covering the 1st prompt mask, you will not see the 1st prompt in the output as it will be covered by the 2nd prompt mask completely.
You need to specify path to your mask frames/video in the mask_paths variable. for 2 prompts you will need 1 mask, for 3 prompt - 2 masks, etc.
Leave mask_paths=[] to disable prompt masks. Enabling prompt masks will effectively disable prompt weights.
Max_batch_size
By default oyur image is diffused with a batch = 2, consisting of conditioned and unconditione images (positive and negative prompt). When we add more prompts, we to diffuse more images, one extra image per extra prompt.
Depending on your gpu vram, you can decide to increase batch size to process more than 2 prompts at a time.
You can set batch size to 1 to reduce VRAM usage even with 1 prompt.
Controlnet low vram fix
Enable to juggle controlnets from cpu to gpu each call. Is very slow, but saves a lot of vram. Right now all controlnets are offloaded and loaded to gpu ram once per frame, so that they are only kept on GPU during diffusion.
With controlnet_low_vram=True all controlnets will stay offloaded to cpu and only be loaded to gpu when being called during diffusion, then offloaded back to cpu, each diffsuion step.
Fixes
Unused loras should now be kept unloaded in a new run that doesn't use loras.
v0.15
Changelog:
- add alpha masked diffusion
- add inverse alpha mask diffusion
- save settings to EXIF
- backup existing settings on resume run
- load settings from PNG Exif
- add beep
- move consistency mask dilation to render settings
- hide edge width/dilation from generate flow cell
- add pattern replacement (filtering) for prompt
- fix constant bitrate error during video export causing noise in high-res videos
- fix typo in cell 1.5
- fix full-screen consistency mask error (also bumped missed consistency dilation to 2)
- fix keep_audio not working for relative video path thanks to louis.jeck#2502
- fix consistency error between identical frames
- fix torchmetrics version thx to tomatoslasher
Alpha masked diffusion
Same as masked diffusion, which was using a consistency mask before, but this time it uses an alpha mask. masked_diffusion has been renamed to cc_masked_diffusion.
It works this way:
if the current diffusion step is before the masked_diffusion values, both masks are used, and their masked areas (black) are being diffused, and unmasked (white) are being fixed.
if the current diffusion step is between the masked_diffusion values, the rightmost mask (the one with the higher masked_diffusion value) is used.
if the current diffusion step is above the masked_diffusion values, the whole frame is diffused.
Beep
Check Beep to beep. Useful to signal when the render is over.
Save & load settings update
Settings are now saved to frame exif data. You can load them from the frame by specifying a path to it instead of a txt file.
If you are changing your settings during resume run, it will back up existing settings.txt, which were lost before.
Pattern replacement (prompt filter)
You can now automatically replace certain words or phrases.
For example, you can have a lot of animated scene descriptions across numerous frames but want to test different styles. You can set a keyword {style} somewhere in your prompt and replace it with a whole new style without touching your prompts at all to iterate quickly.
It works like this:
go to GUI - Replace patterns and use the following notation:
{0: {"keyword or phrase": "replacement"}}
it can also be scheduled:
{0: {"keyword or phrase": "replacement"}, 10:{"another keyword or phrase": "another replacement"}}
This will replace "keyword or phrase" with "replacement" at frame 0 and "another keyword or phrase" with "another replacement" at frame 10.
You can use multiple filters per frame:
{0: {"keyword or phrase": "replacement", "another keyword or phrase": "another replacement"}
This is applied after captions, so you can use that to filter unnecessary words, for example, "cat" if you already have that in your prompt, or want to replace "cat" with "dog" in your captions.