Still no 512 or 768 pre trained model? #88

FurkanGozukara · 2023-09-20T21:48:47Z

256 is working but just too bad resolution

relative_find_best_frame_true_square_aspect_ratio_vox.mp4

relative_find_best_frame_false_org_aspect_ratio_vox.mp4

Qia98 · 2023-09-26T03:54:47Z

I'm trying to train 512 or higher resolution. But I met some challenge in getting 512 datasets.

FurkanGozukara · 2023-09-26T11:01:22Z

I'm trying to train 512 or higher resolution. But I met some challenge in getting 512 datasets.

do you have a model i can test?

EdgarMaucourant · 2023-10-01T11:53:40Z

Hey @nieweiqiang ,

You could use the Talking Head 1KH dataset, it is good for faces and had a lot of videos in the 512 or 768 (along with a lot of other to you might want to use a script to resize the videos).

FurkanGozukara · 2023-10-01T20:13:06Z

Hey @nieweiqiang ,

You could use the Talking Head 1KH dataset, it is good for faces and had a lot of videos in the 512 or 768 (along with a lot of other to you might want to use a script to resize the videos).

do you have any pre trained model i can test? or any tutorial how to train ourselves? i can get gpu power

EdgarMaucourant · 2023-10-02T12:39:11Z

I don't have a pre-trained model (yet) as it is still training, and I can't give you a full tutorial as I'm not the author and did not go into all the details but I can give you what I did to get the training on.

First of all you need a dataset to train on, I used this one: https://github.com/tcwang0509/TalkingHead-1KH

The repo only includes the scripts but this is quite easy to use. Just few things to notice:

You need around 2 TB of free disk space to download the full dataset. The scripts will scrap a bunch of videos from Youtube and then will crop the videos to the face and extract interesting parts into smaller videos. At the end you will have around 500K videos ranging from 10 frames to 800 frames.
The scripts in this repo are meant to be used with a Linux environment, I tried to transform the bash scripts into BAT scripts to be used on Windows but as I was short on time I had to abandon this idea. I ended up using WSL2 (Linux on Windows). So either you want to create BAT scripts or install WSL2 and use your windows partitions (automatically mounted into the Linux disto) as a storage. For WSL2 see https://learn.microsoft.com/en-us/windows/wsl/install
The videos cropped from the original videos don't have all the same dimensions (but seems to be squared) so you will have to resize them or exclude the resolutions that you don't want to use. I trained for a 512x512 result so I resized them to that size using the training script of thin plate (see below). You can resize them before the training if you have the script/software to do it, I went for the easy path and used the training script instead.

Once you have your dataset, don't try to extract the frames from the videos, I tried that and you would need more than 10 TB of storage (over 50 Millions frames extracted), and even if you have the time and the storage, despite what the documentation of Thin Plate mentions the scripts are meant to ingest mp4 videos not frames (although some parts seems to handle frames, but the main script only look for mp4 files).

The script expect a hierarchy of folders as input that is not the same as Talking Head (TH) Dataset. So you will have to create a new folder (call it whatever you like, this will be your source folder). Create two subfolders: train and test. Copy (or move) the content of the folder train/cropped_clips from TH to the train folder, copy the content of the folder val/cropped_clips from TH to the test folder.
Also it seems that the script generate a bunch of invalid videos that will make the training to fail so I just removed all files under 20KB in size and that solved it (around 15000 videos removed) .

The hardest part is knowing what to put in the yaml file of the config.
First of all in the config folder, copy/paste one of the existing config file. I used vox-256.yaml as this is what was the closest to my datasets (faces talking). In the file I made the following changes:

In dataset_params:
- Change root_dir to the path of the source folder you created before (make sure you use the source folder path not train or test).
In train_params:
- change num_epochs to 2 (100 is large and you want to test first on a small number of epochs and raise the number if needed)
- change num_repeats to 2 (the datasets as already a very large number of videos as inputs). This would repeats training on the same videos multiple times. In that case 2 times
- change epoch_milestones to [1,2] because you just have 2 epochs
- change batch_size to 5, this number is difficult to estimate and all depends on your GPU Memory. However if it is too large the training will fail quite quickly (under 2 to 5 minutes) and the message will clearly state that torch tried to allocate more memory than available, if it's the case lower the number until it passes. 5 works fine on my RTX3090 with 24GB RAM.
- change dataloader_workers to 0, this should not be necessary, I lowered that number when I was trying to solve the issue with the GPU RAM above and forgot to set it back to 12 so feel free to keep it to 12
- change checkpoint_freq to 1 (because you don't have much epochs)
- change bg_start to 1

The last change is that you want to train on a specific size (512x512 in my case) so you have to make sure the videos are resized to that size. From what I can read in the script file, you should do that by setting the frame_shape settings in your yaml file (under the dataset_params section). However I did not found the correct format for that settings, in the script it is defined as (256,256,3) but it was not working in the YAML file when I used that value.
So I went the easy way and hardcoded the value in the script directly. You can do this by replacing the line 70 in frames_dataset.py from self.frame_shape = frame_shape to self.frame_shape = (512,512,3)

Then you should be good to go! Just run the run.py script in the folder passing in your config file and voilà!

Note that I'm not an expert, and I'm still trying to get an trained model so I don't guarantee these are the best steps, only what I did to get it working so far.

FurkanGozukara · 2023-10-02T12:46:45Z

@EdgarMaucourant awesome man so much explanation

if you already have checkpoints can you send me latest one?

if you don't want to publicly share you can email me : furkangozukara@gmail.com

EdgarMaucourant · 2023-10-02T12:52:45Z

Hey @FurkanGozukara,

I will share them when I will have them but for now this is still training... I will take probably several more days to train as the datasets if large.

FurkanGozukara · 2023-10-02T12:59:20Z

Hey @FurkanGozukara,

I will share them when I will have them but for now this is still training... I will take probably several more days to train as the datasets if large.

awesome looking forward too. you are doing an amazing job

skyler14 · 2023-10-07T02:21:02Z

Anything peculiar coming up while training on higher resolutions? I'm going to follow this

ak01user · 2023-10-07T02:21:42Z

@EdgarMaucourant How is the model training? I trained it for more than ten hours, 200 epochs, image size is 384*384, but the effect is not very good.I plan to continue training

EdgarMaucourant · 2023-10-07T09:14:46Z

actually the training failed after 65 hours without any output :'(
I did not had time to relaunch it until now, so I started with a much small dataset and see how it will go.

FurkanGozukara · 2023-10-08T00:22:07Z

actually the training failed after 65 hours without any output :'( I did not had time to relaunch it until now, so I started with a much small dataset and see how it will go.

sad

Looking forward to results

Qia98 · 2023-10-08T11:35:30Z

I don't have a pre-trained model (yet) as it is still training, and I can't give you a full tutorial as I'm not the author and did not go into all the details but I can give you what I did to get the training on.

First of all you need a dataset to train on, I used this one: https://github.com/tcwang0509/TalkingHead-1KH

The repo only includes the scripts but this is quite easy to use. Just few things to notice:

You need around 2 TB of free disk space to download the full dataset. The scripts will scrap a bunch of videos from Youtube and then will crop the videos to the face and extract interesting parts into smaller videos. At the end you will have around 500K videos ranging from 10 frames to 800 frames.

The scripts in this repo are meant to be used with a Linux environment, I tried to transform the bash scripts into BAT scripts to be used on Windows but as I was short on time I had to abandon this idea. I ended up using WSL2 (Linux on Windows). So either you want to create BAT scripts or install WSL2 and use your windows partitions (automatically mounted into the Linux disto) as a storage. For WSL2 see https://learn.microsoft.com/en-us/windows/wsl/install

The videos cropped from the original videos don't have all the same dimensions (but seems to be squared) so you will have to resize them or exclude the resolutions that you don't want to use. I trained for a 512x512 result so I resized them to that size using the training script of thin plate (see below). You can resize them before the training if you have the script/software to do it, I went for the easy path and used the training script instead.

Once you have your dataset, don't try to extract the frames from the videos, I tried that and you would need more than 10 TB of storage (over 50 Millions frames extracted), and even if you have the time and the storage, despite what the documentation of Thin Plate mentions the scripts are meant to ingest mp4 videos not frames (although some parts seems to handle frames, but the main script only look for mp4 files).

The script expect a hierarchy of folders as input that is not the same as Talking Head (TH) Dataset. So you will have to create a new folder (call it whatever you like, this will be your source folder). Create two subfolders: train and test. Copy (or move) the content of the folder train/cropped_clips from TH to the train folder, copy the content of the folder val/cropped_clips from TH to the test folder. Also it seems that the script generate a bunch of invalid videos that will make the training to fail so I just removed all files under 20KB in size and that solved it (around 15000 videos removed) .

The hardest part is knowing what to put in the yaml file of the config. First of all in the config folder, copy/paste one of the existing config file. I used vox-256.yaml as this is what was the closest to my datasets (faces talking). In the file I made the following changes:

In dataset_params:

Change root_dir to the path of the source folder you created before (make sure you use the source folder path not train or test).

In train_params:

change num_epochs to 2 (100 is large and you want to test first on a small number of epochs and raise the number if needed)

change num_repeats to 2 (the datasets as already a very large number of videos as inputs). This would repeats training on the same videos multiple times. In that case 2 times

change epoch_milestones to [1,2] because you just have 2 epochs

change batch_size to 5, this number is difficult to estimate and all depends on your GPU Memory. However if it is too large the training will fail quite quickly (under 2 to 5 minutes) and the message will clearly state that torch tried to allocate more memory than available, if it's the case lower the number until it passes. 5 works fine on my RTX3090 with 24GB RAM.

change dataloader_workers to 0, this should not be necessary, I lowered that number when I was trying to solve the issue with the GPU RAM above and forgot to set it back to 12 so feel free to keep it to 12

change checkpoint_freq to 1 (because you don't have much epochs)

change bg_start to 1

The last change is that you want to train on a specific size (512x512 in my case) so you have to make sure the videos are resized to that size. From what I can read in the script file, you should do that by setting the frame_shape settings in your yaml file (under the dataset_params section). However I did not found the correct format for that settings, in the script it is defined as (256,256,3) but it was not working in the YAML file when I used that value. So I went the easy way and hardcoded the value in the script directly. You can do this by replacing the line 70 in frames_dataset.py from self.frame_shape = frame_shape to self.frame_shape = (512,512,3)

Then you should be good to go! Just run the run.py script in the folder passing in your config file and voilà!

Note that I'm not an expert, and I'm still trying to get an trained model so I don't guarantee these are the best steps, only what I did to get it working so far.

I change the same things for 512 training. The datasets I used is voxceled2. I resize the datasets to 512 and transfrom mp4 format to png. It costs about 11TB(only a part). If I use mp4 for training, it costs about 10 hours per epoch. But in png format, it costs about 1 hours per epoch. Total about 3days
The config of my training is :
num_epochs: 100
num_repeats: 200 (the datasets is only a part, so I increase the num_repeats)
batch_size: 8
Other parameters the same with vox-256
And also, in frames_dataset.py I change the image size by hard setting.
but I did't get a good checkpoint for work

Qia98 · 2023-10-08T11:42:01Z

actually the training failed after 65 hours without any output :'( I did not had time to relaunch it until now, so I started with a much small dataset and see how it will go.

Can I see your log.txt? My traing is normal.
The loss is stable and convergence.
From perceptual - 99.74809; equivariance_value - 0.39179; warp_loss - 5.25956; bg - 0.25512 to perceptual - 68.15993; equivariance_value - 0.15263; warp_loss - 0.67301; bg - 0.03551

Qia98 · 2023-10-09T09:12:10Z

When training the 512 model, I noticed that the visualized picture appears to have been cropped.

Has anyone ever encountered this problem? I want to know whether there's something wrong with my frame_dataset.py or the dataset format.

EdgarMaucourant · 2023-10-09T10:06:14Z

Hi @nieweiqiang ,

Probably that the code to generate that vis is hardcoded to 256x256, I did not look at the code but I would supect that.

On my end I'm giving up. Sorry guys, I was doing this on my spare time, and what ever I tried it fails at some point because I'm lacking memory or space on my computer (32 GB RAM is not enough I think, of maybe this is the GPU RAM). I tried to reduce the number of repeats the number of items in the dataset, but whatever I do it fails at some point and I'm lacking time to look into this further more.

I hope that what I shared above for the yaml file was insightful and I wish you all the best for training a model!

FurkanGozukara · 2023-10-09T13:02:39Z

Hi @nieweiqiang ,

Probably that the code to generate that vis is hardcoded to 256x256, I did not look at the code but I would supect that.

On my end I'm giving up. Sorry guys, I was doing this on my spare time, and what ever I tried it fails at some point because I'm lacking memory or space on my computer (32 GB RAM is not enough I think, of maybe this is the GPU RAM). I tried to reduce the number of repeats the number of items in the dataset, but whatever I do it fails at some point and I'm lacking time to look into this further more.

I hope that what I shared above for the yaml file was insightful and I wish you all the best for training a model!

so sad to hear :(

thhung · 2023-10-12T12:05:57Z

@FurkanGozukara Do you plan to continue the work of @EdgarMaucourant ?

FurkanGozukara · 2023-10-13T12:16:23Z

@FurkanGozukara Do you plan to continue the work of @EdgarMaucourant ?

i have 0 idea right now how to prepare dataset and start training

ak01user · 2023-10-29T03:01:14Z

When training the 512 model, I noticed that the visualized picture appears to have been cropped.
Has anyone ever encountered this problem? I want to know whether there's something wrong with my frame_dataset.py or the dataset format.

This phenomenon occurs when I interrupt the program during saving.

huangxin168 · 2023-11-17T05:39:58Z

**Qia98 ** commented Oct 9, 2023 •
have you solve the prblem? also want to train 512 model.

JingchengYang4 · 2024-06-15T21:04:19Z

Tip if you want to use the talking head 1KH dataset. Use ffmpeg to extract two frames of the video onto the disk, then read that as you would with images. Training speed is almost exactly the same as using pre-processed image sequences.

FurkanGozukara · 2024-07-11T23:03:20Z

with liveportrait Thin-Plate-Spline-Motion-Model is now obsolete

Windows LivePortrait Tutorial

https://youtu.be/FPtpNrmuwXk

Animate Static Photos into Talking Videos with LivePortrait AI Compose Perfect Expressions Fast

Cloud LivePortrait Tutorial : Massed Compute, RunPod & Kaggle

https://youtu.be/wG7oPp01COg

LivePortrait: No-GPU Cloud Tutorial - RunPod, MassedCompute & Free Kaggle Account - Animate Images

Windows LivePortrait Tutorial Video Chapters

0:00 Introduction to LivePortrait: A cutting-edge open-source application for image-to-animation conversion
2:20 Step-by-step guide for downloading and installing the LivePortrait Gradio application on your device
3:27 System requirements and installation process for LivePortrait
4:07 Verifying the successful installation of required components
5:02 Confirming installation completion and preserving installation logs
5:37 Initiating the LivePortrait application post-installation
5:57 Showcase of supplementary resources: Portrait images, driving videos, and rendered outputs
7:28 Navigating the LivePortrait application interface
8:06 VRAM consumption analysis for generating a 73-second animation
8:33 Commencing the animation process for the initial image
8:50 Monitoring the animation generation progress
10:10 Completion of the first animated video render
10:24 Discussion on the resolution specifications of rendered animations
10:45 Examining the native output resolution of LivePortrait
11:27 Overview of custom enhancements and features implemented beyond the official demo
11:51 Default storage location for generated animated videos
12:35 Exploring the impact of the Relative Motion feature
13:41 Analyzing the effects of the Do Crop option
14:17 Understanding the functionality of the Paste Back feature
15:01 Demonstrating the influence of the Target Eyelid Open Ratio parameter
17:02 Instructions for joining the SECourses Discord community

Cloud LivePortrait Tutorial Video Chapters

0:00 Introduction to LivePortrait: A cloud-based tutorial for state-of-the-art image-to-animation open-source application
2:26 Installation and utilization guide for LivePortrait on MassedCompute, featuring an exclusive discount code
4:28 Instructions for applying the special 50% discount coupon on MassedCompute
4:50 Configuration of ThinLinc client for accessing and operating the MassedCompute virtual machine
5:33 Setting up ThinLinc client's synchronization folder for file transfer between local and MassedCompute systems
6:20 Transferring installer files to the MassedCompute sync folder
6:39 Connecting to the initialized MassedCompute virtual machine and installing the LivePortrait application
9:22 Launching and operating LivePortrait on MassedCompute post-installation
10:20 Initiating a second instance of LivePortrait on the additional GPU in MassedCompute
12:20 Locating generated animation videos and bulk downloading to local storage
13:23 LivePortrait installation process on RunPod cloud service
14:54 Selecting the appropriate RunPod template
15:20 Configuring RunPod proxy access ports
16:21 Uploading installer files to RunPod's JupyterLab interface and commencing installation
17:07 Launching LivePortrait on RunPod post-installation
17:17 Starting a second LivePortrait instance on the additional GPU
17:31 Accessing LivePortrait via RunPod's proxy connection
17:55 Animating the initial image on RunPod using a 73-second driving video
18:27 Analysis of processing time for a 73-second video animation (highlighting the application's impressive speed)
18:41 Troubleshooting input upload errors with an example case
19:17 One-click download feature for all generated animations on RunPod
20:28 Monitoring the progress of animation generation
21:07 Guide to installing and utilizing LivePortrait for free on a Kaggle account, emphasizing its remarkable speed
24:10 Generating the first animation on Kaggle post-installation and launch
24:22 Importance of complete image and video uploads to avoid errors
24:35 Tracking animation status and progress on Kaggle
24:45 Resource utilization analysis: GPU, CPU, RAM, VRAM, and animation processing speed on Kaggle
25:05 Implementing the one-click download feature for all generated animations on Kaggle
26:12 Restarting LivePortrait on Kaggle without reinstallation
26:36 Instructions for joining the SECourses Discord community for support and discussions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Still no 512 or 768 pre trained model? #88

Still no 512 or 768 pre trained model? #88

FurkanGozukara commented Sep 20, 2023

Qia98 commented Sep 26, 2023

FurkanGozukara commented Sep 26, 2023

EdgarMaucourant commented Oct 1, 2023

FurkanGozukara commented Oct 1, 2023

EdgarMaucourant commented Oct 2, 2023 •

edited

Loading

FurkanGozukara commented Oct 2, 2023 •

edited

Loading

EdgarMaucourant commented Oct 2, 2023

FurkanGozukara commented Oct 2, 2023

skyler14 commented Oct 7, 2023

ak01user commented Oct 7, 2023

EdgarMaucourant commented Oct 7, 2023

FurkanGozukara commented Oct 8, 2023

Qia98 commented Oct 8, 2023

Qia98 commented Oct 8, 2023 •

edited

Loading

Qia98 commented Oct 9, 2023 •

edited

Loading

EdgarMaucourant commented Oct 9, 2023

FurkanGozukara commented Oct 9, 2023

thhung commented Oct 12, 2023

FurkanGozukara commented Oct 13, 2023

ak01user commented Oct 29, 2023

huangxin168 commented Nov 17, 2023

JingchengYang4 commented Jun 15, 2024

FurkanGozukara commented Jul 11, 2024

Still no 512 or 768 pre trained model? #88

Still no 512 or 768 pre trained model? #88

Comments

FurkanGozukara commented Sep 20, 2023

Qia98 commented Sep 26, 2023

FurkanGozukara commented Sep 26, 2023

EdgarMaucourant commented Oct 1, 2023

FurkanGozukara commented Oct 1, 2023

EdgarMaucourant commented Oct 2, 2023 • edited Loading

FurkanGozukara commented Oct 2, 2023 • edited Loading

EdgarMaucourant commented Oct 2, 2023

FurkanGozukara commented Oct 2, 2023

skyler14 commented Oct 7, 2023

ak01user commented Oct 7, 2023

EdgarMaucourant commented Oct 7, 2023

FurkanGozukara commented Oct 8, 2023

Qia98 commented Oct 8, 2023

Qia98 commented Oct 8, 2023 • edited Loading

Qia98 commented Oct 9, 2023 • edited Loading

EdgarMaucourant commented Oct 9, 2023

FurkanGozukara commented Oct 9, 2023

thhung commented Oct 12, 2023

FurkanGozukara commented Oct 13, 2023

ak01user commented Oct 29, 2023

huangxin168 commented Nov 17, 2023

JingchengYang4 commented Jun 15, 2024

FurkanGozukara commented Jul 11, 2024

Windows LivePortrait Tutorial

Animate Static Photos into Talking Videos with LivePortrait AI Compose Perfect Expressions Fast

Cloud LivePortrait Tutorial : Massed Compute, RunPod & Kaggle

LivePortrait: No-GPU Cloud Tutorial - RunPod, MassedCompute & Free Kaggle Account - Animate Images

Windows LivePortrait Tutorial Video Chapters

Cloud LivePortrait Tutorial Video Chapters

EdgarMaucourant commented Oct 2, 2023 •

edited

Loading

FurkanGozukara commented Oct 2, 2023 •

edited

Loading

Qia98 commented Oct 8, 2023 •

edited

Loading

Qia98 commented Oct 9, 2023 •

edited

Loading