Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still no 512 or 768 pre trained model? #88

Open
FurkanGozukara opened this issue Sep 20, 2023 · 23 comments
Open

Still no 512 or 768 pre trained model? #88

FurkanGozukara opened this issue Sep 20, 2023 · 23 comments

Comments

@FurkanGozukara
Copy link

256 is working but just too bad resolution

relative_find_best_frame_true_square_aspect_ratio_vox.mp4
relative_find_best_frame_false_org_aspect_ratio_vox.mp4
@Qia98
Copy link

Qia98 commented Sep 26, 2023

I'm trying to train 512 or higher resolution. But I met some challenge in getting 512 datasets.

@FurkanGozukara
Copy link
Author

I'm trying to train 512 or higher resolution. But I met some challenge in getting 512 datasets.

do you have a model i can test?

@EdgarMaucourant
Copy link

Hey @nieweiqiang ,

You could use the Talking Head 1KH dataset, it is good for faces and had a lot of videos in the 512 or 768 (along with a lot of other to you might want to use a script to resize the videos).

@FurkanGozukara
Copy link
Author

Hey @nieweiqiang ,

You could use the Talking Head 1KH dataset, it is good for faces and had a lot of videos in the 512 or 768 (along with a lot of other to you might want to use a script to resize the videos).

do you have any pre trained model i can test? or any tutorial how to train ourselves? i can get gpu power

@EdgarMaucourant
Copy link

EdgarMaucourant commented Oct 2, 2023

I don't have a pre-trained model (yet) as it is still training, and I can't give you a full tutorial as I'm not the author and did not go into all the details but I can give you what I did to get the training on.

First of all you need a dataset to train on, I used this one: https://github.com/tcwang0509/TalkingHead-1KH

The repo only includes the scripts but this is quite easy to use. Just few things to notice:

  • You need around 2 TB of free disk space to download the full dataset. The scripts will scrap a bunch of videos from Youtube and then will crop the videos to the face and extract interesting parts into smaller videos. At the end you will have around 500K videos ranging from 10 frames to 800 frames.
  • The scripts in this repo are meant to be used with a Linux environment, I tried to transform the bash scripts into BAT scripts to be used on Windows but as I was short on time I had to abandon this idea. I ended up using WSL2 (Linux on Windows). So either you want to create BAT scripts or install WSL2 and use your windows partitions (automatically mounted into the Linux disto) as a storage. For WSL2 see https://learn.microsoft.com/en-us/windows/wsl/install
  • The videos cropped from the original videos don't have all the same dimensions (but seems to be squared) so you will have to resize them or exclude the resolutions that you don't want to use. I trained for a 512x512 result so I resized them to that size using the training script of thin plate (see below). You can resize them before the training if you have the script/software to do it, I went for the easy path and used the training script instead.

Once you have your dataset, don't try to extract the frames from the videos, I tried that and you would need more than 10 TB of storage (over 50 Millions frames extracted), and even if you have the time and the storage, despite what the documentation of Thin Plate mentions the scripts are meant to ingest mp4 videos not frames (although some parts seems to handle frames, but the main script only look for mp4 files).

The script expect a hierarchy of folders as input that is not the same as Talking Head (TH) Dataset. So you will have to create a new folder (call it whatever you like, this will be your source folder). Create two subfolders: train and test. Copy (or move) the content of the folder train/cropped_clips from TH to the train folder, copy the content of the folder val/cropped_clips from TH to the test folder.
Also it seems that the script generate a bunch of invalid videos that will make the training to fail so I just removed all files under 20KB in size and that solved it (around 15000 videos removed) .

The hardest part is knowing what to put in the yaml file of the config.
First of all in the config folder, copy/paste one of the existing config file. I used vox-256.yaml as this is what was the closest to my datasets (faces talking). In the file I made the following changes:

  • In dataset_params:
    • Change root_dir to the path of the source folder you created before (make sure you use the source folder path not train or test).
  • In train_params:
    • change num_epochs to 2 (100 is large and you want to test first on a small number of epochs and raise the number if needed)
    • change num_repeats to 2 (the datasets as already a very large number of videos as inputs). This would repeats training on the same videos multiple times. In that case 2 times
    • change epoch_milestones to [1,2] because you just have 2 epochs
    • change batch_size to 5, this number is difficult to estimate and all depends on your GPU Memory. However if it is too large the training will fail quite quickly (under 2 to 5 minutes) and the message will clearly state that torch tried to allocate more memory than available, if it's the case lower the number until it passes. 5 works fine on my RTX3090 with 24GB RAM.
    • change dataloader_workers to 0, this should not be necessary, I lowered that number when I was trying to solve the issue with the GPU RAM above and forgot to set it back to 12 so feel free to keep it to 12
    • change checkpoint_freq to 1 (because you don't have much epochs)
    • change bg_start to 1

The last change is that you want to train on a specific size (512x512 in my case) so you have to make sure the videos are resized to that size. From what I can read in the script file, you should do that by setting the frame_shape settings in your yaml file (under the dataset_params section). However I did not found the correct format for that settings, in the script it is defined as (256,256,3) but it was not working in the YAML file when I used that value.
So I went the easy way and hardcoded the value in the script directly. You can do this by replacing the line 70 in frames_dataset.py from self.frame_shape = frame_shape to self.frame_shape = (512,512,3)

Then you should be good to go! Just run the run.py script in the folder passing in your config file and voilà!

Note that I'm not an expert, and I'm still trying to get an trained model so I don't guarantee these are the best steps, only what I did to get it working so far.

@FurkanGozukara
Copy link
Author

FurkanGozukara commented Oct 2, 2023

@EdgarMaucourant awesome man so much explanation

if you already have checkpoints can you send me latest one?

if you don't want to publicly share you can email me : furkangozukara@gmail.com

@EdgarMaucourant
Copy link

Hey @FurkanGozukara,

I will share them when I will have them but for now this is still training... I will take probably several more days to train as the datasets if large.

@FurkanGozukara
Copy link
Author

Hey @FurkanGozukara,

I will share them when I will have them but for now this is still training... I will take probably several more days to train as the datasets if large.

awesome looking forward too. you are doing an amazing job

@skyler14
Copy link

skyler14 commented Oct 7, 2023

Anything peculiar coming up while training on higher resolutions? I'm going to follow this

@ak01user
Copy link

ak01user commented Oct 7, 2023

@EdgarMaucourant How is the model training? I trained it for more than ten hours, 200 epochs, image size is 384*384, but the effect is not very good.I plan to continue training

@EdgarMaucourant
Copy link

actually the training failed after 65 hours without any output :'(
I did not had time to relaunch it until now, so I started with a much small dataset and see how it will go.

@FurkanGozukara
Copy link
Author

actually the training failed after 65 hours without any output :'( I did not had time to relaunch it until now, so I started with a much small dataset and see how it will go.

sad

Looking forward to results

@Qia98
Copy link

Qia98 commented Oct 8, 2023

I don't have a pre-trained model (yet) as it is still training, and I can't give you a full tutorial as I'm not the author and did not go into all the details but I can give you what I did to get the training on.

First of all you need a dataset to train on, I used this one: https://github.com/tcwang0509/TalkingHead-1KH

The repo only includes the scripts but this is quite easy to use. Just few things to notice:

  • You need around 2 TB of free disk space to download the full dataset. The scripts will scrap a bunch of videos from Youtube and then will crop the videos to the face and extract interesting parts into smaller videos. At the end you will have around 500K videos ranging from 10 frames to 800 frames.
  • The scripts in this repo are meant to be used with a Linux environment, I tried to transform the bash scripts into BAT scripts to be used on Windows but as I was short on time I had to abandon this idea. I ended up using WSL2 (Linux on Windows). So either you want to create BAT scripts or install WSL2 and use your windows partitions (automatically mounted into the Linux disto) as a storage. For WSL2 see https://learn.microsoft.com/en-us/windows/wsl/install
  • The videos cropped from the original videos don't have all the same dimensions (but seems to be squared) so you will have to resize them or exclude the resolutions that you don't want to use. I trained for a 512x512 result so I resized them to that size using the training script of thin plate (see below). You can resize them before the training if you have the script/software to do it, I went for the easy path and used the training script instead.

Once you have your dataset, don't try to extract the frames from the videos, I tried that and you would need more than 10 TB of storage (over 50 Millions frames extracted), and even if you have the time and the storage, despite what the documentation of Thin Plate mentions the scripts are meant to ingest mp4 videos not frames (although some parts seems to handle frames, but the main script only look for mp4 files).

The script expect a hierarchy of folders as input that is not the same as Talking Head (TH) Dataset. So you will have to create a new folder (call it whatever you like, this will be your source folder). Create two subfolders: train and test. Copy (or move) the content of the folder train/cropped_clips from TH to the train folder, copy the content of the folder val/cropped_clips from TH to the test folder. Also it seems that the script generate a bunch of invalid videos that will make the training to fail so I just removed all files under 20KB in size and that solved it (around 15000 videos removed) .

The hardest part is knowing what to put in the yaml file of the config. First of all in the config folder, copy/paste one of the existing config file. I used vox-256.yaml as this is what was the closest to my datasets (faces talking). In the file I made the following changes:

  • In dataset_params:

    • Change root_dir to the path of the source folder you created before (make sure you use the source folder path not train or test).
  • In train_params:

    • change num_epochs to 2 (100 is large and you want to test first on a small number of epochs and raise the number if needed)
    • change num_repeats to 2 (the datasets as already a very large number of videos as inputs). This would repeats training on the same videos multiple times. In that case 2 times
    • change epoch_milestones to [1,2] because you just have 2 epochs
    • change batch_size to 5, this number is difficult to estimate and all depends on your GPU Memory. However if it is too large the training will fail quite quickly (under 2 to 5 minutes) and the message will clearly state that torch tried to allocate more memory than available, if it's the case lower the number until it passes. 5 works fine on my RTX3090 with 24GB RAM.
    • change dataloader_workers to 0, this should not be necessary, I lowered that number when I was trying to solve the issue with the GPU RAM above and forgot to set it back to 12 so feel free to keep it to 12
    • change checkpoint_freq to 1 (because you don't have much epochs)
    • change bg_start to 1

The last change is that you want to train on a specific size (512x512 in my case) so you have to make sure the videos are resized to that size. From what I can read in the script file, you should do that by setting the frame_shape settings in your yaml file (under the dataset_params section). However I did not found the correct format for that settings, in the script it is defined as (256,256,3) but it was not working in the YAML file when I used that value. So I went the easy way and hardcoded the value in the script directly. You can do this by replacing the line 70 in frames_dataset.py from self.frame_shape = frame_shape to self.frame_shape = (512,512,3)

Then you should be good to go! Just run the run.py script in the folder passing in your config file and voilà!

Note that I'm not an expert, and I'm still trying to get an trained model so I don't guarantee these are the best steps, only what I did to get it working so far.

I change the same things for 512 training. The datasets I used is voxceled2. I resize the datasets to 512 and transfrom mp4 format to png. It costs about 11TB(only a part). If I use mp4 for training, it costs about 10 hours per epoch. But in png format, it costs about 1 hours per epoch. Total about 3days
The config of my training is :
num_epochs: 100
num_repeats: 200 (the datasets is only a part, so I increase the num_repeats)
batch_size: 8
Other parameters the same with vox-256
And also, in frames_dataset.py I change the image size by hard setting.
but I did't get a good checkpoint for work

@Qia98
Copy link

Qia98 commented Oct 8, 2023

actually the training failed after 65 hours without any output :'( I did not had time to relaunch it until now, so I started with a much small dataset and see how it will go.

Can I see your log.txt? My traing is normal.
The loss is stable and convergence.
From perceptual - 99.74809; equivariance_value - 0.39179; warp_loss - 5.25956; bg - 0.25512 to perceptual - 68.15993; equivariance_value - 0.15263; warp_loss - 0.67301; bg - 0.03551

@Qia98
Copy link

Qia98 commented Oct 9, 2023

image-20231009 When training the 512 model, I noticed that the visualized picture appears to have been cropped.

Has anyone ever encountered this problem? I want to know whether there's something wrong with my frame_dataset.py or the dataset format.

@EdgarMaucourant
Copy link

Hi @nieweiqiang ,

Probably that the code to generate that vis is hardcoded to 256x256, I did not look at the code but I would supect that.

On my end I'm giving up. Sorry guys, I was doing this on my spare time, and what ever I tried it fails at some point because I'm lacking memory or space on my computer (32 GB RAM is not enough I think, of maybe this is the GPU RAM). I tried to reduce the number of repeats the number of items in the dataset, but whatever I do it fails at some point and I'm lacking time to look into this further more.

I hope that what I shared above for the yaml file was insightful and I wish you all the best for training a model!

@FurkanGozukara
Copy link
Author

Hi @nieweiqiang ,

Probably that the code to generate that vis is hardcoded to 256x256, I did not look at the code but I would supect that.

On my end I'm giving up. Sorry guys, I was doing this on my spare time, and what ever I tried it fails at some point because I'm lacking memory or space on my computer (32 GB RAM is not enough I think, of maybe this is the GPU RAM). I tried to reduce the number of repeats the number of items in the dataset, but whatever I do it fails at some point and I'm lacking time to look into this further more.

I hope that what I shared above for the yaml file was insightful and I wish you all the best for training a model!

so sad to hear :(

@thhung
Copy link

thhung commented Oct 12, 2023

@FurkanGozukara Do you plan to continue the work of @EdgarMaucourant ?

@FurkanGozukara
Copy link
Author

@FurkanGozukara Do you plan to continue the work of @EdgarMaucourant ?

i have 0 idea right now how to prepare dataset and start training

@ak01user
Copy link

image-20231009 When training the 512 model, I noticed that the visualized picture appears to have been cropped.
Has anyone ever encountered this problem? I want to know whether there's something wrong with my frame_dataset.py or the dataset format.

This phenomenon occurs when I interrupt the program during saving.

@huangxin168
Copy link

**Qia98 ** commented Oct 9, 2023
have you solve the prblem? also want to train 512 model.

@JingchengYang4
Copy link

Tip if you want to use the talking head 1KH dataset. Use ffmpeg to extract two frames of the video onto the disk, then read that as you would with images. Training speed is almost exactly the same as using pre-processed image sequences.

@FurkanGozukara
Copy link
Author

with liveportrait Thin-Plate-Spline-Motion-Model is now obsolete

Windows LivePortrait Tutorial

https://youtu.be/FPtpNrmuwXk

Animate Static Photos into Talking Videos with LivePortrait AI Compose Perfect Expressions Fast

image

Cloud LivePortrait Tutorial : Massed Compute, RunPod & Kaggle

https://youtu.be/wG7oPp01COg

LivePortrait: No-GPU Cloud Tutorial - RunPod, MassedCompute & Free Kaggle Account - Animate Images

image

Windows LivePortrait Tutorial Video Chapters

  • 0:00 Introduction to LivePortrait: A cutting-edge open-source application for image-to-animation conversion
  • 2:20 Step-by-step guide for downloading and installing the LivePortrait Gradio application on your device
  • 3:27 System requirements and installation process for LivePortrait
  • 4:07 Verifying the successful installation of required components
  • 5:02 Confirming installation completion and preserving installation logs
  • 5:37 Initiating the LivePortrait application post-installation
  • 5:57 Showcase of supplementary resources: Portrait images, driving videos, and rendered outputs
  • 7:28 Navigating the LivePortrait application interface
  • 8:06 VRAM consumption analysis for generating a 73-second animation
  • 8:33 Commencing the animation process for the initial image
  • 8:50 Monitoring the animation generation progress
  • 10:10 Completion of the first animated video render
  • 10:24 Discussion on the resolution specifications of rendered animations
  • 10:45 Examining the native output resolution of LivePortrait
  • 11:27 Overview of custom enhancements and features implemented beyond the official demo
  • 11:51 Default storage location for generated animated videos
  • 12:35 Exploring the impact of the Relative Motion feature
  • 13:41 Analyzing the effects of the Do Crop option
  • 14:17 Understanding the functionality of the Paste Back feature
  • 15:01 Demonstrating the influence of the Target Eyelid Open Ratio parameter
  • 17:02 Instructions for joining the SECourses Discord community

Cloud LivePortrait Tutorial Video Chapters

  • 0:00 Introduction to LivePortrait: A cloud-based tutorial for state-of-the-art image-to-animation open-source application
  • 2:26 Installation and utilization guide for LivePortrait on MassedCompute, featuring an exclusive discount code
  • 4:28 Instructions for applying the special 50% discount coupon on MassedCompute
  • 4:50 Configuration of ThinLinc client for accessing and operating the MassedCompute virtual machine
  • 5:33 Setting up ThinLinc client's synchronization folder for file transfer between local and MassedCompute systems
  • 6:20 Transferring installer files to the MassedCompute sync folder
  • 6:39 Connecting to the initialized MassedCompute virtual machine and installing the LivePortrait application
  • 9:22 Launching and operating LivePortrait on MassedCompute post-installation
  • 10:20 Initiating a second instance of LivePortrait on the additional GPU in MassedCompute
  • 12:20 Locating generated animation videos and bulk downloading to local storage
  • 13:23 LivePortrait installation process on RunPod cloud service
  • 14:54 Selecting the appropriate RunPod template
  • 15:20 Configuring RunPod proxy access ports
  • 16:21 Uploading installer files to RunPod's JupyterLab interface and commencing installation
  • 17:07 Launching LivePortrait on RunPod post-installation
  • 17:17 Starting a second LivePortrait instance on the additional GPU
  • 17:31 Accessing LivePortrait via RunPod's proxy connection
  • 17:55 Animating the initial image on RunPod using a 73-second driving video
  • 18:27 Analysis of processing time for a 73-second video animation (highlighting the application's impressive speed)
  • 18:41 Troubleshooting input upload errors with an example case
  • 19:17 One-click download feature for all generated animations on RunPod
  • 20:28 Monitoring the progress of animation generation
  • 21:07 Guide to installing and utilizing LivePortrait for free on a Kaggle account, emphasizing its remarkable speed
  • 24:10 Generating the first animation on Kaggle post-installation and launch
  • 24:22 Importance of complete image and video uploads to avoid errors
  • 24:35 Tracking animation status and progress on Kaggle
  • 24:45 Resource utilization analysis: GPU, CPU, RAM, VRAM, and animation processing speed on Kaggle
  • 25:05 Implementing the one-click download feature for all generated animations on Kaggle
  • 26:12 Restarting LivePortrait on Kaggle without reinstallation
  • 26:36 Instructions for joining the SECourses Discord community for support and discussions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants