-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Still no 512 or 768 pre trained model? #88
Comments
I'm trying to train 512 or higher resolution. But I met some challenge in getting 512 datasets. |
do you have a model i can test? |
Hey @nieweiqiang , You could use the Talking Head 1KH dataset, it is good for faces and had a lot of videos in the 512 or 768 (along with a lot of other to you might want to use a script to resize the videos). |
do you have any pre trained model i can test? or any tutorial how to train ourselves? i can get gpu power |
I don't have a pre-trained model (yet) as it is still training, and I can't give you a full tutorial as I'm not the author and did not go into all the details but I can give you what I did to get the training on. First of all you need a dataset to train on, I used this one: https://github.com/tcwang0509/TalkingHead-1KH The repo only includes the scripts but this is quite easy to use. Just few things to notice:
Once you have your dataset, don't try to extract the frames from the videos, I tried that and you would need more than 10 TB of storage (over 50 Millions frames extracted), and even if you have the time and the storage, despite what the documentation of Thin Plate mentions the scripts are meant to ingest mp4 videos not frames (although some parts seems to handle frames, but the main script only look for mp4 files). The script expect a hierarchy of folders as input that is not the same as Talking Head (TH) Dataset. So you will have to create a new folder (call it whatever you like, this will be your source folder). Create two subfolders: train and test. Copy (or move) the content of the folder train/cropped_clips from TH to the train folder, copy the content of the folder val/cropped_clips from TH to the test folder. The hardest part is knowing what to put in the yaml file of the config.
The last change is that you want to train on a specific size (512x512 in my case) so you have to make sure the videos are resized to that size. From what I can read in the script file, you should do that by setting the frame_shape settings in your yaml file (under the dataset_params section). However I did not found the correct format for that settings, in the script it is defined as (256,256,3) but it was not working in the YAML file when I used that value. Then you should be good to go! Just run the run.py script in the folder passing in your config file and voilà! Note that I'm not an expert, and I'm still trying to get an trained model so I don't guarantee these are the best steps, only what I did to get it working so far. |
@EdgarMaucourant awesome man so much explanation if you already have checkpoints can you send me latest one? if you don't want to publicly share you can email me : furkangozukara@gmail.com |
Hey @FurkanGozukara, I will share them when I will have them but for now this is still training... I will take probably several more days to train as the datasets if large. |
awesome looking forward too. you are doing an amazing job |
Anything peculiar coming up while training on higher resolutions? I'm going to follow this |
@EdgarMaucourant How is the model training? I trained it for more than ten hours, 200 epochs, image size is 384*384, but the effect is not very good.I plan to continue training |
actually the training failed after 65 hours without any output :'( |
sad Looking forward to results |
I change the same things for 512 training. The datasets I used is voxceled2. I resize the datasets to 512 and transfrom mp4 format to png. It costs about 11TB(only a part). If I use mp4 for training, it costs about 10 hours per epoch. But in png format, it costs about 1 hours per epoch. Total about 3days |
Can I see your log.txt? My traing is normal. |
Hi @nieweiqiang , Probably that the code to generate that vis is hardcoded to 256x256, I did not look at the code but I would supect that. On my end I'm giving up. Sorry guys, I was doing this on my spare time, and what ever I tried it fails at some point because I'm lacking memory or space on my computer (32 GB RAM is not enough I think, of maybe this is the GPU RAM). I tried to reduce the number of repeats the number of items in the dataset, but whatever I do it fails at some point and I'm lacking time to look into this further more. I hope that what I shared above for the yaml file was insightful and I wish you all the best for training a model! |
so sad to hear :( |
@FurkanGozukara Do you plan to continue the work of @EdgarMaucourant ? |
i have 0 idea right now how to prepare dataset and start training |
|
Tip if you want to use the talking head 1KH dataset. Use ffmpeg to extract two frames of the video onto the disk, then read that as you would with images. Training speed is almost exactly the same as using pre-processed image sequences. |
with liveportrait Thin-Plate-Spline-Motion-Model is now obsolete Windows LivePortrait TutorialAnimate Static Photos into Talking Videos with LivePortrait AI Compose Perfect Expressions FastCloud LivePortrait Tutorial : Massed Compute, RunPod & KaggleLivePortrait: No-GPU Cloud Tutorial - RunPod, MassedCompute & Free Kaggle Account - Animate ImagesWindows LivePortrait Tutorial Video Chapters
Cloud LivePortrait Tutorial Video Chapters
|
256 is working but just too bad resolution
relative_find_best_frame_true_square_aspect_ratio_vox.mp4
relative_find_best_frame_false_org_aspect_ratio_vox.mp4
The text was updated successfully, but these errors were encountered: