AnimateAnyone_unofficial

Unofficial implementation of Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Pre-trained model: stable diffusion 1.5
Resolution: 512
Batch size: 2
GPU: single A6000 48G
Trainging time: 12 hours, global iteration: 37800
Trainging time: 2 days, global iteration: 127400
Trainging time: 2.5 days, global iteration: 180000
Under training...

Up to now, after 180,000 training sessions, this unofficial code implementation still seems unable to correctly learn information about the human skeleton. Sometimes, it even fails to generate a normal human figure, displaying only the background. Moreover, this background seems to resemble the style of the reference.

😄😄🚀🚀Due to the absence of official source code release, this unofficial code has not been thoroughly validated, and there are still many details to be verified. We welcome collaboration from the community to collectively implement and refine this algorithm！！！

Description

This repo is mainly to re-implement AnimateAnyone based on official ControlNet repository.

AnimateAnyone: Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Getting Started

Prerequisites

Linux or macOS
NVIDIA GPU + CUDA CuDNN
Python 3

Installation

Clone the repository:

git clone https://github.com/MingtaoGuo/AnimateAnyone_unofficial.git
cd AnimateAnyone_unofficial

Dependencies:
We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment.yaml.

First stage training

Downloading the pre-trained stable diffusion v1-5-pruned.ckpt .
Extraction of CLIP Vision Embedder Weights

python tool_get_visionclip.py

Copying Weights from Pretrained stable diffusion model to ReferenceNet

python tool_add_reference.py ./models/v1-5-pruned.ckpt ./models/reference_sd15_ini.ckpt

Preprocessing Video Dataset (Video Decoding and Human Skeleton Extraction)

python tool_get_pose.py --mp4_path Dataset/fashion_mp4/ \
                        --save_frame_path Dataset/fashion_png/ \
                        --save_pose_path Dataset/fashion_pose/

Dataset Organization Structure

Dataset
  ├── fashion_mp4
      ├── 1.mp4
      ├── 2.mp4
       ...
  ├── fashion_png
      ├── 1.mp4
          ├── 1.png
          ├── 2.png
           ...
      ├── 2.mp4
          ├── 1.png
          ├── 2.png
             ...
         ...
  ├── fashion_pose
      ├── 1.mp4
          ├── 1.png
          ├── 2.png
           ...
      ├── 2.mp4
          ├── 1.png
          ├── 2.png
             ...
         ...

Training 🚀

python tutorial_train_animate.py

Custom Dataset

import json
import os 
import cv2
import numpy as np
from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, path="Dataset/"):
        self.path = path
        self.videos = os.listdir(path + "fashion_png")

    def __len__(self):
        return len(self.videos) * 10

    def __getitem__(self, idx):
        video_name = np.random.choice(self.videos)
        frames = np.random.choice(os.listdir(self.path + "/fashion_png/" + video_name), [2])
        ref_frame, tgt_frame = frames[0], frames[1]
        ref_bgr = cv2.imread(self.path + "/fashion_png/"  + video_name + "/" + ref_frame)
        ref_rgb = cv2.cvtColor(ref_bgr, cv2.COLOR_BGR2RGB)
        ref_rgb = (ref_rgb.astype(np.float32) / 127.5) - 1.0

        tgt_bgr = cv2.imread(self.path + "/fashion_png/"  + video_name + "/" + tgt_frame)
        tgt_rgb = cv2.cvtColor(tgt_bgr, cv2.COLOR_BGR2RGB)
        tgt_rgb = (tgt_rgb.astype(np.float32) / 127.5) - 1.0

        skt_bgr = cv2.imread(self.path + "/fashion_pose/"  + video_name + "/" + tgt_frame)
        skt_rgb = cv2.cvtColor(skt_bgr, cv2.COLOR_BGR2RGB)
        skt_rgb = skt_rgb.astype(np.float32) / 255.0

        return dict(target=tgt_rgb, vision=ref_rgb, reference=ref_rgb, skeleton=skt_rgb)

Author

Mingtao Guo E-mail: gmt798714378 at hotmail dot com

Acknowledgement

We are very grateful for the official ControlNet repository.

Reference

[1]. Hu, Li, et al. "Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation." arXiv preprint arXiv:2311.17117 (2023).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AnimateAnyone_unofficial

Description

Getting Started

Prerequisites

Installation

First stage training

Author

Acknowledgement

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

AnimateAnyone_unofficial

Description

Getting Started

Prerequisites

Installation

First stage training

Author

Acknowledgement

Reference