π GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Comparison_A.basketball.free.falls.in.the.air.mp4
We introduce GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis.
Jiaxi Lv*, Yi Huang*, Mingfu Yan*, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, VIVO AI Lab
-
2024-06-18 πππ Congratulations! GPT4Motion was received the Best Paper Runner-Up Award at the CVPR 2024 PBDL workshop.
-
2024-04-16 We released the code for GPT4Motion.
-
2024-04-09 GPT4Motion was accepted by the CVPR 2024 PBDL workshop!
-
2023-11-28 GPT4Motion was covered by Synced.
-
2023-11-22 GPT4Motion was recommended by AK and included in Hugging Face's daily papers.
-
2023-11-21 The GPT4Motion paper was uploaded to arXiv.
First, the user prompt is inserted into our designed prompt template. Then, the Python script generated by GPT-4 drives the Blender physics engine to simulate the corresponding motion, producing sequences of edge maps and depth maps. Finally, two ControlNets are employed to constrain the physical motion of video frames generated by Stable Diffusion, where a temporal consistency constraint is designed to enforce the coherence among frames.
Comparison_A.white.flag.flaps.in.the.wind.mp4
Comparison of the video results generated by different text-to-video models with the prompt "A white flag flaps in the wind".
Comparison_Water.flows.into.a.white.mug.on.a.table.top-down.view.mp4
Comparison of the video results generated by different text-to-video models with the prompt "Water flows into a white mug on a table, top-down view.
GPT4Motion_Basketball.drop.and.collision.mp4
GPT4Motion's results on basketball drop and collision.
GPT4Motion_A.white.flag.flags.in.light.or.the.or.strong.wind.mp4
GPT4Motion's results on "A white flag flags in light or the or strong wind".
GPT4Motion_A.white.T-shirt.flutters.in.light.or.the.or.strong.wind.mp4
GPT4Motion's results on "A white T-shirt flutters in light or the or strong wind".
GPT4Motion_Water.or.Viscous.or.Very.viscous.flows.into.a.white.mug.on.a.table.top-down.view.mp4
GPT4Motion's results on "Water or Viscous or Very viscous flows into a white mug on a table, top-down view".
For ease of reading, we list our directory structure.
βββ data
β βββ basketball
β βββ A basketball free falls in the air
β βββ depth
β β βββ depth_0000.png
β β βββ ... (more depth images)
β βββ freestyle
β βββ canny_0000.png
β βββ ... (more canny images)
βββ PhysicsGeneration
β βββ BlenderTool
β β βββ assets
β β β βββ basketball.obj
β β βββ __init__.py
β β βββ utils.py
β βββ prompt_for_GPT4.txt
β βββ script.py
βββ README.MD
βββ VideoGeneration
βββ config
β βββ basketball.yaml
βββ main.py
βββ requirements.txt
βββ utils
βββ Cross_Frame_Attention.py
βββ __init__.py
βββ utils_all.py
data
: This directory stores the data used in the project.basketball
: A subdirectory specifically for basketball-related data.A basketball free falls in the air
: Contains data for a scenario where a basketball is free-falling.depth
: Contains depth images.freestyle
: Contains canny images.
PhysicsGeneration
: This directory contains the complete prompt for GPT-4 and the code to obtain depth maps and edge maps rendered through Blender.VideoGeneration
: This directory contains the code for generating a video using depth maps and edge maps generated by Blender.
Please install Blender 3.6 according to https://www.blender.org/download/.
cd PhysicsGeneration
Please copy the prompt from "prompt_for_GPT4.txt" to GPT-4 and add the following prefix to the Python code generated by GPT-4:
import bpy
import os
import math
import random
from random import uniform
import mathutils
from sys import path
path.append(bpy.path.abspath("./"))
from BlenderTool.utils import *
ASSETS_PATH = 'BlenderTool/assets/'
You will get a Python file like "script.py", please use the following commands to generate the edge and depth maps:
blender -b -P script.py
The generated edge maps and depth maps are saved are saved in "../data/new/" folder.
Please move to the "VideoGeneration" folder and install the corresponding environment:
cd ../VideoGeneration/
conda create -n GPT4Motion python=3.9
conda activate GPT4Motion
pip install -r requirements.txt
You can generate videos based on our pre-existing depth and edge maps by following the instructions below:
python main.py config/basketball.yaml
The generated results are shown below: