MotionGPT3 is a bimodal motion-language framework designed to address the challenges of unified motion understanding and generation.
Technical details
Though recent advances in multimodal models have demonstrated strong capabilities and opportunities in unified understanding and generation, the development of unified motion-language models remains underexplored. To enable such models with high-fidelity human motion, two core challenges must be addressed. The first is the reconstruction gap between the continuous motion modality and discrete representation in an autoregressive manner, and the second is the degradation of language intelligence during unified training. Inspired by the mixture of experts, we propose MotionGPT3, a bimodal motion-language model that treats human motion as a second modality, decoupling motion modeling via separate model parameters and enabling both effective cross-modal interaction and efficient multimodal scaling training. To preserve language intelligence, the text branch retains the original structure and parameters of the pretrained language model, while a new motion branch is integrated via a shared attention mechanism, enabling bidirectional information flow between two modalities. We first employ a motion Variational Autoencoder (VAE) to encode raw human motion into latent representations. Based on this continuous latent space, the motion branch predicts motion latents directly from intermediate hidden states using a diffusion head, bypassing discrete tokenization. Extensive experiments show that our approach achieves competitive performance on both motion understanding and generation tasks while preserving strong language capabilities, establishing a unified bimodal motion diffusion framework within an autoregressive manner.
- [2025/06/20] Upload and init project
Setup and download
conda create python=3.11 --name mgpt
conda activate mgpt
Install the packages in requirements.txt and install PyTorch 2.0
pip install -r requirements.txt
python -m spacy download en_core_web_sm
We test our code on Python 3.11.11 and PyTorch 2.0.0.
Run the script to download dependencies materials:
bash prepare/download_smpl_model.sh
bash prepare/prepare_gpt2.sh
For Text to Motion Evaluation
bash prepare/download_t2m_evaluators.sh
Run the script to download the pre-trained model
bash prepare/download_pretrained_models.sh
Visit the Google Driver to download the previous dependencies.
Visit XXX to download the pretrained models.
Batch demo
We support txt file input, the output motions are npy files and output texts are txt files. Please check the configs/assets.yaml for path config, TEST.FOLDER as output folder.
Then, run the following script:
python demo.py --cfg ./configs/MoT_vae_stage3.yaml --example ./demos/t2m.txt
Some parameters:
--example=./demo/t2m.txt: input file as text prompts--task=t2m: evaluation tasks including t2m, m2t, pred, inbetween
The outputs:
npy file: the generated motions with the shape of (nframe, 22, 3)txt file: the input text prompt or text output
Training guidance
-
Please refer to HumanML3D for text-to-motion dataset setup.
-
Put the instructions data in
prepare/instructionsto the same folder of HumanML3D dataset.
Please first check the parameters in configs/MoT_vae_stage1_t2m.yaml, e.g. NAME, instruction_type, lm_ablation, DEBUG.
Then, run the following command:
python gen_mot_gpt.py
python -m train --cfg configs/MoT_vae_stage1_t2m.yaml --nodebug
Please update the parameters in configs/MoT_vae_stage2_instruct.yaml and configs/MoT_vae_stage2_all.yaml, e.g. NAME, instruction_type, lm_ablation, DEBUG, PRETRAINED_VAE(change to your latest ckpt model path in previous step)
Then, run the following command:
python -m train --cfg configs/MoT_vae_stage2_all.yaml --nodebug
python -m train --cfg configs/MoT_vae_stage2_instruct.yaml --nodebug
Please update the parameters in configs/MoT_vae_stage3.yaml, e.g. NAME, instruction_type, lm_ablation, DEBUG, PRETRAINED (change to your latest ckpt model path in previous step)
Then, run the following command:
python -m train --cfg configs/MoT_vae_stage3.yaml --nodebug
Please first put the tained model checkpoint path to TEST.CHECKPOINT in config files, e.g. configs/MoT_vae_stage3.yaml.
Then, run the following command:
python -m test --cfg configs/MoT_vae_stage3.yaml --task t2m
Some parameters:
--task: evaluation tasks including t2m(Text-to-Motion), m2t(Motion translation), pred(Motion prediction), inbetween(Motion inbetween)
Render SMPL
Refer to TEMOS-Rendering motions for blender setup, then install the following dependencies.
YOUR_BLENDER_PYTHON_PATH/python -m pip install -r prepare/requirements_render.txt
Run the following command using blender:
YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video
python -m fit --dir YOUR_NPY_FOLDER --save_folder TEMP_PLY_FOLDER --cuda
This outputs:
mesh npy file: the generate SMPL vertices with the shape of (nframe, 6893, 3)ply files: the ply mesh file for blender or meshlab
Run the following command to render SMPL using blender:
YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video
optional parameters:
--mode=video: render mp4 video--mode=sequence: render the whole motion in a png image.
Question-and-Answer
If you find our code or paper helps, please consider citing:
<!-- todo bibtex -->Thanks to MotionGPT, Motion-latent-diffusion, HumanML3D and MAR, our code is partially borrowing from them.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.