Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang
EgoGen: a scalable synthetic data generation system for egocentric perception tasks, with rich multi-modal data and accurate annotations. We simulate camera rigs for head-mounted devices (HMDs) and render from the perspective of the camera wearer with various sensors. Top to bottom: middle and right camera sensors in the rig. Left to right: photo-realistic RGB image, RGB with simulated motion blur, depth map, surface normal, segmentation mask, and world position for fisheye cameras widely used in HMDs.
We will release all code before the CVPR 2024 conference.
- Motion model eval code in Replica room0
- Motion model training code (two-stage RL in crowded scenes)
- Motion model eval code for crowd motion synthesis
- Motion model training code (models for dynamic evaluations)
- Motion primitive C-VAE training code
- Egocentric human mesh recovery code (RGB/depth images as input)
- EgoBody synthetic data (RGB/depth)
- EgoBody synthetic data generation script (incl. automated clothing simulation)
- EgoGen rendering pipeline code
Download the packed conda environment here. Note: As modifications have been made to some packages in the environment, we do not provide req.txt
or env.yml
.
Please install conda-pack
to unpack the environment:
mkdir -p egogen
tar -xzf egogen.tar.gz -C egogen
source egogen/bin/activate
The code is tested on Ubuntu 22.04, CUDA 11.7.
- SMPL-X body model and VPoser
- Precomputed sdf for replica room0
- Pretrained marker regressor and predictor model (C-VAE)
- Pretrained policy model for motion synthesis in Replica room0
- Pretrained policy model for motion synthesis in dynamic settings
- Static box scenes for policy training
- SAMP Mocap dataset
Organize them as following:
EgoGen
├── motion
├── crowd_ppo/
├── data/
| ├── smplx/
| │ └── models/
| | |── smplx/
| | | |── SMPLX_MALE.npz
| | | |── ...
| | |
| | |── vposer_v1_0/
| | | |── snapshots/TR00_E096.pt
| | | |── ...
| ├── room0_sdf.pkl
| ├── checkpoint_87.pth
| ├── checkpoint_best.pth
| ├── scenes/
| | ├── random_box_obstacle_new/
| | └── random_box_obstacle_new_names.pkl
| ├── samp/*_stageII.pkl # original samp dataset
| └── ...
├── results/ # C-VAE pretrained models
SAMP dataset is processed to motion primitive format with these two commands:
python exp_GAMMAPrimitive/utils/utils_canonicalize_samp.py 1
python exp_GAMMAPrimitive/utils/utils_canonicalize_samp.py 10
cp -r data/samp/Canonicalized-MP/data/locomotion data/
Processed files will be located at data/samp/Canonicalized-MP*/
. And copy locomotion data for initial motion seed sampling in policy training.
cd motion
python -W ignore crowd_ppo/main_ppo.py --resume-path=data/checkpoint_87.pth --watch --deterministic-eval
This will generate a virtual human walking in the replica room0 with sampled (start, target)
location. Generated motion sequences are located in log/eval_results/
python vis.py --path motion_seq_path
The pretrained model is trained in scenes with a single static box obstacle. And it is directly generalizable to dynamic settings. We release the code for four humans switching locations. You can easily modify the code for two/eight humans crowd motion synthesis.
cd motion
python crowd_ppo/main_crowd_eval.py (--deterministic-eval)
--deterministic-eval
is optional. If you want to synthesize more diverse motions, do not add it. You may also randomly sample initial motion seed to further increase diversity. Generated motion sequences are located in log/eval_results/crowd-4human/*
python vis_crowd.py --path 'log/eval_results/crowd-4human/*'
cd motion
python -W ignore crowd_ppo/main_ppo.py
We selected checkpoint_113.pth
as the best pretrained model.
Principles to choose the model: (1) the reward is high and the kld loss is small; (2) choose models with smaller epoch numbers if their rewards are similar. These principles will make sure the learned action space does not deviate too much from the prior, and as a result producing more natural motions.
python -W ignore crowd_ppo/main_ppo.py --resume-path=/path/to/pretrained/checkpoint_113.pth --logdir=log/finetune/ --finetune
This would produce log/finetune/checkpoint_87.pth
that you downloaded before. The best model should have (1) high reward; (2) small kld loss.
Our egocentric perception driven motion primitives exhibit remarkable generalizability. The model is training with static scenes but can be used to synthesize crowd motions. To train such model:
cd motion
python crowd_ppo/main_ppo_box.py
This would produce models with similar performance as checkpoint_best.pth
(its test reward was 10.22), which should be the trained log/log_box/checkpoint_164.pth
. Using this model, you can synthesize human motions in dynamic settings.
Our action space is the latent space (128-D Gaussian) of this C-VAE.
Make sure you did "Data preparation" section. The body marker predictor (history markers + action -> future markers) can be trained as:
python exp_GAMMAPrimitive/train_GAMMAPredictor.py --cfg MPVAE_samp20_2frame
python exp_GAMMAPrimitive/train_GAMMAPredictor.py --cfg MPVAE_samp20_2frame_rollout --resume_training 1
# The above command will raise FileExistsError. Copy the last ckpt from MPVAE_samp20_2frame to MPVAE_samp20_2frame_rollout as epoch 0:
cp results/exp_GAMMAPrimitive/MPVAE_samp20_2frame/checkpoints/epoch-300.ckp results/exp_GAMMAPrimitive/MPVAE_samp20_2frame_rollout/checkpoints/epoch-000.ckp
# And run it again:
python exp_GAMMAPrimitive/train_GAMMAPredictor.py --cfg MPVAE_samp20_2frame_rollout --resume_training 1
The final trained model results/exp_GAMMAPrimitive/MPVAE_samp20_2frame_rollout/checkpoints/epoch-400.ckp
is the pretrained results/crowd_ppo/MPVAE_samp20_2frame_rollout/checkpoints/epoch-400.ckp
.
For body marker regressor (markers -> body mesh), we use the pretrained model from GAMMA.
Refer to EgoBody synthetic data generation script. We use Pyrender in this script for faster rendering speed, which may reduce photorealism. If you require higher quality please check EgoGen rendering module.
Please download the blender file here. Some notes of the code:
- When importing
.pkl
motion sequences to blender using the script, please click "Scene Collection" first, then run the script. render.py
: convert.npz
files generated by blender to images.vid.sh
: convert images to videos.- Blender dependencies: smpl-x blender addon and vision blender.
Tested on Blender 3.4.1 Linux x64.
-
Third-party software and datasets employs their respective license. Here are some examples:
- SMPL-X body model follows its own license.
- AMASS dataset follows its own license.
- Blender and its SMPL-X add-on employ their respective license.
-
The rests employ the Apache 2.0 license.
@inproceedings{li2024egogen,
title={{EgoGen: An Egocentric Synthetic Data Generator}},
author={Li, Gen and Zhao, Kaifeng and Zhang, Siwei and Lyu, Xiaozhong and Dusmanu, Mihai and Zhang, Yan and Pollefeys, Marc and Tang, Siyu},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}