This is the script to process video data for SCARF training.
SCARF needs input image, subject mask, clothing mask, and inital SMPL-X estimation for training. Specificly, we use
- FasterRCNN to detect the subject and crop image
- RobustVideoMatting to remove background
- cloth-segmentation to segment clothing
- PIXIE to estimate SMPL-X parameters
When using the processing script, it is necessary to agree to the terms of their licenses and properly cite them in your work.
- Clone submodule repositories:
git submodule update --init --recursive
- Download their needed data:
bash fetch_asset_data.sh
If the script failed, please check their websites and download the models manually.
Put your data list into ./lists/subject_list.txt, it can be video path or image folders.
Then run
python process_video.py --crop --ignore_existing
Processing time depends on the number of frames and the size of video, for mpiis-scarf video (with 400 frames and resolution 1028x1920), need around 12min.
The script has been verified to work for datasets:
a. mpiis-scarf (recorded video for this paper)
b. People Snapshot Dataset (https://graphics.tu-bs.de/people-snapshot)
c. SelfRecon dataset (https://jby1993.github.io/SelfRecon/)
d. iPER dataset (https://svip-lab.github.io/dataset/iPER_dataset.html)
To get the optimal results for your customized video, it is recommended to capture the video using similar settings as the datasets mentioned above.
This means keeping the camera static, recording the subject with more views, and using uniform lighting. And better to have less than 1000 frames for training. For more information, please refer to the limitations section of SCARF.