Action-conditioned Contrastive Learning for 3D Human Pose and Shape Estimation In Videos

Abstract

The aim of this research is to estimate 3D human pose and shape in videos, which is a challenging task due to the complex nature of the human body and the wide range of possible pose and shape variations. This problem also poses difficulty in finding a satisfactory solution due to the trade-off between the accuracy and temporal consistency of the estimated 3D pose and shape. Thus previous researches have prioritized one objective over the other. In contrast, we propose a novel approach called the action-conditioned mesh recovery (ACMR) model, which improves accuracy without compromising temporal consistency by leveraging human action information. Our ACMR model outperforms existing methods that prioritize temporal consistency in terms of accuracy, while also achieving comparable temporal consistency with other state-of-the-art methods. Significantly, the action-conditioned learning process occurs only during training, requiring no additional resources at inference time, thereby enhancing performance without increasing computational demands.

Introduction

This repository provides the PyTorch implementation of our paper Action-conditioned contrastive learning for 3D human pose and shape estimation in videos. The implementation builds upon the excellent work of TCMR.

Dataset Preparation

Please follow the dataset preparation steps from TCMR. For action pseudo-labels, we used SlowFast through MMAction2. You can either:

Use our pre-processed pt files that include action labels, or
Set up MMAction2 and utilize the demo code in trainer to generate pseudo action labels

Demo

Download the ACMR weights from here
Run demo.py with your input frame folder

Citation

If you find this work useful, please consider citing:

@article{song2024action,
  title={Action-conditioned contrastive learning for 3D human pose and shape estimation in videos},
  author={Song, Inpyo and Ryu, Moonwook and Lee, Jangwon},
  journal={Computer Vision and Image Understanding},
  volume={249},
  pages={104149},
  year={2024},
  publisher={Elsevier}
}

Acknowledgments

This work builds upon several excellent previous works. We sincerely thank the authors of:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Figures		Figures
configs		configs
lib		lib
scripts		scripts
.DS_Store		.DS_Store
LICENSE.md		LICENSE.md
README.md		README.md
demo.py		demo.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Action-conditioned Contrastive Learning for 3D Human Pose and Shape Estimation In Videos

Abstract

Introduction

Dataset Preparation

Demo

Citation

Acknowledgments

About

Releases

Packages

Languages

License

Songinpyo/ACMR

Folders and files

Latest commit

History

Repository files navigation

Action-conditioned Contrastive Learning for 3D Human Pose and Shape Estimation In Videos

Abstract

Introduction

Dataset Preparation

Demo

Citation

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages