EgoBody dataset is a novel large-scale dataset for egocentric 3D human pose, shape and motions under interactions in complex 3D scenes.
[Project page] [Paper] [Dataset] [EgoBody challenge]
[December 04, 2023] Text descriptions for motions are provided by Motion-X Dataset.
[October 20, 2022] All modalities of EgoBody are released (including third-person view RGBD, 3D scene, eye gaze/hand/head tracking, 3D human shape and motion annotations for the camera wearer, etc.)!
[July 17, 2022] The EgoBody challenge is released! The first phase of the challenge will end at October 1st. The participants are welcome to submit a 2-4 page abstract to our ECCV workshop.
[July 08, 2022] The EgoSet (egocentric RGB subset of EgoBody) is released! Other modalities (third-person view RGBD, 3D scene, eye gaze, etc.) will come soon.
[June 01, 2022] The EgoBody dataset will be part of the ECCV2022 workshop: Human Body, Hands, and Activities from Egocentric and Multi-view Cameras. Please check out our workshop website for more information.
Please register, sign the dataset license and download the dataset at https://egobody.inf.ethz.ch.
EgoBody dataset contains 125 sequences, 36 subjects and 15 indoor scenes. Please find more detailed statistics in our paper.
frames | train | val | test | total |
---|---|---|---|---|
MVSet | 116630 | 29140 | 73961 | 219731 |
EgoSet | 105388 | 25416 | 68307 | 199111 |
EgoSet_interactee | 90124 | 23332 | 62155 | 175611 |
MVSet
: synchronized frames captured from the Azure Kinects, from multiple third-person viewsEgoSet
: egocentric RGB frames captured from the HoloLens, calibrated and synchronized with the Kinect framesEgoSet_interactee
: frames where the interactee is visible in the egocentric view
EgoBody
├── data_info_release.csv
├── data_splits.csv
├── kinect_cam_params
│ ├── kinect_master/kinect_sub_1/kinect_sub_2/kinect_sub_3/kinect_sub_4
│ │ ├── Color.json
│ │ ├── IR.json
├── calibrations
│ ├── RECORDING_NAME
│ │ ├── kinect12_to_world/$scene_name$.json
│ │ ├── holo_to_kinect12.json
│ │ ├── kinect_11to12_color.json
│ │ ├── kinect_13to12_color.json
│ │ ├── (kinect_14to12_color.json)
│ │ ├── (kinect_15to12_color.json)
data_info_release.csv
: basic information for all sequencesrecording_name
: name for each sequence, named asrecording_202xxxxx_Sxx_Sxx_xx
.202xxxxx
is the capture date, andSxx_Sxx
refers tosubjectID(cemera_wearer)_subjectID(interactee)
body_idx_0
: gender of the body with index 0 in each sequencebody_idx_1
: gender of the body with index 1 in each sequencebody_idx_fpv
: body index and gender of the interactee in each sequencestart_frame
/end_frame
: the frame ID of starting/ending frame for each sequencescene_name
: name of the 3D scene for each sequence
data_splits.csv
: train/validation/test sequence splitscalibrations
: extrinsics between kinects/hololens/3D scene for each sequence- Note that the kinects are labelled as: 12 (
master
), 11 (sub_1
), 13 (sub_2
), 14 (sub_3
), 15 (sub_4
) kinect12_to_world/$scene_name$.json
: extrinsics betweenmaster
kinect RGB camera and the 3D scene meshholo_to_kinect12.json
: extrinsics betweenmaster
kinect RGB camera and the hololens world coordinate systemkinect_11to12_color.json
: extrinsics between RGB camera of themaster
kinect andsub_1
kinectkinect_13to12_color.json
: extrinsics between RGB camera of themaster
kinect andsub_2
kinectkinect_14to12_color.json
: extrinsics between RGB camera of themaster
kinect andsub_3
kinect, only exists for data captured in 2022kinect_15to12_color.json
: extrinsics between RGB camera of themaster
kinect andsub_4
kinect, only exists for data captured in 2022
- Note that the kinects are labelled as: 12 (
kinect_cam_params
:kinect_master/kinect_sub_1/kinect_sub_2/kinect_sub_3/kinect_sub_4
: intrinsics and extrinsics of the color/depth camera for each kinect
EgoBody
├── egocentric_color
│ ├── RECORDING_NAME
│ │ ├── 202x-xx-xx-xxxxxx
│ │ │ ├── PV
│ │ │ ├── 202x-xx-xx-xxxxxx_pv.txt
│ │ │ ├── keypoints.npz
│ │ │ ├── valid_frame.npz
├── egocentric_depth
│ ├── RECORDING_NAME
│ │ ├── 202x-xx-xx-xxxxxx
│ │ │ ├── ...
├── egocentric_gaze
│ ├── RECORDING_NAME
│ │ ├── 202x-xx-xx-xxxxxx
│ │ │ ├── 202x-xx-xx-xxxxxx_head_hand_eye.csv
egocentric_color
: egocentric RGB images and hololens camera informationRECORDING_NAME
: recording_namePV
: egocentric RGB frames of the current sequence, named astimestamp_frame_xxxxx.jpg
, whereframe_xxxxx
is the ID for each frame202x-xx-xx-xxxxxx_pv.txt
:- row 1: RGB(PV) camera
cx
,cy
,w
,h
of the current sequence - row >=2:
timestamp
,fx
,fy
,pv2world_transform
of each each RGB(pv) frame.pv2world_transform
is the extrinsics between the RGB(PV) camera of each frame and the hololens world coordinate of the current sequece (each hololens sequence has a consitent world coordinate system for the whle sequence).
- row 1: RGB(PV) camera
keypoints.npz
:imgname
: egocentric PV image paths (e.x.,egocentric_color/RECORDING_NAME/202x-xx-xx-xxxxxx/PV/timestamp_frame_xxxxx.jpg
)center
: center of the bounding box (to crop the person out) for each PV framescale
: scale of the bounding box for each PV framekeypoints
: openpose body joints (BODY_25 format) of the person (interactee) for each PV framegender
: gender of the interactee for each PV frame
valid_frame.npz
:imgname
: egocentric PV image paths (e.x.,egocentric_color/RECORDING_NAME/202x-xx-xx-xxxxxx/PV/timestamp_frame_xxxxx.jpg
)valid
:True/False
,True
indicates that the detected openpose body joints >= 6 for the interactee in each PV frame
- Note:
keypoints.npz
,valid_frame.npz
and202x-xx-xx-xxxxxx_pv.txt
can contain frame IDs outside of the range of[start_frame, end_frame]
, please ignore those frames.
egocentric_depth
: egocentric depth recordingsRECORDING_NAME
: recording_name- each sequence contains recorded depth, lookup table, and depth camera extrinsics (please refer here for more information)
egocentric_gaze
: egocentric eye gaze recordingsRECORDING_NAME
: recording_name202x-xx-xx-xxxxxx_head_hand_eye.csv
: each row includestimestamp
andhead/hand/eye gaze tracking
for the current timestamp, please referload_head_hand_eye_data()
inutils.py
for details.
EgoBody
├── kinect_color
│ ├── RECORDING_NAME
│ │ ├── master/sub_1/sub_2(/sub_3/sub_4)
│ │ │ ├── frame_xxxxx.jpg
├── kinect_depth
│ ├── RECORDING_NAME
│ │ ├── master/sub_1/sub_2(/sub_3/sub_4)
│ │ │ ├── frame_xxxxx.png
kinect_color
: multi-view third-person view RGB images captured by Kinect camerasRECORDING_NAME
: recording_namemaster/frame_xxxxx.jpg
: RGB frame formaster
kinectsub_1/frame_xxxxx.jpg
: RGB frame forsub_1
kinect ...- Note that here frame ID
frame_xxxxx
is synchronized with the corresponding egocentric RGB frametimestamp_frame_xxxxx.jpg
of the same sequence.
kinect_depth
: multi-view third-person view depth images captured by Kinect camerasRECORDING_NAME
: recording_namemaster/frame_xxxxx.png
: depth frame formaster
kinectsub_1/frame_xxxxx.png
: depth frame forsub_1
kinect ...- Note that here frame ID
frame_xxxxx
is synchronized with the kinect RGB frame with frame IDframe_xxxxx
EgoBody
├── scene_mesh
│ ├── $scene_name$
│ │ ├── $scene_name$.obj
$scene_name.obj$
: 3D scene mesh for scene$scene_name$
EgoBody
├── smplx_interactee_train
│ ├── RECORDING_NAME/body_idx_x/results/frame_xxxxx/000.pkl
├── smplx_interactee_val
├── smplx_camera_wearer_train
│ ├── RECORDING_NAME/body_idx_x/results/frame_xxxxx/000.pkl
├── smplx_camera_wearer_val
├── smpl_interactee_train
│ ├── RECORDING_NAME/body_idx_x/results/frame_xxxxx/000.pkl
├── smpl_interactee_val
├── smpl_camera_wearer_train
│ ├── RECORDING_NAME/body_idx_x/results/frame_xxxxx/000.pkl
├── smpl_camera_wearer_val
smplx_interactee_train/val
andsmplx_camera_wearer_train/val
: SMPL-X body parameters for each frame of the interactee/camera wearer in training/val set- always in the coordinate system of the
master
kinect RGB camera body_idx_x
is the body index of the interactee/camera wearer in the current sequence, andframe_xxxxx
is the ID for each frame.
- always in the coordinate system of the
smpl_interactee_train/val
andsmpl_camera_wearer_train/val
: SMPL body parameters for each frame of the interactee/camera wearer
Text descriptions for motions are provided by Motion-X Dataset.
Render SMPL-X/SMPL bodies from the egocentric (hololens) view:
python release_renderer_fpv_gaze.py --release_data_root=PATH/TO/DATASET --save_root=PATH/TO/SAVE/RESULTS --recording_name RECORDING_NAME --scene_name SCENE_NAME
Available options:
model_type
:smpl/smplx
, render SMPL-X or SMPL bodiesplot_2d_joints
: if set toTrue
, plot 2D joints of openpose detections and the projected 2D joints of ground truth SMPL-X/SMPL bodiesplot_gaze
: if set toTrue
, plot 2D projection of the camera wearer's gaze point on egocentric view imagesrendring_mode
:body
renders 3D body mesh projected on the RGB images,3d
renders 3D body mesh in 3D scenes,both
renders both optionsmodel_folder
: the path to SMPL-X/SMPL models
Render SMPL-X/SMPL bodies from the third-person (kinect) view:
python release_renderer_kinect.py --release_data_root=PATH/TO/DATASET --save_root=PATH/TO/SAVE/RESULTS --recording_name RECORDING_NAME --scene_name SCENE_NAME
Available options:
model_type
:smpl/smplx
, render SMPL-X or SMPL bodiesview
: from which view of kinect to render the body (options:master/sub_1/sub_2/sub_3/sub_4
)rendring_mode
:body
renders 3D body mesh projected on the RGB images,3d
renders 3D body mesh in 3D scenes,both
renders both optionsmodel_folder
: the path to SMPL-X/SMPL models
Visualize point clouds from kinect RGB/depth and the 3d scene mesh together:
python release_vis_kinect_scene.py --release_data_root=PATH/TO/DATASET --recording_name RECORDING_NAME --scene_name SCENE_NAME
Available options:
vis_frame_id
: which frame to visualize, in the format ofxxxxx
, for example, `03000'
Visualize point clouds from kinect RGB/depth of all kinect views together:
python release_vis_kinect_pcd.py --release_data_root=PATH/TO/DATASET --recording_name RECORDING_NAME
Available options:
vis_frame_id
: which frame to visualize, in the format ofxxxxx
, for example, `03000'
To read the depth, head/hand tracking data:
Please refer to HoloLens2ForCV for details.
- To read hololens depth and convert to point clouds, use:
HoloLens2ForCV/Samples/StreamRecorder/StreamRecorderConverter/save_pclouds.py
- To read head/hand tracking data and project onto the egocentric image, use:
HoloLens2ForCV/Samples/StreamRecorder/StreamRecorderConverter/project_hand_eye_to_pv.py
Method | MPJPE | PA-MPJPE | V2V | PA-V2V |
---|---|---|---|---|
CMR | 200.7 | 109.6 | 218.7 | 136.8 |
SPIN | 182.8 | 116.6 | 187.3 | 123.8 |
LGD | 158.0 | 99.9 | 168.3 | 106.0 |
METRO | 153.1 | 98.4 | 164.6 | 106.5 |
PARE | 123.0 | 83.8 | 131.4 | 89.7 |
EFT | 123.9 | 78.4 | 135.0 | 86.0 |
SPIN-ft | 106.5 | 67.1 | 120.9 | 78.3 |
METRO-ft | 98.5 | 66.9 | 110.5 | 76.8 |
EFT-ft | 102.1 | 64.8 | 116.1 | 74.8 |
- Here '-ft' denotes results of fine-tuning SPIN, METRO and EFT on our training set.
@inproceedings{Zhang:ECCV:2022,
title = {EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices},
author = {Zhang, Siwei and Ma, Qianli and Zhang, Yan and Qian, Zhiyin and Kwon, Taein and Pollefeys, Marc and Bogo, Federica and Tang, Siyu},
booktitle = {European conference on computer vision (ECCV)},
month = oct,
year = {2022}
}
This work was supported by the Microsoft Mixed Reality & AI Zurich Lab PhD scholarship. Qianli Ma is partially funded by the Max Planck ETH Center for Learning Systems. We sincerely thank Francis Engelmann, Korrawe Karunratanakul, Theodora Kontogianni, Qi Ma, Marko Mihajlovic, Sergey Prokudin, Matias Turkulainen, Rui Wang , Shaofei Wang and Samokhvalov Vyacheslav for helping with the data capture and processing, Xucong Zhang for the discussion of data collection and Jonas Hein for the discussion of the hardware setup. Siyu Tang acknowledges the SNF grant 200021 204840.
The motion reconstruction pipeline benefits from:
Learning Motion Priors for 4D Human Body Capture in 3D Scenes (ICCV 2021 (Oral))
Siwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys and Siyu Tang
Resolving 3D Human Pose Ambiguities with 3D Scene Constraints (ICCV 2019)
Mohamed Hassan, Vassilis Choutas, Dimitrios Tzionas and Michael J. Black