Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions demos/python_demos/human_pose_estimation_3d_demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,93 @@ The application uses OpenCV to display found poses and current inference perform

![](./data/human_pose_estimation_3d_demo.jpg)



# 2D Human Pose Estimation Python* Demo

This demo demonstrates how to run 2D Human Pose Estimation models using OpenVINO™. The following pre-trained models can be used:

* `human-pose-estimation-0001`.

For more information about the pre-trained models, refer to the [model documentation](../../../models/public/index.md).

> **NOTE**: Only batch size of 1 is supported.
## How It Works

The demo application expects a 2D human pose estimation model in the Intermediate Representation (IR) format.

As input, the demo application can take:
* a path to a video file or a device node of a web-camera.
* a list of image paths.

The demo workflow is the following:

1. The demo application reads video frames one by one and estimates 2D human poses in a given frame.
2. The app visualizes results of its work as graphical window with 2D poses, which are overlaid on input image.

> **NOTE**: By default, Open Model Zoo demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html).
## Prerequisites

This demo application requires a native Python extension module to be built before you can run it.
Refer to [Using Open Model Zoo demos](../../README.md), for instructions on how to build it and prepare the environment for running the demo.

## Running

Run the application with the `-h` option to see the following usage message:

```
usage: human_pose_estimation_2d_demo.py [-h] -m MODEL [-i INPUT [INPUT ...]]
[-d DEVICE]
[--height_size HEIGHT_SIZE]
[--extrinsics_path EXTRINSICS_PATH]
[--fx FX] [--no_show]
[-u UTILIZATION_MONITORS]

Lightweight 2D human pose estimation demo. Press esc to exit, "p" to (un)pause
video or process next image.

Options:
-h, --help Show this help message and exit.
-m MODEL, --model MODEL
Required. Path to an .xml file with a trained model.
-i INPUT [INPUT ...], --input INPUT [INPUT ...]
Required. Path to input image, images, video file or
camera id.
-d DEVICE, --device DEVICE
Optional. Specify the target device to infer on: CPU,
GPU, FPGA, HDDL or MYRIAD. The demo will look for a
suitable plugin for device specified (by default, it
is CPU).
--height_size HEIGHT_SIZE
Optional. Network input layer height size.
--extrinsics_path EXTRINSICS_PATH
Optional. Path to file with camera extrinsics.
--fx FX Optional. Camera focal length.
--no_show Optional. Do not display output.
-u UTILIZATION_MONITORS, --utilization_monitors UTILIZATION_MONITORS
Optional. List of monitors to show initially.
```

Running the application with an empty list of options yields the short version of the usage message and an error message.

To run the demo, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](../../../tools/downloader/README.md) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/).

> **NOTE**: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (`*.xml` + `*.bin`) using the [Model Optimizer tool](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html).
To run the demo, please provide paths to the model in the IR format, and to an input video or image(s):
```bash
python human_pose_estimation_2d_demo.py \
-m /home/user/human-pose-estimation-0001.xml \
-i /home/user/video_name.mp4
```

## Demo Output

The application uses OpenCV to display found poses and current inference performance.

![](./data/human_pose_estimation_2d_demo.jpg)

## See Also
* [Using Open Model Zoo demos](../../README.md)
* [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html)
* [Model Downloader](../../../tools/downloader/README.md)

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
#!/usr/bin/env python
"""
Copyright (c) 2019 Intel Corporation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

from argparse import ArgumentParser, SUPPRESS
import json
import os

import cv2
import numpy as np

from modules.inference_engine import InferenceEngine
from modules.input_reader import InputReader
from modules.draw import Plotter3d, draw_poses
from modules.parse_2d_poses import parse_poses

if __name__ == '__main__':
parser = ArgumentParser(description='Lightweight 2D human pose estimation demo. '
'Press esc to exit, "p" to (un)pause video or process next image.',
add_help=False)
args = parser.add_argument_group('Options')
args.add_argument('-h', '--help', action='help', default=SUPPRESS,
help='Show this help message and exit.')
args.add_argument('-m', '--model',
help='Required. Path to an .xml file with a trained model.',
type=str, required=True)
args.add_argument('-i', '--input',
help='Required. Path to input image, images, video file or camera id.',
nargs='+', default='')
args.add_argument('-d', '--device',
help='Optional. Specify the target device to infer on: CPU, GPU, FPGA, HDDL or MYRIAD. '
'The demo will look for a suitable plugin for device specified '
'(by default, it is CPU).',
type=str, default='CPU')
args.add_argument('--height_size', help='Optional. Network input layer height size.', type=int, default=256)
args.add_argument('--extrinsics_path',
help='Optional. Path to file with camera extrinsics.',
type=str, default=None)
args.add_argument('--fx', type=np.float32, default=-1, help='Optional. Camera focal length.')
args.add_argument('--no_show', help='Optional. Do not display output.', action='store_true')
args = parser.parse_args()

if args.input == '':
raise ValueError('Please, provide input data.')

stride = 8
inference_engine = InferenceEngine(args.model, args.device, stride)
canvas_2d = np.zeros((720, 1280, 3), dtype=np.uint8)
plotter = Plotter3d(canvas_2d.shape[:2])
canvas_2d_window_name = 'Canvas 2D'

frame_provider = InputReader(args.input)
is_video = frame_provider.is_video
base_height = args.height_size
fx = args.fx

delay = 1
esc_code = 27
p_code = 112
space_code = 32
mean_time = 0
for frame in frame_provider:
current_time = cv2.getTickCount()
input_scale = base_height / frame.shape[0]
scaled_img = cv2.resize(frame, dsize=None, fx=input_scale, fy=input_scale)
if fx < 0: # Focal length is unknown
fx = np.float32(0.8 * frame.shape[1])

inference_result = inference_engine.infer(scaled_img)

poses_2d = parse_poses(inference_result, input_scale, stride, fx, is_video)

draw_poses(frame, poses_2d)
current_time = (cv2.getTickCount() - current_time) / cv2.getTickFrequency()
if mean_time == 0:
mean_time = current_time
else:
mean_time = mean_time * 0.95 + current_time * 0.05
cv2.putText(frame, 'FPS: {}'.format(int(1 / mean_time * 10) / 10),
(40, 80), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 255))
if args.no_show:
continue
cv2.imshow('2D Human Pose Estimation', frame)

key = cv2.waitKey(delay)
if key == esc_code:
break
if key == p_code:
if delay == 1:
delay = 0
else:
delay = 1
if delay == 0 or not is_video:
key = 0
while (key != p_code
and key != esc_code
and key != space_code):

key = cv2.waitKey(33)
if key == esc_code:
break
else:
delay = 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#!/usr/bin/env python
"""
Copyright (c) 2019 Intel Corporation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

import numpy as np

from modules.pose import Pose, propagate_ids
from pose_extractor import extract_poses

AVG_PERSON_HEIGHT = 180

# pelvis (body center) is missing, id == 2
map_id_to_panoptic = [1, 0, 9, 10, 11, 3, 4, 5, 12, 13, 14, 6, 7, 8, 15, 16, 17, 18]

limbs = [[18, 17, 1],
[16, 15, 1],
[5, 4, 3],
[8, 7, 6],
[11, 10, 9],
[14, 13, 12]]


def get_root_relative_poses(inference_results):
#features,
heatmap, paf_map = inference_results

upsample_ratio = 4
found_poses = extract_poses(heatmap[0:-1], paf_map, upsample_ratio)
# scale coordinates to features space
found_poses[:, 0:-1:3] /= upsample_ratio
found_poses[:, 1:-1:3] /= upsample_ratio

poses_2d = []
num_kpt_panoptic = 19
num_kpt = 18
for pose_id in range(found_poses.shape[0]):
if found_poses[pose_id, 5] == -1: # skip pose if does not found neck
continue
pose_2d = np.ones(num_kpt_panoptic * 3 + 1, dtype=np.float32) * -1 # +1 for pose confidence
for kpt_id in range(num_kpt):
if found_poses[pose_id, kpt_id * 3] != -1:
x_2d, y_2d, conf = found_poses[pose_id, kpt_id * 3:(kpt_id + 1) * 3]
pose_2d[map_id_to_panoptic[kpt_id] * 3] = x_2d # just repacking
pose_2d[map_id_to_panoptic[kpt_id] * 3 + 1] = y_2d
pose_2d[map_id_to_panoptic[kpt_id] * 3 + 2] = conf
pose_2d[-1] = found_poses[pose_id, -1]
poses_2d.append(pose_2d)
poses_2d = np.array(poses_2d)

keypoint_treshold = 0.1
poses_3d = np.ones((len(poses_2d), num_kpt_panoptic * 4), dtype=np.float32) * -1
for pose_id in range(poses_3d.shape[0]):
if poses_2d[pose_id, 2] <= keypoint_treshold:
continue

neck_2d = poses_2d[pose_id, 0:2].astype(np.int32)

# refine keypoints coordinates at corresponding limbs locations
for limb in limbs:
for kpt_id_from in limb:
if poses_2d[pose_id, kpt_id_from * 3 + 2] <= keypoint_treshold:
continue
for kpt_id_where in limb:
kpt_from_2d = poses_2d[pose_id, kpt_id_from * 3:kpt_id_from * 3 + 2].astype(np.int32)
break

return poses_2d


previous_poses_2d = []


def parse_poses(inference_results, input_scale, stride, fx, is_video=False):
global previous_poses_2d
poses_2d = get_root_relative_poses(inference_results)
poses_2d_scaled = []
for pose_2d in poses_2d:
num_kpt = (pose_2d.shape[0] - 1) // 3
pose_2d_scaled = np.ones(pose_2d.shape[0], dtype=np.float32) * -1
for kpt_id in range(num_kpt):
if pose_2d[kpt_id * 3] != -1:
pose_2d_scaled[kpt_id * 3] = pose_2d[kpt_id * 3] * stride / input_scale
pose_2d_scaled[kpt_id * 3 + 1] = pose_2d[kpt_id * 3 + 1] * stride / input_scale
pose_2d_scaled[kpt_id * 3 + 2] = pose_2d[kpt_id * 3 + 2]
pose_2d_scaled[-1] = pose_2d[-1]
poses_2d_scaled.append(pose_2d_scaled)

if is_video: # track poses ids
current_poses_2d = []
for pose_2d_scaled in poses_2d_scaled:
pose_keypoints = np.ones((Pose.num_kpts, 2), dtype=np.int32) * -1
for kpt_id in range(Pose.num_kpts):
if pose_2d_scaled[kpt_id * 3] != -1.0: # keypoint was found
pose_keypoints[kpt_id, 0:2] = pose_2d_scaled[kpt_id * 3:kpt_id * 3 + 2].astype(np.int32)
pose = Pose(pose_keypoints, pose_2d_scaled[-1])
current_poses_2d.append(pose)
propagate_ids(previous_poses_2d, current_poses_2d)
previous_poses_2d = current_poses_2d

# translate poses
for pose_id in range(poses_2d.shape[0]):
pose_2d = poses_2d[pose_id][0:-1].reshape((-1, 3)).transpose()
num_valid = np.count_nonzero(pose_2d[2] != -1)
pose_2d_valid = np.zeros((2, num_valid), dtype=np.float32)
valid_id = 0
for kpt_id in range(pose_2d.shape[0]):
if pose_2d[2, kpt_id] == -1:
continue
pose_2d_valid[:, valid_id] = pose_2d[0:2, kpt_id]
valid_id += 1

return np.array(poses_2d_scaled)