openvinotoolkit · Chen-MingChang · Aug 25, 2020 · Aug 25, 2020 · Aug 27, 2020 · Aug 27, 2020
diff --git a/demos/python_demos/human_pose_estimation_3d_demo/README.md b/demos/python_demos/human_pose_estimation_3d_demo/README.md
@@ -81,7 +81,93 @@ The application uses OpenCV to display found poses and current inference perform
 
 ![](./data/human_pose_estimation_3d_demo.jpg)
 
+
+
+# 2D Human Pose Estimation Python* Demo
+
+This demo demonstrates how to run 2D Human Pose Estimation models using OpenVINO&trade;. The following pre-trained models can be used:
+
+* `human-pose-estimation-0001`.
+
+For more information about the pre-trained models, refer to the [model documentation](../../../models/public/index.md).
+
+> **NOTE**: Only batch size of 1 is supported.
+## How It Works
+
+The demo application expects a 2D human pose estimation model in the Intermediate Representation (IR) format.
+
+As input, the demo application can take:
+* a path to a video file or a device node of a web-camera.
+* a list of image paths.
+
+The demo workflow is the following:
+
+1. The demo application reads video frames one by one and estimates 2D human poses in a given frame.
+2. The app visualizes results of its work as graphical window with 2D poses, which are overlaid on input image.
+
+> **NOTE**: By default, Open Model Zoo demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html).
+## Prerequisites
+
+This demo application requires a native Python extension module to be built before you can run it.
+Refer to [Using Open Model Zoo demos](../../README.md), for instructions on how to build it and prepare the environment for running the demo.
+
+## Running
+
+Run the application with the `-h` option to see the following usage message:
+
+```
+usage: human_pose_estimation_2d_demo.py [-h] -m MODEL [-i INPUT [INPUT ...]]
+                                        [-d DEVICE]
+                                        [--height_size HEIGHT_SIZE]
+                                        [--extrinsics_path EXTRINSICS_PATH]
+                                        [--fx FX] [--no_show]
+                                        [-u UTILIZATION_MONITORS]
+
+Lightweight 2D human pose estimation demo. Press esc to exit, "p" to (un)pause
+video or process next image.
+
+Options:
+  -h, --help            Show this help message and exit.
+  -m MODEL, --model MODEL
+                        Required. Path to an .xml file with a trained model.
+  -i INPUT [INPUT ...], --input INPUT [INPUT ...]
+                        Required. Path to input image, images, video file or
+                        camera id.
+  -d DEVICE, --device DEVICE
+                        Optional. Specify the target device to infer on: CPU,
+                        GPU, FPGA, HDDL or MYRIAD. The demo will look for a
+                        suitable plugin for device specified (by default, it
+                        is CPU).
+  --height_size HEIGHT_SIZE
+                        Optional. Network input layer height size.
+  --extrinsics_path EXTRINSICS_PATH
+                        Optional. Path to file with camera extrinsics.
+  --fx FX               Optional. Camera focal length.
+  --no_show             Optional. Do not display output.
+  -u UTILIZATION_MONITORS, --utilization_monitors UTILIZATION_MONITORS
+                        Optional. List of monitors to show initially.
+```
+
+Running the application with an empty list of options yields the short version of the usage message and an error message.
+
+To run the demo, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](../../../tools/downloader/README.md) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/).
+
+> **NOTE**: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (`*.xml` + `*.bin`) using the [Model Optimizer tool](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html).
+To run the demo, please provide paths to the model in the IR format, and to an input video or image(s):
+```bash
+python human_pose_estimation_2d_demo.py \
+-m /home/user/human-pose-estimation-0001.xml \
+-i /home/user/video_name.mp4
+```
+
+## Demo Output
+
+The application uses OpenCV to display found poses and current inference performance.
+
+![](./data/human_pose_estimation_2d_demo.jpg)
+
 ## See Also
 * [Using Open Model Zoo demos](../../README.md)
 * [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html)
 * [Model Downloader](../../../tools/downloader/README.md)
+
diff --git a/...thon_demos/human_pose_estimation_3d_demo/data/human_pose_estimation_2d_demo.jpg b/...thon_demos/human_pose_estimation_3d_demo/data/human_pose_estimation_2d_demo.jpg
diff --git a/demos/python_demos/human_pose_estimation_3d_demo/human_pose_estimation_2d_demo.py b/demos/python_demos/human_pose_estimation_3d_demo/human_pose_estimation_2d_demo.py
@@ -0,0 +1,113 @@
+#!/usr/bin/env python
+"""
+ Copyright (c) 2019 Intel Corporation
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+      http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+"""
+
+from argparse import ArgumentParser, SUPPRESS
+import json
+import os
+
+import cv2
+import numpy as np
+
+from modules.inference_engine import InferenceEngine
+from modules.input_reader import InputReader
+from modules.draw import Plotter3d, draw_poses
+from modules.parse_2d_poses import parse_poses
+
+if __name__ == '__main__':
+    parser = ArgumentParser(description='Lightweight 2D human pose estimation demo. '
+                                        'Press esc to exit, "p" to (un)pause video or process next image.',
+                            add_help=False)
+    args = parser.add_argument_group('Options')
+    args.add_argument('-h', '--help', action='help', default=SUPPRESS,
+                      help='Show this help message and exit.')
+    args.add_argument('-m', '--model',
+                      help='Required. Path to an .xml file with a trained model.',
+                      type=str, required=True)
+    args.add_argument('-i', '--input',
+                      help='Required. Path to input image, images, video file or camera id.',
+                      nargs='+', default='')
+    args.add_argument('-d', '--device',
+                      help='Optional. Specify the target device to infer on: CPU, GPU, FPGA, HDDL or MYRIAD. '
+                           'The demo will look for a suitable plugin for device specified '
+                           '(by default, it is CPU).',
+                      type=str, default='CPU')
+    args.add_argument('--height_size', help='Optional. Network input layer height size.', type=int, default=256)
+    args.add_argument('--extrinsics_path',
+                      help='Optional. Path to file with camera extrinsics.',
+                      type=str, default=None)
+    args.add_argument('--fx', type=np.float32, default=-1, help='Optional. Camera focal length.')
+    args.add_argument('--no_show', help='Optional. Do not display output.', action='store_true')
+    args = parser.parse_args()
+
+    if args.input == '':
+        raise ValueError('Please, provide input data.')
+
+    stride = 8
+    inference_engine = InferenceEngine(args.model, args.device, stride)
+    canvas_2d = np.zeros((720, 1280, 3), dtype=np.uint8)
+    plotter = Plotter3d(canvas_2d.shape[:2])
+    canvas_2d_window_name = 'Canvas 2D'
+
+    frame_provider = InputReader(args.input)
+    is_video = frame_provider.is_video
+    base_height = args.height_size
+    fx = args.fx
+
+    delay = 1
+    esc_code = 27
+    p_code = 112
+    space_code = 32
+    mean_time = 0
+    for frame in frame_provider:
+        current_time = cv2.getTickCount()
+        input_scale = base_height / frame.shape[0]
+        scaled_img = cv2.resize(frame, dsize=None, fx=input_scale, fy=input_scale)
+        if fx < 0:  # Focal length is unknown
+            fx = np.float32(0.8 * frame.shape[1])
+
+        inference_result = inference_engine.infer(scaled_img)
+
+        poses_2d = parse_poses(inference_result, input_scale, stride, fx, is_video)
+
+        draw_poses(frame, poses_2d)
+        current_time = (cv2.getTickCount() - current_time) / cv2.getTickFrequency()
+        if mean_time == 0:
+            mean_time = current_time
+        else:
+            mean_time = mean_time * 0.95 + current_time * 0.05
+        cv2.putText(frame, 'FPS: {}'.format(int(1 / mean_time * 10) / 10),
+                    (40, 80), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 255))
+        if args.no_show:
+            continue
+        cv2.imshow('2D Human Pose Estimation', frame)
+
+        key = cv2.waitKey(delay)
+        if key == esc_code:
+            break
+        if key == p_code:
+            if delay == 1:
+                delay = 0
+            else:
+                delay = 1
+        if delay == 0 or not is_video:  
+            key = 0
+            while (key != p_code
+                   and key != esc_code
+                   and key != space_code):
+
+                key = cv2.waitKey(33)
+            if key == esc_code:
+                break
+            else:
+                delay = 1
diff --git a/demos/python_demos/human_pose_estimation_3d_demo/modules/parse_2d_poses.py b/demos/python_demos/human_pose_estimation_3d_demo/modules/parse_2d_poses.py
@@ -0,0 +1,122 @@
+#!/usr/bin/env python
+"""
+ Copyright (c) 2019 Intel Corporation
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+      http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+"""
+
+import numpy as np
+
+from modules.pose import Pose, propagate_ids
+from pose_extractor import extract_poses
+
+AVG_PERSON_HEIGHT = 180
+
+# pelvis (body center) is missing, id == 2
+map_id_to_panoptic = [1, 0, 9, 10, 11, 3, 4, 5, 12, 13, 14, 6, 7, 8, 15, 16, 17, 18]
+
+limbs = [[18, 17, 1],
+         [16, 15, 1],
+         [5, 4, 3],
+         [8, 7, 6],
+         [11, 10, 9],
+         [14, 13, 12]]
+
+
+def get_root_relative_poses(inference_results):
+    #features, 
+    heatmap, paf_map = inference_results
+
+    upsample_ratio = 4
+    found_poses = extract_poses(heatmap[0:-1], paf_map, upsample_ratio)
+    # scale coordinates to features space
+    found_poses[:, 0:-1:3] /= upsample_ratio
+    found_poses[:, 1:-1:3] /= upsample_ratio
+
+    poses_2d = []
+    num_kpt_panoptic = 19
+    num_kpt = 18
+    for pose_id in range(found_poses.shape[0]):
+        if found_poses[pose_id, 5] == -1:  # skip pose if does not found neck
+            continue
+        pose_2d = np.ones(num_kpt_panoptic * 3 + 1, dtype=np.float32) * -1  # +1 for pose confidence
+        for kpt_id in range(num_kpt):
+            if found_poses[pose_id, kpt_id * 3] != -1:
+                x_2d, y_2d, conf = found_poses[pose_id, kpt_id * 3:(kpt_id + 1) * 3]
+                pose_2d[map_id_to_panoptic[kpt_id] * 3] = x_2d  # just repacking
+                pose_2d[map_id_to_panoptic[kpt_id] * 3 + 1] = y_2d
+                pose_2d[map_id_to_panoptic[kpt_id] * 3 + 2] = conf
+        pose_2d[-1] = found_poses[pose_id, -1]
+        poses_2d.append(pose_2d)
+    poses_2d = np.array(poses_2d)
+
+    keypoint_treshold = 0.1
+    poses_3d = np.ones((len(poses_2d), num_kpt_panoptic * 4), dtype=np.float32) * -1
+    for pose_id in range(poses_3d.shape[0]):
+        if poses_2d[pose_id, 2] <= keypoint_treshold:
+            continue
+
+        neck_2d = poses_2d[pose_id, 0:2].astype(np.int32)
+
+        # refine keypoints coordinates at corresponding limbs locations
+        for limb in limbs:
+            for kpt_id_from in limb:
+                if poses_2d[pose_id, kpt_id_from * 3 + 2] <= keypoint_treshold:
+                    continue
+                for kpt_id_where in limb:
+                    kpt_from_2d = poses_2d[pose_id, kpt_id_from * 3:kpt_id_from * 3 + 2].astype(np.int32)
+                break
+
+    return  poses_2d
+
+
+previous_poses_2d = []
+
+
+def parse_poses(inference_results, input_scale, stride, fx, is_video=False):
+    global previous_poses_2d
+    poses_2d = get_root_relative_poses(inference_results)
+    poses_2d_scaled = []
+    for pose_2d in poses_2d:
+        num_kpt = (pose_2d.shape[0] - 1) // 3
+        pose_2d_scaled = np.ones(pose_2d.shape[0], dtype=np.float32) * -1
+        for kpt_id in range(num_kpt):
+            if pose_2d[kpt_id * 3] != -1:
+                pose_2d_scaled[kpt_id * 3] = pose_2d[kpt_id * 3] * stride / input_scale
+                pose_2d_scaled[kpt_id * 3 + 1] = pose_2d[kpt_id * 3 + 1] * stride / input_scale
+                pose_2d_scaled[kpt_id * 3 + 2] = pose_2d[kpt_id * 3 + 2]
+        pose_2d_scaled[-1] = pose_2d[-1]
+        poses_2d_scaled.append(pose_2d_scaled)
+
+    if is_video:  # track poses ids
+        current_poses_2d = []
+        for pose_2d_scaled in poses_2d_scaled:
+            pose_keypoints = np.ones((Pose.num_kpts, 2), dtype=np.int32) * -1
+            for kpt_id in range(Pose.num_kpts):
+                if pose_2d_scaled[kpt_id * 3] != -1.0:  # keypoint was found
+                    pose_keypoints[kpt_id, 0:2] = pose_2d_scaled[kpt_id * 3:kpt_id * 3 + 2].astype(np.int32)
+            pose = Pose(pose_keypoints, pose_2d_scaled[-1])
+            current_poses_2d.append(pose)
+        propagate_ids(previous_poses_2d, current_poses_2d)
+        previous_poses_2d = current_poses_2d
+
+    # translate poses
+    for pose_id in range(poses_2d.shape[0]):
+        pose_2d = poses_2d[pose_id][0:-1].reshape((-1, 3)).transpose()
+        num_valid = np.count_nonzero(pose_2d[2] != -1)
+        pose_2d_valid = np.zeros((2, num_valid), dtype=np.float32)
+        valid_id = 0
+        for kpt_id in range(pose_2d.shape[0]):
+            if pose_2d[2, kpt_id] == -1:
+                continue
+            pose_2d_valid[:, valid_id] = pose_2d[0:2, kpt_id]
+            valid_id += 1
+
+    return  np.array(poses_2d_scaled)