From 9cb1c24b2804542a20fd818626540d3543b1d366 Mon Sep 17 00:00:00 2001
From: Beth-Kosis <117772456+Beth-Kosis@users.noreply.github.com>
Date: Fri, 21 Apr 2023 09:27:31 -0400
Subject: [PATCH] Update README.md (#932)
Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>
---
examples/vit_pose/README.md | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/examples/vit_pose/README.md b/examples/vit_pose/README.md
index 905d789492..86e5e718c4 100644
--- a/examples/vit_pose/README.md
+++ b/examples/vit_pose/README.md
@@ -14,39 +14,41 @@ See the License for the specific language governing permissions and
limitations under the License.
-->
-# Exploration for including the ViT Pose model into the DeepSparse pipeline
+# Exploration for Including the ViT Pose Model into the DeepSparse Pipeline
Source: https://github.com/ViTAE-Transformer/ViTPose
## Installation
Follow the instructions in the `ViTPose` README file. Note:
-- installing one of the dependencies, `mmcv` takes a lot of time and may look often like it is stuck. Be patient (or run with `-v` if helps), it will eventually terminate successfully.
-- after the setup completes, it is also advisable to downgrade the default torch version from `1.3` to `1.2` to avoid CUDA errors (as I am writing this, we are internally supporting `torch==1.2.1`)
+- Installing one of the dependencies, `mmcv` takes a lot of time and often may appear to pause. Be patient (or run with `-v`). Eventually, it will terminate successfully.
+- After the setup completes, it is advisable to downgrade the default torch version from `1.3` to `1.2` to avoid CUDA errors (currently, we are internally supporting `torch==1.2.1`).
## Export
-Before running the onnx export script, (manually) install `timm`, `onnx` and `onnxruntime`. Then, launch the [export script](https://github.com/ViTAE-Transformer/ViTPose/blob/main/tools/deployment/pytorch2onnx.py):
+Before running the ONNX export script, (manually) install `timm`, `onnx`, and `onnxruntime`. Then, launch the [export script](https://github.com/ViTAE-Transformer/ViTPose/blob/main/tools/deployment/pytorch2onnx.py):
```bash
python tools/deployment/pytorch2onnx.py /home/ubuntu/damian/ViTPose/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/ViTPose_base_coco_256x192.py /home/ubuntu/damian/ViTPose/vitpose-b.pth
```
-The first argument is a config file (for ViTpose B) the second argument is the `.pth` checkpoint (weights). Both can be found on the main site of the repository:
+The first argument is a configuration file (for ViTpose B) and the second argument is the `.pth` checkpoint (weights). Both can be found on the main site of the repository:
+
-The resulting model is about 400mb.
+The resulting model is about 400 MB.
## Benchmarking in DeepSparse:
-Naive benchmarking shows that for the dense model, the engine is roughly x2 faster than ORT:
+Naive benchmarking shows that the engine is roughly x2 faster than ORT for the dense model:
+
## Postprocessing
-ViT-Pose might be our first candidate for a "composed" deepsparse pipeline.
-It is a top-down pose estimation approach i.e. we first detect `n` people in the image, and then we estimate their poses individually (using bounding-box-cropped images).
+ViT-Pose might be our first candidate for a "composed" DeepSparse pipeline.
+As a top-down pose estimation approach, we first detect `n` people in the image and then estimate their poses individually (using bounding box-cropped images).
We pass the cropped bounding boxes to ViT to get an array `(batch, no_keypoints, h, w)`. To decode this array, according to the original paper,
we need some simple composition of transposed convolutions.
-What I do naively for now: I "squash" the array to `(h,w)` and then overlay it on the original image. We can see that the heatmap roughly coincides with the joints of the model.
+If "squash" the array to `(h,w)` and then overlay it on the original image, we can see that the heatmap roughly coincides with the joints of the model.