From 9cb1c24b2804542a20fd818626540d3543b1d366 Mon Sep 17 00:00:00 2001 From: Beth-Kosis <117772456+Beth-Kosis@users.noreply.github.com> Date: Fri, 21 Apr 2023 09:27:31 -0400 Subject: [PATCH] Update README.md (#932) Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> --- examples/vit_pose/README.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/examples/vit_pose/README.md b/examples/vit_pose/README.md index 905d789492..86e5e718c4 100644 --- a/examples/vit_pose/README.md +++ b/examples/vit_pose/README.md @@ -14,39 +14,41 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Exploration for including the ViT Pose model into the DeepSparse pipeline +# Exploration for Including the ViT Pose Model into the DeepSparse Pipeline Source: https://github.com/ViTAE-Transformer/ViTPose ## Installation Follow the instructions in the `ViTPose` README file. Note: -- installing one of the dependencies, `mmcv` takes a lot of time and may look often like it is stuck. Be patient (or run with `-v` if helps), it will eventually terminate successfully. -- after the setup completes, it is also advisable to downgrade the default torch version from `1.3` to `1.2` to avoid CUDA errors (as I am writing this, we are internally supporting `torch==1.2.1`) +- Installing one of the dependencies, `mmcv` takes a lot of time and often may appear to pause. Be patient (or run with `-v`). Eventually, it will terminate successfully. +- After the setup completes, it is advisable to downgrade the default torch version from `1.3` to `1.2` to avoid CUDA errors (currently, we are internally supporting `torch==1.2.1`). ## Export -Before running the onnx export script, (manually) install `timm`, `onnx` and `onnxruntime`. Then, launch the [export script](https://github.com/ViTAE-Transformer/ViTPose/blob/main/tools/deployment/pytorch2onnx.py): +Before running the ONNX export script, (manually) install `timm`, `onnx`, and `onnxruntime`. Then, launch the [export script](https://github.com/ViTAE-Transformer/ViTPose/blob/main/tools/deployment/pytorch2onnx.py): ```bash python tools/deployment/pytorch2onnx.py /home/ubuntu/damian/ViTPose/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/ViTPose_base_coco_256x192.py /home/ubuntu/damian/ViTPose/vitpose-b.pth ``` -The first argument is a config file (for ViTpose B) the second argument is the `.pth` checkpoint (weights). Both can be found on the main site of the repository: +The first argument is a configuration file (for ViTpose B) and the second argument is the `.pth` checkpoint (weights). Both can be found on the main site of the repository: + image -The resulting model is about 400mb. +The resulting model is about 400 MB. ## Benchmarking in DeepSparse: -Naive benchmarking shows that for the dense model, the engine is roughly x2 faster than ORT: +Naive benchmarking shows that the engine is roughly x2 faster than ORT for the dense model: + Zrzut ekranu 2022-11-23 o 13 06 23 ## Postprocessing -ViT-Pose might be our first candidate for a "composed" deepsparse pipeline. -It is a top-down pose estimation approach i.e. we first detect `n` people in the image, and then we estimate their poses individually (using bounding-box-cropped images). +ViT-Pose might be our first candidate for a "composed" DeepSparse pipeline. +As a top-down pose estimation approach, we first detect `n` people in the image and then estimate their poses individually (using bounding box-cropped images). We pass the cropped bounding boxes to ViT to get an array `(batch, no_keypoints, h, w)`. To decode this array, according to the original paper, we need some simple composition of transposed convolutions. -What I do naively for now: I "squash" the array to `(h,w)` and then overlay it on the original image. We can see that the heatmap roughly coincides with the joints of the model. +If "squash" the array to `(h,w)` and then overlay it on the original image, we can see that the heatmap roughly coincides with the joints of the model. image