Skip to content

Commit

Permalink
Update README.md (#932)
Browse files Browse the repository at this point in the history
Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>
  • Loading branch information
Beth-Kosis and dbogunowicz authored Apr 21, 2023
1 parent 475a3f3 commit 9cb1c24
Showing 1 changed file with 12 additions and 10 deletions.
22 changes: 12 additions & 10 deletions examples/vit_pose/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,39 +14,41 @@ See the License for the specific language governing permissions and
limitations under the License.
-->

# Exploration for including the ViT Pose model into the DeepSparse pipeline
# Exploration for Including the ViT Pose Model into the DeepSparse Pipeline
Source: https://github.com/ViTAE-Transformer/ViTPose

## Installation
Follow the instructions in the `ViTPose` README file. Note:

- installing one of the dependencies, `mmcv` takes a lot of time and may look often like it is stuck. Be patient (or run with `-v` if helps), it will eventually terminate successfully.
- after the setup completes, it is also advisable to downgrade the default torch version from `1.3` to `1.2` to avoid CUDA errors (as I am writing this, we are internally supporting `torch==1.2.1`)
- Installing one of the dependencies, `mmcv` takes a lot of time and often may appear to pause. Be patient (or run with `-v`). Eventually, it will terminate successfully.
- After the setup completes, it is advisable to downgrade the default torch version from `1.3` to `1.2` to avoid CUDA errors (currently, we are internally supporting `torch==1.2.1`).

## Export

Before running the onnx export script, (manually) install `timm`, `onnx` and `onnxruntime`. Then, launch the [export script](https://github.com/ViTAE-Transformer/ViTPose/blob/main/tools/deployment/pytorch2onnx.py):
Before running the ONNX export script, (manually) install `timm`, `onnx`, and `onnxruntime`. Then, launch the [export script](https://github.com/ViTAE-Transformer/ViTPose/blob/main/tools/deployment/pytorch2onnx.py):

```bash
python tools/deployment/pytorch2onnx.py /home/ubuntu/damian/ViTPose/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/ViTPose_base_coco_256x192.py /home/ubuntu/damian/ViTPose/vitpose-b.pth
```
The first argument is a config file (for ViTpose B) the second argument is the `.pth` checkpoint (weights). Both can be found on the main site of the repository:
The first argument is a configuration file (for ViTpose B) and the second argument is the `.pth` checkpoint (weights). Both can be found on the main site of the repository:

<img width="876" alt="image" src="https://user-images.githubusercontent.com/97082108/203548069-7239c758-8332-4d1d-b4d4-94774a4fcdef.png">

The resulting model is about 400mb.
The resulting model is about 400 MB.

## Benchmarking in DeepSparse:

Naive benchmarking shows that for the dense model, the engine is roughly x2 faster than ORT:
Naive benchmarking shows that the engine is roughly x2 faster than ORT for the dense model:

<img width="1116" alt="Zrzut ekranu 2022-11-23 o 13 06 23" src="https://user-images.githubusercontent.com/97082108/203562298-3a96c653-58ef-4471-ab4a-faeb222c24b3.png">

## Postprocessing
ViT-Pose might be our first candidate for a "composed" deepsparse pipeline.
It is a top-down pose estimation approach i.e. we first detect `n` people in the image, and then we estimate their poses individually (using bounding-box-cropped images).
ViT-Pose might be our first candidate for a "composed" DeepSparse pipeline.
As a top-down pose estimation approach, we first detect `n` people in the image and then estimate their poses individually (using bounding box-cropped images).
We pass the cropped bounding boxes to ViT to get an array `(batch, no_keypoints, h, w)`. To decode this array, according to the original paper,
we need some simple composition of transposed convolutions.

What I do naively for now: I "squash" the array to `(h,w)` and then overlay it on the original image. We can see that the heatmap roughly coincides with the joints of the model.
If "squash" the array to `(h,w)` and then overlay it on the original image, we can see that the heatmap roughly coincides with the joints of the model.

<img width="585" alt="image" src="https://user-images.githubusercontent.com/97082108/204554128-a12deb08-6f6c-4383-aafc-ea5fee754e0e.png">

Expand Down

0 comments on commit 9cb1c24

Please sign in to comment.