Name	Name	Last commit message	Last commit date
parent directory ..
apps	apps
plugins	plugins
README.md	README.md

Vitis AI

Adaptable & Real-Time AI Inference Acceleration

Whole Application Acceleration

Directory structure introduction
Background
Model Performance

Directory structure introduction

Whole Application Acceleration (WAA) examples demonstrate how Xilinx® Vitis Vision library functions can be integrated with deep neural network (DNN) accelerator to achieve complete application acceleration. This application focuses on accelerating the pre/post-processing involved in inference of detection & classification networks.

Whole-Application-Acceleration
├── README.md
├── plugins
│   ├── nms                                 # Post-processing Accelerator source files
│   ├── blobfromimage                       # Pre-processing Accelerator source files
│   ├── of_tvl1                             # TVL1 optical flow Accelerator source files
│   ├── jpeg_decoder                        # JPEG decoder Accelerator xo file
│   ├── cost_volume                         # cost_volume Accelerator xo file
│   └── include                             # lib files for HW accelerators
│       ├── aie                             # lib files for aie accelerators
│       │   ├── common
│       │   ├── imgproc
│       │   └── lib
│       └── pl                              # lib files for pl accelerators
│           ├── common
│           ├── core
│           ├── dnn
│           ├── features
│           ├── imgproc
│           └── video
└── apps
    ├── adas_detection                      # Adas detection application
    │      ├── README.md
    │      ├── app_build.sh
    │      ├── app_test.sh
    │      ├── src
    │      ├── build_flow                   # Hardware build using source files
    │      │   ├── DPUCADF8H_u200
    │      │   ├── DPUCAHX8H_u50_u280
    │      │   ├── DPUCVDX8G_vck190
    │      │   ├── DPUCZDX8G_zcu102
    │      │   └── DPUCZDX8G_zcu104
    │      └── pre_built_flow               # Hardware build using pre-build DPU files
    │          └── DPUCZDX8G_zcu102
    ├── psmnet                              # PSMNet application
    │      ├── README.md
    │      ├── build_flow                   # Hardware build using source files
    │      │   └── DPUCVDX8G_vck190
    │      ├── build.sh
    │      ├── cost_volume.cpp
    │      ├── cost_volume.hpp
    │      ├── cpu_op.cpp
    │      ├── cpu_op.hpp
    │      ├── demo_psmnet.cpp
    │      ├── dpu_resize.cpp
    │      ├── dpu_resize.hpp
    │      ├── dpu_sfm.cpp
    │      ├── dpu_sfm.hpp
    │      ├── Makefile
    │      ├── my_xrt_bo.cpp
    │      ├── my_xrt_bo.hpp
    │      ├── performance.sh
    │      ├── psmnet_benchmark.hpp
    │      ├── psmnet.cpp
    │      ├── psmnet.hpp
    │      ├── test_cost_volume.cpp
    │      ├── test_dpu_task.cpp
    │      ├── test_performance_psmnet.cpp
    │      ├── test_resize_xrt.cpp
    │      ├── test_xrt_bo_alloc.cpp
    │      ├── vai_aie_task_handler.hpp
    │      └── vai_graph.hpp
    ├── resnet50                            # resnet50 application
    │      ├── README.md
    │      ├── app_build.sh
    │      ├── app_test.sh
    │      ├── src
    │      ├── build_flow                   # Hardware build using source files
    │      │   ├── DPUCADF8H_u200
    │      │   ├── DPUCAHX8H_u50_u280
    │      │   ├── DPUCVDX8G_vck190
    │      │   ├── DPUCVDX8H_vck5000
    │      │   ├── DPUCZDX8G_zcu102
    │      │   └── DPUCZDX8G_zcu104
    │      └── pre_built_flow               # Hardware build using pre-build DPU files
    │          └── DPUCZDX8G_zcu102
    ├── resnet50_jpeg                       # resnet50_jpeg application
    │      ├── README.md
    │      ├── app_build.sh
    │      ├── app_test.sh
    │      ├── src
    │      ├── libs                         #libs for jpeg-decoder
    │      └── build_flow                   # Hardware build using source files
    │          └── DPUCZDX8G_zcu102
    ├── resnet50_waa_aie                    # resnet50_waa_aie application
    │      ├── README.md
    │      ├── app_build.sh
    │      ├── app_test.sh
    │      ├── src
    │      └── build_flow                   # Hardware build using source files
    │          └── DPUCVDX8G_vck190
    ├ fall_detection
    ├ ssd_mobilenet
    └ segmentation

Background

Input images are preprocessed before being fed for inference of different deep neural networks. The pre-processing steps vary from network to network. For example, for classification networks like Resnet-50 the input image is resized to 224 x 224 size and then channel-wise mean subtraction is performed before feeding the data to the DNN accelerator. For detection networks like YOLO v3 the input image is resized to 256 x 512 size using letterbox before feeding the data to the DNN accelerator. Similarly, some networks like SSD-Mobilenet also require the inference output to be processed to filter out unncessary data. Non Maximum Supression (NMS) is an example of post-processing function.

Vitis Vision library provides functions optimized for FPGA devices that are drop-in replacements for standard OpenCV library functions. The designs in this repository demonstrate how Vitis Vision library functions can be used to accelerate the complete application.

Model Performance

Refer below table for examples with demonstration on complete acceleration of various applications where pre/post-processing of several networks are accelerated on different target platforms.

No.	Application	Backbone Network	Accelerated Part(s)	H/W Accelerated Functions	DPU Supported (% Improvement Over Non-WAA App)
1	adas_detection	Yolo v3	Pre-process	resize, letter box, scale, DPU	ZCU102 (44.53%), ZCU104 (68.11%), VCK190 (103.26%), ALVEO-U50 (36.04%), ALVEO-U200 (13.97%), ALVEO-U280 (21.79%)
2	fall_detection	VGG16	Pre-process	TVL1 Optical Flow	ALVEO-U200 (560.07%)
3	PSMNET	PSMnet	Intra-process	Cost Volume Generation	VCK190 (99.46%)
4	resnet50	resnet-50	Pre-process	resize, mean subtraction, scale, DPU	ZCU102 (49.95%), ZCU104 (50.47%), VCK190 (115.77%), ALVEO-U50 (29.07%), ALVEO-U200 (29.69%), ALVEO-U280 (38.39%), VCK5000 (82.73%)
5	resnet50_jpeg	resnet-50	Pre-process	JPEG decoder, resize, mean subtraction, scale, DPU	ZCU102 (*109.04%)
6	resnet50_waa_aie	resnet-50	Pre-process (AIE)	resize, mean subtraction, scale, DPU	VCK190 (54.98%)
7	segmentation	fcn8	pre-process & post-process	resize, normalization, argmax	VCK190 (115%)
8	ssd_mobilenet	mobilenet	pre-process & post-process	resize, BGR2RGB, scale, sort, NMS, DPU	ALVEO-U280 (**74.23%)

* Includes imread/jpeg-decoder latency in both WAA and non WAA app.

** Impact shown is on pre/post processing acceleration.

📌 Note: Non-WAA applications use equivalent OpenCV functions. As OpenCV's LK OF is sparse, equivalent non-WAA application uses OpenCV's Farneback OF algorithm with 10 threads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

waa

waa

README.md

Vitis AI

Whole Application Acceleration

Table of Contents

Directory structure introduction

Background

Model Performance

Files

waa

Directory actions

More options

Directory actions

More options

Latest commit

History

waa

Folders and files

parent directory

README.md

Vitis AI

Whole Application Acceleration

Table of Contents

Directory structure introduction

Background

Model Performance