DepthAI Pipeline Builder Gen2 #136

Luxonis-Brandon · 2020-06-26T16:01:16Z

Start with the `why`:

Several of the real-world applications that are desired of the DepthAI platform are actually series or parallel (or both) combinations of neural networks with regions of interest (ROI) passed from one network to one or more subsequent networks.

The Myriad X is hardware is capable of multi-stage neural inference in parallel with computer vision functions, disparity depth, video encoding, etc. but no system exists to be able to easily use this functionality to solve real-world problems. If a user can modularly piece these together (i.e. in a pipeline builder), this gives super-interesting capabilities, and example of which is below for sports filming:

Detecting action in a scene (neural inference, say detecting where a soccer ball is)
Automatically tracking the action (say tracking the ball)
Automatically digitally zooming (Digital PTZ Support (Lossless Zoom) #135) using the 12MP camera dynamically (lossless zoom up to 6x while producing 1080p encoded video). (say running motion detection and only encoding the subset of the video that has the motion… in sports, no motion probably means no action)
Running parallel neural on ball/player detection and tracking them in 3D space - to produce game statistics of total distance traveled of the ball in miles, each player, etc.
Running re-identification (neural inference-based) on players as they move (and occlude eachother) so that each player is tracked individually.

So this is just an example of how the pipeline builder can be used to string together really interesting functionalities. The core value of the builder is that it would allow many hardware/firmware capabilities to be strung together in series/parallel combinations to solve real-world problems easily:

Neural inference (e.g. Object detection, image classification of the ROI of a detected object, etc.)
3D object localization (both monocular object detection plus stereo depth and stereo neural inference supported)
Object tracking
Stereo depth (initial Gen2 example, here)
h.264/h.265 encoding
Digital zoom (leveraging the full 12MP sensor resolution... which is 6x full 1080p streams)
Background subtraction
Feature tracking
Motion estimation
Arbitrary crop/rescale/reformat and ROI return

In many of these pipeline flows of multiple nodes, there is need for custom rules and logic between nodes (e.g. filtering out which ROI 'make the cut' for the next stage. And in many cases, the pipeline is not doable without these rules as the rules are often a key implementation of a-priori knowledge by the designer, without which, the solution is not tractable.

So as such, having support for custom code/functions/etc. to enable rules is a critical feature. And the support of this feature is equally necessary when DepthAI is used with or without a host.

DepthAI used with host

When using DepthAI and megaAI with a host, having the capability to implement these rules/functions/etc. on the host is very convenient. As then the engineer can leverage the full convenience of the host for running rules, functions, and even CV capabilities.

To most flexibly facilitate this, architecting the pipeline builder such that every node (including the camera node(s)) can support (optionally) sending its output to the host and (optionally) receiving it is a key capability of such a pipeline builder.

Importantly, such a capability for each node to send/receive information from the host also enables easier development work-flows:

Debugging (testing each node for accuracy/performance by itself)
QA (capability to test thousands (or millions) of images through the whole pipeline, or parts of it, from existing datasets)
Model refinement and accuracy testing (being able to test the node accuracy fully on the hardware, after conversion, in a quantitative way)
Visualization (being able to see on a computer the output of each stage to easily see how things are looking in each stage)

UPDATE 20 Nov. 2020:: The first example of this host-integrated use-case is here: https://github.com/luxonis/depthai-experiments/blob/master/gaze-estimation

DepthAI used without host (i.e. embedded use-case)

When there is no host present - for example when DepthAI is running completely standalone and directly actuating IO or communicating over SPI/UART/I2C - it is still equally necessary to allow such rules/custom code/etc.

To support this, the capability for the user to run arbitrary code on DepthAI (as nodes) is critical.

It is worth noting that when using DepthAI without a host in deployment, one could still use the with host above for debugging, while still running the full embedded flow.

The `how`:

To support such arbitrary pipeline builds in both with-host and without-host use-cases, we architect the pipeline builder to support every node to send data to/from the host and for CPython code to be run directly as nodes.

Integrating this, we have settled on the following approach, which breaks into 3 modalities of nodes that are used in the pipeline builder to solve embedded CV/AI problems and leveraging this information to interact with the physical world.

Node modalities:

Fast, easy, limited flexibility: So the list accelerated blocks above like neural inference, 3D object localization, etc. These come pre-packaged and are trivial to make use of. But they often need application-specific logic between them, hence modality 2. And if your CV algorithm isn't on that list (or maybe you've invented your own proprietary, and you need it to run performantly on the DepthAI, see modality 3.
Slow, easy, quite flexible: CPython bindings for scripts running direct on DepthAI as a node (issue Scripting Support on DepthAI #207).
This allows you to have custom rules on metadata from neural inference results, write custom protocols that run on-chip as part of the pipeline, communicate with sensors/actuators or other systems over SPI, UART, I2C, etc. based on pipeline results, etc. For example you can make rules that make sense of neural-inference metadata, which then control performant crop/resize/reformat to connect layers of accelerated CV functions.
Fast, hard, quite flexible: OpenCL (here), G-API (more details soon) and ML Frameworks for Vectorized math are used to compile custom computer functions to run performantly on the SHAVES in DepthAI. So you can take your computer vision function, write it in OpenCL, G-API, or say in PyTorch, and drop it as a node in the pipeline builder. So this supports custom algorithms, including proprietary algorithms, to be hardware accelerated in the pipeline as a node. And the pipeline builder leverages the hardware accelerated crop/rescale/reformat to match inputs and outputs. This could even be used for non-CV functions for example be used to run custom arbitrary mathematical functions on audio data brought in via CPython over I2C. For an EXCELLENT example of how to run custom CV code on depthai using PyTorch, see this guide by Rahul Ravikumar.

The `what`:

If we support the following with our pipeline builder it seems it would be sufficiently flexible.
So implement a pipeline builder which can be used to implement the flows below.

UPDATE 26 December 2021: The docs for Gen2 are materializing here: https://docs.luxonis.com/projects/api/en/gen2_develop/

Example Neural Pipelines To support:

The OpenVINO security barrier demo (here).
- This does vehicle detection, followed by two parallel networks that operate on the ROI of the vehicle:
  - 1 x NN for vehicle color and vehicle type
  - 1 x NN for license plate detection
    - Then the ROI from the plate detection is passed to another NN which outputs region and OCRs the plate.
Update 26 January 2021: Github issue for this example pipeline is Gen2 License Plate Detection and OCR depthai-experiments#47
Interactive Face Detection Demo (here)
- Does face detection and allows running the following as secondary networks run on the ROI of the face
  - Age + Gender recognition
  - Facial Expression estimation (incorrectly called ‘emotion’ in their doc)
  - Facial Landmarks
  - UPDATE 16 MARCH 2020 ArduCam produced this example using the Gen2 Pipeline builder, here
Interactive Face Recognition Demo (here)
- Detects faces and runs landmarks and face-reidentification to recognize the people.
- Github issue for Gen2 Example implementation here
- UPDATE 16 MARCH 2020 ArduCam produced this example using the Gen2 Pipeline builder, here
Cross Road Camera Demo (here)
- Detects people, vehicle, bikes, and then runs person attributes and person re-idenfitication on the ROI of detected people.
- UPDATE 16 March 2020: ArduCam actually implemented this, here and we have our WIP version here. (We started before we realized ArduCam had already produced this example!)
Pedestrian Tracker (i.e. Person ReID here)
- Detects people, re-identification runs on the ROI from person detection
- UPDATE Nov 23 2020: Initially implemented in Gen2, here
Text Detection and Recognition (OCR) (here)
- Detects regions that have text, and then OCR these regions.
- UPDATE 30 Dec 2020: Initially implemented in Gen2, here
Gaze Estimation (here and here)
- Does face detection, ROI of which goes to both head pose estimation and facial landmarks.
  - The outputs of head pose estimation and facial landmarks are passed to the gaze estimation model
- UPDATE Oct 23 2020: Initially implemented in Gen2, here

Of the examples on the OpenVINO repository, the following seems like it should not be implemented, as it’s the only one that does series, parallel, and output of parallel back to a single model. So it seems much more specialized.

This will then cover the following items which were previously independently on the DepthAI roadmap:

Get two-stage face detection and following age-gender or emotion working (prototype here)
Person detection, tracking, and reidentification.
Add capability to run multiple neural networks in parallel (prototype here)
Integrate face detection and identification AP with Python API (e.g. here)
- First step: without depth
- Second step: with depth.
- Most common compliment to object detection
Be able to run multiple models in sequence (e.g. facial detection -> facial landmark -> landmark tracking) (prototype here)
- This is different than multiple-output tensor. (which is already implemented, PR here)
Text Detection and OCR Support (OCR Support on DepthAI (and megaAI) #124)

To keep in mind, but maybe not support initially:

This smart motion (DepthAI and megaAI 'SmartMotion' Feature #132) sort of pipeline, here, which is using motion detection to determine what subset of an scene to pass into object detection, followed by object tracking on the detected object detection
Utilizing onboard storage with Pipeline Builder (Utilizing onboard storage with Pipeline Builder #134)
Option to return the depth map for just the ROI of the detected object (Option to return the depth map for just the ROI of the detected object. #125)

The text was updated successfully, but these errors were encountered:

MXGray · 2020-06-26T17:08:13Z

@Luxonis-Brandon,

A pipeline builder can make things quicker and more straightforward to piece up! :)
Some things I'm about to try:

Default mobilenet SSD (Coco) with depth:

If person

Run face recognition and face reidentification

If stranger

Run age / gender estimator

Run facial expression estimator

Run action classifier
Output: 09:00. Person. Male. 20 to 25 years old. Looking happy. Standing. 2 meters away.

If not stranger

Run facial expression estimator

Run action classifier
Output: 09:00. Marx. Looking happy. Standing. 2 meters away.

If not person

Run OCR detection

If text detected

Run text recognition
Output: Dead center. Monitor. 1 meter away. Text reads, " Warning: Aliens Spotted Near You ".

If no text

Pass
Output: Dead center. Monitor. 1 meter away.

:D

Luxonis-Brandon · 2020-06-26T17:33:16Z

Great feedback @MXGray ! Discussing internally now how difficult such results-based dynamic pipelines would be to implement. I definitely see how useful this would be... not to investigate the relative difficulty/feasibility.

Luxonis-Brandon · 2020-10-21T14:49:45Z

The initial Gaze estimation example is implemented here: luxonis/depthai-experiments#8

Luxonis-Brandon · 2021-03-29T20:32:57Z

This is now implemented and mainlined. Most things that were possible in Gen1 API are now possible in Gen2. See below for resources:

The official documentation here: https://docs.luxonis.com/en/latest/
Some examples of using the Gen2 API here: https://github.com/luxonis/depthai-experiments
API Documentation: https://docs.luxonis.com/projects/api/en/latest/references/python/
Code Samples here

Luxonis-Brandon added the enhancement New feature or request label Jun 26, 2020

Luxonis-Brandon mentioned this issue Jun 30, 2020

SPI Interface for DepthAI API #140

Closed

This was referenced Sep 23, 2020

BW1092: DepthAI ESP32 Reference Design | Embedded DepthAI Reference Design luxonis/depthai-hardware#10

Closed

Scripting Support on DepthAI #207

Closed

Gaze Estimation Example - Gen2 Pipeline Builder First Example Use-Case #208

Closed

Luxonis-Brandon mentioned this issue Oct 2, 2020

Video Encoding of left and right in parallel #122

Closed

This was referenced Oct 13, 2020

Error / incorrect values when calling Device methods #233

Closed

Error / incorrect values returned from FrameMetadata methods #234

Closed

Luxonis-Brandon mentioned this issue Oct 14, 2020

Support OpenVINO Toolkit 2021 #236

Closed

Luxonis-Brandon added the Gen2 label Oct 14, 2020

This was referenced Oct 26, 2020

Issue with frame width and frame height luxonis/depthai-python#43

Closed

Cameras - How to handle things like - Headlight Glare luxonis/depthai-hardware#25

Closed

Luxonis-Brandon mentioned this issue Nov 11, 2020

External code during device inference #273

Closed

This was referenced Dec 11, 2020

UVC Output as a Pipeline Node #283

Open

Create Gen2 examples luxonis/depthai-docs-website#102

Closed

OCR Support on DepthAI (and megaAI) #124

Closed

This was referenced Dec 21, 2020

Resource Allocation in Gen2 #288

Closed

Watchdog Rewrite to Support Python Interactive Shell #291

Closed

This was referenced Dec 21, 2020

Depth Calculator Node in Gen2 #292

Closed

Object Tracker Node in Gen2 #293

Closed

RTSP Output as a Pipeline Node #294

Open

Luxonis-Brandon mentioned this issue Jan 26, 2021

Person re-identification and License Plate Recognition #320

Open

Luxonis-Brandon mentioned this issue Feb 17, 2021

Interactive Face Recognition Demo #334

Closed

Luxonis-Brandon mentioned this issue Mar 11, 2021

Bilateral Filtering Directly on DepthAI #215

Closed

Luxonis-Brandon mentioned this issue Mar 29, 2021

New Gen2 Documentation Structure Implementation luxonis/depthai-docs-website#163

Closed

Luxonis-Brandon closed this as completed Mar 29, 2021

Luxonis-Brandon mentioned this issue Sep 18, 2021

[Feature-Request] Run multiple models in pipeline #483

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DepthAI Pipeline Builder Gen2 #136

DepthAI Pipeline Builder Gen2 #136

Luxonis-Brandon commented Jun 26, 2020 •

edited

Loading

MXGray commented Jun 26, 2020 •

edited

Loading

Luxonis-Brandon commented Jun 26, 2020

Luxonis-Brandon commented Oct 21, 2020 •

edited

Loading

Luxonis-Brandon commented Mar 29, 2021

DepthAI Pipeline Builder Gen2 #136

DepthAI Pipeline Builder Gen2 #136

Comments

Luxonis-Brandon commented Jun 26, 2020 • edited Loading

Start with the why:

DepthAI used with host

DepthAI used without host (i.e. embedded use-case)

The how:

The what:

Example Neural Pipelines To support:

MXGray commented Jun 26, 2020 • edited Loading

Luxonis-Brandon commented Jun 26, 2020

Luxonis-Brandon commented Oct 21, 2020 • edited Loading

Luxonis-Brandon commented Mar 29, 2021

Luxonis-Brandon commented Jun 26, 2020 •

edited

Loading

Start with the `why`:

The `how`:

The `what`:

MXGray commented Jun 26, 2020 •

edited

Loading

Luxonis-Brandon commented Oct 21, 2020 •

edited

Loading