-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DepthAI Pipeline Builder Gen2 #136
Comments
A pipeline builder can make things quicker and more straightforward to piece up! :) Default mobilenet SSD (Coco) with depth:
:D |
Great feedback @MXGray ! Discussing internally now how difficult such results-based dynamic pipelines would be to implement. I definitely see how useful this would be... not to investigate the relative difficulty/feasibility. |
The initial Gaze estimation example is implemented here: luxonis/depthai-experiments#8 |
This is now implemented and mainlined. Most things that were possible in Gen1 API are now possible in Gen2. See below for resources:
|
Start with the
why
:Several of the real-world applications that are desired of the DepthAI platform are actually series or parallel (or both) combinations of neural networks with regions of interest (ROI) passed from one network to one or more subsequent networks.
The Myriad X is hardware is capable of multi-stage neural inference in parallel with computer vision functions, disparity depth, video encoding, etc. but no system exists to be able to easily use this functionality to solve real-world problems. If a user can modularly piece these together (i.e. in a pipeline builder), this gives super-interesting capabilities, and example of which is below for sports filming:
So this is just an example of how the pipeline builder can be used to string together really interesting functionalities. The core value of the builder is that it would allow many hardware/firmware capabilities to be strung together in series/parallel combinations to solve real-world problems easily:
In many of these pipeline flows of multiple nodes, there is need for custom rules and logic between nodes (e.g. filtering out which ROI 'make the cut' for the next stage. And in many cases, the pipeline is not doable without these rules as the rules are often a key implementation of a-priori knowledge by the designer, without which, the solution is not tractable.
So as such, having support for custom code/functions/etc. to enable rules is a critical feature. And the support of this feature is equally necessary when DepthAI is used with or without a host.
DepthAI used with host
When using DepthAI and megaAI with a host, having the capability to implement these rules/functions/etc. on the host is very convenient. As then the engineer can leverage the full convenience of the host for running rules, functions, and even CV capabilities.
To most flexibly facilitate this, architecting the pipeline builder such that every node (including the camera node(s)) can support (optionally) sending its output to the host and (optionally) receiving it is a key capability of such a pipeline builder.
Importantly, such a capability for each node to send/receive information from the host also enables easier development work-flows:
UPDATE 20 Nov. 2020:: The first example of this host-integrated use-case is here: https://github.com/luxonis/depthai-experiments/blob/master/gaze-estimation
DepthAI used without host (i.e. embedded use-case)
When there is no host present - for example when DepthAI is running completely standalone and directly actuating IO or communicating over SPI/UART/I2C - it is still equally necessary to allow such rules/custom code/etc.
To support this, the capability for the user to run arbitrary code on DepthAI (as nodes) is critical.
It is worth noting that when using DepthAI without a host in deployment, one could still use the
with host
above for debugging, while still running the full embedded flow.The
how
:To support such arbitrary pipeline builds in both with-host and without-host use-cases, we architect the pipeline builder to support every node to send data to/from the host and for CPython code to be run directly as nodes.
Integrating this, we have settled on the following approach, which breaks into 3 modalities of nodes that are used in the pipeline builder to solve embedded CV/AI problems and leveraging this information to interact with the physical world.
Node modalities:
Fast, easy, limited flexibility: So the list accelerated blocks above like neural inference, 3D object localization, etc. These come pre-packaged and are trivial to make use of. But they often need application-specific logic between them, hence modality 2. And if your CV algorithm isn't on that list (or maybe you've invented your own proprietary, and you need it to run performantly on the DepthAI, see modality 3.
Slow, easy, quite flexible: CPython bindings for scripts running direct on DepthAI as a node (issue Scripting Support on DepthAI #207).
This allows you to have custom rules on metadata from neural inference results, write custom protocols that run on-chip as part of the pipeline, communicate with sensors/actuators or other systems over SPI, UART, I2C, etc. based on pipeline results, etc. For example you can make rules that make sense of neural-inference metadata, which then control performant crop/resize/reformat to connect layers of accelerated CV functions.
Fast, hard, quite flexible: OpenCL (here), G-API (more details soon) and ML Frameworks for Vectorized math are used to compile custom computer functions to run performantly on the SHAVES in DepthAI. So you can take your computer vision function, write it in OpenCL, G-API, or say in PyTorch, and drop it as a node in the pipeline builder. So this supports custom algorithms, including proprietary algorithms, to be hardware accelerated in the pipeline as a node. And the pipeline builder leverages the hardware accelerated crop/rescale/reformat to match inputs and outputs. This could even be used for non-CV functions for example be used to run custom arbitrary mathematical functions on audio data brought in via CPython over I2C. For an EXCELLENT example of how to run custom CV code on depthai using PyTorch, see this guide by Rahul Ravikumar.
The
what
:If we support the following with our pipeline builder it seems it would be sufficiently flexible.
So implement a pipeline builder which can be used to implement the flows below.
UPDATE 26 December 2021: The docs for Gen2 are materializing here: https://docs.luxonis.com/projects/api/en/gen2_develop/
Example Neural Pipelines To support:
The OpenVINO security barrier demo (here).
Update 26 January 2021: Github issue for this example pipeline is Gen2 License Plate Detection and OCR depthai-experiments#47
Interactive Face Detection Demo (here)
Interactive Face Recognition Demo (here)
Cross Road Camera Demo (here)
Pedestrian Tracker (i.e. Person ReID here)
Text Detection and Recognition (OCR) (here)
Gaze Estimation (here and here)
Of the examples on the OpenVINO repository, the following seems like it should not be implemented, as it’s the only one that does series, parallel, and output of parallel back to a single model. So it seems much more specialized.
This will then cover the following items which were previously independently on the DepthAI roadmap:
To keep in mind, but maybe not support initially:
smart motion
(DepthAI and megaAI 'SmartMotion' Feature #132) sort of pipeline, here, which is using motion detection to determine what subset of an scene to pass into object detection, followed by object tracking on the detected object detectionThe text was updated successfully, but these errors were encountered: