Description
Implementation:
Introduction
SIMD brings video abilities to MCUs.
Take advantage of vector extensions and software to process video dta on very low-RAM (~1 MBytes, small for video) targets by introducing libraries helping with this.
Problem description
Color image sensors produce bayer data, which go x3 in size once converted to RGB. An MCU that has 1 MByte of RAM cannot store both the input and output VGA frame in these conditions: 640 * 480 * (1 + 3) = 1228800
.
Proposed change
By inserting a ring buffer between line-processing functions, the flow becomes very linear:
convert_1line_awb() --(gearbox)--> convert_2lines_debayer_3x3() --(gearbox)--> convert_8lines_jpeg()
Such gearbox is proposed on top of ring_buf
and turned into an API.
Detailed RFC
Processing the image end-to-end one line at a time works. For instance:
convert_1line_awb() --+--> convert_2lines_debayer_3x3() --+--> convert_8lines_jpeg()
convert_1line_awb() --' |
convert_1line_awb() --+--> convert_2lines_debayer_3x3() --+
convert_1line_awb() --' |
convert_1line_awb() --+--> convert_2lines_debayer_3x3() --+
convert_1line_awb() --' |
convert_1line_awb() --+--> convert_2lines_debayer_3x3() --'
convert_1line_awb() --'
But this is becomes complex when working in non-power-of-two number of lines, and tedious to stitch by hand.
In addition, having a library that automates the process permits to define "stream processors" that are independent on their context...
- Facilitates writing new line conversion functions, and stitch them into streams (gstreamer style).
- This will be extended in the future to cover a complete ISP pipeline for Zephyr (libcamera style)
- Most image pre-processing algorithms can be implemented on a line-based fashion (opencv style).
- A driver using this to automatically convert data between input and output formats is provided (ffmpeg style)
Performance-wise:
- Having this extra gearbox library between lines conversion functions adds some overhead, but this only happens once a full line is converted. For instance, every
640 * 3
bytes for VGA resolution, considered low impact. - This plays well with SIMD instructions that rarely process data one column at a time, but instead work on contiguous bytes.
- This permits the transfer of the converted image to start as soon as the first line of data is converted, without waiting the full conversion.
Proposed change (Detailed)
No SIMD support for now:
<zephyr/pixel/stream.h>
: the gearbox library that permits to stitch stream processors together.<zephyr/pixel/formats.h>
: containing format conversion with support for RGB24, RGB565, YUYV (from/to)<zephyr/pixel/bayer.h>
: containing format conversion for bayer input: RGGB8, BGGR8, GRBG8, GBRG8<zephyr/pixel/resize.h>
: resize a full frame by using fast but low-quality subsampling.<zephyr/pixel/stats.h>
: collect statistics from a frame: RGB or Y (luma) channel averages or histograms.<zephyr/pixel/print.h>
: utilities callingprintf()
to display a hexdump of the data, as well as the actual colored image, and histograms.
Dependencies
None (mostly <stdint.h>
, and <sys/util.h>
).
Concerns and Unresolved Questions
Should this be part of Zephyr or be put in a separate repo as a module?
Alternatives
Let the application process everything.
Use the vendor-specific libraries directly, although CMSIS-CV is almost empty compared to i.e. OpenCV, others have a simple API but without standard available (which this PR strives to deliver), and there are other CPU architectures with extensions than ARM, so would not count as generic front-end for them.
Open to suggestions!
Example of what an end-to-end flow looks like on native_sim
:
P.S. thanks @VynDragon (MASSDRIVER EI) for the help!
Metadata
Metadata
Assignees
Type
Projects
Status