Description
Motivation
Mediapipe is a collection of ML models for streaming data. The official website provides Python, iOS, Android, and TFLite-JS SDKs for using those models. As WasmEdge is increasingly used in data streaming applications, we would like to build a Rust library crate that enables easy integration of Mediapipe models in WasmEdge applications.
Details
Each MediaPipe model has a description page that describes its input and output tensors. The models are available in Tensorflow Lite format, which is supported by the WasmEdge Tensorflow Lite plugin.
We need at least one set of library functions for each model in Mediapipe. Each library function takes in a media object and returns the inference result. The function performs the following tasks.
- Process the input media object (e.g., a byte array for a JPEG image) into a tensor for the model. As an example, you could use the Rust imageproc crate to process the image into a vector.
- Use WasmEdge NN to run inference of the input tensor on the model.
- Collect and interpret the result tensor.
- The function should at least return a struct containing the output parameters described in the model description page. For example, a face detection function should return a vector of structs. Each struct contains the coordinates of a detected page.
- The function should also return a visual representation of the inference results. For example, we should overlay detected face boundaries and landmarks on the original image. As an example, the draw_hollow_rect_mut() in imageproc could be used to draw detected boundaries.
Milestones
- Create a list of models, and then for each model, list the pre-, and post-processing functions needed.
- Implement the tasks: image classification (no video support), object detection (no video support) (1 week)
- Implement the tasks: text classification and audio classification. (2 weeks)
- Find the function we need in OpenCV, and try to implement the video support for vision tasks. (2 weeks)
- Implement all other vision tasks such as hand landmarks detection. (2 weeks)
- build a new TfLite library that includes MediaPipe custom operators (1 week)
- Try to implement GPU support for MediaPipe models. (1 week)
- Write the documents, then publish the library to
crates.io
. (1 week)
Repository URL: origin: https://github.com/yanghaku/mediapipe-rs-dev, now it will transfer to https://github.com/WasmEdge/mediapipe-rs
Mediapipe tasks progress:
- Object Detection
- Image Classification
- Image segmentation
- Gesture Recognition
- Hand Landmark Detection
- Image embedding
- Face Detection
- Audio Classification
- Text Classification
Appendix
feat: A Rust library crate for MediaPipe models for WasmEdge NN
Activity