Skip to content

LFX Workspace: A Rust library crate for mediapipe models for WasmEdge NN #2355

Closed
@yanghaku

Description

@yanghaku

Motivation

Mediapipe is a collection of ML models for streaming data. The official website provides Python, iOS, Android, and TFLite-JS SDKs for using those models. As WasmEdge is increasingly used in data streaming applications, we would like to build a Rust library crate that enables easy integration of Mediapipe models in WasmEdge applications.

Details

Each MediaPipe model has a description page that describes its input and output tensors. The models are available in Tensorflow Lite format, which is supported by the WasmEdge Tensorflow Lite plugin.

We need at least one set of library functions for each model in Mediapipe. Each library function takes in a media object and returns the inference result. The function performs the following tasks.

  • Process the input media object (e.g., a byte array for a JPEG image) into a tensor for the model. As an example, you could use the Rust imageproc crate to process the image into a vector.
  • Use WasmEdge NN to run inference of the input tensor on the model.
  • Collect and interpret the result tensor.
    • The function should at least return a struct containing the output parameters described in the model description page. For example, a face detection function should return a vector of structs. Each struct contains the coordinates of a detected page.
    • The function should also return a visual representation of the inference results. For example, we should overlay detected face boundaries and landmarks on the original image. As an example, the draw_hollow_rect_mut() in imageproc could be used to draw detected boundaries.

Milestones

  • Create a list of models, and then for each model, list the pre-, and post-processing functions needed.
  • Implement the tasks: image classification (no video support), object detection (no video support) (1 week)
  • Implement the tasks: text classification and audio classification. (2 weeks)
  • Find the function we need in OpenCV, and try to implement the video support for vision tasks. (2 weeks)
  • Implement all other vision tasks such as hand landmarks detection. (2 weeks)
  • build a new TfLite library that includes MediaPipe custom operators (1 week)
  • Try to implement GPU support for MediaPipe models. (1 week)
  • Write the documents, then publish the library to crates.io. (1 week)

Repository URL: origin: https://github.com/yanghaku/mediapipe-rs-dev, now it will transfer to https://github.com/WasmEdge/mediapipe-rs

Mediapipe tasks progress:

  • Object Detection
  • Image Classification
  • Image segmentation
  • Gesture Recognition
  • Hand Landmark Detection
  • Image embedding
  • Face Detection
  • Audio Classification
  • Text Classification

Appendix

feat: A Rust library crate for MediaPipe models for WasmEdge NN

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    LFX MentorshipTasks for LFX Mentorship participants

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions