中文版本

About mediapipe

MediaPipe Solutions provides a suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications.– Github/MediaPipe

Introduce mediapipe to Snap!

Goals

We hope that all the work of using mediapipe will be completed entirely in the Snap! IDE!

This has the following benefits:

No need to update the Snap! platform, no developer intervention required, all work is done in the user environment (just a Snap! project), which means ordinary users can continue to extend these capabilities. (This is an example of end-user programming)

We can fully utilize the liveness of Snap! and enjoy an efficient and pleasant development experience.

Dynamic import

mediapipe provides JavaScript API, so it seems we can introduce mediapipe into Snap! through JavaScript function.

import() can asynchronously and dynamically load ECMAScript modules, which is exactly what we need.

Let’s try to import the mediapipe library. mediapipe has rich features. An example we intend to create is Gesture Recognition (click to experience). I referred to the mediapipe documentation and the use case.

Project link (Click to Run)

The following introduces the key parts.

First, use import() to dynamically import the module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// Avoid duplicate imports
if (window.FilesetResolver) {
  return;
}

// https://developers.google.com/mediapipe/api/solutions/js/tasks-vision
url = "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.13"; // @latest
import(url).then((vision) => {
  // Put the functions of the module into global variables
  window.FilesetResolver = vision.FilesetResolver;
  window.GestureRecognizer = vision.GestureRecognizer;
  window.DrawingUtils = vision.DrawingUtils;
});

After importing, you can use the functions of the module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// https://codepen.io/mediapipe-preview/details/zYamdVd
// url 中的资源可托管到国内
if (window.gestureRecognizer) {
  return;
}
const createGestureRecognizer = async () => {
  const vision = await FilesetResolver.forVisionTasks(
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.13/wasm"
  );
  gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
    baseOptions: {
      // "https://scratch3-files.just4fun.site/gesture_recognizer.task"
      modelAssetPath:
        "https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/1/gesture_recognizer.task",
      delegate: "GPU",
    },
    numHands: 2,
  });
  window.gestureRecognizer = gestureRecognizer;
  // demosSection.classList.remove("invisible");
};
createGestureRecognizer();

After that, you can use gestureRecognizer to recognize the image.

Drawing hand landmarks

The example not only provides gesture results but also draws landmarks:

Details of landmarks:

From the example code, it can be seen that the drawing work uses drawingUtils. Instead of reading the drawingUtils code (which is usually very tedious), I prefer to draw in Snap! based on the raw data. This way, we can fully enjoy the benefits of Snap!’s liveness. Snap!’s pen feature is very easy to use and powerful, and we can use it to draw:

Specifically, use the pen of Snap! to draw the coordinate information contained in results.landmarks. In this process, it is necessary to convert the coordinate system of mediapipe (with the origin at the top left corner and the positive direction of the y-axis downward) to the Snap! stage coordinate system.

During the drawing process, Snap!, as a product of the interactive personal computing vision, contains features (such as liveness) that are particularly helpful for exploring data, allowing users to interactively observe the data:

When you are interested in the current frame, you can pause the program and freeze the system at that moment:

As if time had frozen, after exploring the data of the current time slice, you can continue the process.

Project link (Click to Run)

In this project, we only plotted the data of one hand, results.landmarks may contain the data of two hands.

Interoperability with MicroBlocks

mediapipe can fully utilize the computing power of edge devices (mobile phones/tablets) to provide AI functions. The computing power of edge devices is getting stronger, and Apple’s recently launched M4 chip has the fastest neural engine in Apple’s history, capable of performing up to 38 trillion operations per second. Apple’s artificial intelligence strategy seems to be reasoning/learning on edge devices.

With the standardization of WebGPU, Snap! will be able to fully utilize these capabilities. Since Snap! can also connect to MicroBlocks devices via BLE, it seems we can use a mobile phone as the computing power for a microcontroller. Using frameworks like mediapipe, it seems quite easy to create a wireless AI camera with a mobile phone (similar to the HuskyLens camera).

References