Guide to using AI in Mini programs

# Guide to using AI in Mini programs

# start

The Mini Program AI general interface is a set of general AI model inference solutions provided by the official Mini Program, which internally uses a fully optimized self-developed inference engine and supports CPU、GPU、NPU Reasoning. Mini Program developers do not need to worry about internal implementations and model transformations, they only need to provide trained ONNX Model, the Mini Program internally will transfer the user's ONNX The model is automatically converted into a model format that can be recognized by the self-developed inference engine and the inference is completed.

This guide will show how to use the AI inference capabilities of a Mini Program to complete a classification task from scratch. We convert the data collected by the camera in real time into input for AI inference through simple preprocessing, and after completing the inference, we perform simple post-processing on the output of the model to obtain the final classification results and display them on the page.

Example Use ONNX Official Modelzoo Provided by MobileNetV2 Model. The relevant models can be obtained from the[Official github](https://github.com/onnx/models/blob/Main/vision/classification/MobileNet /model/mobilenetv2-12.onnx ) ObtainThe pre- and post-processing used in the example is also consistent with ONNX Officially given.imagenet_validationConsistent.

# 1 Creating a session

First we need to create a session For reasoning. Here we use from[Official github](https://github.com/onnx/models/blob/Main/vision/classification/MobileNet /model/mobilenetv2-12.onnx )Download the floating-point model, select the precisionLevel Is 0, so that session At runtime, it will automatically select fp16 Storage middle tensor Of the results, will also use fp16 To do the calculations, and it turns on fp16 Calculated Winograd, while opening the approximate math Calculation. We have chosen not to use quantitative reasoning, not to NPU。 create session In addition to having to provide parameters model To specify ONNX Other than the model path, other settings are not necessary.

In general, the use of PrecessionLevel The lower the level, the faster the reasoning speed, but it may bring a loss of accuracy. Therefore, it is recommended that developers use lower precision to improve inference speed and save energy when the effect meets the demand.

In addition to usingwx.createInferenceSession()Interface Creation session Outside, we are also here for session Added 2 events to listen for error Or created.onLoad()Function we do this by setting a isReady Variable, Record session Initialization is completed and can be used for inference.

// here modelPath For the needs of ONNX Model, Attention model At present, only the suffix.onnx is recognized Of the file as a parameter.
const modelPath = `${ wx.env.USER _DATA_PATH}/mobilenetv2-12.onnx `

this.session = wx.createInferenceSession({
    model: modelPath,
    /* 0: Minimum accuracy  use fp16  Storage floating point, fp16 Computing, Winograd  Algorithms are also taken fp16  Calculation, Open Approximation math Calculation
       1: Lower accuracy  use fp16  Storage floating point, fp16 Calculation, Disable Winograd  Algorithm, Open Approximate math Calculation
       2: Medium accuracy  use fp16  Storage floating point, fp32 Calculate, open. Winograd, Open Approximation math Calculation
       3: Higher accuracy  use fp32 Storage floating point, fp32 Calculate, open. Winograd, Open Approximation math Calculation
       4: Highest accuracy  use fp32 Storage floating point, fp32 Calculate, open. Winograd, Close Approximation math Calculation

       Usually higher precision takes longer to complete the reasoning
    */
    precisionLevel : 0,
    allowNPU : false,     // Whether to use NPU Reasoning, only for IOS effective
    allowQuantize: false, // Whether to produce a quantitative model
    })

// Listening for error events
session.onError((error) => {
  console.error(error)
})

// Monitor Model Load Completion Event
session.onLoad(() => {
  console.log('session load')
})

# 2 Session reasoning

# 2.1 Processing camera data

First we create a Camera context, And call the onCameraFrame Capture frame. Here. classifier Encapsulates reasoning session Related calls, the specific complete code can refer to the Demo Example. onCameraFrame Will continue to capture camera images. If our session If the initialization is successful and the inference task of the previous frame is completed, the data collected by the camera will be passed and a new inference task will be performed.

    const context = wx.createCameraContext(this) 
     
    const listener = context.onCameraFrame (frame => {

        const fps = this.fpsHelper.getAverageFps()
        console.log(`fps=${fps} `)

        if (this.classifier && this.classifier.isReady() && !this.predicting) {
           this.executeClassify(frame)
        }
    })

# 2.2 Preprocessing the data collected by the camera

OnCameraFrame Returned frame Include properties width、height and Data, which represents the width, height, and image pixel data of two-dimensional image data respectively. among Data For a ArrayBuffer, which stores data type Uint8, Stored Data format for Rgba, that is, each of the four consecutive values represents a pixel's rgba。 Detailed information about onCameraFrame The content of which can be referred toCameraContext.onCameraFrame.

for frame Content, we first perform a pre-processing operation to transform it into model input.

use Netron open ONNX Documents, we can see that MobileNet Description information for the input and output. As you can see, the input size for this model is[1, 3, 224, 224]The data type is float32。

In order to convert the frame captured by the camera to the data required by the model, we need to discard alpha Channel information, transferring data from the nhwc Transformed into nchw, will frame of width and height Resize become 224 x 224， And complete normallize Operation.

The following code is passed through the js Completed all the pre-processing to capture the camera frame Converted to model input dstInput. among frame The data collected for the cameras, var dstInput = new Float32Array(3 * 224 * 224)

  /* The original input is rgba uint8 Data, The goal is to nchw float32 data

     Will camera The acquisition data is scaled to the model's input Size, will uint8 Data converted to float32,
     And from NHWC Convert to NCHW
  */
  preProcess(frame, dstInput ) {

    return new Promise((resolve, reject) =>
    {
      const origData = new Uint8Array(frame.data)

      const hRatio = frame.height / modelHeight

      const wRatio = frame.width / modelWidth


      const OrigHStride  = frame.width * 4
      const origWStride  = 4
    
      const Mean = [0.485, 0.456, 0.406]

      // Reverse of std = [0.229, 0.224, 0.225]
      const reverse_div = [4.367, 4.464, 4.444]
      const Ratio = 1 / 255.0

      const normalized_div = [Ratio / reverse_div[0], Ratio * reverse_div[1], Ratio * reverse_div[2]]

      const normalized_Mean = [Mean[0] * reverse_div[0], Mean[1] * reverse_div[1], Mean[2] * reverse_div[2]]

      var idx = 0
      for (var c = 0 c < modelChannel ++c)
      {
        for (var h = 0 h < modelHeight ++h)
        {
          const origH = Math.round(h * hRatio)

          const origHOffset  = origH * OrigHStride 

          for (var w = 0 w < modelWidth ++w)
          {
            const origW = Math.round(w * wRatio)

            const origIndex  = origHOffset  + origW * origWStride  + c

            const val = origData[origIndex ] * (normalized_div[c]) - normalized_Mean[c]

            dstInput [idx] = val

            idx++
          }
        }
      } 

      resolve()
    })
  }

# 2.3 Model reasoning

After a simple pre-processing of the data collected by the camera, we can use it to set Input to model inference. What we got with the preprocessing dstInput To construct a XInput, take this xInput As input to the model inference, passed to the session.run Just...

const xinput  = {
    shape: [1, 3, 224, 224],  // Input Shape NCHW value
    data: dstInput.buffer,    // For a ArrayBuffer
    type: 'float32',          // Input data type
}

this.session.run({
    // here "input" Must be associated with ONNX Keep the model input names in the model file strictly consistent
    "input": xinput, 
})
.then((res) => {

    // Use here. res.outputname.data
    // among output  Need to be strict with the ONNX Consistent model output names in the model file
    let num = new Float32Array(res.output.data)

The results of the operation we run through res.output Get. It's important to note, Here's the input/output Not all models are fixed and need to be strictly and specifically ONNX The input in the file, the output name corresponds. Back to before Netron See MobileNet Model Description Information: We can see that this model has an input called "input," there is an output, the name is called “output”。 So when we set the input, session.run({"input": xinput} ), "input" to be considered to be ONNX In the model input name. When there are multiple inputs, we pass thesession.run({"input1": xxx, "input2": xxx} ) For each model named “input1” and “input2” Of the input settings data. Similarly, when we take the output of the model,res.output Refers to getting the name of “output” Of the output.

Whether the input or output of the model Tensor， Are all one. Object, which contains the shape，type and Data Three properties. among Data Is an ArrayBuffer.

# 2.4 Post-processing

This example is relatively simple for the post-processing process. After getting the model output, we pass a argMax Operation, calculate the classification with the highest score Index， And take this Index It is sufficient to convert to the represented category.

let num = new Float32Array(res.output.data)

var maxVar = num[0]

var index = 0

for (var i = 1 i < num.length ++i)
{
if (maxVar < num[i])
{
    maxVar = num[i]   
    index = i     
}
}

this.getClass(index)

# 3 Operation effect

Scan the QR code below and click on the interface - General AI reasoning ability - mobileNet, You can see the running effect.

run Demo, you can see the camera in the collection at the same time, will be real-time classification results written back to the bottom of the page.

complete Demo Please refer to the[Official github Mini Program example](https://github.com/WeChat mini-program /miniprogram-demo/tree/master/miniprogram/packageAPI/pages/ai)

# Operator Support List

Detailed operator support please refer toOperator Support List