8bit Quantization Usage Guide | Weixin public doc

# 8bit Quantization Usage Guide

# start

Mini Program AI general interfaceIt is a general AI model inference solution provided by the official, which supports Int8 model quantization inference. Significantly improve model inference performance and reduce model storage and computation overhead.

This guide will show how to optimize through the technology[Floating Point Classification Demo](https://github.com/WeChat mini-program /miniprogram-demo/tree/master/miniprogram/packageAPI/pages/ai/MobileNet )。

# 1. Get ready

Please download[Model quantification tool](https://github.com/WeChat mini-program /xnet-miniprogram/tree/Main/nncs), and install dependencies.

git clone https://github.com/WeChat mini-program /xnet-miniprogram.git  && cd xnet-miniprogram/nncs && pip install -r requirements.txt

Please downloadImageNet Dataset, or,ImageNet-mini。
Please download the pre-trained model[MobileNetv2 ](https://github.com/WeChat mini-program /xnet-miniprogram/blob/Main/models/mobilenet-v2-71dot82.onnx )Table of Contents

ImageNet
|---train
|     |---n01440764
|     |---n01443537
|     |---...
|     |---n15075141
|---val
|     |---n01440764
|     |---n01443537
|     |---...
|     |---n15075141
nncs
|---nncs
|---demo
|     |---imagenet_classification
|---requirements.txt
|---README.md
mobilenet-v2-71dot82.onnx

# 2. Examples of quantitative training

Reference code: Demo/imagenet_classification/train_imagenet_onnx.py
Modify data sources and ONNX paths:

    ...
    args.train_Data = "/Data/Yangkang /datasets/ImageNet"
    args.val _Data = "/Data/Yangkang /datasets/ImageNet"
    ...
    model = "mobilenet-v2-71dot82.onnx"

Run quantization training

cd Demo/imagenet_classification && python train_imagenet_onnx.py

Example log: Demo/imagenet_classification/nncs_onnx_Lr1e-5.logfile, float model precision 71.82, QAT fine tuning precision 71.52.
Quantitative model export: Mobilenetv2 _qat.onnx

python deploy.py

Quantitative Program Support: Quantitative perception training(QAT)And Post Training Quantization(PTQ)

# 3. Mini Program Demo

The Demo for Quantitative Classification draws on the[Floating Point Classification Demo](https://github.com/WeChat mini-program /miniprogram-demo/tree/master/miniprogram/packageAPI/pages/ai/MobileNet )The differences to note are:

this.session = wx.createInferenceSession({
    model: modelPath,
    precisionLevel : 0,
    allowNPU : false,    
    allowQuantize: true, // Need to be set to true to activate quantitative reasoning
    })

# 4. Operation effect

Scan the QR code below and click on the interface - General AI reasoning ability - mobileNetInt8, You can see the running effect.

run Demo, you can see the camera in the collection at the same time, will be real-time classification results written back to the bottom of the page.

complete Demo Please refer to the[Official github Mini Program example](https://github.com/WeChat mini-program /miniprogram-demo/tree/master/miniprogram/packageAPI/pages/ai)

# 5.Open the time-consuming test

  data: {
    predClass:  "None",
    classifier: null,
    enableSpeedTest: true,  // Set true
    avgTime: 110.0,
    minTime: 110.0
  },

The iphone13ProMax, floating point classification Demo takes about 10ms, and quantized classification Demo takes about 5ms.