# VisionKit

Weixin Mini Program The ability to develop AR functionality is also available in version 2.20.0 of the base library, VisionKit. VisionKit includes AR, including visual algorithms, in order to develop the AR function of Mini programs, we need to understand VisionKit first.

# VKSession

The core of VisionKit is VKSession, the VisionKit session object.We can create an instance of VKSession bywx.createVKSession. This instance is singleton on the page,This is strongly related to the page lifecycle, and the intermittent runtime of VKSeesion instances between pages ensures that a Weixin Mini Program has a maximum of one VKSession instance at a given time, as the demo below uses version v2.

const session = wx.createVKSession({
  track: {
    plane: {mode: 3},
  },
  version: 'v2', 
})

Thestartmethod of the VKSession instance can start the VKSession instance:

session.start(err => {
  if (err) return console.error('VK error: ', err)

  // do something
})

Next, we will build a 3D world and render it.

# rendering

AR is meant to enhance reality, and in popular terms, it can integrate virtual things into the real world, such as putting a virtual robot on the desktop of the real world.

So to see this effect in Weixin Mini Program, we first need to be able to draw a realistic picture on the screen, which depends on our photographic.Of course, the picture is not static, so we have to continuously upload the picture taken by the camera, which is similar to how we draw the 3D world with WebGL, rendering frame by frame:

session.start(err => {
  if (err) return console.error('VK error: ', err)

  const onFrame = timestamp => {
    const frame = session.getVKFrame(canvas.width, canvas.height)
    if (frame) {
      renderFrame(frame)
    }

    session.requestAnimationFrame(onFrame)
  }
  session.requestAnimationFrame(onFrame)
})

In the well-knownrequestAnimationFrameIn VKSession instance, we can get the frame object through thegetVKFramemethod. The frame object contains the picture we need to put on the screen.Here we incorporate the width of the canvas when we callgetVKFramebecause we're ready to render it in WebGL here, and then we'll see how it works inrenderFrame:

function renderFrame(frame) {
  renderGL(frame)

  // do something
}

function renderGL(frame) {
  const { yTexture, uvTexture } = frame.getCameraTexture(gl, 'yuv')
  const displayTransform = frame.getDisplayTransform()

  // 上屏
}

ThroughgetCameraTextureWe can get the yuv texture, which is an uncropped texture, so we also need to get the texture adjustment matrix viagetDisplayTransformand use this matrix to adjust the texture on the screen.The gl in the code here is the WebGLRenderingContext instance.

# WebGL & three.js

So how does it work on the screen? We need to have some WebGL knowledge, and in this demo we write our own shaders to render the image onto the canvas and use three.js to render the 3D model.

First is the initial three.js section:

import { createScopedThreejs } from 'threejs-miniprogram'
import { registerGLTFLoader } from './loaders/gltf-loader'

const THREE = createScopedThreejs(canvas)
registerGLTFLoader(THREE)

// camera
const camera = new THREE.Camera()

// scene
const scene = new THREE.Scene()

// light source
const light1 = new THREE.HemisphereLight(0xffffff, 0x444444) // 半球光
light1.position.set(0, 0.2, 0)
scene.add(light1)
const light2 = new THREE.DirectionalLight(0xffffff) // 平行光
light2.position.set(0, 0.2, 0.1)
scene.add(light2)

// Rendering layer
const renderer = new THREE.WebGLRenderer({antialias: true, alpha: true})
renderer.gammaOutput = true
renderer.gammaFactor = 2.2

// Robot Models
const loader = new THREE.GLTFLoader()
let model
loader.load('https://dldir1.qq.com/weixin/miniprogram/RobotExpressive_aa2603d917384b68bb4a086f32dabe83.glb', gltf => {
  model = {
    scene: gltf.scene,
    animations: gltf.animations,
  }
})
const clock = new THREE.Clock()

Using thethreejs-miniprogrampackage here,This is a three.js package specially encapsulated to be compatible with the Weixin Mini Program environment, but developers can substitute any other WebGL engine that can run in a Mini Program, just three.js as an example.The registerGLTFLoader is used to load the 3D model. On the use of three.js, here is just a simple demo, interested people can consult the official documentation to understand.

Next is the initial WebGL:

const gl = renderer.getContext()

// Writing a coloring machine
const currentProgram = gl.getParameter(gl.CURRENT_PROGRAM)
const vs = `
  attribute vec2 a_position;
  attribute vec2 a_texCoord;
  uniform mat3 displayTransform;
  varying vec2 v_texCoord;
  void main() {
    vec3 p = displayTransform * vec3(a_position, 0);
    gl_Position = vec4(p, 1);
    v_texCoord = a_texCoord;
  }
`
const fs = `
  precision highp float;

  uniform sampler2D y_texture;
  uniform sampler2D uv_texture;
  varying vec2 v_texCoord;
  void main() {
    vec4 y_color = texture2D(y_texture, v_texCoord);
    vec4 uv_color = texture2D(uv_texture, v_texCoord);

    float Y, U, V;
    float R ,G, B;
    Y = y_color.r;
    U = uv_color.r - 0.5;
    V = uv_color.a - 0.5;
    
    R = Y + 1.402 * V;
    G = Y - 0.344 * U - 0.714 * V;
    B = Y + 1.772 * U;
    
    gl_FragColor = vec4(R, G, B, 1.0);
  }
`
const vertShader = gl.createShader(gl.VERTEX_SHADER)
gl.shaderSource(vertShader, vs)
gl.compileShader(vertShader)

const fragShader = gl.createShader(gl.FRAGMENT_SHADER)
gl.shaderSource(fragShader, fs)
gl.compileShader(fragShader)

const program = gl.createProgram()
gl.attachShader(program, vertShader)
gl.attachShader(program, fragShader)
gl.deleteShader(vertShader)
gl.deleteShader(fragShader)
gl.linkProgram(program)
gl.useProgram(program)

const uniformYTexture = gl.getUniformLocation(program, 'y_texture')
gl.uniform1i(uniformYTexture, 5)
const uniformUVTexture = gl.getUniformLocation(program, 'uv_texture')
gl.uniform1i(uniformUVTexture, 6)

const dt = gl.getUniformLocation(program, 'displayTransform')
gl.useProgram(currentProgram)

// Initial VAO
const ext = gl.getExtension('OES_vertex_array_object')
const currentVAO = gl.getParameter(gl.VERTEX_ARRAY_BINDING)
const vao = ext.createVertexArrayOES()

ext.bindVertexArrayOES(vao)

const posAttr = gl.getAttribLocation(program, 'a_position')
const pos = gl.createBuffer()
gl.bindBuffer(gl.ARRAY_BUFFER, pos)
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([1, 1, -1, 1, 1, -1, -1, -1]), gl.STATIC_DRAW)
gl.vertexAttribPointer(posAttr, 2, gl.FLOAT, false, 0, 0)
gl.enableVertexAttribArray(posAttr)
vao.posBuffer = pos

const texcoordAttr = gl.getAttribLocation(program, 'a_texCoord')
const texcoord = gl.createBuffer()
gl.bindBuffer(gl.ARRAY_BUFFER, texcoord)
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([1, 1, 0, 1, 1, 0, 0, 0]), gl.STATIC_DRAW)
gl.vertexAttribPointer(texcoordAttr, 2, gl.FLOAT, false, 0, 0)
gl.enableVertexAttribArray(texcoordAttr)
vao.texcoordBuffer = texcoord

ext.bindVertexArrayOES(currentVAO)

This piece belongs to the knowledge of WebGL, here no longer do too much detail, interested parties can use search engine access to relevant information to understand.After that, we can complete the previousrenderGLmethod and complete the code on the screen:

function renderGL(frame) {
  const gl = renderer.getContext()
  gl.disable(gl.DEPTH_TEST)
  
  // 获取纹理和调整矩阵
  const {yTexture, uvTexture} = frame.getCameraTexture(gl, 'yuv')
  const displayTransform = frame.getDisplayTransform()

  if (yTexture && uvTexture) {
    const currentProgram = gl.getParameter(gl.CURRENT_PROGRAM)
    const currentActiveTexture = gl.getParameter(gl.ACTIVE_TEXTURE)
    const currentVAO = gl.getParameter(gl.VERTEX_ARRAY_BINDING)

    gl.useProgram(program)
    ext.bindVertexArrayOES(vao)

    // 传入调整矩阵
    gl.uniformMatrix3fv(dt, false, displayTransform)
    gl.pixelStorei(gl.UNPACK_ALIGNMENT, 1)

    // 传入 y 通道纹理
    gl.activeTexture(gl.TEXTURE0 + 5)
    const bindingTexture5 = gl.getParameter(gl.TEXTURE_BINDING_2D)
    gl.bindTexture(gl.TEXTURE_2D, yTexture)

    // 传入 uv 通道纹理
    gl.activeTexture(gl.TEXTURE0 + 6)
    const bindingTexture6 = gl.getParameter(gl.TEXTURE_BINDING_2D)
    gl.bindTexture(gl.TEXTURE_2D, uvTexture)

    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4)

    gl.bindTexture(gl.TEXTURE_2D, bindingTexture6)
    gl.activeTexture(gl.TEXTURE0 + 5)
    gl.bindTexture(gl.TEXTURE_2D, bindingTexture5)

    gl.useProgram(currentProgram)
    gl.activeTexture(currentActiveTexture)
    ext.bindVertexArrayOES(currentVAO)
  }
}

At this point, the basic background picture is drawn onto the screen, and the effect on the phone is as if the camera is on.

# Placing 3D models

Admittedly, it can't be called AR so far in terms of its effects alone, so we're going to implement something like clicking on the screen and placing a robot model in the corresponding 3D world position on the screen. For example, click on the table in the screen and put a robot model on the table.

Three.js, which we introduced earlier, is meant to do this. The popular WebGL engines nowadays basically encapsulate a lot of interfaces that are easy for us to use quickly, such as lighting rendering, model loading, etc. The previous code has been demonstrated, so I won't repeat it here.

What we need to know here is the hitTest interface for VKSession. The main purpose of this interface is to convert 2D coordinates to 3D world coordinates, i.e. (x, y) to (x, y, z). On the screen, it is 2D. When we touch screen, we get 2D coordinates, which is (x, y). The hitTest interface converts this into 3D world coordinate system (x, y, z), where the origin is the point at which the camera opens:

function onTouchEnd(evt) {
  const touches = evt.changedTouches.length ? evt.changedTouches : evt.touches

  // 在点击位置放一个机器人模型
  if (touches.length === 1) {
    const touch = touches[0]
    if (session && scene && model) {
      // 调用 hitTest
      const hitTestRes = session.hitTest(touch.x / width, touch.y / height)
      if (hitTestRes.length) {
        model.scene.scale.set(0.05, 0.05, 0.05)

        // 动画混合器
        const mixer = new THREE.AnimationMixer(scene)
        for (let i = 0; i < model.animations.length; i++) {
          const clip = model.animations[i]
          if (clip.name === 'Dance') {
            const action = mixer.clipAction(clip)
            action.play()
          }
        }
        
        // 把模型放到对应的位置上
        const cnt = new THREE.Object3D()
        cnt.add(model.scene)
        model.matrixAutoUpdate = false
        model.matrix.fromArray(hitTestRes[0].transform)
        scene.add(model)
      }
    }
  }
}

As you can see,hitTestis not the standard coordinate value, but the value obtained by dividing it by the width and height of the canvas.The parameters accepted here are actually coordinates relative to the canvas window, with values ranging from [0, 1], with 0 for the left / top edge and 1 for the right / bottom edge.ThehitTestreturns a matrix containing the position, rotation, and scaling information of the 3D world coordinates.You can see that this matrix can be used directly for three.js, which is one of the reasons why this demo chose three.js, which encapsulates a lot of complicated implementation details and simplifies a lot of code.

Then we call the three.js related rendering interface to draw the robot model onto the screen, and here we can continue to refine the previousrenderFramemethod:

function renderFrame(frame) {
  renderGL(frame)

  const frameCamera = frame.camera

  // 更新动画
  const dt = clock.getDelta()
  mixer.update(dt)

  // 相机
  if (camera) {
    camera.matrixAutoUpdate = false
    camera.matrixWorldInverse.fromArray(frameCamera.viewMatrix)
    camera.matrixWorld.getInverse(camera.matrixWorldInverse)

    const projectionMatrix = frameCamera.getProjectionMatrix(NEAR, FAR)
    camera.projectionMatrix.fromArray(projectionMatrix)
    camera.projectionMatrixInverse.getInverse(camera.projectionMatrix)
  }

  renderer.autoClearColor = false
  renderer.render(scene, camera)
  renderer.state.setCullFace(THREE.CullFaceNone)
}

Here through the frame object'scamera``viewMatrixgets the view matrix through thegetProjectionMatrixThemethod gets the projection matrices and passes them all to three.js camera objects to ensure that three.js' cameras are in the right position and angle, and that the 3D world renders the images we see with our eyes.

At this point, the effect of placing a robot model in the 3D world corresponding to the point screen click position in front of the screen can be completed.

# Flat Testing

After understanding how to implement an AR feature in Weixin Mini Program, we may need to expand some scenarios: for example, we need to detect the plane of the 3D world.

The plane that VisionKit recognizes will be provided to us as an anchor object. Here VKSession provides a convenient event: addAnchors / updateAnchors / removeAnchors. Through these three events, we can listen to changes in the anchor list:

session.on('addAnchors', anchors => {
  // anchor.id - anchor 唯一标识
  // anchor.type - anchor 类型,0 表示是平面 anchor
  // anchor.transform - 包含位置、旋转、放缩信息的矩阵,以列为主序
  // anchor.size - 尺寸
  // anchor.alignment - 方向

  // do something
})
session.on('updateAnchors', anchors => {
  // do something
})
session.on('removeAnchors', anchors => {
  // do something
})