VisionKit | Weixin public doc

# VisionKit

Small programs are also in the base library 2.20.0 Version began to provide the development AR Ability to function, i.e. VisionKit。VisionKit Contains the AR Inside the vision algorithm, in order to develop Mini programs of AR Function, we need to first understand the VisionKit。

# VKSession

VisionKit At its core is VKSession, i.e. VisionKit Session object. We can do this by wx.createVKSession To create VKSession This instance is a singleton on the page, strongly related to the life cycle of the page, and the VKSeesion Instance run cycles are mutually exclusive. This ensures that a Mini Program will have at most one VKSession Example, the following Demo with v2 Version for example.

const session = wx.createVKSession({
  track: {
    plane: {mode: 3},
  },
  version: 'v2', 
})

transfer VKSession Examples of start Method can start the VKSession Examples:

session.start(err => {
  if (err) return console.error('VK error: ', err)

  // do something
})

Next, we are going to build 3D The world and rendering up.

# rendering

AR The original meaning is augmented reality, popular speaking is that you can integrate virtual things in the real world, such as putting a virtual robot on the desktop in the real world.

So to see this effect in the Mini Program, we must first be able to draw the real picture on the screen, which depends on our camera. Of course, the picture is not static, so we have to continuously screen the picture captured by the camera, which is the same as when we use the WebGL draw 3D World similar, frame by frame rendering:

session.start(err => {
  if (err) return console.error('VK error: ', err)

  const onFrame = timestamp => {
    const frame = session.getVKFrame(canvas.width, canvas.height)
    if (frame) {
      renderFrame(frame)
    }

    session.requestAnimationFrame(onFrame)
  }
  session.requestAnimationFrame(onFrame)
})

In the well-known requestAnimationFrame Inside, through VKSession Examples of getVKFrame Method can get the frame object, the frame object that contains the screen we need to screen. Here we are tuning getVKFrame The width of the canvas is passed in because we are ready to use it here. WebGL Render it up, and then we'll see renderFrame How to do it in:

function renderFrame(frame) {
  renderGL(frame)

  // do something
}

function renderGL(frame) {
  const { yTexture, uvTexture } = frame.getCameraTexture (gl, 'yuv')
  const displayTransform = frame.getDisplayTransform ()

  // Upscreen
}

adopt getCameraTexture We can get it. yuv Texture, which is an uncropped texture, so you also need to pass the getDisplayTransform Gets a texture adjustment matrix, which can then be used to crop the texture on the screen. In the code here. gl That is WebGLRenderingContext Instance.

# WebGL & three.js

So how does the screen need to operate? It requires us to have a certain WebGL Knowledge, in this Demo We write our own shaders to render the picture onto the canvas, using three.js To render 3D Model.

First is initialization. three.js Section:

import { createScopedThreejs } from 'three-mini-program' 
import { registerGLTFLoader } from './loaders/gltf-loader'

const THREE = createScopedThreejs(canvas)
registerGLTFLoader(THREE)

// camera
const Camera = new THREE.Camera()

// scene
const scene = new THREE.Scene()

// light source
const light1 = new THREE.HemisphereLight(0xffffff, 0x444444) // Hemispheric light 
light1.position.set(0, 0.2, 0)
scene.add (light1)
const light2 = new THREE.DirectionalLight(0xffffff) // Parallel light
light2.position.set(0, 0.2, 0.1)
scene.add (light2)

// Rendering layer
const Renderer = new THREE.WebGLRenderer({antialias:  true, alpha: true})
renderer.gammaOutput  = true
renderer.gammaFactor = 2.2

// Robot model
const Loader = new THREE.GLTFLoader()
let model
loader.load('https://dldir1.qq.com/Weixin/miniprogram/RobotExpressive _aa2603d917384b68bb4a086f32dabe83.glb',  gltf => {
  model = {
    scene: gltf.scene,
    animations: gltf.animations,
  }
})
const clock = new THREE.Clock()

Use here threejs-miniprogram Package, which is specially encapsulated to be compatible with the Mini Program environment three.js Package, of course, developers can also replace it with any other program that can run in the Mini Program WebGL The engine, which is only three.js For example. registerGLTFLoader Is used to load 3D Model. about three.js The use of, here just gives a simple Those who are interested can consult the official documentation.

Next is initialization. WebGL：

const gl = renderer.getContext()

// Writing shaders
const currentProgram = gl.getParameter(gl.CURRENT_PROGRAM)
const vs = `
  attribute Vec2  a_position
  attribute Vec2  a_texCoord 
  uniform mat3 displayTransform
  varying Vec2  v_texCoord 
  void Main() {
    Vec3  p = displayTransform * Vec3 (a_position, 0)
    gl_Position = Vec4 (p, 1)
    v_texCoord  = a_texCoord 
  }
`
const fs = `
  precision highp float

  uniform sampler2D y_texture
  uniform sampler2D uv_texture
  varying Vec2  v_texCoord 
  void Main() {
    Vec4  y_color = texture2D(y_texture, v_texCoord )
    Vec4  uv_color = texture2D(uv_texture, v_texCoord )

    float Y, U, V
    float R ,G, B
    Y = y_color.r
    U = uv_color.r - 0.5
    V = uv_color.a - 0.5
    
    R = Y + 1.402 * V
    G = Y - 0.344 * U - 0.714 * V
    B = Y + 1.772 * U
    
    gl_FragColor = Vec4 (R, G, B, 1.0)
  }
`
const vertShader = gl.createShader(gl.VERTEX_SHADER)
gl.shaderSource(vertShader, vs)
gl.compileShader (vertShader)

const FragShader  = gl.createShader(gl.FRAGMENT_SHADER)
gl.shaderSource(fragShader,  fs)
gl.compileShader (FragShader )

const program = gl.createProgram()
gl.attachShader (program, vertShader)
gl.attachShader (program, FragShader )
gl.deleteShader (vertShader)
gl.deleteShader (FragShader )
gl.linkProgram(program)
gl.useProgram(program)

const uniformYTexture = gl.getUniformLocation(program, 'y_texture')
gl.uniform1i(uniformYTexture, 5)
const uniformUVTexture = gl.getUniformLocation(program, 'uv_texture')
gl.uniform1i(uniformUVTexture, 6)

const dt  = gl.getUniformLocation(program, 'displayTransform')
gl.useProgram(currentProgram)

// to initialize VAO
const ext = gl.getExtension('OES_vertex_array_object')
const currentVAO = gl.getParameter(gl.VERTEX_ARRAY_BINDING)
const Vao  = ext.createVertexArrayOES()

ext.bindVertexArrayOES(Vao )

const posAttr = gl.getAttribLocation(program, 'a_Position' )
const pos = gl.createBuffer()
gl.bindBuffer(gl.ARRAY_BUFFER, pos)
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([1, 1, -1, 1, 1, -1, -1, -1]), gl.STATIC_DRAW)
gl.vertexAttribPointer (posAttr, 2, gl.FLOAT, false, 0, 0)
gl.enableVertexAttribArray (posAttr)
vao.posBuffer  = pos

const texcoordAttr  = gl.getAttribLocation(program, 'a_texCoord )
const texcoord  = gl.createBuffer()
gl.bindBuffer(gl.ARRAY_BUFFER, texcoord )
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([1, 1, 0, 1, 1, 0, 0, 0]), gl.STATIC_DRAW)
gl.vertexAttribPointer (texcoordAttr,  2, gl.FLOAT, false, 0, 0)
gl.enableVertexAttribArray (texcoordAttr )
vao.texcoordBuffer  = texcoord 

ext.bindVertexArrayOES(currentVAO)

This piece belongs to WebGL Here will not do too much detail, interested people can use search engines to consult the relevant information to understand. After that we can perfect the previous renderGL Method to complete the writing of the screen code:

function renderGL(frame) {
  const gl = renderer.getContext()
  gl.disable(gl.DEPTH_TEST)
  
  // Get the texture and adjustment matrix
  const {yTexture, uvTexture} = frame.getCameraTexture (gl, 'yuv')
  const displayTransform = frame.getDisplayTransform ()

  if (yTexture && uvTexture) {
    const currentProgram = gl.getParameter(gl.CURRENT_PROGRAM)
    const currentActiveTexture = gl.getParameter(gl.ACTIVE_TEXTURE)
    const currentVAO = gl.getParameter(gl.VERTEX_ARRAY_BINDING)

    gl.useProgram(program)
    ext.bindVertexArrayOES(Vao )

    // Incoming adjustment matrix
    gl.uniformMatrix3fv (dt, false, displayTransform)
    gl.pixelStore (gl.UNPACK_ALIGNMENT, 1)

    // to import y Channel texture
    gl.activeTexture(gl.TEXTURE0 + 5)
    const bindingTexture5 = gl.getParameter(gl.TEXTURE_BINDING_2D)
    gl.bindTexture(gl.TEXTURE_2D, yTexture)

    // to import uv Channel texture
    gl.activeTexture(gl.TEXTURE0 + 6)
    const bindingTexture6 = gl.getParameter(gl.TEXTURE_BINDING_2D)
    gl.bindTexture(gl.TEXTURE_2D, uvTexture)

    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4)

    gl.bindTexture(gl.TEXTURE_2D, bindingTexture6)
    gl.activeTexture(gl.TEXTURE0 + 5)
    gl.bindTexture(gl.TEXTURE_2D, bindingTexture5)

    gl.useProgram(currentProgram)
    gl.activeTexture(currentActiveTexture)
    ext.bindVertexArrayOES(currentVAO)
  }
}

At this point, the basic background picture is drawn on the screen, and the effect seen on the phone is like having the camera on.

# Place 3D model

It is true that, in effect alone, it cannot so far be called AR, then we want to achieve such a function: click on the screen, and then on the screen corresponding to the 3D World Location Place a robot modelFor example, click on the table in the picture and put a robot model on the table.

We introduced earlier. three.js Just to do this effect. The current popular WebGL The engine basically encapsulates a large number of interfaces that are convenient for us to use quickly, such as lighting rendering, model loading, etc. The previous code has been demonstrated, so I will not repeat the instructions here.

What we need to understand here is VKSession of hitTest Interface. The main purpose of this interface is to bring 2D Coordinates converted to 3D World coordinates, i.e. (x, y) Turn to (x, y, z). In layman's terms is the table shown on the screen, on screen it is 2D When we touch the screen, the coordinates we get are 2D Coordinates, which is (x, y)hitTest Interface can be converted into 3D World coordinates (x, y, z), and 3D The origin of the world coordinate system is the point at which the camera opens:

function onTouchEnd(possibly ) {
  const touches = evt.changedTouches.length ? evt.changedTouches : evt.touches

  // Put a robot model in the click position
  if (touches.length === 1) {
    const touch = touches[0]
    if (session && scene && model) {
      // call hitTest
      const hitTestRes = session.hitTest(touch.x / width, touch.y / height)
      if (hitTestRes.length) {
        model.scene.scale.set (0.05, 0.05, 0.05)

        // Animation mixer
        const mixer = new THREE.AnimationMixer(scene)
        for (let i = 0 i < model.animations.length i++) {
          const clip = model.animations[i]
          if (clip.name === 'Dance') {
            const action = mixer.clipAction(clip)
            action.play()
          }
        }
        
        // Put the model in the corresponding position
        const CNT  = new THREE.Object3D()
        cnt.add(model.scene )
        model.matrixAutoUpdate  = false
        model.matrix.fromArray (hitTestRes[0].transform)
        scene.add (model)
      }
    }
  }
}

You can see hitTest The two parameters passed in are not the standard coordinate values, but the values obtained by dividing them by the width and height of the canvas. The parameters accepted here are actually coordinates relative to the canvas window, and the range is [0, 1]，0 To the left/Upper Edge, 1 For the right/Lower edge. and hitTest The return result is a matrix that contains 3D Location, rotation, and scaling of world coordinates. You can see that the matrix can be directly three.js Used, this is also the time Demo Selection three.js Because it encapsulates a lot of complicated implementation details and simplifies a lot of code.

And then the tune. three.js Related rendering interface, the robot model is also drawn on the screen, here we can continue to improve the previous renderFrame Method:

function renderFrame(frame) {
  renderGL(frame)

  const FrameCamera  = frame.camera

  // Update animation
  const dt  = clock.getDelta()
  mixer.update(dt )

  // camera
  if (Camera) {
    camera.matrixAutoUpdate  = false
    camera.matrixWorldInverse.fromArray(frameCamera.viewMatrix)
    camera.matrixWorld.getInverse(camera.matrixWorldInverse)

    const projectionMatrix = frameCamera.getProjectionMatrix(NEAR, FAR)
    camera.projectionMatrix.fromArray(projectionMatrix)
    camera.projectionMatrixInverse.getInverse(camera.projectionMatrix)
  }

  renderer.autoClearColor = false
  renderer.render(scene, Camera)
  renderer.state.setCullFace(THREE.CullFaceNone)
}

Here through the frame object's Camera Property to get the frame camera, and then pass it through the viewMatrix Got the view matrix. By getProjectionMatrix Method to get the projection matrix, all passed to three.js Of the camera object to ensure that the three.js The camera position and angle are correct, while ensuring that the 3D The rendering of the world matches what our human eyes see.

At this point, the previous point screen click on the corresponding position 3D The effect of placing a robot model of the world is accomplished.

# Plane detection

After a discussion on how to implement a AR After understanding the function, we may need to extend some scenarios: for example, we need to detect 3D Plane of the world.

VisionKit The plane will be identified in the form of anchor Object in the manner provided to us, here VKSession It is a very convenient event: addAnchors/updateAnchors/Removeanchors. Through these three events we can listen anchor Changes to the list:

session.on('addAnchors',  anchors => {
  // anchor.id - anchor Unique identifier
  // anchor.type - anchor Type, 0 Represents a plane anchor
  // anchor.transform - Matrix containing position, rotation, and scaling information, in column order
  // anchor.size - size
  // anchor.alignment - direction

  // do something
})
session.on('updateAnchors', anchors => {
  // do something
})
session.on('removeAnchors',  anchors => {
  // do something
})

Program Examples
Can be found at [VisionKit Basic Capabilities Using Reference](https://github.com/WeChat mini-program /miniprogram-demo/tree/master/miniprogram/packageAPI/pages/ar/visionkit-basic) Page to view the sample code.