Pose Tracking

PoseProcessor estimates 3D pose keypoints, it locates the person/pose region-of-interest (ROI) and predicts the pose keypoints providing smooth, stable and accurate pose estimation. 2D pixel pose keypoints - points in the screen coordinate space. X and Y coordinates are normalized screen coordinates (scaled by width and height of the input image), while the Z coordinate is depth within orthographic projection space, it has the same scale as X coordinate (normalized by image width) and 0 is at the center of hips. These points can be used for 2D pose overlays or when using orthographic projection. Estimation of Z coordinate is not very accurate and we recommend to use only XY for 2D effects. 3D metric points - points within 3D space of perspective camera located at the space origin and pointed in the negative direction of the Z-axis. These points can be used for 3D avatar overlays or virtual try-on. Rigged and skinned models can be rendered on top of the pose aligning skeleton/armature joints with 3D keypoints. 3D and 2D points are perfectly aligned, projections of 3D points coincide with 2D pixel coordinates within the perspective camera.

Pose processor may estimate an accurate & stable segmentation mask. Segmentation mask - monochrome image, where every pixel has value in range [0..1] denoting the probability of it being a foreground. Mask is provided for normalized rect region of the original image, it has a fixed size in pixels and should be scaled to image space. Optional temporal smoothing of a segmentation mask may be enabled. Estimated mask may be used for background substitution, effects like bokeh or focal blur, advanced occluder materials utilizing a mask, regional patchers, and other foreground/background shader effects.

PoseProcessor emits PoseResult storing results of pose tracking. They’re passed to Renderer. PoseEngine is a straightforward specialization of Engine for PoseProcessor.

Simple application utilizing PoseProcessor. In this application we add runtime switching between front and rear cameras. Note how we disable the button to prevent concurrent pipeline state change.

import { PoseEngine } from "@geenee/bodyprocessors";
import { CustomRenderer } from "./customrenderer";
import "./index.css";

let rear = false;
const engine = new PoseEngine();
const token = location.hostname === "localhost" ?
    "localhost_sdk_token" : "prod.url_sdk_token";

async function main() {
    const container = document.getElementById("root");
    if (!container)
        return;
    const renderer = new CustomRenderer(
        container, "crop", !rear, "model.glb");

    const cameraSwitch = document.getElementById(
        "camera-switch") as HTMLButtonElement | null;
    if (cameraSwitch) {
        cameraSwitch.onclick = async () => {
            cameraSwitch.disabled = true;
            rear = !rear;
            await engine.setup({ size: { width: 1920, height: 1080 }, rear });
            await engine.start();
            renderer.setMirror(!rear);
            cameraSwitch.disabled = false;
        }
    }

    await Promise.all([
        engine.addRenderer(renderer),
        engine.init({ token: token })]);
    await engine.setup({ size: { width: 1920, height: 1080 }, rear });
    await engine.start();
}
main();

Documentation of the following packages provides details on how to build more extensive application with custom logic: