Application of 3D range camera in virtual human-computer interfaces

In this work, we propose new ways to employ 3D-ranging systems in advanced human-computer interfaces and show that they can be used for precise hand-tracking systems aiming at virtual keyboard or mouse applications. We first implement a Structured Light (SL) based 3D-ranging system as the testbed for acquiring depth information. The ranging system uses an off-the-shelf projector combined with an industrial color camera, where the projector projects well-designed line patterns onto the scene and the camera captures the reflected light from the scene and the target. The depth information is recovered from the deformed line pattern and the pre-calibrated projector-camera system. With this system we can capture dense depth maps with moderate resolution and high frame rate as the simulated input to the hand-tracking system. We then explore an approach to 3D hand matching that combines a detailed 3D hand model (skeletal + polygonal model) with a Physical-based Model Fitting technique (PMF). Virtual forces are assigned between the depth measurements and the polygonal vertices of the model. The matching problem is formulated as the minimization of the distances between the measurement and the model surface with the guidance of virtual forces. This can employ 3D information simply and directly onto the parameter space, and yield good results under challenging conditions. To demonstrate the effectiveness of the approach we show that we can track hand motions such as typing or grasping frame by frame. The major limitation of this method is that it requires a good initialization of the hand model and the computation time to achieve convergence with good fitting results is quite long. We therefore investigated more complex hand models and feature-based matching techniques. A three-level hand model (skeletal + primitive + polygonal model) is proposed for the Multi-Level Feature Matching approach (MLFM). The matching is carried out from the bottom to top levels of different hand presentations, and distinctive features for each level extracted from both the measurements and the model are matched with Scaled Conjugate Gradient method (SCG). The matching result from lower level is used as the initial state of the model parameters for higher level matching. This hierarchical structure speeds up the optimization process and enhances the precision of the matching result. We demonstrate our approach with the same hand motion as in the previous method and compare the performance of these two methods. We show that the MLFM method is faster and more robust to hand occlusions than the PMF method. Finally, we present a more powerful approach based on the Back-Constrained Gaussian Process Latent Variable Model (BC-GPLVM) that import prior information of categorized hand motion such as typing. BC-GPLVM provides a low-dimensional embedding of hand motion data, with a density function that gives higher probability to gestures and motions close to normal typing motion. This model is learned from a small set of clusters of vast captured motion data. The prior model is then combined with the previous multi-level models and features to generate a uniform cost function for optimization. We show the tracking result to demonstrate that this method can be used for human-computer interfaces requiring precise 3D hand tracking.


Related material