Abstract

Vision-based hand pose estimation is important in human-computer interaction. While many recent works focus on full degree-of-freedom hand pose estimation, robust estimation of global hand pose remains a challenging problem. This paper presents a novel algorithm to optimize the leaf weights in a Hough forest to assist global hand pose estimation with a single depth camera. Different from traditional Hough forest, we propose to learn the vote weights stored at the leaf nodes of a forest in a principled way to minimize average pose prediction error, so that ambiguous votes are largely suppressed during prediction fusion. Experiments show that the proposed method largely improves pose estimation accuracy with optimized leaf weights on both synthesis and real datasets and performs favorably compared to state-of-the-art convolutional neural network-based methods. On real-world depth videos, the proposed method demonstrates improved robustness compared to several other recent hand tracking systems from both industry and academy. Moreover, we utilize the proposed method to build virtual/augmented reality applications to allow users to manipulate and examine virtual objects with bare hands.

Details