Robust Eye Tracking Based on Adaptive Fusion of Multiple Cameras

Eye and gaze movements play an essential role in identifying individuals' emotional states, cognitive activities, interests, and attention among other behavioral traits. Besides, they are natural, fast, and implicitly reflect the targets of interest, which makes them a highly valuable input modality in human-computer interfaces. Therefore, tracking gaze movements, in other words, eye tracking is of great interest to a large number of disciplines, including human behaviour research, neuroscience, medicine, and human-computer interaction. Tracking gaze movements accurately is a challenging task, especially under unconstrained conditions. Over the last two decades, significant advances have been made in improving the gaze estimation accuracy. However, these improvements have been achieved mostly under controlled settings. Meanwhile, several concerns have arisen, such as the complexity, inflexibility and cost of the setups, increased user effort, and high sensitivity to varying real-world conditions. Despite various attempts and promising enhancements, existing eye tracking systems are still inadequate to overcome most of these concerns, which prevent them from being widely used. In this thesis, we revisit these concerns and introduce a novel multi-camera eye tracking framework. The proposed framework achieves a high estimation accuracy while requiring a minimal user effort and a non-intrusive flexible setup. In addition, it provides improved robustness to large head movements, illumination changes, use of eye wear, and eye type variations across users. We develop a novel real-time gaze estimation framework based on adaptive fusion of multiple single-camera systems, in which the gaze estimation relies on projective geometry. Besides, to ease the user calibration procedure, we investigate several methods to model the subject-specific estimation bias, and consequently, propose a novel approach based on weighted regularized least squares regression. The proposed method provides a better calibration modeling than state-of-the-art methods, particularly when using low-resolution and limited calibration data. Being able to operate with low-resolution data also enables to utilize a large field-of-view setup, so that large head movements are allowed. To address aforementioned robustness concerns, we propose to leverage multiple eye appearances simultaneously acquired from various views. In comparison with conventional single view approach, the main benefit of our approach is to more reliably detect gaze features under challenging conditions, especially when they are obstructed due to large head pose or movements, or eye glasses effects. We further propose an adaptive fusion mechanism to effectively combine the gaze outputs obtained from multi-view appearances. To this effect, our mechanism firstly determines the estimation reliability of each gaze output and then performs a reliability-based weighted fusion to compute the overall point of regard. In addition, to address illumination and eye type robustness, the setup is built upon active illumination and robust feature detection methods are developed. The proposed framework and methods are validated through extensive simulations and user experiments featuring 20 subjects. The results demonstrate that our framework provides not only a significant improvement in gaze estimation accuracy but also a notable robustness to real-world conditions, making it suitable for a large spectrum of applications.

Related material