Sequential Topic Models for Mining Recurrent Activities and their Relationships: Application to long term video recordings

In this thesis, we address the analysis of activities from long term data logs with an emphasis on video recordings. Starting from simple words from video, we progressively build methods to infer higher level scene semantics. The main strategies used to achieve this are: the use of simple low-level visual features that can be readily extracted, and of probabilistic topic models that come with powerful learning and inference tools. In the initial part of the thesis, we investigate the use of a simple topic model called Probabilistic Latent Semantic Analysis (PLSA) for video scene analysis. By quantizing location, optical flow direction and foreground blob size into words, and considering short video clips as documents, we discover topics from PLSA that represent recurrent activities in the scene. We then demonstrate how the topics can be used to analyze the scene activities, segment the scene into homogeneous activity regions and detect abnormalities. The topics from PLSA have no temporal structure and hence do not represent activities well. To address this issue, we develop a novel sequential topic model called Probabilistic Latent Sequential Motifs (PLSM) which automatically discovers sequential patterns called motifs that include temporal information from videos. To address the problem of observations caused by multiple activities in the scene, the PLSM formulation uses explicit random variables to represent time at different levels: at a higher level to determine when a motif starts in the video, and at a lower level to know the order of words within the motif. Using a sparsity constraint on the event start times, and MAP priors on the temporal axis of the motifs, we designed an inference algorithm. When applied to surveillance videos, the model captures motifs that resemble trajectories. The model provides more information than PLSA, giving clues about when and where an activity starts, when it ends and how it is executed. As in many unsupervised topic models, deciding the most appropriate number of topics is a difficult problem. To address this, we reformulate PLSM using principles of Bayesian non-parametrics. The new method called Hierarchical Dirichlet Latent Sequential Motifs (HDLSM) uses Dirichlet processes at multiple levels to select a suitable number of motifs and identify their occurrences in the data. The final objective is to analyze how events in a scene are organized. At a global level, a scene can be thought of as undergoing a sequence of phases, each with distinct characteristics. At a more local level, the individual activities can exhibit dependencies that are possibly causal in nature. Following this, we propose a new graphical model called Mixed Event Relationship (MER) model, that incorporates the learning of both local rules and global states simultaneously from a binary event matrix. Learning these scene semantics is achieved using an iterative Gibbs sampling procedure. While the global scene states recover traffic cycles, the local rules provide information about single and multi-object activity interactions. We validate the proposed methods with elaborate experiments on nine different challenging datasets with a wide variety of activity content. The results prove the general applicability of the different methods proposed in this thesis. We believe that they can have wider applications on data coming from sensor logs of other modalities too.

Related material