Dynamic Bayesian multi-target tracking for behavior and interaction detection

Marcenaro, Lucio; Regazzoni, Carlo; Soto, Mauricio

doi:10.1201/b14839

In this chapter, a joint human tracking and human-to-human interaction recognition system is proposed. A Bayesian algorithm for rigid/non-rigid 2D visual object tracking based on sparse image features is described. The tracking algorithm is inspired by the way human visual cortex segments and tracks different moving objects within its FOV by constructing dynamical nonretinotopic layers. This method can be described as a recursive algorithm between time slices (intra-slice) and as a forward-backward message passing within every time slice (inter-slice) under the Probabilistic Graphical Model (PGM) framework. Object tracking and interaction recognition are usually performed separately. In this chapter it will be shown that it is possible to improve tracking performance if these two tasks are jointly carried out. In particular the proposed Bayesian tracking algorithm is coupled with a bio-inspired interaction analysis framework. The motion patterns of moving entities provided by the tracker are analyzed in order to recognize the current situation; causal relationships between interacting individuals in the environment are formulated in terms of probabilistic distributions that are used to cue the tracker in closed loop. The effectiveness of the proposed approach is demonstrated for a variety of image sequences.

Visual tracking represents a fundamental processing step for most of the video analytics for surveillance applications where the aim is to automatically understand the action performed by the objects present in the monitored scene [1-3]. The basic tracking task consists in following a target frame by frame, labeling it, and estimating its trajectory. Although this problem has been widely investigated in the last decades, a solution is still to be found that is valid in general situations without defining tight constraints and assumptions mainly related to the complexity of the guarded scene in terms of number of moving objects and overlapping percentage between objects themselves with environmental obstacles. Crowded scenes in public unconstrained assets such as roads, railway stations, and airports represent a challenging scenario where state-of-the-art tracking algorithms are unable to correctly track each detected target. Therefore, more effective approaches (in terms of target detection and trajectory evaluation precision, object identity preservation, improved robustness to highly cluttered environments) are necessary to correctly perform the tracking task in these scenarios under different environmental conditions (e.g., light changes and nonstatic background). Moreover, research is also focusing on the development of trackers [4] able to enrich available track by including other features such as scale, pose, and shape in the object description with the aim of accomplishing advanced scene interpretation tasks. Furthermore, observation from trackers at different resolution levels can be useful to increase system performances and have a better understanding of the monitored scene.