Related Work

Next: Perception Up: Video Sequence Interpretation for Previous: Introduction

Related Work

Cohen, Bremond Medoni and Nevatia (University of South California), in DARPA's VSAM focus on event recognition involving vehicles and humans ([12] and [6]). The particularity of this work is that videos are filmed by non-fixed camera. They used models of maps of the environment to place aerial images in an a priori known map. They used a property net to compute events and states, which controls the evolution of predefined automaton describing situations. Herzog (VITRA) proposes a system able to dynamically describe scene with humans. The originality of his work is the application environment: a soccer stadium ([1] and [8]) and the inference method based on time interval logic, to describe temporal sequence of events, which are computed and typed separatly. Intille and Bobick (MIT Media lab), in a similar environment, focus on analysis of American football scenes. Their aim is the recognition of particular strategies in the complex players' interactions ([9] and [10]). The main point is that those activities are not just human behaviors but human group behavior. Shah (University of Central Florida) is interested by dynamic description of human behaviors in office environments ([2] and [7]). Even if the problem is the recognition of long duration activities, the authors insist on the importance of the recognition of 'key instants' which are the conditions of changing states in an automaton representing the global behaviors. The ``Key instants'' are generated when certain conditions are realised. Thonnat and Rota (INRIA Sophia) ([14]) propose a method based on both n-ary tree to declare events and temporal logics to declare scenario in the context of visual surveillance. Tessier (ONERA), in the PERCEPTION project, proposes an original method to describe behavior. Petri nets are used to represent dynamic evolutions of a car park scene with humans and vehicles ([13] and [4]). Buxton and Gong (University of Sussex) gave an important contribution to the domain with the VIEWS project ([3]). The system was able to deal with humans and vehicles on roads, streets or in car parks. A high level representation based on Bayesian networks was computed. This work points out the necessity to deal with uncertainty and to use contextual information to enhance detection and tracking results. In the same vein, Ivanov and Grimson (MIT) work on detection of human and vehicle behaviors in car park. The interest of this research is in the event's combination method ([11]). A behavior is represented by a set of rules based with a stochastic context-free grammar, which allows certain combinations of simple constant predicates.

Next: Perception Up: Video Sequence Interpretation for Previous: Introduction

Nathanael Rota
2000-11-06