Conclusion |
In this paper we have shown that high-level video understanding can be performed based on images taken from a single static camera and with simple perception methods working almost in real-time. This has been possible by using two sets of a priori information: first, contextual information describing the 3D geometry of the observed scene and semantic information on the static objects and interesting areas, second, general knowledge of predefined scenarios valid for an application domain. We have proposed a formalism to represent these two types of a priori information and explained how to use them for video understanding. We have also proposed a formalism for event recognition based on state models. This formalism is independent of a particular application domain and enables the passage between the perception data and the scenario models. The current video understanding framework we propose has shown several limitations. One type of problems is the imprecision and uncertainty in the detection and location of mobile objects; most of these low-level detection errors are due either to reflections, shadows or occlusions. A solution to cope with these problems is to relax our second hypothesis and not to be restrict ourselves to the use of a single camera. Another more general problem is that as every vision system, this framework needs, for each perception method and for each interpretation method, to set the values of numerical parameters in a configuration phase. One solution to solve this problem is to use learning techniqus to help find the best parameter values for an application. Another interesting research direction we want to address is to use program supervision technics to improve the flexibility of the system both in term of method adaption and of parametrization