Internship Proposal


Unsupervised learning of behaviour patterns in cognitive vision problems



Video understanding. Machine Learning, Behaviour recognition

Work to be Done

One of the most challenging problems in the domain of computer vision and artificial intelligence is video interpretation. The research in this area concentrates mainly on the development of methods for analysis of visual data in order to extract information about the behaviour in a scene. This complex process comprises different analysis stages: in a first step the video stream is processed for primitive visual cues, such as edges, optic flow vectors, etc., which are further analysed and combined in order to gain information about the objects of interests: their motion, shape, texture, etc.([Forsyth & Ponce 2003]) In a higher stage the above characteristics are used to infer abstract descriptions of the behaviour of the objects ([Buxton 2002]) and to present these descriptions in suitable and human understandable form ([Gerber & Nagel 1998]). The concepts of a cognitive vision system and realization of the different components were vastly studied in the last couple of decades ([Hu et. al 2004]). However, the most of the existing systems depend on the concrete application domain and rely heavily on assumptions about this domain, heuristics and background knowledge.

The topic of this internship proposal is the unsupervised learning of behaviour patterns in cognitive vision problems. We want to infer in automatic and domain independent manner behaviour definitions and thus to step towards a general, easy to manage cognitive vision system which is suitable for analysis of large scale visual data from a variety of application domains. To achieve this goal a broad spectrum of problems must be addressed.

Firstly, a general framework is needed which incorporates the major notions of a video interpretation process. Therefore a study of the guiding principles of visual data understanding from the point of view of generality and learnability should be conducted and the different possibilities of their realization in a computer system should be analysed. A special emphasis will be put on the architecture, which being general, must be easily adaptable to different domains. The role of the different components will also be thoroughly examined and hereby their suitability for different application domains studied. Up to now the above problems were addressed only in specific domains and a more general and flexible approach grounded in the cognitive sciences is needed. We believe that this new view on the problem would lead to more broadly applicable and easily manageable systems.

Another major issue is the different learning techniques, which should be developed for automatic extraction of behaviours. Although the problem of unsupervised learning was broadly studied in the last couple of decades ([Ghahramani 2004]), there are only a few systems which apply them in the domain of video interpretation. During this internship we will study the different learning techniques and try to adapt them on a broad basis to behaviour analysis. The above problems will be investigated partially using VSIP(Video Surveillance Interpretation Platform) [ORION 2004], a system for object tracking and scenario recognition, which was developed in the research team ORION, INRIA, Sophia Antipolis. For the behaviour description a generic scenario description language and a system for efficient recognition of predefined states and events were developed [Van-Thinh Vu 2002]. In the formalism of the above system a behaviour can be represented as a sequence of states and events subject to temporal constraints. Therefore, the topic of this internship can be seen as unsupervised learning of such temporal patterns using different criteria for the interestingness of a pattern such as its frequency. Hereby efficient and robust techniques for finding frequent temporal motifs ([Agrawal & Srikant 1995], [Dousson & Duong 1999]) should be devoloped and a variety of issues should be addressed such as similarity measures between state and event sequences, low-level feature extraction and selection, interpretation of the spatial and temporal attributes of a state/event, flexible incorporation of background knowledge in the search process, quick adaptation of the system to new domains, etc. Especially, we aim to develop generally applicable and domain independent techniques, while remaining flexible to the usage of domain-dependant knowledge in order to achieve good results.

The evaluation of the work will be conducted using video data captured over several days or weeks containing mainly human activities. We believe that the development of unsupervised techniques for detection of interesting activity patterns will lower the deployment costs of surveillance systems and make them easily manageable and as consequence will lead to a widespread application of vision systems in different domains. An appropriate application of the results of this internship will be, for instance, the analysis of behaviour in hospitals where the usage of such robust and also easy to manage will guarantee an efficient and quality, 24-hours per day surveillance of patients.

Time table:

Time period


1 month

Introduction to the problem:

1. Study of existing solutions.

2. Study of similar problems (knowledge detection, psychological experiments, etc.).

3. Study of existing machine learning techniques, esp. for unsupervised learning.

1 month

Development of a concept for a generic framework for a vision system in compliance to the requirements of the problem.

1 month

Development of a frequent pattern recognition system using the formalism of VSIP, also study of the appropriateness of this formalism and it's adaptation to the problem.

1 month

Selection of low-level visual cues which are of interest for generic behavior detection, esp. an unsupervised selection and extraction of appropriate set of features from a predefined one, their combination and usage for detection of primitive state and events.

1 month

Study of the generality and applicability of the developed techniques. Experiments with different domains, such as large scale video data from a hospital.

1 month

Concluding works; Preparation of a report.




Agrawal & Srikant 1995: Rakesh Agrawal and Ramakrishnan Srikant: Mining Sequential Patterns. Proceedings of the 11th International Conference on Data Engineering (ICDE 95), 6 - 10 March 1995, Taipei, Taiwan; IEEE Computer Society Press, 1995, pp. 3 – 14.

Buxton 2002: H. Buxton: Learning and understanding dynamic scene activity. European Conference on Computer Vision, Generative Model Based Vision Workshop, 2000, Copenhagen.

Dousson & Duong 1999: Christophe Duosson and Thang Vu Duong: Discovering Chronicles with Numerical Time Constraints from Alarm Logs for Monitoring Dynamic Systems. Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI 99), 31 July - 6 August 1999, Stockholm, Sweden; Thomas Dean (Ed.), Morgan Kaufmann Publishers, Inc., San Francisco, CA, USA, 1999, pp. 620-626.

Forsyth & Ponce 2003: D. Forsyth and J. Ponce: Computer Vision – A Modern Approach. 2003, Prentice Hall, Upple Saddle River, NJ, USA.

Ghahramani 2004: Unsupervised Learning. In O. Bousquet, G. Raetsch. and U. von Luxburg. (Eds) Advanced Lectures on Machine Learning, LNAI 3176, 2004, Springer-Verlag.

Gerber & Nagel 1998: R. Gerber and H.-H. Nagel: (Mis-?)Using DRT for Generation of Natural Language Text from Image Sequences. In: Proc. Fifth European Conference on Computer Vision ECCV 98, 2-6 June 1998, Freiburg/Germany; H. Burkhardt and B. Neumann (Eds.), Lecture Notes in Computer Science LNCS 1407 (Vol. II), Springer-Verlag: Berlin¢ Heidelberg¢New York/NY 1998, pp. 255 270.

Hu et. al 2004: W. Hu, T. Tan, L. Wang, S. Maybank. A Survey on Visual Surveillance of Object Motion and Behaviors . IEEE Trans. On Systems, Man, and Cybernatics – Part C: Applications and Reviews, Vol. 34, No. 3, August 2004, pp. 334 – 352.

Van-Thinh Vu 2002: Van-Thinh Vu, Francois Bremond, and Monique Thonnat: Temporal Constraints for Video Interpretation. Proceedings of the 15th European Conference on Artificial Intelligence (ECAI'2002), Lyon, France, 21 - 26 July 2002; F. van Harmelen (Eds.), IOS Press, 2002.

ORION 2004: Intelligent Environments for Problem Solving by Autonomous Systems, Orion, INRIA, 2004 research project activity report.


Required Background and Skills

Strong background in C++, vision, artificial intelligence particularly in machine learning

Location and Duration

6 months full-time within the ORION group of INRIA Sophia-Antipolis, FRANCE.

Francois BREMOND
Projet ORION, INRIA-Sophia Antipolis
2004 route des Lucioles BP 93
06902 Sophia Antipolis Cedex, FRANCE

Tel: (+33) 4 92 38 76 59