Participants: Marcos David Zúñiga Barraza, Francois Bremond, Monique Thonnat.
The goal of this thesis is to propose a general video understanding approach for learning and recognition of events occurring in videos, for real world applications. This video understanding framework is composed of four tasks:
First, at each video frame, a segmentation task detects the moving regions, represented by bounding boxes enclosing them.
Second, to each moving region, a new 3D classifier associates a object class label (e.g. person, vehicle) and a 3D parallelepiped described by its width, height, length, position, orientation, and visual reliability measures of these attributes. Next images show examples of classification results. Red bounding boxes represents the classification of a moving region as a person, while the brown bounding box the classification of a moving box as a vehicle.
Third, a new multi-object tracking algorithm uses these object descriptions to generate tracking hypotheses about the objects evolving in the scene. Reliability measures associated to the object features are used to perform a proper selection of valuable information.
Next image shows tracking results for a road video. Blue lines represent the trajectory of the tracked centroid of the bounding box, while the red lines represent the trajectory of the centroid of the parallelepiped base.
Finally, a new incremental event learning algorithm aggregates on-line the attributes and reliability information of the tracked objects to learn a hierarchy of concepts describing the events occurring in the scene. Reliability measures are used to focus the learning process on the most valuable information. Simultaneously, the event learning approach recognises the events associated to the objects evolving in the scene.
The next image shows an overview of the complete approach.
The tracking approach has been validated using video-surveillance benchmark videos publicly accessible. The complete video understanding framework has been evaluated with videos for a real elderly care application. The framework has been able to successfully learn events related to trajectory (e.g. change in 3D position an velocity), posture (e.g. standing up, crouching), and object interaction (e.g. person approaching to a table), among other events, with a minimal configuration effort.