Perception |
In this section we will shortly present the perception component we have used. As shown in figure 1 this component is composed of four main sub-parts: motion detection, person detection, person tracking and smoothing. The goal of this module is to incrementally provide a history of the persons who have been detected in the scene. Each of its sub-parts contains different alternative methods which are selected in a configuration phase.
![]() |
The goal is to extract from each image a set of primitives which indicate the
presence of a motion. A function
is
defined for each pixel of the image. The value 1 means that the pixel is
mobile, 0 means that the pixels is static. Connected regions are obtained by
grouping the neighbours with a mobile label (f = 1). These regions are
named blobs. Three alternative motion detection methods are defined:
where
is the value of the pixel
of the
image at time t.
is the value of the point
for an
empty scene image (a scene without mobile objects) and
are rescpectively thresholding, absolute value, maximum and
difference functions.
The person detection algorithm splits the set of blobs into n subsets of potential persons in the scene. Both 2D image criteria and 3D scene criteria are used. The first ones are based on the 2D distance between blobs in the image. The goal is to merge the closest blobs in the image. The second ones are based on constraints on the 3D height and width. The 3D measures are obtained by linear projection of the image plane on the ground plane in the scene.
The goal of the person tracking is to update the set of previous trajectories. For that purpose, the persons detected in the current image must be matched with those detected in the previous ones. This matching can be defined as a function from the set Pt - 1 of persons detected a time t - 1 into the set Pt of the persons detected a time t. We use 3 alternative methods: a method based on the amount of overlap in the 2D image, a method based on the proximity of the persons in the 3D scene and a restrictive method based on the proximity of the persons in the 3D scene. The first method (based on the amount of overlap in the 2D image) states that two persons detected at two consecutive times are the same real person if the percentage of overlap of their bounding box is greater than a threshold. The second method matches a person at time Pt with a person at time Pt - 1 if their 3D distance is below a threshold. The last method is similar to the second one, but the function must be either an injection or a surjection.
The goal of the smoothing step is twofold. The first goal is to correct
errors made in the previous perception steps on the different 3D parameters
of a person:
the position on the ground plane,
himg the height and limg the width. The second goal is to estimate
the instantaneous speed of the persons. Three
smoothing methods are used. The first method uses a standard Kalman filter
[15]. The state vector is defined by
.
The linear dynamic model is based on the
hypothesis of a constant speed. The second and third methods are
respectively median and mean filtering with window size 3, 5 or 7 .
is initialized by computing
then each of the four values px3D, py3D, vx3Dand vy3D are filtered.
The perception methods we have described are simple ones in order to comply with the real-time constraint hypothesis. Their role is to provide enough information to the intepretation methods for video understanding.