Could early visual processes be sufficient to label motions?

INRIA Research Report 5240

Ivan Dimov, Pierre Kornprobst, Thierry Viéville


Introduction - System Overview - Results




Introduction

Biological visual classification is well-known and very common, but still intriguing fact (see Fig. 1 and Fig. 2). Recent series of experiments have enlightened this biological mechanism: data classification can be realized in the human visual cortex with latencies of about 150 ms, considering the visual pathway latencies, visual classification may only be compatible with a very specific processing architecture and mechanism.
It has been hypothesized that the underlying neural mechanism is based on a rank order coding scheme: the neural information is coded by the relative order in which these neurons fire. Spiking networks of neurons are quite different from usual neural networks.
Surprisingly enough, this experimental evidence is in coherence with algorithms derived from the statistical learning theory, following the work of Vapnik.
Following this track, we would like to revisit the Giese and Poggio neural model for biological movement recognition (in fact a subset of it considering processing in V1 and MT) and use integrate-and-fire neural models in order to discriminate motions.
The underlining question is: for biological movement recognition do we need to consider long term features, trajectories (global motion) or is it sufficient to work with short term trajectories, i.e. consider local motion operators?
In this comparative study we consider local motion cues only. The seminal idea of this choice is related to the Rubin work on segmentation where it is shown that early visual processes could be sufficient to perform the task: could also early visual processes be sufficient to label visualy percieved motions?
We simulate this situation here to help understanding this fact.

biolmot blur example
Fig. 1: Point light stimuli showing different actions (Demo by R. Blake) Fig. 2: In spite of strong blurring the action in this video can be recognized (Demo: L. Davis & A. Bobick)





System overview

The computational problem that is addressed by this work is the recognition of biological motion in image sequences. We focus on a biologically plausible mechanism considering the architecture of the brain system solving this.

The general problem is sub divided into 2 main stages (see Fig.3).

The motion recognition simulation was performed on a set of 40 biological motion image sequences (video samples) in 2 classes, walking and marching, similar to the one shown in Fig. 3.

The feature vector classification was implemented with a support vector machine (SVM) classifier, while the feature extraction block was based on spiking nueral net model and was implemented as shown in Fig. 4.
Input Motion Video Sample

march
Motion Recognition Figure
Fig. 3 -- The biological movements recognition problem is sub divided into 2 main stages.

Feature Extraction block
Fig. 4 -- The feature extraction block, based on the spiking neural net model is decomposed into its sub processes. The red spots on the images indicate the locations of the top spike frequencies after local inhibition.





Results

Using the feature extraction block (Fig. 4) feature vectors were generated for walking and marching motion samples. For each category feature vectors taking into account the top 10, 20, 30, 40 and 50 spike locations were generated.
The locations of the top spike neurons over the whole motion sequence are shown in Fig. 5. From these point and light figures its easy to recognise the movement. Using the feature vectors shown prevoisly, the biological motion classification results obtained are shown in Fig. 6.
10 Spikes 20 Spikes 30 Spikes 40 Spikes
walk 10 spikes walk 20 spikes walk 30 spikes walk 40 spikes
Motion Class: Walking

10 Spikes 20 Spikes 30 Spikes 40 Spikes
walk 10 spikes walk 20 spikes walk 30 spikes walk 40 spikes
Motion Class: Marching

Fig. 5 -- This figure shows the locations of the top spike nuerons, extracted automatically from the raw input motion video samples, that compose the feature vectors which are used for motion classification. In this figure only one sample form each class is shown. 

Classification Performance
Fig. 6 -- This figure shows classifier error rates obtained by discriminating between walk and march motions. On the vertical axis is plotted the percentage error rate versus the amount of samples used to train the classifier. The RAW Giese Curves is the error rate obtained if the feature vectors are composed of the smoothened trajectories of the main joints of an actor performing the motions. These trajectories were extracted manually in Giese's Lab, and were used as references of the optimal classification performance. The rest of the classification performance curves were extracted automatically using the methods discribed previously.