Workshop on Biological and Computer Vision Interfaces (ECCV 2012)

Program of the day

09:15	Welcome
09:25	Guy Orban Lessons from the primate visual system [slides] The primate visual system can perform an amazing array of tasks that is matched by the large portion of the cortex devoted to analyzing retinal signals. It is potentially a source of inspiration for computer vision, although, with a few exceptions, progress has been very slow. One profound problem is that we still do not have an exhaustive list of what vision achieves in humans and that experimental studies are generally restricted to the study of motion, a few object categories and control of a few actions such as reaching or saccades. Here I will review how we integrated several experimental techniques to solve a question that arose from interactions with computer vision scientists more than fifteen years ago: the extraction of 3D surfaces (Orban GA Ann Rev Neurosci 2011). This process is achieved by a new type of higher order visual neuron: the gradient selective neuron. Initially neurons selective for speed gradients were discovered in motion processing areas, such as MT/V5, MSTd and FST. Subsequently neurons selective for disparity gradients were discovered in shape processing areas, such as TEs and AIP. By combining these single cell studies with fMRI in human and awake monkey were able to localize similar neurons in the human cortical areas. In a second part I will address the present challenge of understanding the visual signals related to the actions of conspecifics, which is perhaps the ultimate challenge of motion processing, but receives surprisingly little attention in vision. The understanding of actions exemplifies my statement that visual signals to be useful have to leave the visual system, as indeed the signals related to biological motion in the STS are sent to the parietal regions (Nelissen et al J Neurosci 2011) involved in the control of diverse actions to be understood as actions.
10:10	Heiko Neumann Neural Mechanisms of Form and Motion Detection and Integration: Biology meets Machine Vision [slides] General-purpose vision systems, either biological or technical, rely on the robust processing of visual data that impinges the visual sensor array. Such systems need to adapt their processing capabilities to varying conditions during image acquisition, have to deal with noise, and also need to learn task-relevant representations as a consequence of functional adaptation. Here, I describe research of our group in modeling mechanisms of early and mid-level vision in primates for form and motion processing. In visual cortex core principles recur at various stages of the cortical hierarchy of areas, namely (i) the bottom-up hierarchical processing along different stages building representations with increasing feature selectivity and spatial scale, (ii) the selective amplification, or modulation, of bottom-up activations in feature representations by feedback that utilizes context information, and (iii) the automatic gain control via center-surround competitive interaction and activity normalization in a pool of neurons. These principles are interesting for designing computational vision algorithms to gain improved flexibility and functionality for, e.g., the self-adjustment of responsiveness to various feature contrast conditions and the incorporation of higher-order object information and attention for feature selection and enhancement. We define the dynamics of model neurons via coupled ordinary differential equations. Within this framework we investigate the processing of static form configurations as well as motion detection and integration. We demonstrate how these models replicate experimental findings to explain data in neuroscience. In addition, the model successfully copes with natural images or video sequences, e.g., the grouping of configurations of input items creating boundary representation, or estimating motions generated by opaque or transparent surfaces, the latter requires the encoding and processing of multiple motions at a single spatial position. Our modeling work may provide a basis to tie biologically inspired models to approaches in computer vision and further the development of neurotechnology for vision.
10:55	Coffee break
11:10	Olivier Faugeras Neural fields models of visual areas: principles, successes, and caveats [slides] I discuss how the notion of neural fields, a phenomenological averaged description of spatially distributed populations of neurons, can be used to build models of how visual information is represented and processed in the visual areas of primates. I describe in a pedestrian way one of the basic principles of operation of these neural fields equations which is closely connected to the idea of a bifurcation of their solutions. I then apply this concept to several visual features, edges, textures and motion and show that it can account very simply for a number of experimental facts as well as suggest new experiments. I outline several outstanding open problems and sketch out briefly interesting connections with computer vision and machine learning.
12:00	James A. Bednar Visual cortex as a general-purpose information-processing device [slides] Decades of work in neuroscience have established a set of basic principles of operation and organization of neurons in the primary visual cortex (V1) of monkeys, cats, and other higher mammals: (1) V1 neurons respond to localized patches of images, (2) responses are specific to the orientation, spatial frequency, motion direction, color, horizontal disparity, and eye of origin of a visual stimulus, (3) selectivity is preserved across wide ranges of stimulus brightness and contrast, (4) neurons are arranged systematically across the surface of the cortex with smooth mapping for each of these different selectivities, and (5) neurons interact across the cortical surface in complex but systematic ways, leading to surprising surround modulation effects and visual illusions. In this talk I show how each of these properties can arise from general-purpose local learning rules in relatively simple model neurons, with the goal of explaining how a functioning, adaptive visual system can be constructed automatically from a simple specification and a set of basic but biologically plausible primitives including unsupervised Hebbian learning of afferent and lateral connections, homeostatic regulation of excitability, and lateral interactions at each processing level. The model neurons are initially unspecific and capable of processing any type of input data, but through a general-purpose self-organization process driven by subcortical circuitry and the external environment they become specialized for visual processing. Given the structural similarity between V1 and other cortical areas, these results suggest that it may be possible to devise a relatively simple architecture for building a high-performance system for processing visual and other real-world data.
12:45	Lunch
14:30	Yves Fregnac Reading out the synaptic echoes of low-level perception The field of neuromorphic computation has grown from the idea that inspiration for future computational architectures can be gained from a better understanding of information processing in biological neural networks. Information coding in our brain is both digital, in terms of output spike timing, and analogue, produced by the slower, subthreshold changes in membrane voltage resulting from the ongoing or evoked barrage of synaptic inputs. These small and ever-changing voltage fluctuations in the neuronal membrane potential of the single neuron, control its excitability and, in fine, the propagation of information throughout the network. These echoes also signal at any point in time the functional effective connectivity of the contextual network within which each cell is embedded. The focus of this talk is to understand to what extent emerging macroscopic levels of organisation in early sensory areas (orientation maps, Gestalt and motion flow related percepts) can be predicted from more microscopic levels of neural integration (conductance dynamics, synaptic integration and receptive field organization). I will center my presentation on recent research from my group showing how the read-out of synaptic activity in a single cell in the mammalian visual cortex can be used to predict global percepts during low-level perception and extract generic principles of perceptual binding. To learn more Work supported by CNRS, ANR (NatStats and V1-complex) and the European Community (FET -BrainScales)
15:15	Yann LeCun Learning invariant feature hierarchies [slides] Fast visual recognition in the mammalian cortex seems to be a hierarchical process by which the representation of the visual world is transformed in multiple stages from low-level retinotopic features to high-level, global and invariant features, and to object categories. Every single step in this hierarchy seems to be subject to learning. How does the visual cortex learn such hierarchical representations by just looking at the world? How could computers learn such representations from data? Computer vision models that are weakly inspired by the visual cortex will be described. A number of unsupervised learning algorithms to train these models will be presented, which are based on the sparse auto-encoder concept. The effectiveness of these algorithms for learning invariant feature hierarchies will be demonstrated with a number of practical tasks such as scene parsing, pedestrian detection, and object classification.
16:00	Coffee break
16:30	Tobi Delbrück Event-based silicon retinas and applications [slides] Conventional machine vision for the past 40 years has been based on sequences of image frames that are pulled from the camera and then processed on computers. Frame based image sensors allow for tiny pixels and are highly evolved, but they have fundamental drawbacks, including limited dynamic range, limited sampling rate, and the necessity for expensive post-processing. Biology teaches us that the outputs of the eye are asynchronously pushed to the brain in the form of digital spikes based on local decisions involving spatio temporal context. Recent developments in building asynchronous vision sensors that offer this same form of spike output have shown that they offer unique advantages in terms of latency, dynamic range, temporal resolution, and especially post processing cost. This talk will discuss these developments and show demonstrations of the unique capabilities of a dynamic vision sensor silicon retina for machine vision.
17:15	Simon Thorpe Spike-based Image Processing : Can we reproduce biological vision in hardware? [slides] Over the past 15 years, we have developed software image processing systems that attempt to reproduce the sorts of spike-based processing strategies used in biological vision. The basic idea is that sophisticated visual processing can be achieved with a single wave of spikes by using the relative timing of spikes in different neurons as an efficient code. While software simulations are certainly an option, it is now becoming clear that it may well be possible to reproduce the same sorts of ideas in specific hardware. Firstly, several groups have now developed spiking retina chips in which the pixel elements send the equivalent of spikes in response to particular events such as increases or a decreases in local luminance. Importantly, such chips are fully asynchronous, allowing image processing to break free of the standard frame based approach. We have recently shown how simple neural network architectures can use the output of such dynamic spiking retinas to perform sophisticated tasks by using a biologically inspired learning rule based on Spike-Time Dependent Plasticity (STDP). Such systems can learn to detect meaningful patterns that repeat in a purely unsupervised way. For example, after just a few minutes of training, a network composed of a first layer of 60 neurons and a second layer of 10 neurons was able to form neurons that could effectively count the number of cars going by on the different lanes of a freeway. For the moment, this work has just used simulations. However, there is a real possibility that the same processing strategies could be implemented in memristor-based hardware devices. If so, it will become possible to build intelligent image processing systems capable of learning to recognize significant events without the need for conventional computational hardware.

Workshop on Biological and Computer Vision Interfaces
Organized by Olivier Faugeras and Pierre Kornprobst
This work is partially supported by the ERC grant 227747 (Nervi)