Program of the day
Lessons from the primate visual system
The primate visual system can perform an amazing array of tasks that is matched by the large portion
of the cortex devoted to analyzing retinal signals. It is potentially a source of inspiration for computer vision,
although, with a few exceptions, progress has been very slow. One profound problem is that we still do not have
an exhaustive list of what vision achieves in humans and that experimental studies are generally restricted to
the study of motion, a few object categories and control of a few actions such as reaching or saccades.
Here I will review how we integrated several experimental techniques to solve a question that arose from
interactions with computer vision scientists more than fifteen years ago: the extraction of 3D surfaces
(Orban GA Ann Rev Neurosci 2011). This process is achieved by a new type of higher order visual neuron:
the gradient selective neuron. Initially neurons selective for speed gradients were discovered in motion
processing areas, such as MT/V5, MSTd and FST. Subsequently neurons selective for disparity gradients
were discovered in shape processing areas, such as TEs and AIP. By combining these single cell studies
with fMRI in human and awake monkey were able to localize similar neurons in the human cortical areas.
In a second part I will address the present challenge of understanding the visual signals related to
the actions of conspecifics, which is perhaps the ultimate challenge of motion processing, but receives
surprisingly little attention in vision. The understanding of actions exemplifies my statement that
visual signals to be useful have to leave the visual system, as indeed the signals related to biological
motion in the STS are sent to the parietal regions (Nelissen et al J Neurosci 2011) involved in the
control of diverse actions to be understood as actions.
Neural Mechanisms of Form and Motion Detection and Integration:
Biology meets Machine Vision
General-purpose vision systems, either biological or technical, rely on the robust processing of
visual data that impinges the visual sensor array. Such systems need to adapt their
processing capabilities to varying conditions during image acquisition, have to deal
with noise, and also need to learn task-relevant representations as a consequence of
functional adaptation. Here, I describe research of our group in modeling mechanisms
of early and mid-level vision in primates for form and motion processing. In visual
cortex core principles recur at various stages of the cortical hierarchy of areas,
namely (i) the bottom-up hierarchical processing along different stages building
representations with increasing feature selectivity and spatial scale,
(ii) the selective amplification, or modulation, of bottom-up activations in feature
representations by feedback that utilizes context information, and
(iii) the automatic gain control via center-surround competitive interaction and activity
normalization in a pool of neurons. These principles are interesting for designing
computational vision algorithms to gain improved flexibility and functionality
for, e.g., the self-adjustment of responsiveness to various feature contrast conditions
and the incorporation of higher-order object information and attention for feature
selection and enhancement. We define the dynamics of model neurons via coupled ordinary
differential equations. Within this framework we investigate the processing of static
form configurations as well as motion detection and integration. We demonstrate how these
models replicate experimental findings to explain data in neuroscience. In addition, the
model successfully copes with natural images or video sequences, e.g., the grouping of
configurations of input items creating boundary representation, or estimating motions
generated by opaque or transparent surfaces, the latter requires the encoding and
processing of multiple motions at a single spatial position. Our modeling work may provide
a basis to tie biologically inspired models to approaches in computer vision and further
the development of neurotechnology for vision.
Neural fields models of visual areas: principles, successes, and caveats
I discuss how the notion of neural fields, a phenomenological averaged
description of spatially distributed populations of neurons, can be used to build models of
how visual information is represented and processed in the visual areas of primates. I
describe in a pedestrian way one of the basic principles of operation of these neural fields
equations which is closely connected to the idea of a bifurcation of their solutions. I then
apply this concept to several visual features, edges, textures and motion and show that it
can account very simply for a number of experimental facts as well as suggest new
experiments. I outline several outstanding open problems and sketch out briefly interesting
connections with computer vision and machine learning.
James A. Bednar
Visual cortex as a general-purpose information-processing device
Decades of work in neuroscience have established a set of basic principles of operation
and organization of neurons in the primary visual cortex (V1) of monkeys, cats, and other
higher mammals: (1) V1 neurons respond to localized patches of images, (2) responses are
specific to the orientation, spatial frequency, motion direction, color, horizontal
disparity, and eye of origin of a visual stimulus, (3) selectivity is preserved across wide
ranges of stimulus brightness and contrast, (4) neurons are arranged systematically across
the surface of the cortex with smooth mapping for each of these different
selectivities, and (5) neurons interact across the cortical surface in complex but
systematic ways, leading to surprising surround modulation effects and visual illusions.
In this talk I show how each of these properties can arise from general-purpose local
learning rules in relatively simple model neurons, with the goal of explaining how a
functioning, adaptive visual system can be constructed automatically from a simple
specification and a set of basic but biologically plausible primitives including
unsupervised Hebbian learning of afferent and lateral connections, homeostatic regulation
of excitability, and lateral interactions at each processing level. The model neurons are
initially unspecific and capable of processing any type of input data, but through a
general-purpose self-organization process driven by subcortical circuitry and the external
environment they become specialized for visual processing. Given the structural similarity
between V1 and other cortical areas, these results suggest that it may be possible to
devise a relatively simple architecture for building a high-performance system for
processing visual and other real-world data.
Reading out the synaptic echoes of low-level perception
The field of neuromorphic computation has grown from the idea that
inspiration for future computational architectures can be gained from a better understanding
of information processing in biological neural networks. Information coding in our brain is
both digital, in terms of output spike timing, and analogue, produced by the slower,
subthreshold changes in membrane voltage resulting from the ongoing or evoked barrage of
synaptic inputs. These small and ever-changing voltage fluctuations in the neuronal membrane
potential of the single neuron, control its excitability and, in fine, the propagation of
information throughout the network. These echoes also signal at any point in time the
functional effective connectivity of the contextual network within which each cell is
The focus of this talk is to understand to what extent emerging macroscopic levels of
organisation in early sensory areas (orientation maps, Gestalt and motion flow related
percepts) can be predicted from more microscopic levels of neural integration (conductance
dynamics, synaptic integration and receptive field organization). I will center my
presentation on recent research from my group showing how the read-out of synaptic activity
in a single cell in the mammalian visual cortex can be used to predict global percepts
during low-level perception and extract generic principles of perceptual binding.
To learn more
Work supported by CNRS, ANR (NatStats and V1-complex) and the European Community
Learning invariant feature hierarchies
Fast visual recognition in the mammalian cortex seems to be a hierarchical process by which
the representation of the visual world is transformed in multiple stages from low-level
retinotopic features to high-level, global and invariant features, and to object categories.
Every single step in this hierarchy seems to be subject to learning. How does the visual
cortex learn such hierarchical representations by just looking at the world? How could
computers learn such representations from data? Computer vision models that are weakly
inspired by the visual cortex will be described. A number of unsupervised learning
algorithms to train these models will be presented, which are based on the sparse
auto-encoder concept. The effectiveness of these algorithms for learning invariant
feature hierarchies will be demonstrated with a number of practical tasks such as scene
parsing, pedestrian detection, and object classification.
Event-based silicon retinas and applications
Conventional machine vision for the past 40 years has been based on sequences of image
frames that are pulled from the camera and then processed on computers.
Frame based image sensors allow for tiny pixels and are highly evolved,
but they have fundamental drawbacks, including limited dynamic range, limited sampling rate,
and the necessity for expensive post-processing. Biology teaches us that the outputs of
the eye are asynchronously pushed to the brain in the form of digital spikes based on
local decisions involving spatio temporal context. Recent developments in building
asynchronous vision sensors that offer this same form of spike output have shown that
they offer unique advantages in terms of latency, dynamic range, temporal resolution,
and especially post processing cost. This talk will discuss these developments and
show demonstrations of the unique capabilities of a dynamic vision sensor silicon
retina for machine vision.
Spike-based Image Processing : Can we reproduce biological vision in hardware?
Over the past 15 years, we have developed software image processing systems
that attempt to reproduce the sorts of spike-based processing strategies used in biological vision.
The basic idea is that sophisticated visual processing can be achieved with a single wave of spikes
by using the relative timing of spikes in different neurons as an efficient code.
While software simulations are certainly an option, it is now becoming clear that it may well be
possible to reproduce the same sorts of ideas in specific hardware. Firstly, several groups have now
developed spiking retina chips in which the pixel elements send the equivalent of spikes in response
to particular events such as increases or a decreases in local luminance. Importantly, such chips
are fully asynchronous, allowing image processing to break free of the standard frame based approach.
We have recently shown how simple neural network architectures can use the output of such dynamic
spiking retinas to perform sophisticated tasks by using a biologically inspired learning rule based
on Spike-Time Dependent Plasticity (STDP). Such systems can learn to detect meaningful patterns that
repeat in a purely unsupervised way. For example, after just a few minutes of training, a network
composed of a first layer of 60 neurons and a second layer of 10 neurons was able to form neurons
that could effectively count the number of cars going by on the different lanes of a freeway.
For the moment, this work has just used simulations. However, there is a real possibility that
the same processing strategies could be implemented in memristor-based hardware devices. If so,
it will become possible to build intelligent image processing systems capable of learning to
recognize significant events without the need for conventional computational hardware.