Direction
des Relations et Internationales (DRI)
Programme
INRIA "Equipes Associées"
I. DEFINITION
EQUIPE ASSOCIEE
|
TGDA |
sélection |
2008 |
Equipe-Projet INRIA : Géometrica |
Organisme étranger partenaire : Stanford University |
Centre de recherche INRIA : Futurs / Saclay
Thème INRIA : SYM (systèmes symboliques) |
Pays : USA |
|
Coordinateur
français |
Coordinateur
étranger |
Nom, prénom |
OUDOT, Steve Y. |
GUIBAS, Leonidas J. |
Grade/statut |
CR / Chargé de Recherches |
Full Professor |
Organisme d'appartenance
(précisez le département et/ou le laboratoire) |
INRIA Futurs |
Stanford University, Computer Science Department |
Adresse postale |
Parc Orsay Université
ZAC des Vignes
4, rue Jacques Monod - Bât. P
91893 ORSAY Cedex, France |
Computer Science Department
Gates Building
Stanford University
Stanford, CA 94305, USA |
URL |
tba |
http://geometry.stanford.edu/member/guibas/ |
Téléphone |
(+33) 174 854 216 |
(+1) 650 723 0304 |
Télécopie |
(+33) 174 854 249 |
(+1) 650 723 0033 |
Courriel |
steve.oudot[at]inria.fr |
guibas[at]cs.stanford.edu |
La proposition
en bref / Overview of the Proposal
Collaboration title: Topological and Geometric Data Analysis / Analyse topologique et géométrique de données
|
Description (about
10 lines) :
TGDA is an associate team between the Géometrica group at
INRIA and the Geometric Computation group at Stanford University,
focused on the topological and geometric analysis of possibly
high-dimensional point cloud data that are difficult to handle with
standard linear or non-linear dimensionality reduction
techniques. Such data appear in as diverse areas as machine
learning, dynamical systems, structural biology, image processing, or
sensor networks. They are typically obtained by sampling from
highly-curved manifolds with non-trivial topology, embedded in
high-dimensional Euclidean spaces. Our main goal will be to devise
suitable methods to handle this type of data, based on recent
advances in computational geometry and algebraic topology, including
(but not restricted to) persistent homology, witness complexes, and
distance functions. Both the Géometrica group at INRIA and the
Geometric Computation group at Stanford University have actively
contributed to these advances in the recent years, and we think their
collaboration is now mature enough to take advantage of the associate
team framework. To validate our approach, we intend to set our
methods against real-life data sets coming from other scientific
fields, thereby developing new collaborations with other groups at
INRIA and at Stanford University.
|
II. BILAN 2008
Rapport
scientifique de l'année 2008
The research program for 2008 is accessible here
Throughout the first year of existence of TGDA, priority has been
given to the development of new collaborations between the
two research groups. The aim was to create opportunities for members
from both groups to interact with one another. From a practical point
of view, this strategy encouraged visits and student exchanges
greatly. From a scientific point of view, it created a real synergy
between the two groups, whose respective expertises in the area of
point cloud data analysis turned out to be quite complementary. In
terms of scientific outcomes, emphasis has been put on the following
theoretical questions raised by or closely related to the 2008
research program:
- Revisiting the stability of persistence diagrams.
Persistent homology provides an effective way of encoding the
qualitative and quantitative behavior of real-valued functions defined
over topological spaces. This encoding, known as persistence diagram
or persistence barcode, has been extensively used in the context of
topological data analysis, where its stability properties enable to
infer robust topological information on the input data sets. However,
to this date, the stability of persistence diagrams was proven only
for continuous functions defined over triangulable spaces, which
restricted greatly its range of applications. Using a purely algebraic
approach, Frédéric Chazal, Steve Oudot and David Cohen-Steiner
collaborated with Leonidas Guibas to extend the notion of persistence
diagrams to a broader context where functions may be discontinuous and
spaces non-triangulable. Within this framework, they proved new
stability results for persistence diagrams that open the gate to many
more applications of the concept of persistence. Two of them are given
below. This work is detailed in the INRIA research
report RR-6568.
- Analyzing Scalar Fields defined over sampled spaces. Given
a real-valued function f defined over some metric
space X, is it possible to recover some structural information
about f from the sole information of its values at a finite
set L of points sampled from X, whose pairwise geodesic
distances in X are given? Taking advantage of the extended
stability results for persistence diagrams mentioned above, Frédéric
Chazal and Steve Oudot collaborated with Leonidas Guibas and his
student Primoz Skraba to introduce a novel algebraic construction, based
on a pair of nested families of simplicial complexes built on top of
the point cloud L, from which the persistence diagram
of f can be faithfully approximated. They also derived from
this construction a series of algorithms for the analysis of scalar
fields from point cloud data. These algorithms are simple and easy to
implement, have reasonable complexities, and come with theoretical
guarantees. This work is to be presented at the next venue of the
SIAM Symposium on Discrete Algorithms (SODA'09). The full version is
available as INRIA research
report RR-6576.
- Defining Gromov-Hausdorff stable descriptors for finite metric
spaces. In the context of shape segmentation and classification,
it is important to rely on local or global descriptors that are stable
under small perturbations of the shape, but also under rigid or
almost-rigid transforms. More generally, it is desirable to have
descriptors that remain stable under Gromov-Hausdorff perturbations of
the data. Since this data is usually given under the form of a point
cloud sampling the underlying shape, the case of finite metric spaces
is of practical interest. Motivated by applications in shape matching,
Frédéric Chazal, Steve Oudot and David Cohen-Steiner collaborated with
Leonidas Guibas and Facundo Memoli (a post-doctoral fellow in the
Mathematics Department at Stanford University) to define new
descriptors for finite metric spaces and prove their
stability. Roughly speaking, these descriptors are the persistence
diagrams of a set of intrinsic functions defined over the
metric space. By intrinsic is meant that the definition of the
functions depends only on the intrinsic pairwise distances between the
points, and therefore it does not vary much under small
Gromov-Hausdorff perturbations of the space. This work is still under
progress and is expected to lead to a research report around early
2009.
- Estimating integral curvature tensors and extracting features
from point cloud data. Learning the geometric properties of a
shape from a finite sampling is at the core of our project. In 2007,
Quentin Mérigot and his advisors Frédéric Chazal and David
Cohen-Steiner devised a Monte-Carlo approach to the estimation of
curvature measures of compact sets from finite point samples. During
his stay at Stanford, Quentin collaborated with Maksim Ovsjanikovs, a
student from the Geometric Computation group, to develop a variant of
this approach that can not only estimate curvature measures, but also
curvature tensors. This enables for instance to recover the second
fundamental form of a surface in 3-D from a finite point sampling,
without the requirement that an intermediate mesh of the surface be
built. The method comes with theoretical guarantees, and it is
efficient enough to be applied to real-life data sets. This work is
detailed in a paper submitted to Eurographics 2009.
In parallel, some members of TGDA have investigated more algorithmic
questions related to the research project, among which the following
one is of major practical interest:
- Memory-efficient persistent homology computation. The
persistence algorithm, originally designed by Edelsbrunner, Letscher
and Zomorodian, is a commonly used tool in data analysis techniques,
including ours. Given a nested
family {Fi}i of simplicial complexes,
called a filtration, the algorithm can compute the persistence diagram
associated with {Fi}i. From a purely
algorithmic point of view, it can be viewed as a mere Gaussian
elimination on the matrix of the boundary operator that maps every
k-dimensional simplex of {Fi}i to
a formal alternate sum of (k-1)-dimensional simplices
representing its boundary. Although the complexity of a Gaussian
elimination can be up to cubic in the number of columns and rows of
the matrix (i.e. in the total number of simplices
in {Fi}i), it has been observed that the
persistence algorithm has an almost-linear behavior on most practical
data sets. As a consequence, the bottleneck of the approach is not so
much its time complexity as its space complexity, since the sizes of
the simplicial complexes considered can be huge. Steve Oudot and
Primoz Skraba designed a memory-efficient variant of the persistence
algorithm, where the idea is to process the matrix of the boundary
operator by blocks when doing the Gaussian elimination. Their
algorithm satisfies the invariant that no more than two blocks are
loaded in main memory at the same time, so that main memory usage can
be effectively controlled, while the overall time complexity remains
roughly the same. The practical impact of this new approach is
expected to be huge, since it allows to build much bigger families of
simplicial complexes, and therefore to study much larger data sets.
The code, written in C++, has been tested and optimized. It should be
integrated to
the PLEX
library in 2009. A draft with all the proofs of correctness and
complexity bounds is currently being written: it is expected to be
turned into a research report around early 2009.
Rapport financier 2008
Certaines dépenses sont encore prévisionnelles à
ce stade. Leurs montants exacts seront ajustés en fonction du budget
restant sur TGDA.
1. Dépenses EA (effectuées sur les crédits de l'Equipe Associée) |
Montant dépensé |
Invitations des partenaires |
11,626.22 |
Missions INRIA | 8,517.72 |
Total
| 20,143.94 |
2. Dépenses externes (effectuées
sur des financements hors EA) |
Montant dépensé |
Région PACA: |
Invitations des partenaires |
0 |
Missions INRIA vers le partenaire |
1,800 (bourse de 600 euros/mois allouée à Quentin Mérigot) |
Total
| 1,800 |
NSF grants: |
Invitations des partenaires |
0 |
Missions INRIA vers le partenaire |
18,000 |
Total
| 18,000 |
NSF REUSSI program: |
Invitations des partenaires |
1,500 |
Missions INRIA vers le partenaire |
0 |
Total
| 1,500 |
DRI (INRIA): |
Invitations des partenaires |
1,500 |
Missions INRIA vers le partenaire |
0 |
Total
| 1,500 |
Total des financements externes dépensés
|
22,800
|
Total des financements EA et externes dépensés
|
42,943.94 |
Bilan des échanges effectués en 2008
As both parties were willing to develop new collaborations, exchanges
within the TGDA project have been pretty numerous and productive
during this first year, both at junior and senior levels:
-
Steve Oudot spent 5 months at Stanford University as a visiting
scientist. During this time, he coordinated with Frédéric Chazal the
works on extended stability results for persistence diagrams and on
the analysis of scalar fields from point cloud data. He also initiated
the work on memory-efficient persistence computation with Primoz
Skraba. In parallel, he started a collaboration with Aneesh Sharma and
David Arthur, two PhD students in the Theoretical Computer Science
laboratory at Stanford, on effective reverse proximity queries in high
dimensions, to be completed in 2009.
- Quentin Mérigot spent 3 months at Stanford University as a
visiting student, during which he collaborated with Maksim Ovsjanikovs
on the estimation of curvature tensors from point cloud data.
- Frédéric Chazal and David Cohen-Steiner visited Stanford for a
couple of weeks in July, during which the collaboration with Leonidas
Guibas and Facundo Memoli on Gromov-Hausdorff stability started.
- Leonidas Guibas will visit INRIA for a week in November. He will
also give a talk at the workshop Emerging Trends in Visual
Computing held at the École Polytechnique. This will be an
opportunity for having a high-level discussion about longer-term
aspects of our collaboration.
- Since early October and until mid-December, Maksim Ovsjanikovs
and Primoz Skraba are visiting students at INRIA. The purpose of their
visits is to initiate the work on various aspects of our research
program for 2009.
After his visit, Primoz Skraba will stay in the Geometrica group
for a year as a post-doctoral fellow.
1. Chercheurs Seniors
Nom |
statut (1) |
provenance |
destination |
objet (2) |
durée (3) |
Coût (si financement EA) |
Coût (si financement externe) |
Steve Oudot |
CR |
INRIA |
Stanford |
visite |
5 mois |
1,069.79 |
13,760 |
Steve Oudot |
CR |
Stanford |
INRIA |
réunion de travail |
7 jours |
1,156.19 |
0 |
Frédéric Chazal |
DR |
INRIA |
Stanford |
visite |
14 jours |
1,166.14 |
1,600 |
David Cohen-Steiner |
CR |
Duke |
Stanford |
visite |
7 jours |
500 |
840 |
Leonidas Guibas |
Professeur |
ETH |
INRIA |
visite |
7 jours |
1,200 |
0 |
Steve Oudot |
CR |
INRIA |
New York |
conférence SODA |
7 jours |
1,900 |
0 |
Total des durées |
6 mois 11 jours |
(1) DR / CR / professeur
(2) colloque, thèse, stage, visite....
(3) précisez l'unité (mois, semaine..)
2. Juniors
Nom |
statut (1) |
provenance |
destination |
objet (2) |
durée (3) |
Coût (si financement EA) |
Coût (si financement externe) |
Quentin Mérigot |
doctorant |
INRIA |
Stanford |
visite |
3 mois |
2,725.60 |
3,600 |
Maksim Ovsjanikovs |
doctorant |
Stanford |
INRIA |
visite |
2 mois et demi |
3,750 |
3,000 |
Primoz Skraba |
doctorant |
Stanford |
INRIA |
visite |
2 mois et demi |
6,676.22 |
0 |
(1) post-doc / doctorant / stagiaire
(2) colloque, thèse, stage, visite....
(3) précisez l'unité (mois, semaine..)
III. PREVISIONS 2009
Programme
de travail
While the collaboration between the two research groups was initiated
around the more theoretical aspects of the 2008 research project, a
great deal of our efforts in 2009 will be geared towards more
practical questions, as we believe the tools developped in 2008 have a
high potential for applications. To this end, we will need to develop
methods that can perform some of the steps of our algorithms
efficiently, including finding reverse nearest neighbors to update our
data structures, or computing persistent homology to get their
associated persistence barcodes. Meanwhile, we intend to pursue our
efforts towards developing a geometric and topological inference
theory for sampled spaces, in order to strengthen the positions of
both groups as leaders in the area.
1. Applications
- Persistence-based clustering. One of the by-products of our
scalar field analysis approach is a very simple technique for
segmenting a given point cloud according to the basins of attraction
of the maxima of some given function. One potential application of
this technique is in unsupervised learning, where it could cluster the
data points according to the basins of attraction of the underlying
density function, provided that the latter can be estimated
reliably. Although this approach turns out to be very sensitive to
noise in general, it is not in our case thanks to the use of
topological persistence, which enables to detect which peaks are more
prominent and how the basins of attractions of the other, less
prominent, peaks must be merged. As a result, guarantees on the number
of clusters obtained can be derived. We intend to investigate this
direction further and determine which density estimators would be most
relevant to plug into our scalar field analyzer, depending on the
nature of the input data.
- Shape matching using Gromov-Hausdorff stable signatures.
We believe our work on designing signatures for shapes that are stable
under small Gromov-Hausdorff perturbations is likely to have
applications in the context of shape matching and retrieval, where the
goal is to find among a database of shapes the one that matches best
with a given query shape. One of the major difficulties of this
problem stems from the fact that two objects belonging to a same class
may differ from each other, or they may lie in different coordinates
systems, or both. It is therefore important to compute shape
descriptors that are stable under small perturbations of the objects,
as well as under rigid or almost-rigid transforms. By describing each
object o as a cloud of points sampled from o together
with their pairwise geodesic distances within o, we use an
object representation that does not require any mesh nor any prior
knowledge of the dimension or geometric regularity of o. We
are now interested in studying the effect of our Gromov-Hausdorff
stable descriptors on the quality of the matchings.
- Shape segmentation combining curvature tensor estimation and
scalar fields analysis. A somewhat related problem is to partition
a sampled shape into natural parts, without any prior knowledge
of the intrinsic structure of the shape. Building upon the above idea
of partitioning a point cloud according to the basins of attraction of
the maxima or minima of a given function, we want to design functions
over point clouds whose basins of attraction will coincide with
the natural parts of the underlying objects. With the special
case of CAD models in mind, our idea is to use the curvature tensor
estimator designed by Quentin Mérigot and Maksim Ovsjanikovs to derive
a scalar field whose local minima will be located at the center of the
faces of the sampled model, so that the output clustering of the point
cloud will match with the segmentation of the model into its various
faces.
2. Algorithmic aspects
- Distributed persistence calculation. The memory-efficient
variant of the persistence algorithm, designed by Primoz Skraba and
Steve Oudot, could potentially lead to a fully parallel method for
computing persistence. Indeed, it turns out that a large number of
blocks in the matrix of the boundary operator can actually be
processed independently from each other, and could therefore be sent
to different processing units. However, there remains to find an
efficient way of gathering and combining the results obtained
separately on different blocks. We intend to investigate this question
and see whether the computations can be distributed in such a way that
the algorithm can be run on a cluster of PCs, and ultimately on a
wireless sensor network.
- Reverse nearest neighbor search based on Locality-Sensitive
Hashing. The algebraic constructions used in topological data
analysis algorithms mostly rely on comparisons between distances,
simply because more elaborate geometric predicates tend to be
difficult (if at all possible) to implement in high dimensions. An
example of commonly used algebraic construction is the so-called
witness complex: given an input point cloud P and a
subset Q of the points of P, the witness complex is
built by comparing the distances of the points of P to all the
points of Q. This resorts to making a linear number of
proximity queries and reverse proximity queries on the data points. It
is therefore important to pre-process these points so as to optimize
the query time. While a lot of attention has been devoted to the
design of provably effective methods for nearest neighbor search in
high dimensions, the reverse problem has hardly been considered from a
theoretical point of view. The collaboration initiated between Steve
Oudot, Aneesh Sharma and David Arthur will constitute a first step
towards filling in this gap.
3. Theoretical aspects
- Zig-Zag persistence. Gunnar Carlsson and Vin de Silva
recently generalized the theory of persistent homology (which applies
to the behaviour of filtrations, i.e. nested 1-parameter families of
spaces) to a more general theory of zig-zag persistence, which
applies to the behaviour of arbitrary 1-parameter families of
spaces. One of the main goals of Vin de Silva's 6-months visit to
INRIA in 2009 will be to collaborate with the Geometrica group in
developing the geometric side of this theory, for instance by
providing new topological or geometric approximation results based on
more elaborate algebraic constructions than filtrations. At length,
these more elaborate constructions may enable to capture more subtle
topological or geometric features of the sampled spaces.
- Manifold reconstruction. One aspect of our
initial research program was to develop the theoretical and
algorithmic aspects of manifold reconstruction from point
samples. Here, the goal is not only to infer topological invariants or
geometric features of the manifold M underlying the input point
cloud, but also to build a faithful piecewise-linear representation
of M. Although there is a rich literature on the subject in 2-d
and 3-d, the higher-dimensional case has hardly been investigated so
far, despite its potential applications in high-dimensional data
rendering and processing. Building on previous work by Frédéric Chazal
and Steve Oudot on the use of persistence in the context of
reconstruction, we intend to push this aspect of the project further
in 2009.
Programme d'échanges avec
budget prévisionnel
1. Echanges
Interactions and collaborations within the TGDA project have been very
active throughout 2008. We will try
our best to keep them at a comparable level in 2009:
- Quentin Mérigot will spend another 2 months at Stanford, to
pursue his work with Maksim Ovsjanikovs on shape segmentation. His
local expenses will be covered in part by the PACA region.
- Another PhD student from Geometrica will spend 3 months in
Stanford, to reinforce the collaboration at junior level.
- Primoz Skraba, who will hold a post-doc position at INRIA in 2009,
will go back to Stanford for a short visit at some point, in order to
maintain his scientific connections there (in particular within the Geometric
Computation and Applied Topology groups).
- Symmetrically, Geometrica is willing to invite one to two PhD
students from the Geometric Computation group, each for a period of 3
months.
- In addition, Dmitriy Morozov, a former PhD student at Duke who
applied to post-doc positions both at INRIA and at Stanford in 2008
and finally chose to go to Stanford, is willing to spend some time in
the Geometrica group. He is therefore likely to make one or two short
visits (1 week each), or one longer visit (1 month).
- At senior level, Steve Oudot will make another 4-to-6-weeks visit
at Stanford, Frédéric Chazal another 2-weeks visit, and David
Cohen-Steiner another 1-week visit. In addition, Jean-Daniel
Boissonnat will make a 3-weeks visit.
- Symmetrically, Leonidas Guibas will visit INRIA for 2 to 3
weeks. Also, Gunnar Carlsson (and perhaps other senior researchers
involved in the Topological Data Analysis project at Stanford) will
come to INRIA for 1 week.
In addition to the above exchanges, Vin de Silva will visit the
Geometrica group for a 6-months period, from January to June 2009 (his
expenses will be covered by other grants). His visit will strengthen
the associate team and help greatly in the investigation of the more
theoretical aspects of our research program. Moreover, as one of the
pioneers of the Computational Topology
library PLEX,
developped and maintained by
the Applied Topology group
at Stanford, Vin de Silva will act as a cornerstone in the
establishment of a long-term collaboration between this group and TGDA
around the development of the library, which in the longer run is
likely to become a reference in the field.
Finally, TGDA intends to organize a mini-workshop, to be held in
Saclay and/or Paris around early July 2009. This event is motivated by
the promising results that were obtained during the first year of the
collaboration, and by their expected outcomes in terms of
applications. The format of the workshop is still to be discussed, but
organizers agree on a 2+1 days structure: 2 days for formal talks and
demos to a broad audience, plus 1 day for internal discussions among
the members of TGDA about future directions and code development
issues.
1. ESTIMATION
DES DÉPENSES EN MISSIONS INRIA VERS LE PARTENAIRE |
Nombre de personnes
|
Coût estimé
|
Chercheurs confirmés |
4 |
12,000 |
Post-doctorants
|
1 |
1,500 |
Doctorants |
2 |
10,000 |
Stagiaires
|
|
|
Autre (précisez) :
|
|
|
Total
|
|
23,500 |
2. ESTIMATION
DES DÉPENSES EN INVITATIONS DES PARTENAIRES |
Nombre de personnes
|
Coût estimé
|
Chercheurs confirmés |
2 |
6,000 |
Post-doctorants
|
1 |
2,500 |
Doctorants |
2 |
10,000 |
Stagiaires
|
|
|
Mini-workshop :
|
logistique sur place + invitations |
10,000 |
Total
|
|
28,500 |
2. Cofinancement
- The PACA region will provide around 1,000 euros for
Quentin Mérigot's visit to Stanford,
- The France-Stanford Funds will provide 9,000 dollars (6,000
euros) for the mini-workshop,
- The Geometric Computation group will provide funds for up to 20,000
euros to cover local expenses of visitors from INRIA, an amount that
is comparable to the one provided in 2008,
- Concerning the visits of PhD students to INRIA, we
hope to be able to benefit from the REUSSI program at NSF and/or the
Intership program at INRIA, which should reduce the cost of these
visits by 5,000 euros.
3. Demande
budgétaire
Indiquez, dans le tableau ci-dessous, le coût global estimé de
la proposition et le budget demandé à la DRI dans le cadre de
cette Equipe Associée.
(maximum
20 K€ pour une prolongation en 2e année et 10 K€ pour une
3e année).
Commentaires
|
Montant
|
A. Coût global de la proposition (total
des tableaux 1 et 2 : invitations, missions, ...) |
52,000 |
B. Cofinancements utilisés (financements
autres que Equipe Associée) |
32,000 |
Financement "Équipe Associée" demandé (A.-B.)
(maximum
20 K€ pour une 2e année et 10 K€ pour une 3e année)
|
20,000 |
© INRIA - mise à jour le 16/10/2008