Direction des Relations et Internationales (DRI)

Programme INRIA "Equipes Associées"

I. DEFINITION

EQUIPE ASSOCIEE	TGDA
sélection	2008

Equipe-Projet INRIA : Géometrica	Organisme étranger partenaire : Stanford University
Centre de recherche INRIA : Futurs / Saclay Thème INRIA : SYM (systèmes symboliques)	Pays : USA

	Coordinateur français	Coordinateur étranger
Nom, prénom	OUDOT, Steve Y.	GUIBAS, Leonidas J.
Grade/statut	CR / Chargé de Recherches	Full Professor
Organisme d'appartenance (précisez le département et/ou le laboratoire)	INRIA Futurs	Stanford University, Computer Science Department
Adresse postale	Parc Orsay Université ZAC des Vignes 4, rue Jacques Monod - Bât. P 91893 ORSAY Cedex, France	Computer Science Department Gates Building Stanford University Stanford, CA 94305, USA
URL	tba	http://geometry.stanford.edu/member/guibas/
Téléphone	(+33) 174 854 216	(+1) 650 723 0304
Télécopie	(+33) 174 854 249	(+1) 650 723 0033
Courriel	steve.oudot[at]inria.fr	guibas[at]cs.stanford.edu

La proposition en bref / Overview of the Proposal

Collaboration title: Topological and Geometric Data Analysis / Analyse topologique et géométrique de données

Description (about 10 lines) :

TGDA is an associate team between the Géometrica group at INRIA and the Geometric Computation group at Stanford University, focused on the topological and geometric analysis of possibly high-dimensional point cloud data that are difficult to handle with standard linear or non-linear dimensionality reduction techniques. Such data appear in as diverse areas as machine learning, dynamical systems, structural biology, image processing, or sensor networks. They are typically obtained by sampling from highly-curved manifolds with non-trivial topology, embedded in high-dimensional Euclidean spaces. Our main goal will be to devise suitable methods to handle this type of data, based on recent advances in computational geometry and algebraic topology, including (but not restricted to) persistent homology, witness complexes, and distance functions. Both the Géometrica group at INRIA and the Geometric Computation group at Stanford University have actively contributed to these advances in the recent years, and we think their collaboration is now mature enough to take advantage of the associate team framework. To validate our approach, we intend to set our methods against real-life data sets coming from other scientific fields, thereby developing new collaborations with other groups at INRIA and at Stanford University.

II. BILAN 2008

Rapport scientifique de l'année 2008

The research program for 2008 is accessible here

Throughout the first year of existence of TGDA, priority has been given to the development of new collaborations between the two research groups. The aim was to create opportunities for members from both groups to interact with one another. From a practical point of view, this strategy encouraged visits and student exchanges greatly. From a scientific point of view, it created a real synergy between the two groups, whose respective expertises in the area of point cloud data analysis turned out to be quite complementary. In terms of scientific outcomes, emphasis has been put on the following theoretical questions raised by or closely related to the 2008 research program:

Revisiting the stability of persistence diagrams. Persistent homology provides an effective way of encoding the qualitative and quantitative behavior of real-valued functions defined over topological spaces. This encoding, known as persistence diagram or persistence barcode, has been extensively used in the context of topological data analysis, where its stability properties enable to infer robust topological information on the input data sets. However, to this date, the stability of persistence diagrams was proven only for continuous functions defined over triangulable spaces, which restricted greatly its range of applications. Using a purely algebraic approach, Frédéric Chazal, Steve Oudot and David Cohen-Steiner collaborated with Leonidas Guibas to extend the notion of persistence diagrams to a broader context where functions may be discontinuous and spaces non-triangulable. Within this framework, they proved new stability results for persistence diagrams that open the gate to many more applications of the concept of persistence. Two of them are given below. This work is detailed in the INRIA research report RR-6568.
Analyzing Scalar Fields defined over sampled spaces. Given a real-valued function f defined over some metric space X, is it possible to recover some structural information about f from the sole information of its values at a finite set L of points sampled from X, whose pairwise geodesic distances in X are given? Taking advantage of the extended stability results for persistence diagrams mentioned above, Frédéric Chazal and Steve Oudot collaborated with Leonidas Guibas and his student Primoz Skraba to introduce a novel algebraic construction, based on a pair of nested families of simplicial complexes built on top of the point cloud L, from which the persistence diagram of f can be faithfully approximated. They also derived from this construction a series of algorithms for the analysis of scalar fields from point cloud data. These algorithms are simple and easy to implement, have reasonable complexities, and come with theoretical guarantees. This work is to be presented at the next venue of the SIAM Symposium on Discrete Algorithms (SODA'09). The full version is available as INRIA research report RR-6576.
Defining Gromov-Hausdorff stable descriptors for finite metric spaces. In the context of shape segmentation and classification, it is important to rely on local or global descriptors that are stable under small perturbations of the shape, but also under rigid or almost-rigid transforms. More generally, it is desirable to have descriptors that remain stable under Gromov-Hausdorff perturbations of the data. Since this data is usually given under the form of a point cloud sampling the underlying shape, the case of finite metric spaces is of practical interest. Motivated by applications in shape matching, Frédéric Chazal, Steve Oudot and David Cohen-Steiner collaborated with Leonidas Guibas and Facundo Memoli (a post-doctoral fellow in the Mathematics Department at Stanford University) to define new descriptors for finite metric spaces and prove their stability. Roughly speaking, these descriptors are the persistence diagrams of a set of intrinsic functions defined over the metric space. By intrinsic is meant that the definition of the functions depends only on the intrinsic pairwise distances between the points, and therefore it does not vary much under small Gromov-Hausdorff perturbations of the space. This work is still under progress and is expected to lead to a research report around early 2009.
Estimating integral curvature tensors and extracting features from point cloud data. Learning the geometric properties of a shape from a finite sampling is at the core of our project. In 2007, Quentin Mérigot and his advisors Frédéric Chazal and David Cohen-Steiner devised a Monte-Carlo approach to the estimation of curvature measures of compact sets from finite point samples. During his stay at Stanford, Quentin collaborated with Maksim Ovsjanikovs, a student from the Geometric Computation group, to develop a variant of this approach that can not only estimate curvature measures, but also curvature tensors. This enables for instance to recover the second fundamental form of a surface in 3-D from a finite point sampling, without the requirement that an intermediate mesh of the surface be built. The method comes with theoretical guarantees, and it is efficient enough to be applied to real-life data sets. This work is detailed in a paper submitted to Eurographics 2009.

In parallel, some members of TGDA have investigated more algorithmic questions related to the research project, among which the following one is of major practical interest:

Memory-efficient persistent homology computation. The persistence algorithm, originally designed by Edelsbrunner, Letscher and Zomorodian, is a commonly used tool in data analysis techniques, including ours. Given a nested family {F_i}_i of simplicial complexes, called a filtration, the algorithm can compute the persistence diagram associated with {F_i}_i. From a purely algorithmic point of view, it can be viewed as a mere Gaussian elimination on the matrix of the boundary operator that maps every k-dimensional simplex of {F_i}_i to a formal alternate sum of (k-1)-dimensional simplices representing its boundary. Although the complexity of a Gaussian elimination can be up to cubic in the number of columns and rows of the matrix (i.e. in the total number of simplices in {F_i}_i), it has been observed that the persistence algorithm has an almost-linear behavior on most practical data sets. As a consequence, the bottleneck of the approach is not so much its time complexity as its space complexity, since the sizes of the simplicial complexes considered can be huge. Steve Oudot and Primoz Skraba designed a memory-efficient variant of the persistence algorithm, where the idea is to process the matrix of the boundary operator by blocks when doing the Gaussian elimination. Their algorithm satisfies the invariant that no more than two blocks are loaded in main memory at the same time, so that main memory usage can be effectively controlled, while the overall time complexity remains roughly the same. The practical impact of this new approach is expected to be huge, since it allows to build much bigger families of simplicial complexes, and therefore to study much larger data sets. The code, written in C++, has been tested and optimized. It should be integrated to the PLEX library in 2009. A draft with all the proofs of correctness and complexity bounds is currently being written: it is expected to be turned into a research report around early 2009.

Rapport financier 2008

Certaines dépenses sont encore prévisionnelles à ce stade. Leurs montants exacts seront ajustés en fonction du budget restant sur TGDA.

1. Dépenses EA (effectuées sur les crédits de l'Equipe Associée)	Montant dépensé
Invitations des partenaires	11,626.22
Missions INRIA	8,517.72
Total	20,143.94

2. Dépenses externes (effectuées sur des financements hors EA)	Montant dépensé
Région PACA:
Invitations des partenaires	0
Missions INRIA vers le partenaire	1,800 (bourse de 600 euros/mois allouée à Quentin Mérigot)
Total	1,800

NSF grants:
Invitations des partenaires	0
Missions INRIA vers le partenaire	18,000
Total	18,000

NSF REUSSI program:
Invitations des partenaires	1,500
Missions INRIA vers le partenaire	0
Total	1,500

DRI (INRIA):
Invitations des partenaires	1,500
Missions INRIA vers le partenaire	0
Total	1,500

Total des financements externes dépensés

22,800

Total des financements EA et externes dépensés

42,943.94

Bilan des échanges effectués en 2008

As both parties were willing to develop new collaborations, exchanges within the TGDA project have been pretty numerous and productive during this first year, both at junior and senior levels:

Steve Oudot spent 5 months at Stanford University as a visiting scientist. During this time, he coordinated with Frédéric Chazal the works on extended stability results for persistence diagrams and on the analysis of scalar fields from point cloud data. He also initiated the work on memory-efficient persistence computation with Primoz Skraba. In parallel, he started a collaboration with Aneesh Sharma and David Arthur, two PhD students in the Theoretical Computer Science laboratory at Stanford, on effective reverse proximity queries in high dimensions, to be completed in 2009.
Quentin Mérigot spent 3 months at Stanford University as a visiting student, during which he collaborated with Maksim Ovsjanikovs on the estimation of curvature tensors from point cloud data.
Frédéric Chazal and David Cohen-Steiner visited Stanford for a couple of weeks in July, during which the collaboration with Leonidas Guibas and Facundo Memoli on Gromov-Hausdorff stability started.
Leonidas Guibas will visit INRIA for a week in November. He will also give a talk at the workshop Emerging Trends in Visual Computing held at the École Polytechnique. This will be an opportunity for having a high-level discussion about longer-term aspects of our collaboration.
Since early October and until mid-December, Maksim Ovsjanikovs and Primoz Skraba are visiting students at INRIA. The purpose of their visits is to initiate the work on various aspects of our research program for 2009.
After his visit, Primoz Skraba will stay in the Geometrica group for a year as a post-doctoral fellow.

1. Chercheurs Seniors

Nom	statut (1)	provenance	destination	objet (2)	durée (3)	Coût (si financement EA)	Coût (si financement externe)
Steve Oudot	CR	INRIA	Stanford	visite	5 mois	1,069.79	13,760
Steve Oudot	CR	Stanford	INRIA	réunion de travail	7 jours	1,156.19	0
Frédéric Chazal	DR	INRIA	Stanford	visite	14 jours	1,166.14	1,600
David Cohen-Steiner	CR	Duke	Stanford	visite	7 jours	500	840
Leonidas Guibas	Professeur	ETH	INRIA	visite	7 jours	1,200	0
Steve Oudot	CR	INRIA	New York	conférence SODA	7 jours	1,900	0

Total des durées

6 mois 11 jours

(1) DR / CR / professeur
(2) colloque, thèse, stage, visite....
(3) précisez l'unité (mois, semaine..)

2. Juniors

Nom	statut (1)	provenance	destination	objet (2)	durée (3)	Coût (si financement EA)	Coût (si financement externe)
Quentin Mérigot	doctorant	INRIA	Stanford	visite	3 mois	2,725.60	3,600
Maksim Ovsjanikovs	doctorant	Stanford	INRIA	visite	2 mois et demi	3,750	3,000
Primoz Skraba	doctorant	Stanford	INRIA	visite	2 mois et demi	6,676.22	0

Total des durées

8 mois

(1) post-doc / doctorant / stagiaire
(2) colloque, thèse, stage, visite....
(3) précisez l'unité (mois, semaine..)

III. PREVISIONS 2009

Programme de travail

While the collaboration between the two research groups was initiated around the more theoretical aspects of the 2008 research project, a great deal of our efforts in 2009 will be geared towards more practical questions, as we believe the tools developped in 2008 have a high potential for applications. To this end, we will need to develop methods that can perform some of the steps of our algorithms efficiently, including finding reverse nearest neighbors to update our data structures, or computing persistent homology to get their associated persistence barcodes. Meanwhile, we intend to pursue our efforts towards developing a geometric and topological inference theory for sampled spaces, in order to strengthen the positions of both groups as leaders in the area.

1. Applications

Persistence-based clustering. One of the by-products of our scalar field analysis approach is a very simple technique for segmenting a given point cloud according to the basins of attraction of the maxima of some given function. One potential application of this technique is in unsupervised learning, where it could cluster the data points according to the basins of attraction of the underlying density function, provided that the latter can be estimated reliably. Although this approach turns out to be very sensitive to noise in general, it is not in our case thanks to the use of topological persistence, which enables to detect which peaks are more prominent and how the basins of attractions of the other, less prominent, peaks must be merged. As a result, guarantees on the number of clusters obtained can be derived. We intend to investigate this direction further and determine which density estimators would be most relevant to plug into our scalar field analyzer, depending on the nature of the input data.
Shape matching using Gromov-Hausdorff stable signatures. We believe our work on designing signatures for shapes that are stable under small Gromov-Hausdorff perturbations is likely to have applications in the context of shape matching and retrieval, where the goal is to find among a database of shapes the one that matches best with a given query shape. One of the major difficulties of this problem stems from the fact that two objects belonging to a same class may differ from each other, or they may lie in different coordinates systems, or both. It is therefore important to compute shape descriptors that are stable under small perturbations of the objects, as well as under rigid or almost-rigid transforms. By describing each object o as a cloud of points sampled from o together with their pairwise geodesic distances within o, we use an object representation that does not require any mesh nor any prior knowledge of the dimension or geometric regularity of o. We are now interested in studying the effect of our Gromov-Hausdorff stable descriptors on the quality of the matchings.
Shape segmentation combining curvature tensor estimation and scalar fields analysis. A somewhat related problem is to partition a sampled shape into natural parts, without any prior knowledge of the intrinsic structure of the shape. Building upon the above idea of partitioning a point cloud according to the basins of attraction of the maxima or minima of a given function, we want to design functions over point clouds whose basins of attraction will coincide with the natural parts of the underlying objects. With the special case of CAD models in mind, our idea is to use the curvature tensor estimator designed by Quentin Mérigot and Maksim Ovsjanikovs to derive a scalar field whose local minima will be located at the center of the faces of the sampled model, so that the output clustering of the point cloud will match with the segmentation of the model into its various faces.

2. Algorithmic aspects

Distributed persistence calculation. The memory-efficient variant of the persistence algorithm, designed by Primoz Skraba and Steve Oudot, could potentially lead to a fully parallel method for computing persistence. Indeed, it turns out that a large number of blocks in the matrix of the boundary operator can actually be processed independently from each other, and could therefore be sent to different processing units. However, there remains to find an efficient way of gathering and combining the results obtained separately on different blocks. We intend to investigate this question and see whether the computations can be distributed in such a way that the algorithm can be run on a cluster of PCs, and ultimately on a wireless sensor network.
Reverse nearest neighbor search based on Locality-Sensitive Hashing. The algebraic constructions used in topological data analysis algorithms mostly rely on comparisons between distances, simply because more elaborate geometric predicates tend to be difficult (if at all possible) to implement in high dimensions. An example of commonly used algebraic construction is the so-called witness complex: given an input point cloud P and a subset Q of the points of P, the witness complex is built by comparing the distances of the points of P to all the points of Q. This resorts to making a linear number of proximity queries and reverse proximity queries on the data points. It is therefore important to pre-process these points so as to optimize the query time. While a lot of attention has been devoted to the design of provably effective methods for nearest neighbor search in high dimensions, the reverse problem has hardly been considered from a theoretical point of view. The collaboration initiated between Steve Oudot, Aneesh Sharma and David Arthur will constitute a first step towards filling in this gap.

3. Theoretical aspects

Zig-Zag persistence. Gunnar Carlsson and Vin de Silva recently generalized the theory of persistent homology (which applies to the behaviour of filtrations, i.e. nested 1-parameter families of spaces) to a more general theory of zig-zag persistence, which applies to the behaviour of arbitrary 1-parameter families of spaces. One of the main goals of Vin de Silva's 6-months visit to INRIA in 2009 will be to collaborate with the Geometrica group in developing the geometric side of this theory, for instance by providing new topological or geometric approximation results based on more elaborate algebraic constructions than filtrations. At length, these more elaborate constructions may enable to capture more subtle topological or geometric features of the sampled spaces.
Manifold reconstruction. One aspect of our initial research program was to develop the theoretical and algorithmic aspects of manifold reconstruction from point samples. Here, the goal is not only to infer topological invariants or geometric features of the manifold M underlying the input point cloud, but also to build a faithful piecewise-linear representation of M. Although there is a rich literature on the subject in 2-d and 3-d, the higher-dimensional case has hardly been investigated so far, despite its potential applications in high-dimensional data rendering and processing. Building on previous work by Frédéric Chazal and Steve Oudot on the use of persistence in the context of reconstruction, we intend to push this aspect of the project further in 2009.

Programme d'échanges avec budget prévisionnel

1. Echanges

Interactions and collaborations within the TGDA project have been very active throughout 2008. We will try our best to keep them at a comparable level in 2009:

Quentin Mérigot will spend another 2 months at Stanford, to pursue his work with Maksim Ovsjanikovs on shape segmentation. His local expenses will be covered in part by the PACA region.
Another PhD student from Geometrica will spend 3 months in Stanford, to reinforce the collaboration at junior level.
Primoz Skraba, who will hold a post-doc position at INRIA in 2009, will go back to Stanford for a short visit at some point, in order to maintain his scientific connections there (in particular within the Geometric Computation and Applied Topology groups).
Symmetrically, Geometrica is willing to invite one to two PhD students from the Geometric Computation group, each for a period of 3 months.
In addition, Dmitriy Morozov, a former PhD student at Duke who applied to post-doc positions both at INRIA and at Stanford in 2008 and finally chose to go to Stanford, is willing to spend some time in the Geometrica group. He is therefore likely to make one or two short visits (1 week each), or one longer visit (1 month).
At senior level, Steve Oudot will make another 4-to-6-weeks visit at Stanford, Frédéric Chazal another 2-weeks visit, and David Cohen-Steiner another 1-week visit. In addition, Jean-Daniel Boissonnat will make a 3-weeks visit.
Symmetrically, Leonidas Guibas will visit INRIA for 2 to 3 weeks. Also, Gunnar Carlsson (and perhaps other senior researchers involved in the Topological Data Analysis project at Stanford) will come to INRIA for 1 week.

In addition to the above exchanges, Vin de Silva will visit the Geometrica group for a 6-months period, from January to June 2009 (his expenses will be covered by other grants). His visit will strengthen the associate team and help greatly in the investigation of the more theoretical aspects of our research program. Moreover, as one of the pioneers of the Computational Topology library PLEX, developped and maintained by the Applied Topology group at Stanford, Vin de Silva will act as a cornerstone in the establishment of a long-term collaboration between this group and TGDA around the development of the library, which in the longer run is likely to become a reference in the field.

Finally, TGDA intends to organize a mini-workshop, to be held in Saclay and/or Paris around early July 2009. This event is motivated by the promising results that were obtained during the first year of the collaboration, and by their expected outcomes in terms of applications. The format of the workshop is still to be discussed, but organizers agree on a 2+1 days structure: 2 days for formal talks and demos to a broad audience, plus 1 day for internal discussions among the members of TGDA about future directions and code development issues.

1. ESTIMATION DES DÉPENSES EN MISSIONS INRIA VERS LE PARTENAIRE	Nombre de personnes	Coût estimé
Chercheurs confirmés	4	12,000
Post-doctorants	1	1,500
Doctorants	2	10,000
Stagiaires
Autre (précisez) :
Total		23,500

2. ESTIMATION DES DÉPENSES EN INVITATIONS DES PARTENAIRES	Nombre de personnes	Coût estimé
Chercheurs confirmés	2	6,000
Post-doctorants	1	2,500
Doctorants	2	10,000
Stagiaires
Mini-workshop :	logistique sur place + invitations	10,000
Total		28,500

2. Cofinancement

The PACA region will provide around 1,000 euros for Quentin Mérigot's visit to Stanford,
The France-Stanford Funds will provide 9,000 dollars (6,000 euros) for the mini-workshop,
The Geometric Computation group will provide funds for up to 20,000 euros to cover local expenses of visitors from INRIA, an amount that is comparable to the one provided in 2008,
Concerning the visits of PhD students to INRIA, we hope to be able to benefit from the REUSSI program at NSF and/or the Intership program at INRIA, which should reduce the cost of these visits by 5,000 euros.

3. Demande budgétaire

Indiquez, dans le tableau ci-dessous, le coût global estimé de la proposition et le budget demandé à la DRI dans le cadre de cette Equipe Associée.
(maximum 20 K€ pour une prolongation en 2e année et 10 K€ pour une 3e année).

Commentaires	Montant
A. Coût global de la proposition (total des tableaux 1 et 2 : invitations, missions, ...)	52,000
B. Cofinancements utilisés (financements autres que Equipe Associée)	32,000
Financement "Équipe Associée" demandé (A.-B.) (maximum 20 K€ pour une 2e année et 10 K€ pour une 3e année)	20,000