GeomStats Associated Team
Asclepios, Inria-Sophia
Holmes'Lab, Stanford, USA

Geometric Statistics in Computational Anatomy:
Non-linear Subspace Learning Beyond the Riemannian Structure.



 

Presentation

The scientific goal of the associated team is to develop the field of geometric statistics with key applications in computational anatomy. Computational anatomy is an emerging discipline at the interface of geometry, statistics, image analysis and medicine that aims at analyzing and modeling the biological variability of the organs shapes at the population level. An important application in neuroimaging is the spatial normalization of subjects which is necessary to compare anatomies and functions through images in populations with different clinical conditions. The research directions have been broken into three axes, the first two being methodologically driven and the last one being application driven. The first axis aims at generalizing the statistical framework from Riemannian to more general geometric structures and even non-manifold spaces (e.g. stratified spaces). The goal is to understand what is gained or lost using each geometric structure. The second axis aims at developing subspace learning methods in non-linear manifolds. This objective contrasts with most manifold learning methods which assumes that subspaces are embedded in a large enough Euclidean space. The third scientific direction is application driven with cross-sectional and longitudinal brain neuroimaging studies. The goal will be to extract reduced models of the brain anatomy that best describe and discriminate the populations under study.

Members

Principal investigators

  • Xavier Pennec, Senior Research Scientist (Directeur de recherche), Asclepios team, Inria Sophia-Antipolis, France Xavier Pennec's research axes are about statistics on geometric data, in particular for medical image analysis, image registration and computational anatomy (statistics on normal and abnormal organs' shape across populations). This proposal is at the heart of these research axes.
  • Susan Holmes, Professor, Statistics Department, Stanford University, USA. Pr. Holmes has been teaching at Stanford since 1998, before which she was a tenured researcher at INRA in Montpellier France, an Associate Professor in Biometry at Cornell University and a visiting Professor in Applied Mathematics at MIT. Her main focus of research is the application of nonparametric multivariate methods to biological data. She is most well known for her work on the use of topology and geometry to create a useful metric on the space of phylogenetic trees. She has also published many papers on MCMC methods and their applications in biology. Here interest in this proposal lies in the conjunction of geometry with statistics. She has more than ten collaborations with groups in the Stanford medical school and recently won an NIH Transformative Research (High Risk High Reward) award.

Other participants

  • Nina Miolane, Post-doctoral fellow, Stanford, Statistics Dpt. Nina Miolane graduated in October 2013 from the Ecole Polytechnique, Palaiseau, France, with major in Mathematics applied to Physics, and from the Master in Theoretical & Mathematical Physics (Science Quantum Fields & Fundamental Forces) at Imperial College London, UK. Her PhD at Inria Sophia Antipolis (2013-2016) was in the context of the Associated-team GeomStat under the supervision of Xavier Pennec and Susan Holmes. Nina was awarded a L'Oreal-Unesco For Women in Science National fellowship (2016) and obtained one of the Inria@SilicoValley post-doc fellowship in 2016. Due to visa issues, the starting of her Post-doc was postponed to 2017. In the meantime, to better understand the industrial needs in medical imaging, we worked as engineer in the startup Bay Labs on the subject of statistics on rigid-body poses of ultrasound probes. Staring from 2018, she holds a post-doc / lecturer position at Stanford in the Statistics department.
  • Yann Thanwerdas, PhD Student, Epione team. Yann Thanwerdas is doing his end of study internship from Ecole Central-Supelec in fall 2018 on the geometry of positive semi-definite precision matrices with applications to the statistical detection of anatomical and functional networks in brain images. Yann will start a PhD under the supervision of Xavier pennec in January 2019 with a Cote d'Azur University excellence fellowship on statistical dimension reduction in non-Linear manifolds for brain shape analysis, connectomics & brain-computer interfaces.
  • Nicolas Guigui, PhD Student, Epione team. After a diploma from the Ecole Centrale-Supelec in 2016 and a Master in Mathematical Statistics from the University of Cambridge in 2017, Nicolas Guigui stared at PhD in Oct 2018 under the supervision of Xavier Pennec on the tatistical estimation on Riemannian and affine symmetric spaces with applications to the statistical survey of the brain anatomy.
  • Marco Lorenzi, PhD, Research Scientist (Chargé de recherche), Epione team. The research activity of Dr. Lorenzi concerns the development and study of statistical learning methods for the analysis of biomedical data, with application to the modeling of neurodegenerative disorders. His work also concerns the development of statistical tools for the analysis of data from clinical trials of disease modifying drugs. Marco will contribute to the Axis 3 of the project, based on his recent works on the meta-analysis of imaging-genetics data in distributed clinical cohorts.

Former participants

  • Christof Seiler, PhD. Dr. Seiler was a postdoctoral scholar in the department of statistics at Stanford mentored by Prof. Holmes until 2018. He obtained his joint Ph.D. degree in biomedical engineering and image processing from the University of Sophia Antipolis, France and University of Bern, Switzerland, co-advised by Dr. Pennec and Dr. Reyes (University of Bern). In 2011, he won a prestigious MICCAI young scientist award in Toronto, Canada, for his work on tree deformations for medical images. Christof Seiler left Stanford and the associated team in 2018 for an assistant professor position at the Department of Data Science and Knowledge Engineering Maastricht University, The Netherlands.
  • Loic Devillier, PhD Student, Asclepios/Epione team. Loic Devillier started his PhD in fall 2015 under the supervision of Xavier Pennec and St걨anie Allassoniàre (CMAP, Ecole Polytechnique, Paris) on statistics on quotient spaces in the infinite dimensional setting. His expertise is on the analysis in infinite dimensions. This work is also done in collaboration with Alain Trouvé (ENS Cachan). Loic Devillier defended his PhD in December 2017. Since september 2018, he is Professor in prepartory school (PSI/PCSI) at Lycee Essouriau, France.
  • Boris Gutman Former Postdoctoral Scholar Research Associate at USC, Institute for Neuroimaging and Informatics, Los-Angeles, CA, USA. Boris Gutman's research interest concern the brain anatomy and function, neurodegenerative disorders and their relationship with genetic factors. He is also interested in geometric methods for processing the related data (deformations for registration, Diffusion imaging with DTI or higher order models , etc.). He earned his PhD at LONI UCLA in 2013. he published more than 39 journal papers and was awarded a Michael J Fox and Alzheimer's Association fellowship for his project on Biomarkers Across Neurodegenerative Diseases, 2015-2017. Boris Gutman moved to an associate professor position of Biomedical Engineering at the Illinois Institute of Technology, Armour College of engineering.

Program and results

Computational anatomy is an emerging discipline at the interface of geometry, statistics, image analysis and medicine that aims at analyzing and modeling the biological variability of the organs shapes at the population level. The goal is to model the mean anatomy and its normal variation among a population and to discover morphological differences between normal and pathological populations. In the computational anatomy framework formalized by Grenander & Miller, the observed organ shapes are modeled as the random diffeomorphic deformation of an unknown template. Thus one need to perform statistics on infinite dimensional manifolds and Lie groups: the template can be seen as the mean shape, and the variability is encoded with the deformation variability.

An important application in neuroscience is the spatial normalization of subjects (mapping all the anatomies into a common reference system) which is used to compare anatomies and functions through images in populations and to identify the trend of one specific subject towards one of the considered populations of subjects. Statistical anatomical models are also used for the segmentation of patient images (atlas-based segmentation) thanks to the registration of generic atlases to patient specific images (personalized atlases). In this context there has been a gradual shift in the community from the use of a single atlas to the use of multiple atlases (multi-atlas segmentation methods) with important improvements of the segmentation quality. The goal of this project is to understand in which conditions the computation of the template (or more generally the mean) is well posed and to extend it to the estimation of a low-dimensional shape manifold that would mimic the multi-atlas method. From the theoretical point of view, this means quantifying the statistical efficiency (bias and variance) of the mean (however it is defined) and developing methods for subspace learning in manifolds.

The scientific goal of the associated team is to develop the field of geometric statistics with key applications in computational anatomy. The research directions have been broken into three axes, the first two being methodologically driven and the last one being application driven. The first axis aims at generalizing the statistical framework from Riemannian to more general geometric structures and even non-manifold spaces (e.g. stratified spaces). The goal is to understand what is gained or lost using each geometric structure. The second axis aims at developing subspace learning methods in non-linear manifolds. This objective contrasts with most manifold learning methods which learn (locally linear but globally non-linear) subspaces embedded in a large enough Euclidean space. The third scientific direction is application driven with cross-sectional and longitudinal brain neuroimaging studies. The goal will be to extract reduced models of the brain anatomy that best describe and discriminate the populations under study.

Statistics on manifolds beyond the Riemannian framework

In Computational Anatomy, organ shapes may be modeled as deformations of a template - i.e. as elements of a Lie group, a manifold with a group structure - or as the equivalence classes of their 3D configurations under the action of transformations - i.e. as elements of a quotient space, a manifold with a stratification. Medical images can be modeled as manifolds with a horizontal distribution. These are all manifolds with additional properties. Defining a metric or a distance to perform statistics on these manifolds might not be enough, as the metric or distance might not consistently account for the aforementioned additional structures. We develop Geometric Statistics especially beyond the now classical Riemannian and metric geometries in order to account for these additional structures. In particular, the Ph.D thesis of Nina Miolane assimilates and puts in perspective these developments.

First, organ's shapes are often modeled as deformations of a reference shape, i.e., as elements of a Lie group. To analyze the variability of the human anatomy in this framework, we need to perform statistics on Lie groups (manifolds with a consistent group structure). Statistics on Riemannian manifolds have been well studied, but to use the statistical Riemannian framework on Lie groups, one needs to define a Riemannian metric compatible with the group structure: a bi-invariant metric. However, it is known that Lie groups, which are not a direct product of compact and abelian groups, have no bi-invariant metric. In [Miolane and Pennec, Entropy 2015], we investigated if a weaker structure of bi-invariant pseudo-Riemannian metric could be sufficient for most of the groups used in Computational Anatomy. Our contribution is two-fold. First, we present an algorithm that constructs bi-invariant pseudo-metrics on a given Lie group, in the case of existence. Then, by running the algorithm on commonly-used Lie groups, we show that most of them do not admit any bi-invariant (pseudo-) metric. We thus conclude that the (pseudo-) Riemannian setting is too limited for the definition of consistent statistics on general Lie groups.

A second important generalization of the Riemannian framework that we have investigated concerns quotient spaces in which live shape data after removing the influence of some hidden transformation parameters. Statistics on shapes -and more generally on observations belonging to quotient spaces- have been studied since the 1980's. However, most theories model the variability in the shapes but do not take into account the noise on the observations themselves. In [Miolane and Pennec, GSI 2015], we show with a simple finite dimensional example that statistics on quotient spaces may be biased and even inconsistent when one takes into account the noise. This is the case for instance of one of the most used algorithms for template estimation in Computational Anatomy. Our development gives a first theoretical geometric understanding. In view of the applications, we characterize geometrically the situations when the bias can be neglected with respect to the situations when it must be corrected. The results on our toy shape spaces suggests that the main control variable is a kind of signal (seen here as the distance to the singularity) to noise ratio. In [Miolane et al, SIIMS 2016], we gave a Taylor expansion of this bias with respect to the variance of the noise on the object, for a general finite dimensional Riemannian manifold. We proposed two bootstrap procedures that quantify the bias and correct it, if needed. In [Miolane et al. 2016 ISBA], we considered the problem of learning the orbit of the template as a manifold learning problem and we showed how the Bayesian framework enables a correction in pathological cases for the Maximum Likelihood estimator.

Finally, we investigate the consequences of this statistical behavior on quotient spaces for template shape estimation in neuroimaging. The template shape estimation is interpreted as the estimation of the unique healthy brain anatomical shape of the population. But the uniqueness of a healthy brain shape may be discussed for brain shapes: for example, the distribution of the sulci on the brain surface varies significantly from one healthy subject to another. Assuming a unique anatomical brain structure may lead to a biased template. In the on-going work [Miolane et al. 2016], we present a methodology that quantifies spatially the brain template bias. This leads us to investigate the topology of the templateӳ intensity levels sets, represented by its Morse-Smale complex. We propose a topologically constrained adaptation of the template computation, that constructs a hierarchical template with bounded bias. We apply our method to the analysis of a brain template of 136 T1 weighted MR images from the Open Access Series of Imaging Studies (OASIS) database.

A third generalization of the Riemannian framework that we have investigated is sub-Riemannian geometry, where the metric is finite only in a subspace (usually non-integrable) of the tangent space. Such spaces have been used in theoretical physics as an alternative to dark matter. In computational anatomy, this would be a natural setting for modeling the as the distribution of the laminar sheets is not integrable. More generally, a sub-Riemannian structure appears in curve, surface or image interpolation and diffusion when one wants to consider first order derivatives into account. This idea has been explored in computer vision over the last years with Neuro-geometry. The aim is to model the human visual cortex and to explain human visual phenomena through modern Differential Geometry. This has resulted into new image processing algorithms for image completion, crossing-preserving smoothing, curve fitting, boundary completion, among others, where the object (curve or image intensity) is basically lifted in a larger space of position plus orientation for the processing. Generalizing these 2D algorithms to 3D medical images (and potentially higher dimensions is mathematically not straightforward. In [Miolane and Pennec, MCV 2015], we analyzed the different structures involved in 2D (Fiber bundle, group action, principal bundle) to distinguish the different notions of horizontality, verticality and geodesics used in the different papers of the field, and we show that a second level of lifting process (from oriented positions to the special Euclidean motion group) is necessary to deal with 3D curves, surfaces and images.

Geometric subspace learning

In manifold learning, one often embeds data in a large enough Hilbert space before learning a manifold structure. However, when a geometric structure is already known for the data, or when one wants to reduce the dimensionality of the data manifold in a hierarchical way, one needs to find a submanifold structure in a manifold, which is a harder problem.

One way to address this problem is to generalise Principle Component Analysis on manifolds by exploring alternatives to Principle geodesic analysis (PGA) or tangent PCA that remains well defined for other types of geometric structures (affine connection geometry, stratified spaces). In Riemannian manifolds, tangent PCA is often sufficient for analysing data which are sufficiently centred around a central value (unimodal or Gaussian-like data), but fails for multimodal or large support distributions (e.g. uniform on close compact subspaces). Instead of a covariance matrix analysis, Principal Geodesic Analysis (PGA) and Geodesic PCA (GPCA) are proposing to minimize the distance to Geodesic Subspaces (GS) which are spanned by the geodesics going through a point with tangent vector is a restricted linear sub-space of the tangent space. Other methods like Principal Nested Spheres (PNS) restrict to simpler manifolds but emphasize on the need for the nestedness of the resulting principal subspaces.

In [Pennec, GSI 2015], we propose a new and more general type of family of subspaces in manifolds that we call barycentric subspaces. They are implicitly defined as the locus of points which are weighted means of reference points. The affine span is then defined as the metric completion in the embedding manifold. As these definitions rely on points and do not on tangent vectors, they can also be extended to geodesic spaces which are not Riemannian. For instance, it naturally allows having principal subspaces that span over several strata in stratified spaces, which is not the case with PGA. We showed that affine spans locally define submanifolds which generalize geodesic subspaces. Like PGA, affine spans can naturally be nested, which allow the construction of inductive forward nested subspaces approximating data points which contains the Frechet mean. However, they also allow the construction of backward flags which may not contain the mean. In [Pennec MFCA 2015], we show that the affine span is a natural definition on the spheres that also generalises principal nested spheres. We also rephrase PCA in Euclidean spaces as an optimization on flags of linear subspaces (a hierarchies of properly embedded linear subspaces of increasing dimension). We propose for that an extension of the unexplained variance criterion that generalizes nicely to flags of affine spans in Riemannian manifolds. This results into a particularly appealing generalization of PCA on manifolds, that we call Barycentric Subspaces Analysis (BSA).

Practical implementation of geometric statistics: the geomstats library

The Python package geomstats is an open-source package for computations of Riemannian geometry in Machine Learning. The package gathers several codes from the associated team and aims at becoming the reference library to implement geometric statistics procedures, seamlessly accessible in a natural Scikit-learn machine learning environment. Nina Miolane created the library in 2018, in the framework of the Associated team GeomStats. Because it is a unique tool to promotes reproducible research in geometric statistics, we decided in 2019, to reorient all the efforts of the associated team to further push the developments and to make this library the flagship result of our collaboration. This library is also currently becoming the main development environment of the ERC G-Statistics research group. The ambition is to make “geomstats” one of the go-to packages in our field.

We organized two hackathons of one week in 2020 in January and March (gone virtual due to the lock-down) that gathered about 12 researchers from different communities coming from France, Germany, USA. This demonstrated the interest and motivation of the developers. The code is available on git-hub (https://github.com/geomstats/). It counts more than 45000 lines of code relying on three backends (numpy, pytorch and tensorflow) to authorize efficient GPU processing. It implements the atomic operations for statistical computing on more than 15 manifolds and a few generic high-level statistical learning algorithms based on these. Contributions are processed with the usual pull request of Github and continuous integration is done with the Travis tool. PEP8 / numpy standards are enforced for the code / docstrings using the flake8 tool.

The project was presented at several international conferences (GSI, Toulouse, 2019 et Scipy, Austin, 2020) and a paper in Journal of Machine Learning Research in the software section is currently in press. The library is also used as support for scientific publications in statistics and machine leraning ( B. Hou ; N. Miolane et al, MIDL 2019 ; Chevallier et Guigui, Entropy 2020 ; X. Pennec, submitted 2020).

Exchanges

  • May 2020: One week visit of N. Guigui and Y. Thanwerdas (PhD, Inria) planned at Stanford for the third GeomStat Hackathon. [canceled due to Covid-19 lock-down]
  • January 13-17, 2020: One week visit of N. Miolane (Post-doc Stanford) at Inria Sophia for the first GeomStat Hackathon.
  • June 9-11, 2019: One week meeting of X. Pennec and N. Miolane during the Informal Workshop on New Directions in Shape Analysis (Math in the Desert, Cedar Break, Utah, USA).
  • April 8-10, 2019: Y Thanwerdas visited the Statistics Department at Stanford.
  • March-December 2018: 10 month stay of Nina Miolane at Stanford Statistics Department.
  • May 2018: one week visit of Boris Gutman to Sophia Antipolis.
  • October/November 2017: short-term visits of Nina Miolane at Stanford Statistics Department.
  • July 2017: short visit of X. Pennec to Boris Gutman and Paul Thompson at LONI, USC, Los Angeles.
  • December 2016: Visit of Susan Holmes at Inria for Nina Miolane's PhD Defense.
  • April to June 2016: 3 month visit of Nina Miolane at Stanford Statistics Department.
  • April to June and August to December 2015: 8 month stay of Nina Miolane at Stanford Statistics Department.
  • April to June 2015: 3 month visit of Xavier Pennec at Stanford Statistics Department.

Events

Workshop and conferences organized

  • October 19-22, 2020: Geomstats coding week in Sophia Antipolis.
  • Annual mathematical workshop on the geometry of shapes 2020 (Math in the Cloud), planned at La Minière de Vallauria, Tende, 06 France, and gone virtual due to the pandemics. June 29 to July 10, 2020 (X. Pennec organizer, participation of 25 people including talks by Y. Thanwerdas, N. Miolane and N. Guigui).
  • March 30-April 3, 2020: Second GeomStats Hackathon in Paris [transformed to virtual], regrouping 10 people.
  • January 13-17, 2020: first GeomStats Hackathon in Sophia Antipolis, regrouping 12 people.
  • Geometric Statistics workshop, Toulouse, FR, Aug-30 to Sept. 5, 2019 (X. Pennec organizer, X. Pennec, S. Holmes, N. Miolane and Y. Thanwerdas speakers, N. Guigui participant).
  • Geometric Sciences of Information GSI'2019, Aug. 27-29, 2019 (X. Pennec and N. Miolane program committee members and session chairs, N. Guigui, Y. Thanwerdas and N. Miolane speakers).
  • Geometric Sciences of Information GSI'2017, Paris, FR, Nov 2017. (X. Pennec, S. Holmes and N. Miolane members of the Scientific Committee, X. Pennec session chair and speaker).
  • Topological and Geometric Structure of Information (TGSI 2017), Luminy, FR (X. Pennec co-organizer)
  • Mathematical Foundations of Computational Anatomy MFCA 2017, Quebec, CA. (X. Pennec Chair).
  • Workshop on Computational methods for the better understanding of human cognition and health at BIS'2017, UC Berkeley, California, June 2017 (N. Miolane organizer).
  • Geometry of Shapes Workshop (Math in the Mine), June 26 - July 2 2016, Miniere de Vallauria, Alpes Maritimes, FR (Xavier Pennec Organizer, Nina Miolane and Loic Devillier participants).
  • Geometric Sciences of Information GSI'2015 (Xavier Pennec and Susan Holmes members of the Scientific Committee, Xavier Pennec session chair, Xavier Pennec and Nina Miolane speakers).
  • Mathematical Foundations of Computational Anatomy MFCA 2015 (Xavier Pennec General Chair, Susan Holmes member of PC committee).
  • AMS Special Session on Differential Geometry and Statistics, 2015 Joint Mathematics Meetings (AMS/MAA), San Antonio, Texas, January 2015. Session organized by Susan Holmes with Xavier Pennec as Speaker.

Scientific mediation

Publications

PhD theses

  • Nina Miolane. Geometric Statistics for Computational Anatomy. PhD Thesis. Defended on December 16, 2016. Jury: Ian Dryden (U. Nottingham, UK), Alain Trouvé (ENS Cachan, FR), Sarang Joshi (U. Utah, USA), Stephan Huckemann (U. Goettingen, DE), Susan Holmes (U Stanford, USA), Nicholas Ayache (Inria, FR), Xavier Pennec (Inria , FR).
  • Loic Devillier. Consistency of statistics in quantient spaces of infinite dimension. Defended on November 20, 2017. Jury: Stéphanie Allassonnière (U. Paris Descartes, FR), Marc Arnaudon (U. Bordeaux, FR), Charles Bouveyron (U. Cote d'Azur, FR), Stephan Huckemann (U. Goettingen, DE), Xavier Pennec (Inria, FR), Stefan Sommer (U. Copenhague), Alain Trouvé (ENS Cachan, FR).
  • Hadj-Hamou, M. Beyond Volumetry in Longitudinal Deformation-Based Morphometry: Application to Sexual Dimorphism during Adolescence. December 14, 2016. Jury: O. Colliot (ICM, Paris), Ch. Barillot (IRISA, Rennes), J.-L. Martinot (Inserm U1000, Orsay), N. Ayache (Inria, FR), X. Pennec (Inria FR), R. Deriche (Inria, FR).

Books and Proceedings

Journal papers

Book chapters

Conferences with proceedings

Software