Scientific Data Management

Last News
Jan. 1, 2011 Creation of Zenith
 





Contact:
Patrick Valduriez (Firstname.Lastname@inria.fr)



Key-words: scientific data, uncertain data, data processing, data analysis, social-based data sharing, scientific workflows, data integration, content-based information retrieval, P2P, cloud.

Scientific data management is now on the agenda of a very active research community composed of scientists from different disciplines and data management researchers . For instance, the SciDB organization is building an open source database system for scientific data analytics.

Our approach is to capitalize on the principles of distributed data management. In particular, we plan to exploit: high-level languages as the basis for data independence and automatic optimization; data semantics (taxonomies, folksonomies, ontologies, …) to improve information retrieval and automate data integration; declarative languages (algebra, calculus) to manipulate data and workflows, with user-defined functions; and exploit user (social) profiles and relationships between participants to help recommendation. Furthermore, we will exploit highly distributed environments in particular, P2P for data sharing between participants and parallel processing to scale up in the cloud. To reflect our approach, we organize our research program in three complementary research themes:

  1. Data and Metadata Management. This theme addresses the problems of managing and integrating data and metadata with uncertainty, in particular, n-way schema matching and distributed probabilistic query processing.
  2. Data and process sharing. This theme addresses the problems of scientific data and processes in highly distributed and parallel environments, in particular, social-based P2P data sharing and scientific workflow management.
  3. Scalable data analysis. Given the gap between the growth of computing power and that of data production, our ability to analyze these data is inevitably at stake. This theme addresses the scalability problem by investigating new data mining and content-based retrieval techniques that exploit parallelism in the cloud.

INRIA main page LIRMM main page UM2 main page