Patrick Valduriez

Inria
Campus Saint-Priest - Bâtiment 5
860 rue de St Priest
34095 Montpellier Cedex 5
France

Firstname.Lastname@inria.fr
Tel : +33 4 67 14 97 26
Fax : +33 4 67 41 85 00

Data integration

  • CloudMdsQL Polystore (2015-2018). Transforms queries expressed in a common SQL-like query language into an optimized query execution plan to be executed over multiple cloud data stores (SQL, NoSQL, HDFS, etc.) through a query engine. The compiler/optimizer is implemented in C++ and uses the Boost.Spirit framework for parsing context-free grammars. CloudMdsQL has been validated on relational, document and graph data stores in the context of the CoherentPaaS European project. It has been transferred to LeanXcale.
  • WebSmatch - Web Schema Matching (2011-2014). A flexible, open environment for discovering and matching complex schemas from many heterogeneous data sources over the Web. It provides three basic functions: (1) metadata extraction from data sources; (2) schema matching, and (3) schema clustering. It is delivered through Web services, to be used directly by data integrators or other tools, with RIA clients. Implemented in Java, delivered as Open Source Software (under LGPL), it has been used by Data Publica and CIRAD.
  • DISCO - Distributed Information Search Component (1997-1999). Disco is a data integration system for Internet data sourcesdeveloped in the Rodin group (in the context of the Dyade joint venture with Bull) between 1995 and 1999. Disco was transferred to the Kelkoo company (now number one Internet buying guide in Europe) in 1999.

Scientific workflow management

  • DfAnalyzer (2017 -). A tool for monitoring, debugging, steering, and analysis of dataflows generated by scientific applications. It works by capturing strategic domain data, registering provenance and execution data to enable queries at runtime. It provides lightweight dataflow monitoring components to be invoked by HPC applications. It can be plugged in scripts, or Spark applications, in the same way users already plug visualization library components.
  • OpenAlea (2012 – ). OpenAlea is an open source project primarily aimed at the plant research community. It is a distributed collaborative effort to develop Python libraries and tools that address the needs of current and future works in Plant Architecture modeling. It includes modules to analyze, visualize and model the functioning and growth of plant architecture. It was formally developed in the Inria VirtualPlants team. OpenAlea is used heavily by INRA for the analysis of phenotyping data.
  • Scifloware (2013-2020). A middleware for the execution of scientific workflows in a distributed and parallel way. SciFloware provides a development environment and a runtime environment for scientific workflows, interoperable with existing systems. We validate SciFloware with workflows for analyzing biological data provided by our partners CIRAD, INRA and IRD.

Distributed data management

  • SAVIME - Simulation And Visualization IN-Memory (2017 -). A multi-dimensional array DBMS for scientific applications. SAVIME supports a novel data model called TARS (Typed ARray Schema), which supports typed arrays. In TARS, the support of application dependent data characteristics, such as data visualization and UQ computation, is provided through the definition of TAR objects, ready to be manipulated by TAR operators. This approach provides much flexibility for capturing internal data layouts through mapping functions, which makes data ingestion independent of how simulation data has been produced.
  • Triton End-to-end Graph Mapper (2017-2020). A server for managing graph data and applications for mobile social networks. The server is built on top of the OrientDB graph DBMS and a distributed middleware. It provides an End-to-end Graph Mapper (EGM) for modeling the application as (i) a set of graphs representing the business data, the in-memory data structure maintained by the application and the user interface (tree of graphical components), and (ii) a set of standardized mapping operators that maps these graphs with each other.
  • P2Prec (2010-2013). P2Prec is a recommendation service for P2P content sharing systems that exploits users social data. To manage users social data, we rely on Friend-Of-A-Friend (FOAF) descriptions. P2Prec has a hybrid P2P architecture to work on top of any P2P content sharing system. It combines efficient DHT indexing to manage the users FOAF files with gossip robustness to disseminate the topics of expertise between friends.

Software Engineering

  • VersionClimber (2018 -). VersionClimber is an automated system to help update the package and data infrastructure of a software application based on priorities that the user has indicated (e.g. I care more about having a recent version of this package than that one). The system does a systematic and heuristically efficient exploration (using bounded upward compatibility) of a version search space in a sandbox environment (Virtual Env or conda env), finally delivering a lexicographically maximum configuration based on the user-specified priority order. It works for Linux and Mac OS on the cloud.
  • Hadoop_g5k (2014 -). Apache Hadoop provides an open-source framework for reliable, scalable, parallel computing. It can be deployed and used in large-scale platforms such as Grid 5000. However, its configuration and managementis very difficult, especially under the dynamic nature of clusters. Therefore, we built Hadoop_g5k (Hadoop easy deployment in clusters), a tool that makes it easier to manage Hadoop clusters and prepare reproducible experiments. Hadoop_g5k offers a set of scripts to be used in command-line interfaces and a Python interface. It is actually used by Grid5000 users, and helps them saving much time when doing their experiments with MapReduce.
  • ATL - Atlas Transformation Language (2004-2006). ATL is a transformation-based model management framework, with metadata management and data mapping as the main applications. It comes with a library of more than 100 transformation components. ATL has been registered in 2004 (by Inria, TNI-Software and University of Nantes) to the APP. ATL is released as Open Source Software under the Eclipse Public Licence and available as an Eclipse plugin. The average number of downloads is 675 per month. There is now an active community of more than 100 user sites, including research labs and major companies (Airbus, NASA, Ilog, Sodius, TNI, etc.). In early 2007, ATL was recognized a standard Eclipse component for model transformation.

INRIA main page LIRMM main page