Scifloware

Scientific Workflow Middleware

Use Cases

Context

The goal of the collaboration with the "Phenome" project is the large scale phenotyping.

Because of the necessity to process a large quantity of data and different tasks, we chose to use specialized software to handle data : "iRODS" (http://irods.org/), "DIRAC" for jobs (http://diracgrid.org/) and we will on a grid, kindly shared to the "Phenome" project (https://www.phenome-fppn.fr/) by "FranceGrilles" (http://www.france-grilles.fr).

We have a lot of different actors in this project : INRIA (SciFloware-Zenith, OpenAlea-VirtualPLants), INRA (Phenome, PHIS), and FranceGrilles. Theses actors are structured as follows :

Each actor will have his role. As said before, FranceGrilles will provide iRODS and DIRAC for handling the data and jobs. SciFloware part will be the creation of algebraic workflow, parallelization and distribution of subworkflows. OpenAlea will provide and execute the subworkflows and algorithms we need. And finally Phenome and PHIS will provide the raw data.

SciFloware, OpenAlea and iRODS/DIRAC will communicate with web services or/and DIRAC jobs, iRODS operations.

Users

As mentioned before, SciFloware works on data driven workflows, these workflows will be designed by the user and use the different subworkflows of OpenAlea or other systems.

The user will design his workflow with a xml formalism or with an user interface and each operation will be associated with an URI of the data. In our case, we assume the data is put on iRODS first and OpenAlea engines are installed and ready. From there SciFloware will use the available engines to distribute tasks.

Example

One simple example of an execution is a basic map-reduce operation with SciFloware. Our workflow in SciFloware is two components, map and reduce and the dataset will consist of images of plants given by Phenome.

Here is an example of the workflow in xml:

The subworkflow will be provided by OpenAlea and the execution will estimate the surface of a plant by counting green pixels.

Each element of the dataset will be treated with the subworkflow and given to available (OpenAlea)engines by SciFloware and then followed by a reduce operation.


INRIA main page