HOSCAR project home page
High performance cOmputing and SCientific dAta management
dRiven by highly demanding applications
Research activities
Driving applications
The project focuses on providing performing computational and mathematical tools to address the different aspects involved in the exploitation of natural resources and its impact on the environment, and data management in astronomy and scientific simulations. As for the former, several problems of interest today in the oil and gas industry exhibit a complex behavior over a wide range of length and time scales. Seismic imaging, basin analysis, reservoir simulation, and the impact of the oil exploitation on environment, are among those problems, described by (usually) non-linear partial differential equations, that must be solved taking into account possibly discontinuous multiscale or high-contrast coefficients, such as permeability fields, faults, etc. On the other hand, astronomy projects are pushing the development of scalable data stores to cope with a data deluge produced by new digital telescope technology.
Resource prospection
Computational challenges in geoseismics span a wide range of disciplines and have significant scientific and societal implications. Two important topics are mitigation of seismic hazards and discovery of economically recoverable petroleum resources. The capacity for imaging accurately the earth subsurface, on land and below the sea floor is a challenging problem that has significant economical applications in terms of resource management, identification of new energy reservoirs and storage sites as well as their monitoring through time. As recoverable deposits of petroleum become harder to find the costs of drilling and extraction increase, the need for more detailed imaging of underground geological structures has therefore become obvious. Progress realized recently in seismic acquisition related to dense networks of sensors and data analysis make possible now to extract new information from fine structures of the recorded signals associated with strongly diffracted waves. In the context of the present project, the objective is to provide new insight in emerging numerical methodologies for the geosciences, particularly seismic imaging. The partners will investigate core technologies for the forward modeling of the full elastic wave propagation and parallel mesh generation for complex multiscale geoscience applications.
Reservoir simulation
The numerical simulation of fluid flow in porous media, as found in saline aquifers or petroleum reservoirs, is of fundamental importance when it comes to managing water resources or oil extraction. Regarding the latter, the nature of the fluid inside the reservoir strongly depends on the current stage of oil recovery. Primary recovery is usually modeled through a single-phase flow, whereas secondary recovery uses a two-phase immiscible flow to account for the injection of water into some wells. However, inefficiencies arising from saturation during secondary recovery have lead engineers to seek miscibility by injecting CO 2? gas, thereby enhancing oil recovery. Recently, this strategy has attracted particular attention as a reservoir may be seen as a storage site to sequester the gas indefinitely, with possible benefits for the environment.
Each stage of oil recovery is driven by distinct sets of differential partial equations with highly heterogeneous coefficients and embedded high-contrast interfaces. Among them is the mixed form of the Darcy equation, which takes part in all modeling stages and is responsible for establishing the velocity of the fluid through a linear relationship with the pressure gradient. Coupled with such a fluid flow model, non-linear transport equations model the interaction among the different phases (oil, water, gas) acting in the reservoir.
Also, more realistic models may involve even more complexity as they account for fluid-solid interactions occurring inside the reservoir (in which case pore-elasticity equation must be adopted) or for the interaction between the reservoir itself and the different layers situated above the cap rock (usually model by a three-dimensional elasticity model). The latter becomes important when it comes to model faults as a way to predict potential reservoir damaging.
Ecological modeling
The obligation to provide environmental impact report (EIR) to control organs before any action or development that might harm the environment stimulated the appearance of numerous studies on the structural and functional characteristics of ecosystems. What is meant by the EIR is to make available to society and the competent organs, forecasts of medium and long term reliable and accurate information about a possible negative impact on the ecosystem. In this context, modeling and numerical resolution of coupled fluid-ecosystems models are fundamental to the prediction of possible impacts on a region of biosystem. In this context, the project intends to develop ecological models, which usually correspond to very large system of PDE equations, and numerical algorithms adapted to solve them. To this end, strategies are applied in developing mathematical models of the trofic chain coupled to fluid flow models. More specifically, the biological system is likely to involve a large set of non-linear coupled parabolic PDE of reaction-advection-diffusion type for each species while the physical models rely on the shallow-water or the incompressible Navier-Stokes equations.
Astronomy data management
The Dark Energy Survey projectis expected to produce and analyze tens of petabytes of information during the next five years. Scientists use scientific workflows in reducing telescope-captured images and in analyzing them aiming at the identification of celestial bodies. The output of the process is loaded into the sky catalogue that is then explored by analytical workflows. The current release of the catalogue, including only simulated data, comprehends 1TB of data and is modeled as a single relation with 900 columns, most of them of double data type, and around 900 million tuples. From the scientific workflow designer point of view, a single relation is an efficient model for building applications that basically select the part of the sky they are interested in, using spatial predicates, in addition to more than ten predicates on other columns, such as telescope calibration, brightness level, and so on. Implementing such abstraction in current DBMS technology clearly does not scale to the envisioned petabyte data volume.
Simulation data management
The quality of in-silico simulations depends on computing increasingly frequent space-time variations on observable physical quantities. Moreover, due to the complexity of the simulated domain, multi-scale modeling needs to be taken into account. Thus, a multi-dimensional data model is needed for adequate simulation data representation. Some initiatives such as SCIDB offer multi-dimensional array data models. Nevertheless, due to the massive data volume produced by simulations, intrinsic parallelism is required for scaling the system to reasonable response times on HPC systems. In the specific case of hemodynamics modeling - a research topic within the National Institute of Science and Technology for Computer-Aided Medicine (INCT-MAC), which is coordinated by LNCC with participation from UFC – simulated data is output to a visualization tool for real-time analysis of patients' circulatory system. Supporting online visualization is increasingly challenging with respect to the simulation time period and simulation scope. Managing data parallelism is paramount for enhancing the quality of the visualization and medical use therein. Another challenging example is the management of data produced by simulations on the PELD project. Complex analysis may be expressed on consequences of oil spill in the Guanabara bay. Distributed spatio-temporal index structures must be investigated to support large topology meshes and large temporal data variation.