From C2S@Exa - Computer and Computational Sciences at Exascale

ResearchActivities: Research and development activities

Research and development activities undertaken in the project are organized along 5 thematic poles.

Pole 1: Numerical linear algebra
Coordinators: Jean-Yves L’Excellent (Roma project-team) and Luc Giraud (Hiepacs project-team)

The design of the extreme-scale computing platforms that are expected to become available in the forthcoming decade will represent a convergence of technological trends and the boundary conditions imposed by over half a century of algorithm and application software development. These platforms will be hierarchical as they will provide coarse grain parallelism between nodes and fine grain parallelism within each node. They are also expected to be very heterogeneous since multicore chips and accelerators have completely different architectures and potentials. It is clear that such a degree of complexity will embody radical changes regarding the software infrastructure for large-scale scientific applications. Central numerical kernels such as fast transforms or numerical linear algebra solvers –dense or sparse– are intensively used in many large-scale computer simulations in science and engineering, where they often account for the most time-consuming part of the computations. It is widely recognized that there is not a single strategy that outperforms all the others for a large class or problems. To address these challenges, we consider in the project numerical kernels that exhibit a natural hierarchical dependency, which can be summarized in a bottom-up description as follows:

The combination of the above-mentioned techniques will enable to exploit many levels of parallelism with various granularity suited for an effective use of the forthcoming heterogeneous extreme-scale platforms. The advances in the methods will benefit in particular but not exclusively to the solution of the systems of differential equations characterizing the uses cases considered in the project.

Pole 2: Numerical schemes for PDE models
Coordinators: Jocelyne Erhel (Sage project-team) and Philippe Helluy (Calvi project-team)

The availability of massively parallel systems with theoretical floating point performances in the petascale range, and in the exascale range in a near future, will enable the numerical treatment of more challenging problems involving, on the one hand, discretized models with higher spatial resolution and, on the other hand, the access to longer time scales but also to more complex physical models possibly encompassing multiple space and time scales. However, the simulation of such complex physical phenomena will require very accurate and efficient numerical schemes that will ideally be able to automatically switch between arbitrary high order accuracy in regions where the problem solution is smooth, and low order accuracy combined to local adaptivity of the discretization mesh in less regular regions. Moreover, the targeted simulation tools will possibly have to couple several mathematical models in a multiscale space-time setting. From the algorithmic point of view, the implementation of the proposed numerical schemes will have to be optimized for maximizing both the single core computational performances and the scalability in view of exploiting massive parallelism. In this context, the objective of the activities undertaken in this thematic pole is to address these needs for some general widely used types of differential equations, as well as to develop complex solvers tuned for some specific challenging physical problems.

The expertise of the project-teams that are active in this thematic pole is about the mathematical analysis and numerical study of differential equations modeling various types of physical problems. Nuclear energy production and radioactive waste management are two application domains that are addressed in the first place. The corresponding scientific and engineering use cases deal with transport, diffusion and convection-diffusion problems. These use cases are proposed by the two external partners that are participating to the project at its start: ANDRA (French National Agency for Radioactive Waste Management) and CEA (French Alternative Energies and Atomic Energy Commission). In addition to these two central application challenges, other physical problems (such as seismis wace propagation and electromagnetic wave propagation) are considered by the project partners for the purpose of demonstrating the benefits of the methodological contributions of this thematic pole (and the other thematic poles as well).

Pole 3: Optimization of performances of numerical solvers
Coordinators: François Pellegrini (Bacchus project-team) and Olivier Aumage (Runtime project-team)

The research and development activities at the heart of the thematic poles 1 and 2 deal with core ingredients of the numerical treatment of the systems of differential equations modeling the computational physics problems considered in the project. A feature shared by the techniques adopted for the discretization of these systems of differential equations is that they rely on unstructured meshes and modern principles for building high order, possibly auto-adaptive (i.e. hp-adaptive) finite element or finite volume type methods. At the discrete level, these methods result in large sparse or hybrid sparse-dense linear systems of equations for which we propose to study solution strategies that combine, in a hierarchical way, iterative and direct building blocks. Then, the activities undertaken in these poles aim at designing differential equation solvers and associated numerical kernels that exploit as far as possible the architectural characteristics of modern massively parallel computing platforms. Besides, these solvers will be characterized by complex data structures and highly irregular data access patterns which will drastically impact the sustained computational performances, in particular if one does not take into consideration crucial complementary questions related to the mapping of data sets and numerical algorithms to the targeted computing platforms. This is exactly the objective of the thematic pole 3 which is concerned with the optimization of the performances of numerical solvers by considering topics that often make a link or are at the interface between computer science techniques and tools for exploiting high performance computing systems and application software. While doing so, two architectural considerations that must be dealt with are the hierarchical organization of the memory and the heterogeneity of processing units and networks. The topics to be addressed in this thematic pole are the following:

Pole 4: Programming models
Coordinators: Thierry Gautier (Moais project-team) and Christian Perez (Avalon project-team)

Exascale computing will offer to the programmer the opportunity to use millions of cores. Cores will certainly be heterogeneous, where a simple chip may contains numerous simple cores and few complex cores (CPU); memory will be hierarchically organized. Moreover, as exascale computing will provide more computing and storage capabilities, it will enable new kinds of applications. For example, multiphysics applications will be able to integrate more phenomena. It will be unimaginable that programmers will be exposed directly to these hardware and software complexities, which must be carried out by the software stack between the application and the hardware rather than by the programmers themselves. In this thematic pole, programming models will be investigated to reduce the previously mentioned complexities. The common idea is to structure the parallelism and its description using known skeletons or patterns with (guaranteed) good performances on multiple architectures. The absence of consensus to write or compose HPC applications motivates the research of alternative ways to program them. Two research directions have been identified:

The key points here are providing a correct abstraction of the underlying architecture, finding good ways to describe the application parallelism or to identify computational patterns and, providing a scheduling algorithm that maps the logical parallelism of the application to the physical available hardware resources. Right features and design decisions allow reducing the overhead inherent to architecture abstraction. This is why the thematic pole 4 is strongly connected with thematic pole 3 that address runtime tools and algorithms to exploit parallel architecture. Resource managers, scheduling algorithms, graph partitioning and load balancing algorithms that are studied in the context of the thematic pole 3 are viewed as basic parallel building blocks that will be coordinated at an upper level by taking into account both architecture abstraction and structural properties of parallel programs.

Pole 5: Resilience for exascale computing
Coordinators: Frédéric Vivien (Roma project-team) and Laura Grigori (Alpines project-team)

Simulation is developing as a third methodological pillar for many science disciplines, in addition to theory and experiments. Simulations, which are done on high performance computers, are slowed down by preventive and corrective actions used to mask hardware or software component faults. At extreme scale (such as the PRACE systems, NSF Track 1 and Track 2 systems or the Department of Energy Leadership computing systems), where high performance computers are pushed to their limits in scale and technologies, faults are so frequent that not only they challenge the robustness of the mechanisms that are supposed to ensure reliable executions, but they also force to use these mechanisms much more frequently, thereby significantly decreasing the simulation performance. With the advent of exascale, computing, involving 10,000,000 processor cores or more, experts of fault tolerance for HPC systems consider that the time between two consecutive faults will be lower than the time required for a restart-checkpoint cycle. At this point, no actual progress of the application can be expected. An important objective is to develop new algorithmic techniques to solve the exascale resilience problem. Solving this problem implies a rupture from current approaches, and calls for yet-to-be-discovered algorithms, protocols and software tools. In the project the following research directions will be considered:

Retrieved from http://www-sop.inria.fr/c2s_at_exa/pmwiki-2.2.38/pmwiki.php/ResearchActivities/ResearchActivities
Page last modified on June 23, 2013, at 09:20 PM