Hybrid MIMD/SIMD high order DGTD solver
Results.HPC History
Show minor edits - Show changes to markup
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2-new.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2-new.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3-new.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4-new.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2-new.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2-new.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2-new.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2-new.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.png
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2-new.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.png
We conducted a parallel performance evaluation in terms of strong scalability analysis on thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node. For that purpose, we selected a use case typical of optical guiding applications. A Y-shaped waveguide is considered which consists of a nanosphere chain embedded in vacuum. The constructed tetrahedral mesh is made of 520,704 vertices and 2,988,103 elements. The high order discontinuous finite element method designed for the solution of the system of time-domain Maxwell equations coupled to a Drude model for the dispersion of noble metals at optical frequencies is formulated on a tetrahedral mesh. Within each element (tetrahedron) of the mesh, the components of the electric and magnetic field, as well as the component of the electric polarization, are approximated by a nodal (Lagrange type) interpolation method. The unknowns of the problem are thus given by the values of these physical quantities at the nodes of the polynomial interpolation. For instance, for a linear (i.e. P1) interpolation of the fields, the number of DoFs (Degrees of Freedoms) within a tetrahedron is 6x4 if the element is located in vacuum, and 9x4 if the element is located in the metallic structure. For a quadratic (i.e. P2) interpolation, the corresponding figures are 6x10 and 9x10, and so on for higher interpolation degrees. Then the global number of DoFs is the sum of these figures of the elements of the given mesh.
We selected a use case typical of optical guiding applications. A Y-shaped waveguide is considered which consists of a nanosphere chain embedded in vacuum. The constructed tetrahedral mesh is made of 520,704 vertices and 2,988,103 elements. The high order discontinuous finite element method designed for the solution of the system of time-domain Maxwell equations coupled to a Drude model for the dispersion of noble metals at optical frequencies is formulated on a tetrahedral mesh. Within each element (tetrahedron) of the mesh, the components of the electric and magnetic field, as well as the component of the electric polarization, are approximated by a nodal (Lagrange type) interpolation method. The unknowns of the problem are thus given by the values of these physical quantities at the nodes of the polynomial interpolation. For instance, for a linear (i.e. P1) interpolation of the fields, the number of DoFs (Degrees of Freedoms) within a tetrahedron is 6x4 if the element is located in vacuum, and 9x4 if the element is located in the metallic structure. For a quadratic (i.e. P2) interpolation, the corresponding figures are 6x10 and 9x10, and so on for higher interpolation degrees. Then the global number of DoFs is the sum of these figures of the elements of the given mesh.
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png
Strong scalability analysis of the DGTD solver
(:table border='0' width='100%' align='center' cellspacing='1px':) (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png (:tableend:)
We conducted a parallel performance evaluation in terms of strong scalability analysis on thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node.
(:table border='0' width='100%' align='center' cellspacing='1px':) (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg (:tableend:) Strong scalability analysis of the DGTD solver
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg %width=450px% http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:tableend:) (:table border='0' width='100%' align='center' cellspacing='1px':)
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg %width=450px% http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:tableend:) (:table border='0' width='100%' align='center' cellspacing='1px':)
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png
Contour lines of the amplitude of the DFT of Ez
Contour lines of the amplitude of the DFT of E
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-1.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-2.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png
(:linebreaks:)
(:linebreaks:)
We conducted a parallel performance evaluation in terms of strong scalability analysis. For that purpose, we selected a use case typical of optical guiding applications. A Y-shaped waveguide is considered which consists of a nanosphere chain embedded in vacuum. The constructed tetrahedral mesh is made of 520,704 vertices and 2,988,103 elements. The high order discontinuous finite element method designed for the solution of the system of time-domain Maxwell equations coupled to a Drude model for the dispersion of noble metals at optical frequencies is formulated on a tetrahedral mesh. Within each element (tetrahedron) of the mesh, the components of the electric and magnetic field, as well as the component of the electric polarization, are approximated by a nodal (Lagrange type) interpolation method. The unknowns of the problem are thus given by the values of these physical quantities at the nodes of the polynomial interpolation. For instance, for a linear (i.e. P1) interpolation of the fields, the number of DoFs (Degrees of Freedoms) within a tetrahedron is 6x4 if the element is located in vacuum, and 9x4 if the element is located in the metallic structure. For a quadratic (i.e. P2) interpolation, the corresponding figures are 6x10 and 9x10, and so on for higher interpolation degrees. Then the global number of DoFs is the sum of these figures of the elements of the given mesh.
We conducted a parallel performance evaluation in terms of strong scalability analysis on thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node. For that purpose, we selected a use case typical of optical guiding applications. A Y-shaped waveguide is considered which consists of a nanosphere chain embedded in vacuum. The constructed tetrahedral mesh is made of 520,704 vertices and 2,988,103 elements. The high order discontinuous finite element method designed for the solution of the system of time-domain Maxwell equations coupled to a Drude model for the dispersion of noble metals at optical frequencies is formulated on a tetrahedral mesh. Within each element (tetrahedron) of the mesh, the components of the electric and magnetic field, as well as the component of the electric polarization, are approximated by a nodal (Lagrange type) interpolation method. The unknowns of the problem are thus given by the values of these physical quantities at the nodes of the polynomial interpolation. For instance, for a linear (i.e. P1) interpolation of the fields, the number of DoFs (Degrees of Freedoms) within a tetrahedron is 6x4 if the element is located in vacuum, and 9x4 if the element is located in the metallic structure. For a quadratic (i.e. P2) interpolation, the corresponding figures are 6x10 and 9x10, and so on for higher interpolation degrees. Then the global number of DoFs is the sum of these figures of the elements of the given mesh.
The strong scalability analysis has been conducted on thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node.
The strong scalability analysis has been conducted on thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node.
The strong scalability analysis has been conducted on thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node.
(:linebreaks:)
The strong scalability analysis has been conducted on thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node.
The strong scalability analysis has been conducted on thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node.
(:linebreaks:)
This study has been conducted in the context of a PRACE Preparatory Access project (17th cut-off date, june 2014, project #2010PA2452).>><<
(:linebreaks:)
This study has been conducted in the context of a PRACE Preparatory Access project (17th cut-off date, june 2014, project #2010PA2452).
This study has been conducted in the context of a PRACE Preparatory Access project (17th cut-off date, june 2014, project #2010PA2452).
This study has been conducted in the context of a PRACE Preparatory Access project (17th cut-off date, june 2014, project #2010PA2452).>><<
This study has been conducted in the context of a PRACE Preparatory Access project (17th cut-off date, june 2014, project #2010PA2452).
This study has been conducted in the context of a PRACE Preparatory Access project (17th cut-off date, june 2014, project #2010PA2452).
>frame bgcolor='white'<<
The strong scalability analysis has been conducted on the thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node.
The strong scalability analysis has been conducted on thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node.
We conducted a parallel performance evaluation in terms of strong scalability analysis. For that purpose, we selected a use case typical of optical guiding applications. A Y-shaped waveguide is considered which consists of a nanosphere chain embedded in vacuum. The constructed tetrahedral mesh consists of 520,704 vertices and 2,988,103 elements. The high order discontinuous finite element method designed for the solution of the system of time-domain Maxwell equations coupled to a Drude model for the dispersion of noble metals at optical frequencies is formulated on a tetrahedral mesh. Within each element (tetrahedron) of the mesh, the components of the electric and magnetic field, as well as the component of the electric polarization, are approximated by a nodal (Lagrange type) interpolation method. The unknowns of the problem are thus given by the values of these physical quantities at the nodes of the polynomial interpolation. For instance, for a linear (i.e. P1) interpolation of the fields, the number of DoFs (Degrees of Freedoms) within a tetrahedron is 6x4 if the element is located in vacuum, and 9x4 if the element is located in the metallic structure. For a quadratic (i.e. P2) interpolation, the corresponding figures are 6x10 and 9x10, and so on for higher interpolation degrees. Then the global number of DoFs is the sum of these figures of the elements of the given mesh.
We conducted a parallel performance evaluation in terms of strong scalability analysis. For that purpose, we selected a use case typical of optical guiding applications. A Y-shaped waveguide is considered which consists of a nanosphere chain embedded in vacuum. The constructed tetrahedral mesh is made of 520,704 vertices and 2,988,103 elements. The high order discontinuous finite element method designed for the solution of the system of time-domain Maxwell equations coupled to a Drude model for the dispersion of noble metals at optical frequencies is formulated on a tetrahedral mesh. Within each element (tetrahedron) of the mesh, the components of the electric and magnetic field, as well as the component of the electric polarization, are approximated by a nodal (Lagrange type) interpolation method. The unknowns of the problem are thus given by the values of these physical quantities at the nodes of the polynomial interpolation. For instance, for a linear (i.e. P1) interpolation of the fields, the number of DoFs (Degrees of Freedoms) within a tetrahedron is 6x4 if the element is located in vacuum, and 9x4 if the element is located in the metallic structure. For a quadratic (i.e. P2) interpolation, the corresponding figures are 6x10 and 9x10, and so on for higher interpolation degrees. Then the global number of DoFs is the sum of these figures of the elements of the given mesh.
We conducted a parallel performance evaluation in terms of strong scalability analysis. For that purpose, we selected a use case typical of optical guiding applications. A Y-shaped waveguide is considered which consists in nanosphere embedded in vacuum. The constructed tetrahedral mesh consists of 520,704 vertices and 2,988,103 elements. The high order discontinuous finite element method designed for the solution of the system of time-domain Maxwell equations coupled to a Drude model for the dispersion of noble metals at optical frequencies is formulated on a tetrahedral mesh. Within each element (tetrahedron) of the mesh, the components of the electric and magnetic field, as well as the component of the electric polarization, are approximated by a nodal (Lagrange type) interpolation method. The unknowns of the problem are thus given by the values of these physical quantities at the nodes of the polynomial interpolation. For instance, for a linear (i.e. P1) interpolation of the fields, the number of DoFs (Degrees of Freedoms) within a tetrahedron is 6x4 if the element is located in vacuum, and 9x4 if the element is located in the metallic structure. For a quadratic (i.e. P2) interpolation, the corresponding figures are 6x10 and 9x10, and so on for higher interpolation degrees. Then the global number of DoFs is the sum of these figures of the elements of the given mesh.
We conducted a parallel performance evaluation in terms of strong scalability analysis. For that purpose, we selected a use case typical of optical guiding applications. A Y-shaped waveguide is considered which consists of a nanosphere chain embedded in vacuum. The constructed tetrahedral mesh consists of 520,704 vertices and 2,988,103 elements. The high order discontinuous finite element method designed for the solution of the system of time-domain Maxwell equations coupled to a Drude model for the dispersion of noble metals at optical frequencies is formulated on a tetrahedral mesh. Within each element (tetrahedron) of the mesh, the components of the electric and magnetic field, as well as the component of the electric polarization, are approximated by a nodal (Lagrange type) interpolation method. The unknowns of the problem are thus given by the values of these physical quantities at the nodes of the polynomial interpolation. For instance, for a linear (i.e. P1) interpolation of the fields, the number of DoFs (Degrees of Freedoms) within a tetrahedron is 6x4 if the element is located in vacuum, and 9x4 if the element is located in the metallic structure. For a quadratic (i.e. P2) interpolation, the corresponding figures are 6x10 and 9x10, and so on for higher interpolation degrees. Then the global number of DoFs is the sum of these figures of the elements of the given mesh.
Indeed, the compact nature of method (the polynomial interpolation of the physical field is performed at the element level) is particularly appealing for harnessing the processing capabilities of manycore CPUs or accelerator chips. We are concerned here with the study of a hybrid coarse grain/fine grain parallelization of a high order DGTD solver for the system of Maxwell equations coupled to a physical dispersion model. Practical modeling settings of interest to our study are the system of Maxwell equations coupled to a Debye dispersion model for the simulation of microwave interaction with biologocal tissues, and the system of Maxwell equations coupled to a Drude dispersion model for the simulation of light interaction with nanometer scale metallic structures.
Indeed, the compact nature of method (the polynomial interpolation of the physical field is performed at the element level) is particularly appealing for harnessing the processing capabilities of manycore CPUs or accelerator chips. We are concerned here with the study of a hybrid coarse grain/fine grain parallelization strategy for a high order DGTD solver for the system of Maxwell equations coupled to a physical dispersion model. Practical modeling settings of interest to our study are the system of Maxwell equations coupled to a Debye dispersion model for the simulation of microwave interaction with biologocal tissues, and the system of Maxwell equations coupled to a Drude dispersion model for the simulation of light interaction with nanometer scale metallic structures.
Indeed, the compact nature of method (the polynomial interpolation of the physical field is performed at the element level) is particularly appealing fro exploiting the processing capabilities of manycore CPUs or accelerator chips. We are concerned here with the study of a hybrid coarse grain/fine grain parallelization of a high order DGTD solver for the system of Maxwell equations coupled to a physical dispersion model. Practical modeling settings of interest to our study are the system of Maxwell equations coupled to a Debye dispersion model for the simulation of microwave interaction with biologocal tissues, and the system of Maxwell equations coupled to a Drude dispersion model for the simulation of light interaction with nanometer scale metallic structures.
Indeed, the compact nature of method (the polynomial interpolation of the physical field is performed at the element level) is particularly appealing for harnessing the processing capabilities of manycore CPUs or accelerator chips. We are concerned here with the study of a hybrid coarse grain/fine grain parallelization of a high order DGTD solver for the system of Maxwell equations coupled to a physical dispersion model. Practical modeling settings of interest to our study are the system of Maxwell equations coupled to a Debye dispersion model for the simulation of microwave interaction with biologocal tissues, and the system of Maxwell equations coupled to a Drude dispersion model for the simulation of light interaction with nanometer scale metallic structures.
Indeed, the compact nature of method (the polynomial interpolation of the physical field is performed at the element level) is particularly appealing fro exploiting the processing capabilities of manycore CPUs or accelerator chips. We are concerned here with the study of a hybrid coarse grain/fine grain parallelization of a high order DGTD solver for the system of Maxwell equations coupled to a physical dispersion model. Practical modeling settings of interest to our study are the system of Maxwell equations coupled to a Debye dispersion model for the simulation of microwave interaction with biologocal tissues, and the system of Maxwell equations coupled to a Drude dispersion model for the simulation of light interaction with nanometer scale metallic structures.
Indeed, the compact nature of method (the polynomial interpolation of the physical field is performed at the element level) is particularly appealing fro exploiting the processing capabilities of manycore CPUs or accelerator chips. We are concerned here with the study of a hybrid coarse grain/fine grain parallelization of a high order DGTD solver for the system of Maxwell equations coupled to a physical dispersion model. Practical modeling settings of interest to our study are the system of Maxwell equations coupled to a Debye dispersion model for the simulation of microwave interaction with biologocal tissues, and the system of Maxwell equations coupled to a Drude dispersion model for the simulation of light interaction with nanometer scale metallic structures.
- It is naturally adapted to parallel computing. As long as an explicit time integration scheme is used, the DGTD method is easily parallelized. Moreover, the compact nature of method is in favor of high computation to communication ratio especially when the interpolation order is increased.
In this study, we focus on the last of these features and develop a hybrid coarse grain/fine grain parallelization of a high order DGTD solver formulated on unstructured tetrahedral meshes for the simulation of light interaction with nanometer scale metallic structures.
- It is naturally adapted to parallel computing. As long as an explicit time integration scheme is used, the DGTD method is easily parallelized.
Indeed, the compact nature of method (the polynomial interpolation of the physical field is performed at the element level) is particularly appealing fro exploiting the processing capabilities of manycore CPUs or accelerator chips. We are concerned here with the study of a hybrid coarse grain/fine grain parallelization of a high order DGTD solver for the system of Maxwell equations coupled to a physical dispersion model. Practical modeling settings of interest to our study are the system of Maxwell equations coupled to a Debye dispersion model for the simulation of microwave interaction with biologocal tissues, and the system of Maxwell equations coupled to a Drude dispersion model for the simulation of light interaction with nanometer scale metallic structures.
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:tableend:) (:table border='0' width='100%' align='center' cellspacing='1px':) (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-2.jpg
(:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-2.jpg
(:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-2.jpg
(:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-2.jpg
(:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-2.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-1.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-1.jpg
(:linebreaks:)
(:table border='0' width='100%' align='center' cellspacing='1px':) (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Ez-1.jpg (:tableend:) Contour lines of the amplitude of the DFT of Ez
Related publications
(:linebreaks:)
S. Lanteri, R. Léger, C. Scheid,J. Viquerat, T. Cabel and G. Hautreux
Hybrid MIMD/SIMD high order DGTD solver for the numerical modeling of light/matter interaction on the nanoscale
PRACE White Paper (2015)
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
Strong scalability analysis of the DGTD solver
The strong scalability analysis has been conducted on the thin nodes of the Curie system. Each run has been made considering 8 OpenMP threads per socket and 2 sockets per node.
(:linebreaks:)
(:tableend:) (:table border='0' width='100%' align='center' cellspacing='1px':)
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:table border='0' width='100%' align='center' cellspacing='1px':) (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-1.png (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png (:tableend:) Contour lines of the amplitude of the discrete Fourier transform of the electric field
We conducted a parallel performance evaluation in terms of strong scalability analysis. For that purpose, we selected a use case typical of optical guiding applications. A Y-shaped waveguide is considered which consists in nanosphere embedded in vacuum. The constructed tetrahedral mesh consists of 520,704 vertices and 2,988,103 elements. The high order discontinuous finite element method designed for the solution of the system of time-domain Maxwell equations coupled to a Drude model for the dispersion of noble metals at optical frequencies is formulated on a tetrahedral mesh. Within each element (tetrahedron) of the mesh, the components of the electric and magnetic field, as well as the component of the electric polarization, are approximated by a nodal (Lagrange type) interpolation method. The unknowns of the problem are thus given by the values of these physical quantities at the nodes of the polynomial interpolation. For instance, for a linear (i.e. P1) interpolation of the fields, the number of DoFs (Degrees of Freedoms) within a tetrahedron is 6x4 if the element is located in vacuum, and 9x4 if the element is located in the metallic structure. For a quadratic (i.e. P2) interpolation, the corresponding figures are 6x10 and 9x10, and so on for higher interpolation degrees. Then the global number of DoFs is the sum of these figures of the elements of the given mesh.
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-1.png (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-1.png (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-1.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-1.png (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.png
Contour lines of the amplitude of the discrete Fourier transform of the electric field
(:linebreaks:)
(:table border='0' width='100%' align='center' cellspacing='1px':) (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-1.jpg (:cell align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/Yguide-2.jpg (:tableend:)
In this study, we focus on the last of these features and develop a hybrid coarse grain/fine grain parallelization of a high order DGTD solver formulated on unstructured tetrahedral meshes for the simulation of light interaction with nanometer scale metallic structures.
This study has been conducted in the context of a PRACE Preparatory Access project (17th cut-off date, june 2014, project #2010PA2452).
(:linebreaks:)
The DGTD method can be considered as a finite element method where the continuity constraint at an element interface is released. While it keeps almost all the advantages of the finite element method (large spectrum of applications, complex geometries, etc.), the DGTD method has other nice properties, which explain the renewed interest it gains in various domains in scientific computing:
- It is naturally adapted to a high order approximation of the unknown field. Moreover, one may increase the degree of the approximation in the whole mesh as easily as for spectral methods but, with a DGTD method, this can also be done locally i.e. at the mesh cell level. In most cases, the approximation relies on a polynomial interpolation method but the method also offers the flexibility of applying local approximation strategies that best fit to the intrinsic features of the modeled physical phenomena.
- When the discretization in space is coupled to an explicit time integration method, the DG method leads to a block diagonal mass matrix independently of the form of the local approximation (e.g the type of polynomial interpolation). This is a striking difference with classical, continuous FETD formulations. Moreover, the mass matrix is diagonal if an orthogonal basis is chosen.
- It easily handles complex meshes. The grid may be a classical conforming finite element mesh, a non-conforming one or even a hybrid mesh made of various elements (tetrahedra, prisms, hexahedra, etc.). The DGTD method has been proven to work well with highly locally refined meshes. This property makes the DGTD method more suitable to the design of a hp-adaptive solution strategy (i.e. where the characteristic mesh size h and the interpolation degree p changes locally wherever it is needed).
- It is flexible with regards to the choice of the time stepping scheme. One may combine the discontinuous Galerkin spatial discretization with any global or local explicit time integration scheme, or even implicit, provided the resulting scheme is stable.
- It is naturally adapted to parallel computing. As long as an explicit time integration scheme is used, the DGTD method is easily parallelized. Moreover, the compact nature of method is in favor of high computation to communication ratio especially when the interpolation order is increased.
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.png (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.png
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.jpg (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p4.jpg
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.png (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.png
(:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.png (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.png
(:linebreaks:)
(:table border='0' width='100%' align='center' cellspacing='1px':) (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p2.png (:cellnr align='center':) http://www-sop.inria.fr/nachos/pics/results/hpc/dgtd_p3.png (:tableend:)
(:title Hybrid MIMD/SIMD high order DGTD solver:)
(:linebreaks:)
(:linebreaks:)