... Wornom1
INRIA, 2004 Route des Lucioles, BP. 93, 06902 Sophia-Antipolis, France
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... (Work)1
The maximum local and global communication times may be on different processors. Therefore the Communication/Work ratios shown here computed with the mentioned ratios are only an upper estimate and are larger than the actual values. The most recent version of the AERO-F code compute these ratios correctly-see APPENDIX F.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... solver2
The explict solver was approximately a factor of 38 slower for this problem. However, the explict solver can be optimized for steady-state problems, this was not done in this study as it was beyond the scope of this investigation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Communication/Work3
The Communication/Work ratio is the total communication time divided by the Work. (Work = total computational time - total communication time.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... processors4
This abnormality was noted during the writing of the report. Since the 568K mesh is two times larger than the 262K mesh, and 16 processors were used rather than 8, one would expect approximately the same Communication/Work ratios. For both the 262K and 568K meshes, 10 time steps were used for the comparisons.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... used5
The Makefile shows that the -O1 option was used to compile AERO-F for the 262K mesh and the -O3 option for the 568K mesh. The -O1 option increases the processor work by a factor of approximately 4 relative to the -O3 option.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... unchanged6
In hindsight, for comparison purposes, the -O1 compile option should have been retained for the 568K and the 1.03M meshes. The -O1 compile option was used in the AGARD test case.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ratios7
Examining Tables 12-13 shows that the local communication times are approximately the same for both the -O1 and -O3 compile options whereas the global communication times for the -O1 option are on the order of 3-6 times larger than the -O3 times.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... CRate8
CRate should not be confused with megaflops.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... below9
Noted by Basset [#!basset!#]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... CEMEF10
The CEMEF cluster has 62 CPUs available but is configured so that the maximum CPUs available to Globus users is 24. Note that users of the INRIA and IUSTI clusters request the number of CPUs whereas users of the CEMEF cluster request the number of Nodes by queue submission q2, q4, q6, q8, q16, q24 and q32. Each Node has 2 CPUs, the default ppn (processors per Node) is 1. q32 requests 32 Nodes but as only 31 Nodes are available q32 jobs never run. Thus the maximum available Nodes is q24 with default ppn=1, therefore 24 maximum processors! Hopefully this abnormally will be corrected in the near future. For Globus users, the globusrun script must be modified by the local system administrator to permit the user to set the ppn parameter in the OpenPBS script written and submitted by globusrun.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... large11
The AERO-F code prints the following error message is the buffer size is too small
``MESSAGE_IS_TOO_LONG_FOR_BUFFER."
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... tried.12
Based on item 4 of the last section, it would have been prudent to increase the buffer size so as to compare the Communication/Work time ratios for the 568K and the 1.03M meshes compiled with the -O3 option for a fixed number of CPUs.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... sent/received13
by 64 as AEDIF is compiled with the -r8 option therefore 64 bits for real data
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... CPUs14
Basset used 2 CPUs for his inter cluster tests and 4 CPUs for the intra cluster tests
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... CPUs)15
The CEMEF cluster has 62 CPUs but only 24 are available to Globus users
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... application16
The Communication/Work times for the 262K mesh using 8 CPUs were much smaller than for the 568K mesh with 16 CPUs. This may, in part, be due to a decomposition that resulted in a more optimal message passing and should be evaluated in future studies.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... animations
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.