next up previous
Next: Influence of the "0" Up: 568K mesh Previous: 568K mesh

568K mesh: Globus performance on individual clusters

After the 262K study was completed, the AEDIF code was restructured to remove all unnecessary tables and subroutines to permit larger meshes with smaller executables than otherwise possible.


Table 12: 568K mesh: Globus performances on individual clusters - 16 CPUs -O3 option
Run type Globus Globus Globus Globus
Name of cluster INRIA-nina IUSTI INRIA-pf CEMEF
Processor speed 2 GHz 2 GHz 1 GHz 1 GHz
LAN speed 1 Gbps 100 Mbps 100 Mbps 100 Mbps
cache 512K 512K 256K 256K
RAM/CPUe 1/2 GB 1 GB 1/4 GB 1/4 GB
Executable size 237 MB 237 MB 237 MB 237 MB
Number of processors 16 16 16 16
Total computational time 104.0 94.4 195.6 189.7
Local inter-comm. time 1.8 13.3 12.2 13.4
Global inter-comm. time 51.5 39.1 55.9 81.5
Computational ratio 1.0 0.91 1.9 1.9
Communication/Work 1.0 1.2 0.5 1.0


Shown in Table 12 are the performances on the different individual clusters using 16 processors. The performances on the INRIA-pf and the CEMEF clusters are quite good (compute ratio < 2). Note that the IUSTI cluster performance is 20 percent faster than the INRIA-nina cluster, an unexpected result. However, the Communication/Work ratios for the 568K mesh with 16 processors are much larger than for the 262K mesh using 8 processors4. In examining the computational times for the the 262K and 568K runs, it was found that different compile options were used5and explains the differences in the Communication/Work ratios. Therefore for the same mesh, the more efficient the code (less Work per processor) the larger the Communication/Work ratios as the communication times depend on the LAN speeds that remain unchanged6!


Table 13: 568K mesh: Globus performances on individual clusters - 16 CPUs -O1 option
Run type Globus Globus Globus Globus
Name of cluster INRIA-nina iusti INRIA-pf cemef
Processor speed 2 GHz 2 GHz 1 Ghz 1 GHz
LAN speed 1 Gbps 100 Mbps 100 Mbps 100 Mbps
cache 512K 512K 256K 256K
RAM/CPUe 1/2 GB 1 GB 1/4 Gb 1/4 GB
Executable size 871 MB 871 MB 871 871 MB
Number of processors 8-8 8-8 8-8 8-8
Total computational time 547.3 449.9 740.9 1039.8
Local inter-comm. time 2.3 12.8 9.4 12.5
Global inter-comm. time 280.8 178.9 288.7 277.3
Computational ratio 1.00 0.82 1.35 1.90
Communication/Work 1.07 0.74 0.67 0.39


Table 13 shows the performances on the individual clusters using the -O1 option. Comparison of Table 12 (-O3 option) with Table 13 (-O1 option) shows that compiling with the -O1 option reduces the Communication/Work time ratios7.


Table 14: 568K mesh: inter-cluster performance - 16 CPUs -O1 option
Run type Globus Globus Globus Globus Globus Globus
Name of cluster nina nina-pf nina-iusti nina-cemef pf-cemef iusti-cemef
Processor speed 2 GHz 2/1 Ghz 2/2 GHz 2/1 Ghz 1/1 Ghz 2/1 Ghz
Executable size 343 MB 343 MB 343 MB 343 MB 343 MB 343 MB
Number of processors 16 8-8 8-8 8-8 8-8 8-8
Total computational time 547.3 702.3 1207.9 1322.4 1323.4 2041.6
Local inter-comm. time 2.3 10.5 496.3 190.8 181.9 554.0
Global inter-comm. time 280.8 279.5 449.2 411.4 411.4 449.4
Computational ratio 1.00 1.28 2.21 2.42 2.41 3.73
Communication/Work 1.07 0.70 3.61 0.83 0.81 0.97


Shown in Table 14 are some of the inter-cluster performances with 16 processors. It is noted that the local communication times for nina-cemef and pf-cemef are approximately two times smaller than the other inter-cluster combinations. This astonishing observation cannot be explained.


next up previous
Next: Influence of the "0" Up: 568K mesh Previous: 568K mesh
Stephen Wornom 2004-09-10