Next: Influence of the "0"
Up: 568K mesh
Previous: 568K mesh
After the 262K study was completed, the AEDIF code was restructured
to remove all unnecessary tables and subroutines to permit
larger meshes with smaller executables than otherwise possible.
Table 12:
568K mesh: Globus performances on individual clusters - 16 CPUs -O3 option
Run type |
Globus |
Globus |
Globus |
Globus |
Name of cluster |
INRIA-nina |
IUSTI |
INRIA-pf |
CEMEF |
Processor speed |
2 GHz |
2 GHz |
1 GHz |
1 GHz |
LAN speed |
1 Gbps |
100 Mbps |
100 Mbps |
100 Mbps |
cache |
512K |
512K |
256K |
256K |
RAM/CPUe |
1/2 GB |
1 GB |
1/4 GB |
1/4 GB |
Executable size |
237 MB |
237 MB |
237 MB |
237 MB |
Number of processors |
16 |
16 |
16 |
16 |
Total computational time |
104.0 |
94.4 |
195.6 |
189.7 |
Local inter-comm. time |
1.8 |
13.3 |
12.2 |
13.4 |
Global inter-comm. time |
51.5 |
39.1 |
55.9 |
81.5 |
Computational ratio |
1.0 |
0.91 |
1.9 |
1.9 |
Communication/Work |
1.0 |
1.2 |
0.5 |
1.0 |
|
Shown in Table 12 are the performances on
the different individual clusters using 16 processors.
The performances on the INRIA-pf and the CEMEF clusters are quite good (compute
ratio < 2). Note that the IUSTI cluster performance is 20 percent faster
than the INRIA-nina cluster, an unexpected result.
However, the Communication/Work ratios for the 568K mesh with 16 processors are
much larger than for the 262K mesh using 8 processors4. In examining the computational
times for the the 262K and 568K runs, it was found that different compile options were
used5and explains the differences in the Communication/Work ratios. Therefore for the same mesh,
the more efficient the code (less Work per processor) the larger the Communication/Work ratios as the
communication times depend on the LAN speeds that remain unchanged6!
Table 13:
568K mesh: Globus performances on individual clusters - 16 CPUs -O1 option
Run type |
Globus |
Globus |
Globus |
Globus |
Name of cluster |
INRIA-nina |
iusti |
INRIA-pf |
cemef |
Processor speed |
2 GHz |
2 GHz |
1 Ghz |
1 GHz |
LAN speed |
1 Gbps |
100 Mbps |
100 Mbps |
100 Mbps |
cache |
512K |
512K |
256K |
256K |
RAM/CPUe |
1/2 GB |
1 GB |
1/4 Gb |
1/4 GB |
Executable size |
871 MB |
871 MB |
871 |
871 MB |
Number of processors |
8-8 |
8-8 |
8-8 |
8-8 |
Total computational time |
547.3 |
449.9 |
740.9 |
1039.8 |
Local inter-comm. time |
2.3 |
12.8 |
9.4 |
12.5 |
Global inter-comm. time |
280.8 |
178.9 |
288.7 |
277.3 |
Computational ratio |
1.00 |
0.82 |
1.35 |
1.90 |
Communication/Work |
1.07 |
0.74 |
0.67 |
0.39 |
|
Table 13 shows the performances
on the individual clusters
using the -O1 option. Comparison of Table 12 (-O3 option)
with Table 13 (-O1 option)
shows that compiling with the -O1 option reduces the Communication/Work time
ratios7.
Table 14:
568K mesh: inter-cluster performance - 16 CPUs -O1 option
Run type |
Globus |
Globus |
Globus |
Globus |
Globus |
Globus |
Name of cluster |
nina |
nina-pf |
nina-iusti |
nina-cemef |
pf-cemef |
iusti-cemef |
Processor speed |
2 GHz |
2/1 Ghz |
2/2 GHz |
2/1 Ghz |
1/1 Ghz |
2/1 Ghz |
Executable size |
343 MB |
343 MB |
343 MB |
343 MB |
343 MB |
343 MB |
Number of processors |
16 |
8-8 |
8-8 |
8-8 |
8-8 |
8-8 |
Total computational time |
547.3 |
702.3 |
1207.9 |
1322.4 |
1323.4 |
2041.6 |
Local inter-comm. time |
2.3 |
10.5 |
496.3 |
190.8 |
181.9 |
554.0 |
Global inter-comm. time |
280.8 |
279.5 |
449.2 |
411.4 |
411.4 |
449.4 |
Computational ratio |
1.00 |
1.28 |
2.21 |
2.42 |
2.41 |
3.73 |
Communication/Work |
1.07 |
0.70 |
3.61 |
0.83 |
0.81 |
0.97 |
|
Shown in Table 14 are some of the inter-cluster performances
with 16 processors. It is noted that the local communication times for nina-cemef and pf-cemef
are approximately two times smaller than the other inter-cluster combinations.
This astonishing observation cannot be explained.
Next: Influence of the "0"
Up: 568K mesh
Previous: 568K mesh
Stephen Wornom
2004-09-10