next up previous
Next: Analysis of Grid performance Up: latex2html_globus Previous: 568K mesh: Grid performance


Globus performance on a 1.03M vertices mesh

This section examines the MecaGRID performance for a large number of processors with a mesh containing 1.03M vertices.

Shown in Tables 17-18 are MecaGRID performances using 60-64 CPUs for the 1.03M mesh where Ts/Hr = time steps per hour, CRate8= Number of vertices * nstages/(computational time)/(number of time steps). The performances are for 10 time steps. The first column indicates the type of run ( g=Globus, ng=non-Globus). The second column gives the number of partitions (P) and the number of processors used. T1 is the computational time and T2 the communication time. W is the work = T1 - T2 and includes the time to write the solution files. Sav = the number of times the solution files were written. The times shown in Tables 17-18 are the average CPU times. For 64/48, a total of 48 processors are used for the 64 partition mesh( see Load balancing by processor speed (LB-2) in section 12. Table 17 shows that for non-Globus computations on the INRIA clusters one can compute at a rate of approximately 200 time steps/hour with the 1.03M vertices mesh writing solution files every 10th time step. When the solution files are written every two time steps using nina-pf processors, Table 18 shows a Globus computational rate on the order of 50. When inter-clusters are used the rate is on the order of 10 time steps per hour.


Table 17: 1.03M results Sav = 1
  Processor distribution  
Typ P/CPU nina pf cemef iusti Sav T1/T2 CRate Ts/Hr T/W
ng 64/64 32 32 - - 1 165/ 108 1870 218 1.9
ng 64/48 16 32 - - 1 185/ 113 1674 195 1.6
inter cluster  
g 60/60 32 - - 28 1 3523/ 2452 88 10 2.3



Table 18: 1.03M results Sav = 5
  Processor distribution  
Typ P/CPU nina pf cemef iusti Sav T1/T2 CRate Ts/Hr T/W
ng 64/64 32 32 - - 5 681/ 470 454 53 2.2
g 64/64 32 32 - - 5 655/ 442 472 55 2.1
g 62/62 32 - - - 5 637/ 417 485 57 1.9
inter cluster  
g 64/64 32 16 - 16 5 2796/ 1950 111 13 2.3
g 64/64 16 4 8 24 5 3520/ 2537 88 10 2.6


Four full Globus production runs (800 time steps) with the 1.03M mesh were attempted using 62 processors (32 nina CPUs + 30 iusti CPUs) and 60 processors (32 nina CPUs + 28 iusti CPUs). Three of the four runs failed when one of the requested CPUs failed to start execution. The problem of failing CPUs has existed for at least six months. It occurs at random, the job remains active blocking the CPUs until the job is killed. Two of the failed runs blocked the system for nina and IUSTI users for five and eight hours before being killed. This failing due to dying CPUs has been noted by other Globus users on the Globus users E-mail list to which all Globus users can subscribe. The Globus software is an evolutionary software, open source and free. Users download the software, install it, and test it. Bugs are found and usually reported on the Globus users E-mail lists often with fixes that they have found or simply bring the bugs to the attention of the Globus Alliance gurus who seek to fix the problems that occur in an evolutionary software. It is possible that this problem is solved in the newer versions of the Globus software.



Subsections
next up previous
Next: Analysis of Grid performance Up: latex2html_globus Previous: 568K mesh: Grid performance
Stephen Wornom 2004-09-10