Next: Optimizing MecaGRID calculations Up: latex2html_globus Previous: Mesh partitioners for Grid

Submitting jobs to the MecaGRID

The long-term goal is to have the MecaGRID clusters to function as a single computer similar to how the INRIA clusters function. INRIA users can submit a job to one or both the nina and the pf clusters. Four possible scripts for heterogeneous partitioning are:

bsub -n $2 -m linux-nina              mpijob meshmig_grille $1
bsub -n $2 -m linux-pf                mpijob meshmig_grille $1
bsub -n $2 -m "linux-nina linux-pf"   mpijob meshmig_grille $1
or
bsub -n $2 -m MyMachinefile.LINUX     mpijob meshmig_grille $1
where
MyMachinefile.LINUX is
linux-nina
linux-pf

Carrying this idea over to the MecaGRID, the user would submit his/her job similar to:

bsub -n $2 -machinefile MyMachinefile.LINUX     mpijob aedif.x $1
where MyMachinefile.LINUX would look something like
linux-nina
linux-pf
m3h-cluster
sarek-cluster

The globus job manager would query the MecaGRID cluster's jobmanagers until the requested number of processors are available and then submit the job.

At the present time the MecaGRID does not function as described above. Presently, one must specify, in advance, the individual clusters to be used and the number of processors to be used on each cluster. This is done with a RSL (Resource Specification Language). This approach has the obvious disadvantage that global availability of processors is not taken into account. Therefore processors may be requested on a cluster that is fully saturated while another cluster with many processors available goes unused. An example RSL script using 64 processors is shown below requesting 40 processors on the INRIA clusters, 8 processors at the CEMEF and 16 processors on the IUSTI cluster:

+
( &(resourceManagerContact="cluster.inria.fr")
   (label="subjob 0")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
      (LD_LIBRARY_PATH /usr/local/globus/lib)
      (PGI /usr/local/pgi))
   (directory =/net/home/swornom/Bubble_3d/64-proc)
   (executable=/net/home/swornom/Bubble_3d/64-proc/aerodia_globus_wornom.x)
   (stdout=globus_sophia_40_cemef_8_iusti_16.out)
   (stderr=globus_sophia_40_cemef_8_iusti_16.err)
   (count=40)
   (jobtype=multiple)
   (MaxWallTime=15)
)
( &(resourceManagerContact="m3h-cluster.polytech.univ-mrs.fr")
   (label="subjob 1")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 1)
      (LD_LIBRARY_PATH /usr/lib/:/home/swornom/pgi/:/usr/local/globus/lib/))
   (directory =/home/swornom/Bubble_3d/64-proc)
   (executable=/home/swornom/Bubble_3d/64-proc/aerodia_globus_wornom.x)
   (stdout=globus_sophia_40_cemef_8_iusti_16.out)
   (stderr=globus_sophia_40_cemef_8_iusti_16.err)
   (count=16)
   (MaxWallTime=15)
)
( &(resourceManagerContact="sarek-cluster.cma.fr")
   (queue=q8)
   (label="subjob 2")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 2)
      (LD_LIBRARY_PATH /mecagrid/tmp/packages_globus/globus_RH7.1/lib:
         /mecagrid/nivet/pgi:/mecagrid/wornom/pgi))
   (directory =/mecagrid/wornom/Bubble_3d/64-proc)
   (executable=/mecagrid/wornom/Bubble_3d/64-proc/aerodia_globus_wornom.x)
   (stdout=globus_sophia_40_cemef_8_iusti_16.out)
   (stderr=globus_sophia_40_cemef_8_iusti_16.err)
   (count=8)
   (MaxTime=15)
)
where count = the number of processors (Nodes at the CEMEF)

Next: Optimizing MecaGRID calculations Up: latex2html_globus Previous: Mesh partitioners for Grid

Stephen Wornom 2004-09-10