The long-term goal is to have the MecaGRID clusters to function as a single computer similar to how the INRIA clusters function. INRIA users can submit a job to one or both the nina and the pf clusters. Four possible scripts for heterogeneous partitioning are:
bsub -n $2 -m linux-nina mpijob meshmig_grille $1 bsub -n $2 -m linux-pf mpijob meshmig_grille $1 bsub -n $2 -m "linux-nina linux-pf" mpijob meshmig_grille $1 or bsub -n $2 -m MyMachinefile.LINUX mpijob meshmig_grille $1 where MyMachinefile.LINUX is linux-nina linux-pf
Carrying this idea over to the MecaGRID, the user would submit his/her job similar to:
bsub -n $2 -machinefile MyMachinefile.LINUX mpijob aedif.x $1 where MyMachinefile.LINUX would look something like linux-nina linux-pf m3h-cluster sarek-cluster
The globus job manager would query the MecaGRID cluster's jobmanagers until the requested number of processors are available and then submit the job.
At the present time the MecaGRID does not function as described above. Presently, one must specify, in advance, the individual clusters to be used and the number of processors to be used on each cluster. This is done with a RSL (Resource Specification Language). This approach has the obvious disadvantage that global availability of processors is not taken into account. Therefore processors may be requested on a cluster that is fully saturated while another cluster with many processors available goes unused. An example RSL script using 64 processors is shown below requesting 40 processors on the INRIA clusters, 8 processors at the CEMEF and 16 processors on the IUSTI cluster:
+ ( &(resourceManagerContact="cluster.inria.fr") (label="subjob 0") (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0) (LD_LIBRARY_PATH /usr/local/globus/lib) (PGI /usr/local/pgi)) (directory =/net/home/swornom/Bubble_3d/64-proc) (executable=/net/home/swornom/Bubble_3d/64-proc/aerodia_globus_wornom.x) (stdout=globus_sophia_40_cemef_8_iusti_16.out) (stderr=globus_sophia_40_cemef_8_iusti_16.err) (count=40) (jobtype=multiple) (MaxWallTime=15) ) ( &(resourceManagerContact="m3h-cluster.polytech.univ-mrs.fr") (label="subjob 1") (environment=(GLOBUS_DUROC_SUBJOB_INDEX 1) (LD_LIBRARY_PATH /usr/lib/:/home/swornom/pgi/:/usr/local/globus/lib/)) (directory =/home/swornom/Bubble_3d/64-proc) (executable=/home/swornom/Bubble_3d/64-proc/aerodia_globus_wornom.x) (stdout=globus_sophia_40_cemef_8_iusti_16.out) (stderr=globus_sophia_40_cemef_8_iusti_16.err) (count=16) (MaxWallTime=15) ) ( &(resourceManagerContact="sarek-cluster.cma.fr") (queue=q8) (label="subjob 2") (environment=(GLOBUS_DUROC_SUBJOB_INDEX 2) (LD_LIBRARY_PATH /mecagrid/tmp/packages_globus/globus_RH7.1/lib: /mecagrid/nivet/pgi:/mecagrid/wornom/pgi)) (directory =/mecagrid/wornom/Bubble_3d/64-proc) (executable=/mecagrid/wornom/Bubble_3d/64-proc/aerodia_globus_wornom.x) (stdout=globus_sophia_40_cemef_8_iusti_16.out) (stderr=globus_sophia_40_cemef_8_iusti_16.err) (count=8) (MaxTime=15) ) where count = the number of processors (Nodes at the CEMEF)