INRIA home page
Subsections
Parallel version of ALIAS-Maple
Clearly most of the algorithms of the ALIAS-C++ library have a
structure that allows a parallel implementation.
Hence ALIAS-Maple offers a parallel version of most its problem solving
procedures.
The general principle of the parallel version of the procedure
available in ALIAS-Maple is to
create a master
program that will process the initial search domain until a fixed
number of boxes remains to be processed. This master program will
be run on the computer on which is run the Maple session.
Then the master
program, that maintains a list of unprocessed input intervals, will
dispatch by an appropriate mechanism the input intervals to be
processed to a slave program that will be run on different computers.
This slave program will run a bisection process until some stop
criteria is fulfilled: at this point the slave program will send back
to the master program the list of unprocessed input intervals and,
eventually, the solutions that have been found.
As soon as all the available computers have received an interval input
to process the master process will check if any slave computer has
terminated its job: if this is the case the master will retrieve the
unprocessed boxes from the slave and add them to the list
of boxes to be processed. Then new boxes will be sent to the free computers.
If none of the slaves have terminated their jobs the master program
will process the current
box while monitoring if a slave as returned its result. As soon as
this computation is performed the master program will again check if
any slave computer has terminated its computation: if not it will
process the next box in its list.
This process will be repeated until all the boxes in the master list
have been processed and all the slaves are free.
The master and slave programs are created by writing an initial part
which is problem dependent and by concatenating to this initial part a
fixed code that depends only on the procedure that is used. The ALIAS
distribution includes generic fixed codes (which are called Master and Slave followed by the Maple procedure name, e.g.
MasterGeneralSolve.C, SlaveGeneralSolve.C).
For some procedures you may customize the fixed code by assigning the
variables `ALIAS/master_program`, `ALIAS/slave_program`
to a string that gives the name of your C++ code. To determine if you
have such possibility look at the global variables at the
on-line help for the procedure.
Stopping an ALIAS-C++ procedure
For using the previous mechanism it
is necessary to be able to stop an ALIAS-C++ procedure and recover
the current state of the process i.e. the solutions if any and the
remaining unprocessed boxes.
There are different mechanisms to stop the bisection process of an
algorithm:
- the number of boxes still to be processed
is greater than a given
threshold (this is obtained by setting the global ALIAS-C++ variable
ALIAS_Parallel_Max_Bisection
to ). This is the mechanism that is used when the master starts
processing the initial search box.
.
- as it may be necessary to perform a large number of bisection
before obtaining boxes it is necessary to have a safety mechanism
that will limit the number of bisection (otherwise a
slave may be stuck in a lengthy computation while the other slaves
are free). The
maximal number of bisection may be indicated by setting the global
ALIAS-C++ variable ALIAS_Parallel_Max_Split
and the algorithm will return if this
number is reached and the number of unprocessed boxes
is lower than ALIAS_Parallel_Max_Bisection (otherwise the
process will continue until this number is reached).
- the number of performed bisection is greater than a given
threshold whatever is the number of boxes still to be processed:
this is obtained by setting the global variable
ALIAS_Parallel_Max_Bisection
to
- the number of performed bisection is greater than a given
threshold and the number of boxes still to be processed
is lower than a given
threshold : this is obtained by setting the global ALIAS-C++ variable
ALIAS_Parallel_Max_Bisection
to and the global ALIAS-C++ variable
ALIAS_Parallel_Max_Box
to . A
safety mechanism is enforced in that case to avoid that only one of the
slave computer will perform all the computation: if
bisections have been done and there is more than
boxes to be processed the algorithm will return the error code
-1. The value of is given by the global C++ variable
ALIAS_Safety_Factor` with a default value of
2
- if we are using the reverse storage mode we may indicate that
the number of performed bisection must be greater than a given
threshold and the number of boxes still to be processed
is lower than a given
threshold : this is obtained by setting the global ALIAS-C++ variable
ALIAS_Parallel_Max_Bisection
to and the global ALIAS-C++ variable
ALIAS_Parallel_Max_Reverse
to
- the slave computation time has exceeded a fixed amount of time,
which probably indicate that it will be preferable to distribute the
treatment of the processed box among different slaves. This could be
done using the ALIAS-C++ variable ALIAS_TimeOut which indicates
the maximum
amount of time (in minutes) during which a slave may run (this amount
will be respected only approximately).
Using this stop criteria we may implement a parallel version of the
ALIAS-Maple procedures.
For a parallel implementation it is
hence necessary to have a message passing mechanism that enable
to send a box to a slave program on another computer and
to receive data from the slave computers. For the parallel
implementation we use the message passing mechanism
pvm10.1.
A very simple master
program using pvm
may be found in the ALIAS distribution under the name
MasterGeneralSolve.C while the corresponding slave program
is SlaveGeneralSolve.C: these programs are used by Maple
for creating the parallel implementation of the GeneralSolve procedure.
Here we use text message passing:
- the master program send a box with
unknowns by writing in a text buffer the keyword B followed by
reals which are the extremity of the box ranges for each
unknown.
- then it append to this message the value N of
ALIAS_Parallel_Max_Bisection with the keyword SP (hence
SP N is appended to the buffer. If N is negative then the
value M of ALIAS_Parallel_Max_Box is appended to the
buffer by appending the text P I M (the value of
ALIAS_Parallel_Max_Bisection is computed by the master program:
if the box is large a positive value is chosen while for
small box a negative value is given).
This buffer is then sent to a slave computer with the pvm_pkstr
procedure. As soon as a buffer is received by the slave computer the
value of the box to be processed is extracted from the
message using the ALIAS procedure ALIAS_Read_Buffer
(which also update the value of ALIAS_Parallel_Max_Bisection
and ALIAS_Parallel_Max_Box. As soon as the slave computer has
terminated its computation it send back to the master in a text buffer:
- the boxes still to be processed (using the keywords
B for each box or N is no box remain
to be processed)
- the solution that have been eventually found (using the keyword
S followed by the reals)
- a failure code if the slave has not been able to process the
box (using the keyword F followed by the error code)
For the algorithms using the derivatives the jacobian matrix is also
transferred to the slaves. For large system this may slow down
considerably the communication between the slaves and the
master. Hence it may be a good policy to inhibit the transfer of the
jacobian by setting the variable `ALIAS/transmit_gradient` to 0.
The message passing mechanism that has been chosen
is pvm10.2. Before using the
parallel procedures it is necessary to run pvm. This is usually
done by first setting the environment variable PVM_ROOT to the
name of the directory where you have installed pvm. Typically
this variable should contain something which looks like:
/u/PVM/pvm3
Then you create a file, called the hostfile, in which you
indicate the name of the slave machine that you intend to use in the
parallel computation. An interesting option is to indicate the full
path for the pvm daemon program that will be run by the
slaves. This option is indicated by the keyword dx=path. A
typical hostfile will look like:
cygnusx1 dx=/u/PVM/pvm3/lib/pvmd
penfret dx=/u/PVM/pvm3/lib/pvmd
camarat dx=/u/PVM/pvm3/lib/pvmd
Note that you may have as slave computers a mixture of SUN and PC. As
soon as this hostfile has been created you run pvm by:
pvm hostfile
After some time you will get a prompt and you may verify that all the
slave computers have been registered with the command conf. You
may then exit from the pvm console program with the command quit (you may also kill the pvm program with the command halt).
Most of the procedures available in the ALIAS-Maple library have a
parallel implementation. They have the same name than the non parallel
procedures except that they are prefixed by the word Parallel. Hence ParallelGeneralSolve is the name of the
parallel implementation of the procedure GeneralSolve.
There is an exception with ParallelContinuationSimplex which
exists only in a parallel implementation.
The parallel procedures
have the same initial arguments as the normal procedure except that they
have at the end two additional arguments:
- a list of strings: this list gives the name of the slave
computers. For example ["cygnusx1","molotova"] indicates that
the slave computers will be cygnusx1 and molotova.
- a list of pair of integers like [[N1,M1],[N2,M2]]. There
must be as many pairs of integers as the number of slave
computers. The first number in the pair indicates that the
corresponding slave will return approximately N1 new boxes
provided that the maximal width of the box given as input
for the slave is greater
than
`ALIAS/diam_switch` and that the mean diameter is larger than
`ALIAS/mean_diam_switch`: the exact number of returned boxes
is such that right after a bisection process the number of unprocessed
boxes is equal or larger than N1. Note that there is a
safety mechanism to avoid that a slave has a too large workload: if
the number of bisection that has been performed is larger than
`ALIAS/max_split`
(default value: 10000) and the number of unprocessed boxes is lower
or equal to N1, then the slave will return. A good policy is to
increase the value of `ALIAS/diam_switch` if you observe that
all the computation is done by the slave computers.
Otherwise if
the maximal width of the box given as input
for the slave is lower
than
`ALIAS/diam_switch`
the slave program will perform at
least M bisection (provided that there are still boxes
to process after this number of bisection)
and will return at most `ALIAS/bisection_slave`
(default value: 30). If M S bisections have been done and
still the number of unprocessed boxes is larger than
`ALIAS/bisection_slave`, then the slave will stop its
processing and will return an error code. In that case
the master program will bisect the range having the largest
width in the initial box, two new boxes
will be created and added to the list of unprocessed boxes. The value
of S is defined by `ALIAS/safety_factor`
with a default value of 2.
Clearly the efficiency of the parallel implementation is
heavily dependent on the values of N and M and diam_switch.
Another way to stop the computation of a slave is to set the flag
`ALIAS/time_out` to the maximum number of minutes that a slave
may run before sending the processed boxes to the master. By default
this value is set to 0 which means that a slave calculation will never
be stopped by a time-out signal.
Before using the parallel implementation it is compulsory to set some
variables:
- `ALIAS/bisection_master`: the number of
unprocessed boxes
that the master will have to generate before dispatching the boxes
to the slaves (default value: 30). It is also the bound on
the number of
unprocessed boxes that the master will compute when all the
slaves are busy before checking again if any slave is free (within the
master algorithm there is a mechanism to monitor if a slave has
completed its calculation but stopping the master algorithm allows to
ensure that all the computation is not done by the master). This
number should be small so that the master is not doing any heavy
computation while the slaves are waiting for boxes. This variable
corresponds to the C++ variable ALIAS_Parallel_Max_Bisection
- `ALIAS/working_directory`: the full path of the directory
in which the C++ master and slave programs will be
created
- `ALIAS/pvm`: the path of the include files for the pvm
library e.g. "/u/caphorn/PVM/pvm3/include"
- `ALIAS/libpvm`, `ALIAS/libpvm_linux`: the path of
the pvm library respectively for SUN and LINUX PC's. You should have
typically something looking like
"/u/caphorn/PVM/pvm3/lib/SUN4SOL2" and "/u/caphorn/PVM/pvm3/lib/LINUX"
- `ALIAS/gpp_sun`, `ALIAS/make_sun`: the full path
for the C++ compiler and the make program for SUN
workstation. Typically "/usr/local/bin/g++" and "/usr/ccs/bin/make"
- `ALIAS/gpp_linux`, `ALIAS/make_linux`: same for
PC's under linux.
Hence a typical Maple file for solving a system of equations EQ in the variable VAR within the ranges S on a mixture
of SUN and PC will be:
`ALIAS/profilN`:="/net/coprin/intervalles/Profil-linux":
`ALIAS/libN`:="/net/coprin/intervalles/Lib-linux":
`ALIAS/working_directory`:="/u/Interval/Maple":
`ALIAS/bisection_master`:=2:
`ALIAS/diam_switch`:=0.3:
`ALIAS/bisection_slave`:=15:
ParallelHessianSolve(ef2,VAR,S,["otway","molotova"],[[30,4000],[30,4000]]);
As soon as the Maple procedure run two C++ program will be generated:
one contained the keyword MASTER, the other one with the keyword
SLAVE. The program MASTER will be compiled on the machine
that run Maple and will control the process. The program that will run
the slaves is the SLAVE program: one or two such programs will
be generated according to the architecture of the slave computers
(slave program running on a Sun workstation will have their name ended
by SUN while the program running under Linux will be ended by
LINUX).
The solving parameters described in section 3.5 are the
one that will be used by the algorithm run by the slaves.
Hence it is necessary to specify the parameters for the algorithm that
is run by the master.
There is a simple rule: the parameters for the master program have the
same name with _master appended. Hence `ALIAS/maxkraw_master` is the parameter `ALIAS/maxkraw` that
will be used by the master while the slaves will use the value of
`ALIAS/maxkraw`.
There is an exception to this rule: the parameters for the 3B method
have _Master appended.
We indicate in table 10.1 parameters that appear in
the parallel version of the procedures, their
meaning, their default value and the corresponding name in the C++
library.
The parameters that play a role in the calculation are the same for
the slave program than for the non parallel version except for
the maximal number of boxes for the slave program which is
defined by `ALIAS/maxbox_slave`.
Table 10.1:
Parameters for parallel procedures
Parameter name |
Meaning |
C++ equivalent |
`ALIAS/bisection_master` |
maximal number of boxes |
ALIAS_Parallel_Max_Bisection |
|
that a master program may |
|
|
create before checking |
|
|
for free slaves |
|
`ALIAS/bisection_slave` |
maximal number of boxes |
|
|
returned by a slave |
ALIAS_Parallel_Max_Box |
`ALIAS/diam_switch` |
maximal width of a box |
|
|
before switching to direct |
|
|
storage mode in a slave |
Diam_Switch
|
`ALIAS/3B_Master` |
1 if 3B is activated |
|
|
for the master |
|
`ALIAS/Full3B_Master` |
full 3B |
|
|
for the master |
|
`ALIAS/Full3B_Change_Master` |
minimal change for the |
|
|
full 3B for master |
|
`ALIAS/Delta3B_Master` |
delta for master 3B |
|
`ALIAS/Delta3B_ARRAY_Master` |
individual delta for master 3B |
|
`ALIAS/Max3B_Master` |
maximal box width |
|
|
for applying 3B for master |
|
`ALIAS/Max3B_ARRAY_Master` |
individual maximal box width |
|
|
for applying 3B for master |
|
`ALIAS/Use_Simp_3B_Master` |
0: don't use simplification |
|
|
procedure in master |
|
`ALIAS/gpp_sun` |
location of the C++ compiler |
|
|
for Sun |
|
`ALIAS/gpp_linux` |
location of the C++ compiler |
|
|
for Linux |
|
`ALIAS/libpvm` |
location of the pvm library |
|
|
for Sun Os |
|
`ALIAS/libpvm_linux` |
location of the pvm library |
|
|
for Linux |
|
`ALIAS/make_sun` |
location of the make program |
|
|
for Sun Os |
|
`ALIAS/make_linux` |
location of the make program |
|
|
for Linux |
|
`ALIAS/maxgradient_master` |
maxgradient flag |
|
|
for the
master program |
|
`ALIAS/maxkraw_master` |
maxkraw flag |
|
|
for the
master program |
|
`ALIAS/maxnewton_master` |
maxnewton flag |
|
|
for the
master program |
|
`ALIAS/no_hessian_master` |
don't use hessian |
|
|
for evaluation in master |
|
`ALIAS/allows_n_new_boxes_master` |
allows_n_new_boxes |
|
|
for the master |
|
`ALIAS/max_reverse` |
see section 10.2 |
ALIAS_Parallel_Max_Reverse |
`ALIAS/max_split` |
see section 10.2 |
ALIAS_Parallel_Max_Split |
`ALIAS/max_split_master` |
max_split for the master |
|
`ALIAS/profilN` |
string that indicates the |
|
|
location of the BIAS/Profil |
|
|
library for slaves having |
|
|
a different architecture |
|
|
than the master |
|
`ALIAS/pvm` |
location of the pvm include file |
|
`ALIAS/safety_factor |
see section 10.2 |
ALIAS_Safety_Factor |
`ALIAS/store_gradient_slave` |
gradient storage |
|
|
activated for the slave |
|
|
|
ALIAS_Store_Gradient |
`ALIAS/time_out` |
maximal computation time |
|
|
of a slave |
ALIAS_TimeOut |
`ALIAS/working_directory` |
directory where is located the |
|
|
slave program |
|
|
For the
calculation of the master involving the simplex method
you may define the parameters of the simplex using one of the
following names
`ALIAS/min_improve_simplex_master`,
`ALIAS/full_simplex_master`.
Next: Customizing and improving the
Up: ALIAS-Maple
Previous: Utilities procedures of ALIAS-Maple