next up previous Next: Customizing and improving the Up: ALIAS-Maple Previous: Utilities procedures of ALIAS-Maple

  • La page de J-P. Merlet
  • J-P. Merlet home page
  • La page Présentation de HEPHAISTOS
  • HEPHAISTOS home page
  • La page "Présentation" de l'INRIA
  • INRIA home page

    Subsections



    Parallel version of ALIAS-Maple

    Introduction

    Clearly most of the algorithms of the ALIAS-C++ library have a structure that allows a parallel implementation. Hence ALIAS-Maple offers a parallel version of most its problem solving procedures.

    The general principle of the parallel version of the procedure available in ALIAS-Maple is to create a master program that will process the initial search domain until a fixed number $N$ of boxes remains to be processed. This master program will be run on the computer on which is run the Maple session. Then the master program, that maintains a list of unprocessed input intervals, will dispatch by an appropriate mechanism the input intervals to be processed to a slave program that will be run on different computers. This slave program will run a bisection process until some stop criteria is fulfilled: at this point the slave program will send back to the master program the list of unprocessed input intervals and, eventually, the solutions that have been found.

    As soon as all the available computers have received an interval input to process the master process will check if any slave computer has terminated its job: if this is the case the master will retrieve the unprocessed boxes from the slave and add them to the list of boxes to be processed. Then new boxes will be sent to the free computers. If none of the slaves have terminated their jobs the master program will process the current box while monitoring if a slave as returned its result. As soon as this computation is performed the master program will again check if any slave computer has terminated its computation: if not it will process the next box in its list.

    This process will be repeated until all the boxes in the master list have been processed and all the slaves are free.

    The master and slave programs are created by writing an initial part which is problem dependent and by concatenating to this initial part a fixed code that depends only on the procedure that is used. The ALIAS distribution includes generic fixed codes (which are called Master and Slave followed by the Maple procedure name, e.g. MasterGeneralSolve.C, SlaveGeneralSolve.C).

    For some procedures you may customize the fixed code by assigning the variables `ALIAS/master_program`, `ALIAS/slave_program` to a string that gives the name of your C++ code. To determine if you have such possibility look at the global variables at the on-line help for the procedure.


    Stopping an ALIAS-C++ procedure

    For using the previous mechanism it is necessary to be able to stop an ALIAS-C++ procedure and recover the current state of the process i.e. the solutions if any and the remaining unprocessed boxes.

    There are different mechanisms to stop the bisection process of an algorithm:

    1. the number of boxes still to be processed is greater than a given threshold $N$ (this is obtained by setting the global ALIAS-C++ variable ALIAS_Parallel_Max_Bisection to $N-1$). This is the mechanism that is used when the master starts processing the initial search box. .
    2. as it may be necessary to perform a large number of bisection before obtaining $N$ boxes it is necessary to have a safety mechanism that will limit the number of bisection (otherwise a slave may be stuck in a lengthy computation while the other slaves are free). The maximal number of bisection may be indicated by setting the global ALIAS-C++ variable ALIAS_Parallel_Max_Split and the algorithm will return if this number is reached and the number of unprocessed boxes is lower than ALIAS_Parallel_Max_Bisection (otherwise the process will continue until this number is reached).
    3. the number of performed bisection is greater than a given threshold $M$ whatever is the number of boxes still to be processed: this is obtained by setting the global variable ALIAS_Parallel_Max_Bisection to $-M$
    4. the number of performed bisection is greater than a given threshold $M$ and the number of boxes still to be processed is lower than a given threshold $N$: this is obtained by setting the global ALIAS-C++ variable ALIAS_Parallel_Max_Bisection to $-M$ and the global ALIAS-C++ variable ALIAS_Parallel_Max_Box to $N$ . A safety mechanism is enforced in that case to avoid that only one of the slave computer will perform all the computation: if $M \times S$ bisections have been done and there is more than $N$ boxes to be processed the algorithm will return the error code -1. The value of $S$ is given by the global C++ variable ALIAS_Safety_Factor` with a default value of 2
    5. if we are using the reverse storage mode we may indicate that the number of performed bisection must be greater than a given threshold $M$ and the number of boxes still to be processed is lower than a given threshold $N$: this is obtained by setting the global ALIAS-C++ variable ALIAS_Parallel_Max_Bisection to $N$ and the global ALIAS-C++ variable ALIAS_Parallel_Max_Reverse to $M$
    6. the slave computation time has exceeded a fixed amount of time, which probably indicate that it will be preferable to distribute the treatment of the processed box among different slaves. This could be done using the ALIAS-C++ variable ALIAS_TimeOut which indicates the maximum amount of time (in minutes) during which a slave may run (this amount will be respected only approximately).

    Using this stop criteria we may implement a parallel version of the ALIAS-Maple procedures.

    Message passing mechanism and pvm

    For a parallel implementation it is hence necessary to have a message passing mechanism that enable to send a box to a slave program on another computer and to receive data from the slave computers. For the parallel implementation we use the message passing mechanism pvm10.1.

    Example of message passing mechanism

    A very simple master program using pvm may be found in the ALIAS distribution under the name
    MasterGeneralSolve.C while the corresponding slave program is SlaveGeneralSolve.C: these programs are used by Maple for creating the parallel implementation of the GeneralSolve procedure. Here we use text message passing:

    This buffer is then sent to a slave computer with the pvm_pkstr procedure. As soon as a buffer is received by the slave computer the value of the box to be processed is extracted from the message using the ALIAS procedure ALIAS_Read_Buffer (which also update the value of ALIAS_Parallel_Max_Bisection and ALIAS_Parallel_Max_Box. As soon as the slave computer has terminated its computation it send back to the master in a text buffer:

    For the algorithms using the derivatives the jacobian matrix is also transferred to the slaves. For large system this may slow down considerably the communication between the slaves and the master. Hence it may be a good policy to inhibit the transfer of the jacobian by setting the variable `ALIAS/transmit_gradient` to 0.

    pvm

    The message passing mechanism that has been chosen is pvm10.2. Before using the parallel procedures it is necessary to run pvm. This is usually done by first setting the environment variable PVM_ROOT to the name of the directory where you have installed pvm. Typically this variable should contain something which looks like:

     
    /u/PVM/pvm3
    
    Then you create a file, called the hostfile, in which you indicate the name of the slave machine that you intend to use in the parallel computation. An interesting option is to indicate the full path for the pvm daemon program that will be run by the slaves. This option is indicated by the keyword dx=path. A typical hostfile will look like:
     
    cygnusx1 dx=/u/PVM/pvm3/lib/pvmd
    penfret dx=/u/PVM/pvm3/lib/pvmd
    camarat dx=/u/PVM/pvm3/lib/pvmd
    
    Note that you may have as slave computers a mixture of SUN and PC. As soon as this hostfile has been created you run pvm by:
     
    pvm hostfile
    
    After some time you will get a prompt and you may verify that all the slave computers have been registered with the command conf. You may then exit from the pvm console program with the command quit (you may also kill the pvm program with the command halt).

    Parallel procedures in Maple

    Most of the procedures available in the ALIAS-Maple library have a parallel implementation. They have the same name than the non parallel procedures except that they are prefixed by the word Parallel. Hence ParallelGeneralSolve is the name of the parallel implementation of the procedure GeneralSolve. There is an exception with ParallelContinuationSimplex which exists only in a parallel implementation.

    The parallel procedures have the same initial arguments as the normal procedure except that they have at the end two additional arguments:

    Minimal parameters setting

    Before using the parallel implementation it is compulsory to set some variables:

    Hence a typical Maple file for solving a system of equations EQ in the variable VAR within the ranges S on a mixture of SUN and PC will be:

     
    `ALIAS/profilN`:="/net/coprin/intervalles/Profil-linux":
    `ALIAS/libN`:="/net/coprin/intervalles/Lib-linux":
    `ALIAS/working_directory`:="/u/Interval/Maple":
    `ALIAS/bisection_master`:=2:
    `ALIAS/diam_switch`:=0.3:
    `ALIAS/bisection_slave`:=15:
    ParallelHessianSolve(ef2,VAR,S,["otway","molotova"],[[30,4000],[30,4000]]);
    
    As soon as the Maple procedure run two C++ program will be generated: one contained the keyword MASTER, the other one with the keyword SLAVE. The program MASTER will be compiled on the machine that run Maple and will control the process. The program that will run the slaves is the SLAVE program: one or two such programs will be generated according to the architecture of the slave computers (slave program running on a Sun workstation will have their name ended by SUN while the program running under Linux will be ended by LINUX).

    Specific parameters for the parallel implementation

    The solving parameters described in section 3.5 are the one that will be used by the algorithm run by the slaves. Hence it is necessary to specify the parameters for the algorithm that is run by the master.

    There is a simple rule: the parameters for the master program have the same name with _master appended. Hence `ALIAS/maxkraw_master` is the parameter `ALIAS/maxkraw` that will be used by the master while the slaves will use the value of `ALIAS/maxkraw`. There is an exception to this rule: the parameters for the 3B method have _Master appended.

    We indicate in table 10.1 parameters that appear in the parallel version of the procedures, their meaning, their default value and the corresponding name in the C++ library.

    The parameters that play a role in the calculation are the same for the slave program than for the non parallel version except for the maximal number of boxes for the slave program which is defined by `ALIAS/maxbox_slave`.


    Table 10.1: Parameters for parallel procedures
    Parameter name Meaning C++ equivalent
    `ALIAS/bisection_master` maximal number of boxes ALIAS_Parallel_Max_Bisection
    that a master program may
    create before checking
    for free slaves
    `ALIAS/bisection_slave` maximal number of boxes
    returned by a slave ALIAS_Parallel_Max_Box
    `ALIAS/diam_switch` maximal width of a box
    before switching to direct
    storage mode in a slave Diam_Switch
    `ALIAS/3B_Master` 1 if 3B is activated
    for the master
    `ALIAS/Full3B_Master` full 3B
    for the master
    `ALIAS/Full3B_Change_Master` minimal change for the
    full 3B for master
    `ALIAS/Delta3B_Master` delta for master 3B
    `ALIAS/Delta3B_ARRAY_Master` individual delta for master 3B
    `ALIAS/Max3B_Master` maximal box width
    for applying 3B for master
    `ALIAS/Max3B_ARRAY_Master` individual maximal box width
    for applying 3B for master
    `ALIAS/Use_Simp_3B_Master` 0: don't use simplification
    procedure in master
    `ALIAS/gpp_sun` location of the C++ compiler
    for Sun
    `ALIAS/gpp_linux` location of the C++ compiler
    for Linux
    `ALIAS/libpvm` location of the pvm library
    for Sun Os
    `ALIAS/libpvm_linux` location of the pvm library
    for Linux
    `ALIAS/make_sun` location of the make program
    for Sun Os
    `ALIAS/make_linux` location of the make program
    for Linux
    `ALIAS/maxgradient_master` maxgradient flag
    for the master program
    `ALIAS/maxkraw_master` maxkraw flag
    for the master program
    `ALIAS/maxnewton_master` maxnewton flag
    for the master program
    `ALIAS/no_hessian_master` don't use hessian
    for evaluation in master
    `ALIAS/allows_n_new_boxes_master` allows_n_new_boxes
    for the master
    `ALIAS/max_reverse` see section 10.2 ALIAS_Parallel_Max_Reverse
    `ALIAS/max_split` see section 10.2 ALIAS_Parallel_Max_Split
    `ALIAS/max_split_master` max_split for the master
    `ALIAS/profilN` string that indicates the
    location of the BIAS/Profil
    library for slaves having
    a different architecture
    than the master
    `ALIAS/pvm` location of the pvm include file
    `ALIAS/safety_factor see section 10.2 ALIAS_Safety_Factor
    `ALIAS/store_gradient_slave` gradient storage
    activated for the slave
    ALIAS_Store_Gradient
    `ALIAS/time_out` maximal computation time
    of a slave ALIAS_TimeOut
    `ALIAS/working_directory` directory where is located the
    slave program


    For the calculation of the master involving the simplex method you may define the parameters of the simplex using one of the following names `ALIAS/min_improve_simplex_master`, `ALIAS/full_simplex_master`.


    next up previous Next: Customizing and improving the Up: ALIAS-Maple Previous: Utilities procedures of ALIAS-Maple
  • J-P. Merlet home page
  • La page Présentation de HEPHAISTOS
  • HEPHAISTOS home page
  • La page "Présentation" de l'INRIA
  • INRIA home page

    jean-pierre merlet
    2018-07-25