Chapter 9. The nbody example

9.1. Using facilities provided by ProActive on a complete example

9.1.1. Rationale and overview

This section of the guided tour goes through the different steps that you would take in writing an application with ProActive, from a simple design, to a more complicated structure. This is meant to help you get familiar with the Group facilities offered by ProActive. Please take note that this page tries to take you through the progression, step by step. You may find some more information, mainly on the design, on the web page of the applications/examples of ProActive. This is a snapshot of the ProActive nbody example running on 3 hosts with 8 bodies:

NBody screenshot, with 3 hosts and 8 bodies

Figure 9.1. NBody screenshot, with 3 hosts and 8 bodies

NBody screenshot, with the application GUI and Java3D installed

Figure 9.2. NBody screenshot, with the application GUI and Java3D installed

n-body is a classic problem. It consists in working out the position of bodies in space, which depend only on the gravitational forces that apply to them. A good introduction to the problem is given here. You may find a detailled explanation of the underlying mathematics here. Different ways of finding numerical solutions are given here.

In short, one considers several bodies (sometimes called particles) in space, where the only force is due to gravity. When only two bodies are at hand, this is expressed as

Fp->b is the force that p applies on b, G is the gravitational constant, mpmb describe the mass of the bodies, r is the distance between p and b, andu is a unit vector in the direction going from p to b. When we consider all the forces that apply to one given body, we have to sum up the contribution of all the other bodies:

This should be read as: the total force on the body b is the sum of all the forces applied to b, generated by all the other bodies in the system.

This is the force that has to be computed for every body in the system. With this force, using the usual physics formulae, (Newton's second Law)

one may now compute the movement of a particle for a given time step (a the acceleration, v the velocity, x the position, t the time):

9.1.2. Usage

With script located in the folder ProActive/script/[unix|windows] do:

$ nbody.[bat|sh] [-nodisplay | -displayft | -3d | -3dft] totalNbBodies maxIter
  • No parameter starting in default mode (2D).

  • -nodisplay starting in console mode.

  • -displayft starting with fault-tolerance configuration.

  • -3d starting GUI in 3D, must have Java3d (≥ 1.4) installed and also must have ProActive compiled with it installed.

  • -3dft same as above with fault-tolerance configuration.

  • totalNbBodies is the total number of bodies, default is 4 bodies.

  • maxIter is the maximun number of iterations, default is 10,000 iterations.

Right after starting the application, users have to choose one algorithm for computing. The choice is between:

  • Simplest version, one-to-one communication and master.

  • Group communication and master.

  • Group communication, odd-even-synchronization.

  • Group communication, oospmd synchronization.

  • Barnes-Hut.

Mouse controls with the 3D GUI:

  • Left click: rotating.

  • Right click: moving the scene.

  • Scroll whell: zoom in/out

9.1.3. Source files: ProActive/src/org/objectweb/proactive/examples/nbody

This guided tour is based on the files you may find in the directory ProActive/src/org/objectweb/proactive/examples/nbody. You'll find the following tree:

The nbody directory structure

Figure 9.3. The nbody directory structure

The common directory contains files reused through the different versions. 'simple' is the simplest example, 'groupcom' is the first example with Group communication, and 'groupdistrib' and 'groupoospmd' are two enhancements based on different synchronization schemes. 'barneshut' is a bit special, in that it contains a different algorithm to solve the nbody problem.

9.1.4. Common files

The files contained in 'common' are those that are reused throughout the different versions. Let's see what they do:

  • First of all there are the two files called Displayer.java and NBodyFrame.java. These handle the graphical output of the bodies, as they move about in space. They are not particularly of interest, as the GUI is not the point of this tutorial. Nonetheless, please note that the important method here is:

    public void drawBody(int x, int y, int vx, int vy, int weight, int d, int id) ;

    Taking position, velocity, diameter and a unique identifier of the body, it updates the display window.

  • Then, we have the files Force.java and Planet.java. They are used to compute the interaction between two distant bodies in the universe. Since they are in the common directory, they can be modified to include other forces (for example, collision) in a simple manner, which would be spread to all the examples. A Planet is no more than a point in space, with velocity and mass - the diameter expresses the size to use for the display:

             public class Planet implements Serializable{
                 public double mass;
                 public double x,y,vx,vy;
                     // position and velocity
                 public double diameter;
                       // diameter of the body, used by the Displayer
                       ...
    

    Please take note that it implements Serializable because it will be sent as parameter to method calls on Active Objects, but it is good practice to have all your ProActive classes implement Serializable. For example, migration requires everything to implement it, and the same with fault-tolerance....

    The Force class is just the implementation of what a physical force really is. It is the implementation of a 3D vector, with the method "add" following the physics rules.

    The equation of the force between two bodies

    Figure 9.4. The equation of the force between two bodies

  • Point3D.java and Cube.java are helper files. They simply implement what a point in space looks like, and what a region of space is. Of course, they were created as being Serializable.

  • And finally, the Start.java acts as the wrapper for the main() method. There is a part which reads command line parameters, counting bodies and iterations, and constructing the optional Displayer. Before choosing which example to run, it creates the nodes required by the simulation:

             // Construct deployment-related variables: pad & nodes
             descriptorPad = null;
             VirtualNode vnode;
             try { descriptorPad = ProActive.getProactiveDescriptor(xmlFileName); }
             catch (ProActiveException e) { abort(e); }
             descriptorPad.activateMappings();
             vnode = descriptorPad.getVirtualNode('Workers');
             Node[] nodes = null;
             try { nodes = vnode.getNodes(); }
             catch (NodeException e) { abort(e);
             }
    

    The Node [] nodes are the different JVMs that were created on possibly different machines. They are used for Active Object creation. They were specified in the descriptor used to deploy the application. You may find more information on these in Chapter 21, XML Deployment Descriptors, while Active Object creation is explained in Chapter 13, Active Objects: creation and advanced concepts. Just as an example, in the simple package, the Maestro is created on the first of these JVMs, and takes three parameters, a Domain [], an Integer, and a Start (it will be detailed later):

             Object [] constructorParams ;
             constructorParams = {domainArray, new Integer(maxIter), killsupport} ;
             maestro = (Maestro) ProActive.newActive 
               ( Maestro.class.getName(), constructorParams , nodes[0] ) ;
    

The files contained in the other directories, 'simple', 'groupcom', 'groupdistrib' , 'groupoospmd' detail steps of increasing complexity, making the application use different concepts. 'barneshut' contains the final implementation, featuring the Barnes-Hut algorithm. But let's not go too fast. Let's have a look at the insides of the simplest implementation of the n-body problem.

9.1.5. Simple Active Objects

This is the implementation of the simplest example of nbody. We defined the Planet to be a passive object, and it does nothing. It is a container for position, velocity and mass, as we've seen in the description given higher up. The real actors are the Domains, they do all the work. Every Planet in the universe is associated with a Domain, which is an Active Object. This Domain contains the code to manage the communication of the possitions of the Planets during the simulation. They are created in the Start.java file:

         Rectangle universe = new Rectangle (-100,-100,100,100);
         Domain [] domainArray = new Domain [totalNbBodies];
         for (int  i = 0 ; i < totalNbBodies ; i++)  {
             Object [] constructorParams = new Object [] {
                      new Integer(i),
                      new Planet (universe)
                 };
             try {
                 // Create all the Domains used in the simulation
                 domainArray[i] = (Domain) ProActive.newActive(
                         Domain.class.getName(),
                         constructorParams,
                         nodes[(i+1) % nodes.length]
                 );
             }
             catch (ActiveObjectCreationException e) { killsupport.abort(e); }
             catch (NodeException e) {  killsupport.abort(e); }
         }

See how the call to ProActive.newActive creates one new Active Object, a Domain, at each iteration of the loop. The array nodes contains all the nodes on which an Active Object may be deployed; at each iteration, one given node, ie one JVM, is selected. The constructorParams are the parameters that are to be passed to the constructor of Domain, and since it's an Object [] , the parameters may only be Objects (don't try to build constructors using ints in their constructor - this explains the use of the class Integer).

The Domains, once created, are initialized, and then they synchronize themselves by all pinging the maestro, with the notifyFinished call:

         // init workers, from the Start class
         for (int i=0 ; i < totalNbBodies ; i ++)
             domainArray[i].init(domainArray, displayer, maestro);
          // init method, defined within each worker
         
          public void init(Domain [] domainArray, Displayer dp, Maestro master) {
                   this.neighbours = domainArray;
                   .....
                   maestro.notifyFinished();  // say we're ready to start
                   }
         public void notifyFinished() {
             this.nbFinished ++;
             if (this.nbFinished == this.domainArray.length) {
                 this.iter ++; 
                 if (this.iter==this.maxIter)
                      this.killsupport.quit();
                 this.nbFinished = 0 ;
                   for (int i= 0 ; i < domainArray.length ; i++)
                       this.domainArray[i].sendValueToNeighbours();
           }
           }

Notice how domainArray is passed to all the Domains, when calling init. This is the value assigned to the local field neighbours, which later on serves to communicate with all the other Domains of the simulation.

The synchronization is done by the Maestro, which counts the number of Domains that have finished, and then asks them to go on to the next iteration. While in their execution, the Domains gather information concerning the position of all the other bodies, which need to be known to move the local Planet, at every time step. This is done using a push scheme. Instead of explicitly asking for information, this information is automatically issued:

       public void sendValueToNeighbours() {
            for (int i = 0 ; i < this.neighbours.length ; i ++)
               if (i != this.identification) // don't notify self!
                   this.neighbours[i].setValue(this.info, this.identification);
             .....  
            }
       
        public void setValue(Planet inf, int id) {
           this.values [id] = inf;
           this.nbReceived ++ ;
           if (this.nbReceived > this.nbvalues)  // This is a bad sign!
               System.err.println('Domain ' + identification + ' received too many answers');
           if (this.nbReceived == this.nbvalues) {
               this.maestro.notifyFinished();
               moveBody();
           }
           }  

This means that each Domain sends its information to all the other Domains, and then waits until it has received all the positions it is waiting for. The other Domains are stored as an array, which is called neighbours. You may find another view of this example on this web page.

9.1.6. Groups of Active objects

This is a simple improvement, which results in faster communication. You may have noticed the Group capabilities of ProActive. They give us the ability to call an operation on an object which is a Group, and have it sent to all the members of the Group. We can use them in this framework: first, create a Group (instead of having independant Active Objects) :

           // in the Start class
            Object [][] params = ...
            Domain  domainGroup = null;
            try {
                // Create all the Domains as part of a Group
                domainGroup = (Domain) ProActiveGroup.newGroup ( Domain.class.getName(), params,
 nodes);
            }
            catch ....>

The double array params stores the parameters passed to the constructors of the Domains we're creating. Domain 0 will have params[0][] passed as arguments, Domain 1 params[1][], and so on. The nodes are the Nodes on which to create these Active Objects. Do notice the try... catch construction which is needed around any creation of Active Objects because it may raise exceptions. In this previous bit of code, a Group containing new Active Objects has been created and all these Objects belong to the group . You may have noticed that the type of the Group is Domain. It's a bit strange at first, and you may think this reference points to only one Active Object at once, but that's not true. We're accesssing all the objects in the group, and to be able to continue using the methods of the Domain class, the group is typed as Domain, and that's the reason why it's called a typed Group.

Then this group is passed as a parameter to all the members of the Group in just one call:

         // Still in the Start class
            domainGroup.init(domainGroup, displayer, maestro);
           

This method sets the local field as a copy of the passed parameter, and as such is unique. We can play around with it without affecting the others. So let's remove the local Domain from the Group, to avoid having calls on self:

               public void init(Domain domainGroup, Displayer dp, Maestro master) {
                 this.neighbours = domainGroup;
                 Group g = ProActiveGroup.getGroup(neighbours);
                 g.remove(ProActive.getStubOnThis()); // no need to send information to self
               .....
              

Remember that in the previous example, the neighbours where stored in an array, and each was accessed in turn:

         for (int i = 0 ; i < this.neighbours.length ; i ++)
               if (i != this.identification) // don't notify self!
               this.neighbours[i].setValue(this.info, this.identification);

Well, that's BAAAAD, or at least inefficient! Replace this by the following code, because it works faster:

this.neighbours.setValue(this.info, this.identification);

This has the following meaning: call the method setValue, with the given parameters, on all the members of the Group neighbours. In one line of code, the method setValue is called on all the Active Objects in the group.

You may find another view of this example on this web page.

9.1.7. groupdistrib

Now, do we like the idea that the synchronization is centralized on one entity, the Maestro? I don't and it's the bottleneck of the application anyway: once a Domain has finished, it sends the notifyFinshed, and then sits idle. A way of making this better is to remove this bottleneck completely! This is done by using an odd-even scheme: if a Domain receives information from a distant Domain too early (ie in the wrong iteration), this information is stored, and will get used at the next iteration. In the meantime, the local Domain does not change its iteration, because it is still waiting for more results, in the current iteration.

         public void setValue(Planet inf, int receivedIter) {
           if (this.iter == receivedIter) {
             this.currentForce.add(info, inf);
             this.nbReceived ++ ;
             if (this.nbReceived == this.nbvalues)
                 moveBody();
         }
         else {
             this.prematureValues.add(new Carrier (inf, receivedIter));
         }
         }

Also notice how the computation is done incrementally when the result is received (this.currentForce.add(info, inf);), instead of when all the results have arrived. This allows for less time spent idle. Indeed, waiting for all the results before computing might leave idle time between setValue requests. And then, just before computing the new position of the body, the sum of all the forces has to be computed. It's better to have this sum ready when needed.

The prematureValues Vector is the place where we put the values that arrive out of sync. When a value is early, it is queued there, and dequeued as soon as this Domain changes iteration.

         public void sendValueToNeighbours()  {
                   reset();                  
                   this.iter++;
                   if (this.iter < this.maxIter) {                           
                       neighbours.setValue(this.info, this.iter);                      
                       ... // display related code
                       treatPremature();                  
                       }                  
                    ... // JVM destruction related code             
             }   
            

The treatPremature() method simply treats the values that were early as if they had just arrived, by calling the setValue method with the parameters stored.

You may find another view of this example on this web page.

9.1.8. Object Oriented SPMD Groups

This is another way to improve the groupcom example. It also removes the master, but this time by inserting oospmd barriers, that can be thought as behaving like the maestro class, but faster. To create functional OOspmd Groups, there is a special instruction, which takes the same parameters as a newGroup instruction:

         Object [][] params =  ...
         Domain domainGroup = null;
         try {
              domainGroup = (Domain) ProSPMD.newSPMDGroup( Domain.class.getName(), params, nodes);
             }
             catch ...

Now, to use this OOspmd group properly, we want to use the barrier() methods. We put these in the Domains code, to do the synchronization. What happens is that each Domain hits the barrier call, and then waits for all the others to have reached it, before reading its request queue again.

 public void sendValueToNeighbours() {
    this.neighbours.setValue(this.info, this.identification);       
    ProSPMD.barrier('barrier' + this.iter);
    this.iter++;
    this.asyncRefToSelf.moveBody();    
  ....
 

Beware, the stop-and-wait is not just after the barrier call, but instead blocks the request queue. So if there is code after that barrier, it will get executed. In fact, the barrier should be seen as a prioritary request on the queue. This explains why we had to put the code after the barrier as a method placed on an asynchronous refernce to self. If we hadn't done it that way, but just appended the code of that method just after the barrier, the call to moveBody() would be executed before the barrier execution, which is exactly what we don't want!

You may find another view of this example on this web page.

9.1.9. Barnes-Hut

This way to construct the nbody simulation is based on a very different algorithm. This is inserted to show how one can express this algorithm in ProActive, but breaks off from the previous track, having such a different approach to solving the problem. Here's how it works:

To avoid broadcasting to every active object the new position of every particle, a tree implementation can simplify the problem by agglomerating sets of particles as a single particle, with a mass equal to the sum of masses of the all the particles:. This is the core of the Barnes-Hut algorithm. References on this can be found for example here, and here. This method allows us to have a complexity brought down to O(N log N).

In our parallel implementation, we have defined an Active Object called Domain, which represents a volume in space, and which contains Planets. It is either subdivided into smaller Domains, or is a leaf of the total tree, and then only contains Planets. A Planet is still an Object with mass, velocity and position, but is no longer on a one-to-one connection with a Domain. We have cut down communications to the biggest Domains possible : when a Planet is distant enough, its interactions are not computed, but it is grouped with its local neighbours to a bigger particle. Here is an example of the Domains which would be known by the Domain drawn in red:

The Domain in the lower left hand-corner, drawn in blue, is also divided into sub-Domains, but this needs not be known by the Domain in red: it assumes all the particles in the blue Domain are only one big one, centered at the center of mass of all the particles within the blue.

In this version, the Domains communicate with a reduced set of other Domains, spanning on volumes of different sizes. Synchronization is achieved by sending explicitely iteration numbers, and returning when needed older positions. You may notice that some Domains seem desynchronized with other ones, having several iterations inbetween. That is no problem because if they then need to be synchronized and send each other information, a mechanism saving the older positions permits to send them when needed.

You may find another view of this example on this web page.

9.1.10. Conclusion

In this guided tour, we tried to show different facilities provided by ProActive, based on a real problem (nbody). We first saw how to deploy the application, then tuned it by adding Group communication, then removed a bottleneck ( due to the hard synchronization ) . Finally, given is the code associated to a different algorithm, which cumbersomely shows how to get Active Objects deployed along a tree structure to communicate. Remember that there is another explanation of all this on the web.