PicsouGrid/Grid5000 Benchmark Results 2007-03-07

Other Results

Description

On 7 March 2007 I submitted 15 multi-node tasks to all the available Grid5000 clusters (Nancy was unavailable due to an upgrade), each one using at least 30 nodes. 8 clusters at 6 sites were able to run the jobs within a few hours (Lyon and Grenoble either still had the task queued the next day, or had failed/calcelled the task). After resubmission attempts over the next few days, 4 more clusters also completed the test.

Each task was a set of simple Monte Carlo simulations, where each one takes about 90 seconds on a "standard" desktop machine. Once the task started on the cluster, it spread to all nodes in the OAR_NODELIST and then forked one process per CPU.

The graphs below show a blue box for the cluster occupancy, which is the time from the task starting on a worker node until the last node in the nodelist has returned its results. The grey boxes signify either the task queue time or the task data stage-out time. The red boxes show the worker node occupancy, for each worker node. The black lines show the life line of the core Monte Carlo algorithm.

Images

(note: these are high resolution, in order to see the life-lines for individual cores)

Overall Results

Benchmark results. Red=simulation code. Blue=node occupancy.
Queue, Execute, and Clean, by site.

Cluster and Site Results

Bordeaux. NTP not configured properly on all nodes (clock skew).
Lille. NTP not configured properly on all nodes (clock skew).
Orsay. Surprising execution times.
Rennes Parasol
Rennes Paravent
Sophia Azur
Sophia Sol
Toulouse

Discussion

It can be seen that NTP configuration is quite important when trying to take timestamps from various components of a large distributed system. A number of sites/clusters/workers are clearly mis-configured.

The blue areas seen on the graphs show the wasted time, when I held the reservation for the nodes but wasn't actually utilising all of them for my computation. sim_eff is the total time of the simulation life lines divided by the total core reservation time (sum of black lines divided by blue box). node_eff is the total node occupancy time divided by the total core reservation time (sum of red boxes divided by blue box)

Something strange was going on at Orsay. Perhaps they run multiple jobs per node.

Follow up

I plan to re-run these tests once the NTP problems have been resolved, and also to execute it on the other clusters/sites, not represented here. I also plan to capture CPU model details, and to compare "single task per node" vs. "all cores per node". If you have any questions or would like the raw data or images in a better format, email me at Ian.Stokes-Rees _AT_ inria.fr.