PicsouGrid/Grid5000 Benchmark Results 2007-03-07
Other Results
Description
On 7 March 2007 I submitted 15 multi-node tasks to all the available Grid5000
clusters (Nancy was unavailable due to an upgrade), each one using at least 30
nodes. 8 clusters at 6 sites were able to run the jobs within a few hours
(Lyon and Grenoble either still had the task queued the next day, or had
failed/calcelled the task). After resubmission attempts over the next few
days, 4 more clusters also completed the test.
Each task was a set of simple Monte Carlo simulations, where each one takes
about 90 seconds on a "standard" desktop machine. Once the task started on the
cluster, it spread to all nodes in the OAR_NODELIST and then forked one process
per CPU.
The graphs below show a blue box for the cluster occupancy, which is
the time from the task starting on a worker node until the last node in the
nodelist has returned its results. The grey boxes signify either the task
queue time or the task data stage-out time. The red boxes show the
worker node occupancy, for each worker node. The black lines
show the life line of the core Monte Carlo algorithm.
Images
(note: these are high resolution, in order to see the life-lines for individual cores)
Overall Results
Benchmark results. Red=simulation code. Blue=node occupancy.
Queue, Execute, and Clean, by site.
Cluster and Site Results
Bordeaux. NTP not configured properly on all nodes (clock skew).
Lille. NTP not configured properly on all nodes (clock skew).
Orsay. Surprising execution times.
Rennes Parasol
Rennes Paravent
Sophia Azur
Sophia Sol
Toulouse
Discussion
It can be seen that NTP configuration is quite important when trying to take
timestamps from various components of a large distributed system. A number of
sites/clusters/workers are clearly mis-configured.
The blue areas seen on the graphs show the wasted time, when I held the
reservation for the nodes but wasn't actually utilising all of them for my
computation. sim_eff is the total time of the simulation life lines divided by
the total core reservation time (sum of black lines divided by blue box).
node_eff is the total node occupancy time divided by the total core reservation
time (sum of red boxes divided by blue box)
Something strange was going on at Orsay. Perhaps they run multiple jobs per node.
Follow up
I plan to re-run these tests once the NTP problems have been resolved, and also to execute it on the other clusters/sites, not represented here. I also plan to capture CPU model details, and to compare "single task per node" vs. "all cores per node". If you have any questions or would like the raw data or images in a better format, email me at Ian.Stokes-Rees _AT_ inria.fr.