QMIPS Deliverable D18

State of the Art and Advances in Load Balancing

This deliverable contains five papers dealing with problems of static load balancing and optimization. The questions that are examined concern both the distribution of demand among different servers and the sharing of servers among different demands. Analytical and numerical results are obtained.

Introduction to Deliverable D18

The paper Routing Among Different Nodes Where Servers Break Down Without Losing Jobs, is concerned with a model where jobs are generated by a single Poisson source and can be routed through N alternative parallel M/M/1 nodes. The routing decisions are Bernoulli, with probabilities depending on the operative state of the nodes but not on the queue sizes. The N servers have different characteristics. Each of them is subject to random breakdowns which leave the corresponding queue intact, but may affect the routing of jobs during the subsequent repair periods. For example, jobs need not be routed to a broken node if an operative one is available. The marginal equilibrium queue size distributions are determined by spectral expansion. This can be done, at least in principle, for any number of queues. Several routing strategies are evaluated and compared empirically. There is experimental evidence that some heuristics are near-optimal. In the special case N=2, it is also possible to find the joint distribution of the numbers of jobs in the two queues.

The paper Optimization of static traffic allocation policies considers the following traffic allocation problem: arriving jobs have to be assigned to one of a group of processors. The aim is to optimize system performance measures, such as mean waiting time of a job or total number of jobs in the system, under a given static allocation policy. Two static allocation policies are studied: probabilistic assignment and allocation according to a fixed pattern. For these two policies, general properties as well as optimization aspects are discussed.

The third paper, Static Optimization of Queueing Systems, discusses some recent developments in the optimal allocation of resources in queues. Special attention is given to three classes of problems: The first concerns the optimal assignment of a given number of servers, or a given service capacity, to queues in a network. The second involves the optimal scheduling of the visits of a single server to several queues (optimal polling). The third problem is related to the optimal splitting of a single arrival stream among several single server queues. This is again a problem of optimal Bernoulli routing. Now the servers are reliable, but the service times have general distributions. An exact characterisation of the optimal policy is obtained.

In the fourth paper, Failure detection algorithms for a reliable execution of parallel programs, a different kind of load balancing issue is discussed. Here, servers may break down during the execution of a parallel program. A key problem to study the tradeoff in the choice of failure detection algorithms: checking often for failures increases the chances of completing a program but puts more overhead on the system, whereas checking too seldom causes programs to fail and restart, thus increasing the average execution time.

Finally, the fifth paper, Approximate Solution Methods for Open Queueing Networks with Breakdowns and Repairs, analyzes a system where processors fail but jobs are not lost neither re-routed. This is a point of reference, allowing the comparison with systems where dynamic load balancing takes place, at the expense of more system overhead.


Contents of the Deliverable

D18-1
Routing Among Different Nodes Where Servers Break Down Without Losing Jobs, by Nigel Thomas and Isi Mitrani. Procs., IPDS'95, Erlangen, 1995 (front page). Also in Quantitative Methods in Parallel Systems, eds. F. Baccelli, A. Jean-Marie and I. Mitrani, Springer, Berlin, 1995. Report included.

D18-2
Optimization of static traffic allocation policies, M.B. Combe and O.J. Boxma. This paper appears in Chapter 6 of the proceedings of the Erlangen Workshop (Deliverable D1).

D18-3
Static Optimization of Queueing Systems, by O.J. Boxma. CWI Report BS-R9506, February 1995, Amsterdam; to appear in R.P. Agarwal (ed.), Recent Trends in Optimization Theory and Applications. Report included (front page).

D18-4
Failure detection algorithms for a reliable execution of parallel programs, by S. Chabridon and E. Gelenbe. 14th Symposium on Reliable Distributed Systems (SRDS-14), Bad Neuenahr, September 13-15, 1995, pp.229-238. Report included.

D18-5
Approximate Solution Methods for Open Queueing Networks with Breakdowns and Repairs, by R. Chakka and I. Mitrani. Stochastic Networks Workshop, Edinburgh, 1995; to appear in the book edited by S. Zachary and F. Kelly. Report included (front page).



Page Web by Alain Jean-Marie (Alain.Jean-Marie@sophia.inria.fr) Last Modified: 25 Oct 1995.