QMIPS Deliverable D9

Fixed-Point Methods


Introduction to Deliverable D9

i/ The paper Modelling and Evaluation of Cache Coherence Protocols in Multiprocessor Systems, by E. Ametistova and I. Mitrani, is concerned with the modelling and evaluation of systems where a number of processors, together with their associated caches, are connected to each other and to main memory by a bus. Parallel execution of programs in such systems causes interference between the processors and raises the question of maintaining the coherent state of common data held in several caches. Of particular interest here is the performance of the cache coherence protocols employed in the recently introduced R4000 processor. That processor was a prime candidate for the prototype fault-tolerant multiprocessor being developed at the time by the ESPRIT project FASST. This study resulted from a collaboration between QMIPS and FASST at Newcastle.

Two distinct versions of the cache coherence protocol are considered, both resembling the Berkeley protocol but also differing from it in some important respects. The difference between them lies in the way they handle a write hit access to a shared cache line: one version broadcasts a signal which causes all other caches to invalidate their copies, whereas the other broadcasts the new content of the line and all other caches update their copies. There are non-trivial trade-offs involved in choosing one or the other of the two variants, and the model can be used to assess the effect of various system parameters.

An exact solution of the model is unattainable, due to the size of the state space. However, a two-stage approximate analysis yields the desired results. First, the behaviour of an individual cache line is modelled by a finite-state Markov process. Its parameters are the number of processors, the ratio of read to write accesses and the ratio of hit to miss accesses. In order to reflect properly the interactions between processors, the generator matrix of that Markov process is expressed in terms of its own stationary distribution. Thus the problem is reduced to a fixed-point equation for an unknown probability vector.

The solution of the cache line model, together with the various access time characteristics of the hardware, provides the relevant bus traffic parameters: rate of bus requests per processor and average service time per request. In the second stage of the analysis, the bus is modelled as a single-server, finite-source queue, in order to determine the system power and other performance measures.

ii/ Fixed-point equations also arises in A Uniform-Memory Access Model of a Distributed Coherent Cache System, by A.J. Field, P.G. Harrison and N. Lehovetsky. This paper deals whith a system in which processors are connected in a ring. A segment of main memory and a local cache resides with each processor, which may need to access the cache or memory segment of any remote processor as well as its own.

Coherency in this system is achieved by maintaining a linked list of shared cache memory locations, with a pointer to the head of the list from the associated main memory block. The list is distributed, with pointers attached to each of the participating cache lines. In order to maintain memory consistency, messages must be sent to transfer data between processing nodes and to update the list.

The modelling approach is in two phases, each of which generates a fixed-point equation. In the first phase the state of each line is modelled as a Markov process which is solved directly, having only a small number of states. However, the state transition probabilities depend on the steady-state probabilities sought under certain approximations made to make the problem tractable. The resulting fixed point is obtained iteratively.

In the second phase, offered message traffic on the network is determined from the equilibrium probabilities estimated in phase one. Each processing node is represented - in the simplest, current version - by an M/G/1 queue with priority being given to ring traffic over new traffic emanating from the node. If nodes have to await a response before continuing processing, a further fixed-point equation results from modelling the network like a conventional token ring: the service time, i.e. time between initiating a memory access up to the time processing recommences, is a function of the delay experienced in all nodes on the ring which are statistically identical. This fixed-point equation is again solved by iteration. It should be noted that when significant pipelining is used, several requests can be generated by a processor without stopping processing. In this case, the constraint leading to the fixed point equation is absent and a simpler model results.


Contents of the Deliverable

D9-1
Modelling and Evaluation of Cache Coherence Protocols in Multiprocessor Systems, by E. Ametistova and I. Mitrani (front page).

D9-2
A Uniform-Memory Access Model of a Distributed Coherent Cache System, by A.J. Field, P.G. Harrison and N. Lehovetsky (front page).


Page Web by Alain Jean-Marie (Alain.Jean-Marie@sophia.inria.fr) Last Modified: 14 Oct 1995.