1 Introduction

1.1 Concurrent Programming

Concurrent programming is the use of several programs, running together in the same application. It is opposed to traditional sequential programming, in which there is one unique program, which thus coincides with the whole application. Concurrent programming is usually considered to increase modularity, as one can often naturally decompose a complex application in several cooperating programs which can be run concurrently. This is for example the case of servers in which each user request can quite logically be mapped to a specific program processing it. With concurrent programming there is no more need to code a complex application made of many distinct parts in one single program, as it is the case in sequential programming. There are two main variants of concurrent programming: the one in which the programs, called threads, are sharing a common memory; and the one in which programs are basically processes which have their own memory. The presence of a shared memory eases communication between threads, as threads can directly use shared data to communicate, while processes have to use specialized means, for example sockets, for the same purpose. But, the counterpart is that shared data may have to be protected from concurrent accesses from distinct threads. Of course, this is not necessary with processes in which, by definition, data are fully protected from being accessed by other processes. Parallelism is the possibility to simultaneously use several computing resources. It is available with multiprocessor machines, or more generally, when several machines are distributed over a network. It is well-known that parallelism is not the best solution in all cases, and there are systems which are more efficiently run by a single processor machine than by a multiprocessor one. Parallelism is now directly available, with no special effort, in standard OS such as Linux which can operate hardware mapped on Symmetric Multi-Processor (SMP) architectures. Parallelism appears quite natural in concurrent programming, as generally there is no logical need that concurrent programs should be run by one unique processor. Concurrent programming is largely recognized as being, in the general case, much more difficult than sequential programming. Some problems, such as deadlocks, are specific to it. Moreover, concurrent programs are most of the time nondeterministic, and thus more difficult to debug. For these reasons, concurrent programming is often considered as an activity which must be left to specialists. This seems of course largely justified in domains such as the programming of OS kernels. But the question remains of designing a set of concurrent primitives that can be not too difficult or painful to use, and to provide general users with it.

Concurrency at Language Level

Most of the time, programming languages do no offer any primitive for concurrent programming. This is the case of C and C++, in which concurrent programming is only available using system calls (for example fork or exec) or through libraries of threads (for example, the pthread library). Parallelism is also most of the time uncovered by programming languages primitives. The way to benefit from parallelism is either to use threads in a SMP context, or to design distributed systems where programs (often considered as distributed objects) communicate and synchronize through the network. Amongst the languages that have primitives to deal with concurrency, some use message passing such as Erlang, others propose "rendez-vous" such as Ada and Occam, and several, as CAML, directly include threads. The most achieved proposal of such an inclusion seems to be Java[8] in which threads are first-class objects and data can be protected using locks and monitors, as defined by E.W. Dijkstra in the sixties (see [14] for reference papers on concurrent programming). In Java, threads are supposed to be used in almost every context (the simplest applet involves several threads, one of them being for example dedicated to graphics). However, Java does not give particular means to ease thread programming, which remains difficult and error-prone (see [23] for a discussion on this point). Concerning parallelism, Java proposes the RMI mechanism to define distributed objects, methods of which can be invoked through the network. Remote method invocations basically use specialized threads dedicated to communications. Synchronous languages[19] propose concurrent programming primitives in the special context of reactive systems[20]. In this context, which is roughly the one of embedded systems, concurrent programs are deterministic and can be formally verified. However, synchronous languages suffer several limitations which limit their use. Amongst these is the limitation to static systems, in which the number of concurrent programs is known at compile time, and to mono-processor architectures. A proposal to extend the synchronous approach to dynamic systems (systems which are not static) is called the reactive approach. It is described in full details in [6]. Several language belonging to this approach have been designed; amongst them are Reactive-C[11] which implements the approach in C, and SugarCubes[13] which implements it in Java. Reactive programs can be very efficiently implemented as they are basically cooperative. However, up to now, they were not able to get direct benefit from multiprocessor machines. In this text, one proposes a language for concurrent programming inspired by the reactive approach and based on special threads, called fair threads. One major point is that the language contains primitives that allows multiprocessing. The rest of the section considers threads and presents the basics of the language proposal.

1.2 Threads

In sequential programming, a program is basically a list of instructions run by the processor. The program has access to the memory in which it can read and write data. Amongst the instructions are tests and jumps which are the way to control the execution flow. In this standard computing model, a program-counter points to the current instruction executed by the processor. The model of threads extends the previous sequential model by allowing the presence of several programs, the threads, sharing the same memory. There are thus several program counters, one by thread, and a new component, the scheduler, which is in charge of dispatching the processors to the threads. The transfer of a processor from one thread to another is called a context-switch, because it implies that the program-counter of the thread which is left must be saved, in order to be restored later, when the thread will regain the processor. When several threads can be executed, the scheduler can choose arbitrarily between them, depending on implementation details, and it is thus generally unpredictable to determine which one will finally get the processor. This is a source of nondeterminism in multi-threaded programming. The strategy of the scheduler is said to be preemptive when it is able to force a running thread to release a processor executing it, in which case one says that the thread is preempted. With a preemptive scheduler, situations where a non-cooperative thread prevents the others from any execution, are not possible. A preemptive scheduler can withdraw a processor from a running thread because of priority reasons (there exist threads with higher priorities in the system), or for other reasons. This is also a source of nondeterminism. In a preemptive context, to communicate or to synchronize generally implies the need to protect some shared data involved in the communication or in the synchronization. Locks are often used for this purpose, but they have a cost and are error-prone as they introduce possibilities of deadlocks, that are situations where two threads cannot progress because each one owns a lock needed by the other. Preemptive threads are generally considered to be well adapted for systems made of heavy-computing tasks and needing few communications and synchronizations. This is typically the case of Web servers, in which a thread (usually picked out from a pool of available threads) is associated to each new user request. Advantages are clear: first, modularity is increased, because requests do not have to be split in several small parts, as it would be the case with sequential programming. Second, servers can be run by multiprocessor machines without any change, and for example they can immediately benefit from SMP architectures. Finally, blocking I/Os, for example reading from a file, do not need special attention. Indeed, as the scheduler is preemptive, there is no risk that a thread blocked forever on an I/O operation will also block the rest of the system. The strategy of the scheduler is said to be cooperative if the scheduler has no way to force a thread to release the processor it owns. With this strategy, a running thread must cooperate and release by itself the processor from time to time, in order to let the other threads the possibility to get it. Cooperation is of course absolutely mandatory if there is only one processor: a running thread which would never release the processor would indeed forbid any further execution of the other threads. Cooperative threads, sometimes called green-threads, are threads that are supposed to run under the control of a cooperative scheduler. Cooperative threads are adapted for tasks that need a lot of communications between them. Indeed, data protection is no more needed, and one can thus avoid to use locks. Cooperative threads can be efficiently implemented at user level, but they cannot benefit from multiprocessor machines. Moreover, they need special means to deal with blocking I/O. Thus, purely cooperative threads are not sufficient to implement important applications such as servers.

Reuse of Sequential Code

It is often claimed that reuse of sequential code is made easier by using threads. What is meant is that a sequential program can be easily embedded in one thread, and then added into a system to be run concurrently with the others programs present in it. However, one generally cannot be sure that a sequential program never enters a loop in which actions leading to releasing the processor, such as I/Os, will never be performed. In a cooperative scheduler, such a non-cooperative program would block the whole system (supposing that the processor is unique). In a way, this is a major justification for preemptive scheduling: a non-cooperating program is no more a problem because it can now be preempted by the scheduler. This is however only one aspect of the problem, because a sequential program cannot generally be used directly as it, even in a preemptive context. The first reason is that some data may have to be protected from being accessed by other threads, which needs locks which were of course absent in the sequential code. A second reason is that in some cases, for example libraries, the sequential code must be made reentrant, because several threads can now make concurrent calls to it. Finally, the granularity of the execution, which is under control of the scheduler and only of it, can be too large and can need the introduction in the initial code of supplementary cooperation points. In conclusion, it seems that in all cases reuse of sequential code is largely an illusion in the context of multi-threading.

Portability

Portability is very difficult to achieve for multi-threaded systems. In the larger sense, portability means that a system should run in the same way, independently of the scheduling strategy. This is of course a quite impossible challenge in the general case: data should be protected, for the case of a preemptive scheduling, but cooperation points should also be introduced, for the case of a cooperative scheduling, and the conjunction of these two requirements would lead to very inefficient code. Portability is, thus, often considered as a goal that can only be partially achieved with multi-threaded systems. Note that this is a major problem in Java which claims to be portable but does not specify the scheduling strategy that should be used for multi-threaded applications.

Determinism and Debugging

The choices made by the scheduler are a major source of nondeterminism which make thread programming complex. In particular, debugging is more difficult than in sequential programming because one may have to take in account the choices of the scheduler when tracking a bug. Moreover, the replay of a faulty situation may be difficult to achieve, as it may actually depend on the scheduler choices. For example, it may happens that the introduction of printing instructions to trace an execution change the scheduling and makes the bug disappear. Nondeterminism seems inherent to preemption as it is very difficult to conceive a multi-threading framework both deterministic and preemptive. For the reader who is not convinced, imagine two threads running in a preemptive context, one which cyclically increments a counter, and one which prints the value of the same counter. What should be the (unique, because of the determinism) printed value and how can one simply find it? However, it should be noticed that it is possible to design completely deterministic frameworks based on cooperative threads, with clear and simple semantics. This is actually the case of fair threads which are introduced later.

Resources Issues

The way resources are consumed is a major issue for threads. Several aspects are relevant to this matter. First, as any standard sequential program, each thread needs a private memory stack to hold its local variables and the parameters and intermediate results of executed procedures. The use of a stack is actually mandatory in the context of a preemptive scheduler, because the moments context-switches occur are independent of the thread, which has thus no possibility to save its local data before the release of the processor. Note that the situation is different in a cooperative context, where context-switches are predictable. The necessity of a private stack for each thread certainly contributes to waste memory. Second, user threads are often mapped to kernel threads which are under supervision of the OS. Of course, this is only meaningful with a preemptive OS, in order to avoid the risk of freezing the whole machine when a non-cooperative thread is encountered. The need to perform context-switches at kernel level is generally costly in CPU cycles. An other problem that can occur with kernel threads is the limitation which generally exists on the number of such threads (typically, the number of allowed threads can vary on a Linux system from 256 to about 4 thousands). Several techniques can be used to bypass these problems, specially when large numbers of short-lived components are needed. Among these techniques are thread-pooling, to limit the number of created threads, and the decomposition of tasks in small pieces of code, sometimes called "chores" or "chunks", which can be executed in a simpler way than threads are. Preemptive threads (threads designed to be run by a preemptive scheduler) are also subject to an other problem: some unnecessary context-switches can occur which are time consuming. Actually, threads have no way at all to prevent a context-switch. Unnecessary context-switches causes a loss of efficiency, but this is of course the price to pay for preemptive scheduling.

Conclusion

Actually, programming with threads is difficult because threads generally have very "loose" semantics, which strongly depend on the scheduling strategy in use. This is particularly true with preemptive threads. Moreover, the semantics of threads may also depends on many others aspects, for example, the way priorities of threads are mapped at the kernel level. As a consequence, portability of multi-threaded systems is very hard to achieve. Moreover, multi-threaded systems are most of the time nondeterministic which makes debugging difficult. Cooperative threads have several advantages over preemptive ones: it is possible to give them a precise and simple semantics, which integrates the semantics of the scheduler. They can be implemented very efficiently. They lead to a simple programming approach, which limits the needs of protecting data. However, purely cooperative systems are not usable as it, and preemption facilities have to be provided in a way or an other. The question is thus to define a framework providing both cooperative and preemptive threads, and allowing programmers to use these two kinds of threads, according to their needs.

1.3 The LOFT Proposal

Loft basically gives users the choice of the context, cooperative or preemptive, in which threads are executed. More precisely, Loft defines schedulers which are cooperative contexts to which threads can dynamically link or unlink. All threads linked to the same scheduler are executed in a cooperative way, and at the same pace. Threads which are not linked to any scheduler are executed by the OS in a preemptive way, at their own pace. An important point is that Loft offers programming constructs for linking and unlinking threads. The main characteristics of Loft are:

It is a language, based on C and compatible with it.
Programs can benefit from multiprocessor machines. Indeed, schedulers and unlinked threads can be run in real parallelism, on distinct processors.
Users can stay in a purely cooperative context by linking all the threads to the same scheduler, in which case systems are completely deterministic and have a simple and clear semantics.
Blocking I/Os can be implemented in a very simple way, using unlinked native threads.
There exist instants shared by all the threads linked to the same scheduler. Thus, all threads linked to the same scheduler execute at the same pace, and there is an automatic synchronization at the end of each instant.
Events can be defined by users. They are instantaneously broadcast to all the threads linked to a scheduler; events are a modular and powerful means for threads to synchronize and communicate.
It can be efficiently implemented on various platforms. Implementation of the whole language needs the full power of native threads. However, the cooperative part of the language can be implemented without any native thread support. This leads to implementations suitable for platforms with few resources (PDAs, for example).

Loft is strongly linked to an API of threads called FairThreads[12]: actually, Loft stands for Language Over Fair Threads. This API mixes threads with the reactive approach, by introducing instants and broadcast events in the context of threads. Loft can be implemented in a very direct way by a translation into FairThreads (this implementation is described in section Implementation in FairThreads7.1).

1.4 Structure of the Document

Section The Language LOFT2 contains the description of the language. First, an overview of Loft is given. Then, modules and threads are considered. Atomic and non-atomic instructions are described. Finally, native modules are considered. A first serie of examples is given in section Examples - 13. Amongst them is the coding of a small reflex-game which shows the reactive programming style. The case of data-flow programming is also considered. Various sieves are coded, showing how the presence of instants can benefit to code reuse. Section Programming Style4 contains a discussion on the programming style implicitly supposed by Loft. Section Semantics5 contains the formal semantics of the cooperative part of Loft by means of a set of rewriting rules. FairThreads is described in section FairThreads in C6. Several implementations of Loft are described in section Implementations7. One consists in a direct translation in FairThreads. Another is a direct implementation of the semantics rules of section Semantics5. One implementation is suited for embedded systems. Section Examples - 28 contains a second serie of examples. Amongst them are some example which can benefit from multiprocessor architectures. A prey/predator demo running on an embedded system (the Game Boy Advance of Nintendo) is also described. Related work is described in section Related Work9 before the conclusion. A first annex gives the API of FairThreads. A second one is the Reference Manual of Loft. The third one is a quick description of the compiling command.

2 The Language LOFT

One first gives an overview of the language. Then, the basic notions of modules, threads, schedulers, events, and atomic and non-atomic instructions are presented. Finally, native modules are considered.

2.1 Overview of the Language

Modules and Threads

Modules are the basic units of the language. The syntax is based on C and modules are defined in files that can also contain C code. Threads are created from modules. A thread created from a module is called an instance of it. A module can have parameters which define corresponding parameters of its instances. Arguments provided when an instance is created are associated to these parameters. A module can also have local variables, new fresh instances of which are automatically created for each thread created from it.

Instructions

The body of a module is basically a sequence of instructions executed by its instances. There are two types of instructions: atomic instructions and non-atomic ones. Atomic instructions are run in one single step by the same processor. Usual terminating C code belongs to this kind of instructions. Another example of atomic instruction is the generation of an event, described later, which is broadcast to all the threads. Execution of non-atomic instructions may need several steps up to completion. This is typically the case of the instruction which waits for an event to be generated. Execution steps of non-atomic instructions are interleaved, and can thus interfere during execution. Note that all the execution steps of a non-atomic instruction may not be necessarily executed by the same processor.

Schedulers

The basic task of a scheduler is to control execution of the threads that are linked to it. The scheduling is basically cooperative: linked threads have to return the control to the scheduler to which they are linked, in order to let other threads execute. Leaving the control can be either explicit, with the instruction cooperate, or implicit, by waiting for an event which is not present. All linked threads are cyclically considered in turn by the scheduler until all of them have reached a cooperation point (cooperate, or waiting instructions). Then, and only then, a new cycle can start. Cycles are called instants. Note that the same thread can receive the control from the scheduler several times during the same instant; this is for example the case when the thread waits for a first event which is generated by another thread, later in the same instant. In this case, the thread receives the control a first time and then blocks, waiting for the event. The control goes to the others threads, and returns back to the first thread after the generation of the event. A schedulers define some kind of automatic synchronization which forces the threads linked to it to run at the same pace: all the threads must have finished their execution for the current instant, before the next instant can start. The order in which threads are cyclically considered is always the same: it is the order in which they have been linked to the scheduler. This leads to deterministic executions, which are reproducible. Determinism is of course an important point for debugging. At creation, each thread is linked to one scheduler. There exists an implicit scheduler to which threads are linked by default when no specific scheduler is specified. Several schedulers can be defined and actually running in the same program. Schedulers thus define synchronized areas in which threads execute in cooperation. Schedulers can run autonomously, in a preemptive way under the supervision of the OS. During their execution, threads created from special modules, called native modules, can unlink from the scheduler to which they are currently linked, and become free from any scheduler synchronization. Such free threads are run by kernel threads of the system. Of course, native modules and native threads have a meaning only in the context of a preemptive OS.

Memory Mapping

Three kinds of variables exist in Loft:

Global variables that are C global variables shared by all the threads. These variables are placed in the shared memory in which the program executes.
Local variables that are instances of local variables of modules. These variables are local to threads. Their specific characteristic is that their value is preserved across instants: they have thus to be stored in the heap.
Automatic variables which are defined in atomic instructions, and which are automatically destroyed when the atomic instruction in which they appear is left. The value of an automatic variable is not preserved from one instant to the next. As in C, automatic variables can be stored in registers or in the stack.

Note that non-native threads need a stack only for execution of their atomic instructions. Indeed, atomic instructions cannot be preempted during execution, by definition. Thus, in this case, the stack is always in the same state at the beginning and at the end of the execution of each atomic instruction. Thus, the content of the stack needs not to be saved when a context switch occurs. The situation is of course different for native threads: because they can be preempted arbitrarily by the OS, they need a specific stack, provided by the native thread in charge of running them.

Communication

The simpler way for threads to communicate is of course to use shared variables. For example, a thread can set a boolean variable to indicate that a condition is set, and others threads can test the variable to know the status of the condition. This basic pattern works if all threads accessing the variable are linked to the same scheduler. Indeed, in this case atomicity of the accesses to the variable is guaranteed by the cooperativeness of the scheduler. A general way to protect a data from concurrent accesses is thus to associate it with one unique scheduler to which threads willing to access the data should first link to. However, this solution does not work if some of the threads are non-native and belong to different schedulers or are unlinked. This is actually the standard situation in concurrent programming, where protection is basically obtained with locks. Standard POSIX mutexes can be used for this purpose. Events, described in the next section, give threads an other different means of communication.

Events

Events are synchronizing data basically used by threads to avoid busy-waiting on conditions. An event is created in a scheduler which is in charge of it during all its lifetime. An event is either present or absent during each instant of the scheduler which manages it. It is present if it is generated by the scheduler at the beginning of the instant, or if it is generated by one of the thread executed by the scheduler during the instant; it is absent otherwise. The presence or the absence of an event can change from an instant to another, but it cannot change during the course of an instant: all the threads always "see" the presence or the absence of an event in the same way, independently on the order in which they are scheduled. This is what is meant by saying that events are broadcast. Values can be associated to event generations; they are collected during each instant and are available only during it.

Implementation

The basic implementation of Loft is build on top of a C library of native threads. Each scheduler and each instance of a native module is implemented as a native thread. Threads which are instances of non-native modules do no need the full power of a native thread to execute. Actually, they don't need any specific stack as they can use the one of the native thread of the scheduler to which they are linked. In this way, the number of native threads can be limited to a minimum. This is important when a large number of concurrent activities is needed, as the number of native threads that can be created in systems is usually rather low. Unlinked threads and schedulers can be run in full parallelism, by distinct processors. Programs can thus take benefit of SMP architectures, and be speed-up by multiprocessors. Note that one gets program structures that conform to the so-called M:N architectures: M linked threads and N schedulers running them. An important point is that these architectures become programmable directly at language level, and not through the use of a library (as for example in the Solaris system of Sun).

2.2 Modules and Threads

Modules are templates from which threads are created. A thread created from a module is called an instance of it. A thread is always created in a scheduler which is initially in charge of executing it. Modules are made from atomic and non-atomic instructions that are defined in the next sections. A module can have parameters which define corresponding parameters of its instances. Arguments provided when an instance is created are associated to these parameters. A module can also have local variables, new fresh instances of which are automatically created for each thread created from it. The syntax of Loft is based on C and modules are defined in files that can also contain C code. The syntax of modules is:

module <kind> <name> ( <params> )
<locals>
<body>
<finalizer>
end module

<kind> is the optional native keyword. The case of native modules is considered in section Native Modules2.5.
<name> is the module name; it can be any C identifier.
<params> is the list of parameters; it is empty if there is no parameter at all.
<locals> is the optional list of local variables declarations. This list begins with the keyword local.
<body> is a sequence of instructions defining the module body.
<finalizer> is an optional atomic instruction executed in case of forced termination (instruction stop). This part begins with the keyword finalize.

As example, consider the module trace defined by:

module trace (char *s)
while (1) do
   printf ("%s\n",local(s));
   cooperate;
end
end module

This module is a non-native one and it has a parameter which is a character string. The body is reduced to a single while loop (actually, an infinite one) considered in section Loops2.4.4. At each instant, the loop prints the message in argument and cooperates. Note the presence of the do and end keywords around the loop body; indeed, as in Loft "{" and "}" are used to delimit atomic instructions (see C Statements2.3.4), one needs a different notation for loop bodies. The parameter s is actually a local variable which is passed at creation to the instances of trace. In Loft, all accesses to local variables, and thus also to parameters, must be of the form local(...) (local is actually a macro).

2.2.1 Local variables

Threads can have local variables which keep their values across instants. These are to be distinguished from static variable which are global to all the threads, and from automatic variables which loose their values across instants. Let us return to the previous module trace and suppose that one wants to also print the instant number. A first idea would be to define a variable inside the atomic instruction:

module trace (char *s)
while (1) do
   {
      int i = 0;
      printf ("%s (%d)\n",local(s),i++);
   }
   cooperate;
end
end module

This is not correct because the variable i is allocated in the stack at the beginning of the atomic instruction, and vanishes at the end of it. Thus, a new instance of the variable is used at each instant. A solution would be to declare a global C variable to store the instant number:

int i = 0;

module trace (char *s)
while (1) do
   printf ("%s (%d)\n",local(s),i++);
   cooperate;
end
end module

This solution produces a correct output. However, i is a global variable shared by all instances of trace. A local variable should be used if one wishes that each instance owns a distinct variable:

module trace (char *s)
local int i;
{local(i) = 0;}
while (1) do
   printf ("%s (%d)\n",local(s),local(i)++);
   cooperate;
end
end module

Note that local variables must always be accessed using the form local(...). This allows one to access in the same context a local variable and a global or automatic one having the same name.

2.2.2 Threads

Threads are instances of modules. Threads which are instances of a module m are created using the function m_create which returns a new thread. Threads are of the pointer type thread_t. Note that, as in many C APIs, all types defined in Loft have their name terminated by "_t". Arguments given to m_create are passed to the created instance (as in C, arguments are passed by value). For example, the following atomic instruction creates two instances of the previous module trace:

{
   trace_create ("first thread");
   trace_create ("second thread");
}

The output is:

first thread (0)
second thread (0)
first thread (1)
second thread (1)
first thread (2)
second thread (2)
...

Several points are important:

Threads are created in the implicit scheduler which is automatically created and started in all program (see Schedulers7.2.4).
Threads are automatically started. This is a difference with Java in which threads have to be explicitly started.
Threads are always executed in the same order, which is the order of their creation. Thus, the previous output is the only one possible.

It is possible to create a thread in a specific scheduler using a creation function of the form create_in. For example, an instance of trace is created in the scheduler sched by:

trace_create_in (sched,"a thread");

Threads are always incorporated in the scheduler at the beginning of an instant, in order to avoid interferences with already running threads. Moreover, if the current scheduler and the one in which the thread is created are the same, then this instant is the next one. The running thread is returned by the function self().

2.2.3 Finalizer

The finalizer of a module is an atom executed when an instance of the module is forced to terminate by a stop instruction (defined in Control over Threads). The finalizer is only executed if the stopped thread is not already terminated. A typical use of finalizer is for deallocation purposes, as in:

module m (...)
local resource_type resource;

{local(resource) = allocate ();}
...
{deallocate (local(resource));}

finalization
{deallocate (local(resource));}
end module

The allocated resource is deallocated in case of normal termination, at the end of the module body. It is also deallocated if termination is forced, as in this case the finalizer is executed. An important point to note is that the thread executing a finalizer is unspecified, as the instance considered is stopped. This means that self cannot be safely used in finalizers.

2.2.4 Main Module

The entry point of a program is a module with name main. An instance of it is automatically created in the implicit scheduler. For example, the previous output could be produced by the execution of a program with the following module main:

module main ()
trace_create ("first thread");
trace_create ("second thread");
end module

As in C, arguments can be given to the module main, following the standard argc/argv convention. The global program does not terminate when the main thread terminates: it is the programmer's responsibility to end the program (typically, by calling the exit function of C).

2.3 Atomic Instructions

In this section, one considers the atomic instructions which terminate at the same instant they start. We have already seen two examples in previous sections: call to the C function printf, and creation of instances of a module m with the m_create and m_create_in functions.

2.3.1 Schedulers

Creation

Schedulers can be created arbitrarily with the scheduler_create atomic instruction. They are of the type scheduler_t.

scheduler_t sched = scheduler_create ();

One instant of a scheduler is got by executing the acheduler_react atomic instruction.

scheduler_react (sched);

The scheduler_react function allows user to build hierarchies of schedulers and to make them react following arbitrary strategies.

Implicit and Current Schedulers

A scheduler, called the implicit scheduler, always exists in every program. Two atomic instructions are available to get the implicit scheduler and also the current scheduler, which is the one to which the executing thread is linked (the NULL value is returned if the thread is not linked to any scheduler).

scheduler_t implicit = implicit_scheduler ();
scheduler_t current  = current_scheduler ();

2.3.2 Events

Events are basically variables with 3 possible values: present, absent, or unknown. An event is created in a scheduler and managed by it during all its lifetime. The value of an event is automatically set to unknown at the beginning of each instant. There is a way for the programmer to set an event to present (namely, to generate it), but there is no way to set it to absent: only the system is able to decide that an event is absent. Actually, the system automatically decides that unknown events are absent when all threads have reached a cooperation point and when there is no thread waiting for an event which has been previously generated. The language definition assures that, once unknown events are declared absent, all is finished for the current instant (in particular, no event can be generated anymore during the same instant). As consequence, one can consider that:

Events are non-persistent data which lose their presence status at the beginning of each instant.
Events are broadcast: all threads always "see" their presence/absence status in the same way, independently on the order in which they are scheduled.
Reaction to the absence of an event is always postponed to the next instant.

The last point is a major difference with the synchronous approach, in particular with the Esterel[9] language (see Related Work9 for details).

Creation

A event is created in a scheduler whose instants determine its presence/absence status. event_create creates an event in the implicit scheduler. event_create_in creates an event in the scheduler given as argument. Events are of the type event_t.

event_t evt1 = event_create ();
event_t evt2 = event_create_in (sched);

The first event is created in the implicit scheduler while the second one is created in the scheduler sched.

Generation

Generation of an event makes it present in the scheduler in charge of it (the one in which the event has been created). If the thread running the generation is linked to the scheduler of the event, then the generation becomes immediately effective. In this case, all the threads waiting for the event will be re-scheduled during the same instant, and thus will see the event as present. If the scheduler of the event and the one of the thread are different, then the order to generate the event is sent to the scheduler of the event. In this case, the event will be generated later, at the beginning of the next instant of the target scheduler. Note that this can occur at an aritrary moment if the target scheduler is run asynchronously by a dedicated native thread. A value can be associated to an event generation; in this case one speaks on a valued generation of the event; in the other case, the generation is said to be pure. The value must be of a pointer type, or of a type that can be cast to such a type. All the values generated for a same event are collected during the instant and are available during this instant using the get_value non-atomic instruction (see section Generated Values2.4.8). A pure generation of a present (that is, already generated) event has no effect. This is of course different for a valued generation: then, the value is appended to the list of values of the event. Note that, as for the presence/absence status, the list of values of an event is automatically reset to the empty list at the beginning of each instant.

generate (evt);
generate_value (evt,val);

The first call is the pure generation of the event evt. The second call generates the same event, but with the value val. Note that a valued generation with a null value is not equivalent to a generation without any value.

2.3.3 Control over Threads

Three means for controlling threads execution are provided: the way to stop the execution of a thread which is thus forced to terminate definitively; the way to suspend and resume the execution of a thread. With these means, the user can design its own scheduling strategies for controlling threads.

Stop

Execution of a thread is stopped with a call to the stop instruction. To stop a thread means that its execution is definitively abandoned. The effect of the stop instruction is to send to the scheduler of the thread in argument the order to stop the thread. The order will always be processed at the beginning of a new instant, in order to avoid interferences with the other threads. If the executing thread and the stopped threads are both linked to the same scheduler, then the thread will be stopped at the beginning of the next instant. As example, a thread can stop itself by:

stop (self ());

Note that the thread terminates its execution for the current instant and is only effectively stopped at the next instant; the preemption is actually a "weak" one.

Suspend and Resume

Threads can be suspended and resumed using the two atomic instructions suspend and resume. Suspension of a thread means that its execution is suspended until it is resumed. Suspension and resuming are orders given to the scheduler running the considered thread, and are always processed at the beginning of new instants. Like for stop, if the executing thread and the concerned threads are both linked to the same scheduler, then the thread execution status will be changed at the beginning of the next instant.

suspend (main_thread);
resume (other_thread);

Orders are processed in turn. For example, the following sequence of instruction has no effect as thread is immediately resumed after being suspended:

suspend (thread);
resume (thread);

2.3.4 C Statements

Basic atomic instructions are standard C blocks, made of C statements enclosed by "{" and "}". Of course, the C statement must terminate and one must keep in mind that it should not take too much time to execute it. Indeed, the others threads belonging to the same scheduler cannot execute while the current thread has not finish to execute it (see the discussion on this point in Necessity of Cooperation4.1). Almost any C code can syntactically appear in atomic instructions, between "{" and "}". It is of the programmer's responsibility to avoid the use of any statement that would prevent the complete execution of an atomic instruction. For example, setjump/longjump should absolutely be avoided. However, some forbidden statements are detected, and lead to program rejection. This is the case of the C return statement. break statements not enclosed in a C bloc are also forbidden, as well as goto statements. Procedure calls are considered as atomic instructions and can thus be used directly without being put inside a C block. This is however just a syntactic facility and the semantics of an atomic action made of a C procedure call is exactly the one of the block containing it. For example, the module main of Main Module12.1.2 could be equivalently written:

module main ()
{
   trace_create ("first thread");
   trace_create ("second thread");
}
end module

2.4 Non-atomic Instructions

In this section, one considers the non-atomic instructions. As opposite to atomic instructions, execution of non-atomic instructions can last several instants.

2.4.1 Cooperate

The basic non-atomic instruction is the cooperate instruction which returns the control back to the scheduler. When receiving the control after a cooperate instruction, the scheduler knows that the executing thread has finished its execution for the current instant, and thus that is is not necessary to give it back the control another time during the instant. When the thread will receive the control in a future instant, the cooperate instruction terminates and passes the control in sequence. Execution of a cooperate instruction thus needs two instants to complete.

2.4.2 Await

Execution of an await instruction suspends the executing thread until the event is generated. There is of course no waiting at all if the event is already present. Otherwise, the waiting can just take a portion of the current instant, if the awaited event is generated later in the same instant, by a thread scheduled later; the waiting can also last several instants, or can even be infinite, if the awaited event is never generated.

await (event);

There is a way to limit the time during which the thread is suspended waiting for an event. The limitation is expressed as a number of instants. The thread is resumed when the limit is reached. Of course, the waiting ends, as previously, as soon as the event is generated before reaching the limit. For example, the following instruction suspends the executing thread at most 10 instant, waiting for the event e.

await (e,10);

After resuming, one can know if the limit was exceeded or if the event has been generated before the limit was reached, by examining the value returned by the predefined function return_code ():

ETIMEOUT means that the limit has been reached.
OK means normal termination because the awaited event is generated.

For example, the following code tests the presence of an event during one instant:

await (e,1);
{
   if (return_code () == ETIMEOUT) 
      printf ("was absent");
   else 
      printf ("is present"); 
}

Note that the message "was absent" is printed at the instant that follows the starting of the waiting, while "is present" is printed at the same instant. Reaction to the absence of an event (here, the printing action) cannot be immediate, but is always postponed to the next instant. The value returned by return_code () concerns the last non-atomic instruction executed. For example, in the following code, the printed messages concern f, not e:

await (e,1);
await (f,1);
{
   if (return_code () == ETIMEOUT) 
      printf ("f was absent");
   else 
      printf ("f is present");
}

2.4.3 Join

The join instruction suspends the executing thread until another one, given as argument, has terminated; this is called joining the thread. Note that there are actually two ways for a thread to terminate: either because execution has returned, or because the thread has been stopped. As for await, there is a possibility to limit the time during which the thread is suspended, and the return_code function is used in the same way to detect if the limit was reached.

join (th1);
join (th2,10);

The first instruction waits unconditionally for the termination of th1, while the second waits at most 10 instants to join th2.

2.4.4 Loops

There are two kinds of loops: while loops, that cycle while a boolean condition is true, and repeat loops that cycle a fixed number of times. There is no equivalent of a for loop, as such loops would need in most cases an associated local variable; for loops are thus, when needed, to be explicitly coded.

While

While loops are executing their bodies while a boolean condition is true.

while (1) do
   printf ("loop!");
   cooperate;
end

The value of the condition is only considered at the first instant and when the body terminates. For example, the following loop never terminates, despite the fact that the condition is false at some instants:

{local(i) = 1;}
while (local(i)) do
   {local(i) = 0;}
   cooperate;      
   printf ("loop!");
   cooperate;
   {local(i) = 1;}   
end

Instantaneous loops are loops with a non-cooperating body. This is for example the case of:

while (1) do
   printf ("loop!");
end

Instantaneous loops should generally be avoided as they forbid execution of the others threads (see Necessity of Cooperation4.1). Nevertheless, they can be useful in unlinked threads, or when the executing thread is the only one run by the scheduler (which is a rather special situation, indeed).

Repeat

A repeat loop executes its body a fixed number of times. It could be coded with a while loop and a counter implemented as a local variable; the repeat instruction avoids the use of such a local variable.

repeat (10) do
   printf ("loop!");
   cooperate
end

The expression defining the number of cycles is evaluated when the control reaches the loop for the first time.

2.4.5 If

The if statement can be used in atomic instructions as other standard C statements. However, when it is used in this context, both then and else branches must be atomic instructions. A non-atomic version of the if is available. The syntax is:

if (<expression>) then <instruction> else <instruction> end

For example, the instruction of Await12.9.4, which tests for the presence of e during the current instant, can be equivalently written as:

await (e,1);
if (return_code () == ETIMEOUT) then 
   printf ("was absent");
else 
   printf ("is present"); 
end

Both then and else branches are optional. The boolean expression exp is evaluated once, when the control reaches the if for the first time. Its value determines which branch is chosen for execution. The chosen branch is then executed up to completion, which can takes several instants, as it can be non-atomic. For example, the following instructions prints at the next instant the value exp has at the current instant:

if (exp) then
   cooperate;
   printf ("true!");
else
   cooperate;   
   printf ("false!");
end

2.4.6 Return

The return statement of C cannot be used in atomic instruction, as it would prevent the instruction to terminate. However, it can be used as a non-atomic instruction to force the termination of the executing thread. For example, the following module terminates when the boolean expression exp becomes true:

module m ()
while (1) do
   if (exp) then return; end
   cooperate;
end
end module

Note that, as the return statement is forbidden in atomic instructions, the following module is incorrect:

module bug ()
while (1) do
   {if (exp) return;} 
   cooperate;
end
end module

2.4.7 Halt

The halt instruction never terminates. It is equivalent to:

while (1) do cooperate; end

Its basic use is to block the execution flow, as in:

if (!exp) then
   generate (error);
   halt;
end
...

In this code, the if instruction tests exp and forbids the control to pass in sequence if it is false. In this case, an event is generated which should be used by an other thread to recover from this situation.

2.4.8 Generated Values

The get_value instruction is the means to get the values associated to an event by the valued generations of it. The values are indexed and, when available, returned in a variable of type void**. The instruction get_value (e,n,r) is an attempt to get the value of index n, generated for the event e during the current instant. If available, the value is assigned to r during the current instant; otherwise, r will be set to NULL at the next instant, and the function return_code () will return the value ENEXT. For example, the following instruction is an attempt to get the first value of evt (for simplicity, values are supposed to be of type int):

get_value (evt,0,res);
{
   if (return_code () == ENEXT) printf ("there was no value! ");
   else printf ("first value is %d ", (int)(*res));
}

The following module awaits an event and prints all the values generated for it:

module print_all_values (event_t evt)
local int i,int res;
await (local(evt));
{local(i) = 0;}
while (1) do
   get_value (local(evt), local(i), (void**)&local(res));
   if (return_code () == ENEXT) then return;
   else
      {
         printf ("value #%d: %d\n", local(i), local(res));
         local(i)++;
      }
   end
end
end module

Note that the module always terminates at the next instant. This should not be surprising: one must wait for the end of the current instant to be sure that all the values have been effectively got.

2.4.9 Run

A module can run another one, using the run non-atomic instruction. The calling thread suspends execution when encountering a run instruction. Then, an instance of the called module is created with the arguments provided. Finally, the calling thread is resumed when the instance of the called module terminates. Moreover, a thread which is stopped retransmits the stop order to the thread it is running, if there is one. Thus, the instance of a called module is automatically stopped when the calling thread is.

run mod1 ();
run mod2 ("mod2 called");

In this sequence, an instance of mod1 is first executed. When it terminates (if it does), then an instance of mod2 is executed with a string argument. Finally, the sequence terminates when the last instance does.

2.4.10 Link

When created, a thread is always linked to a scheduler. During execution, the thread can link to others schedulers, using the link instruction. The effect of execution the instruction link (sched) is to extract the executing thread from the current scheduler and to add it to sched, in the state in which it has left the current scheduler. Thus, after re-linking, the thread will resume execution in the new scheduler. This can be seen as a restricted migration of threads. For example, the following module cycles between two schedulers. On each scheduler, it awaits an event and then prints a message. Then, if the two events are generated at each instant, the program prints "Ping Pong" forever:

module play ()
while (1) do
   link (sched1);
   await (evt1);
   printf ("Ping ");
   link (sched2);
   await (evt2);
   printf ("Pong\n");
end
end module

2.5 Native Modules

Native modules only have meaning in the context of a preemptive OS as instances of native modules should be run by native threads belonging to the kernel. The presence of the native keyword just after the keyword module defines the module as native. The unlink non-atomic instruction unlinks the instance of a native module from the scheduler to which it was previously linked. After the unlink instruction, the thread is free from any synchronization. However, an unlinked thread can always re-link to a scheduler using the link non-atomic instruction (see Link12.7.2). The unlink instruction is specific to native modules: programs where unlink instructions appears in non-native modules are rejected at compile time. When unlinked, instances of native modules are autonomous, which means that they execute independantly of any scheduler, at their own pace. For example, consider the following program which creates two native threads instances of the same native module pr:

module native pr (char *s)
unlink;
while (1) do
   printf (local(s));
end
end module

module main ()
pr_create ("hello ");
pr_create ("world!
");
end module

The two lists of messages produced are merged in an unpredictable way in the output: nondeterminism is introduced by unlinked instances of native modules. Note that the loop in pr is instantaneous; this is not a problem as the thread is unlinked. The granularity of each thread is, however, under the dependance of the OS, which can be unacceptable in some situations (see Granularity of Instants4.2). Native module are important for using standard blocking I/Os in a cooperative context. For example, the following module uses the getchar function which blocks execution of the calling thread until a character is read on the standard input:

module native AnalyseInput ()
unlink;
...
while (1) do
   {
      switch (getchar ()){
         ...
      }
   }
end
end module

The first instruction unlinks the thread from the current scheduler. Then, the getchar function can be safely called without any risk to block other threads. The following module read_module implements a cooperative read I/O, using the standard blocking read function. The thread first unlinks from the scheduler, then performs the read, and finally re-links to the scheduler:

module native read_module (int fd,void *buf,size_t count,ssize_t *res)
local scheduler_t sched;
{local(sched) = current_scheduler ();}
unlink;
{(*local(res)) = read (local(fd),local(buf),local(count));}
link (local(sched));
end module

Note that the pattern suggested by this example is a very general one, useful to reuse code which was not designed to be run in a cooperative context.

3 Examples - 1

One considers several examples of programming in Loft. First, some basic examples are given in Basic Examples3.1. The complete programming of a little reflex game is considered in Reflex Game Example3.2. Sieve algorithms are coded in Sieve Examples3.3. Finally, data-flow programming is considered Data-Flow Programming3.4.

3.1 Basic Examples

One considers several basic examples which highlight some aspects of the language. Section Mutual Stops3.1.1 shows the benefit of a precise semantics. A notification-based communication mechanism is implemented in Wait-Notify3.1.2. Barriers are considered in Synchronization Points3.1.3. A reader/writer example is code in Readers/Writers3.1.4. Finally, a producer/consumer example is considered in section Producers/Consumers3.1.5.

3.1.1 Mutual Stops

Let us consider a system made of two threads implementing two variants of a service, say a fast one and a slow one. Two events are used to start each of the variants. After a variant is chosen, the other one should become unavailable, that is each variant should be able to stop the other variant. The coding of such an example is straightforward. First, the two threads fast and slow, and the two events start_fast and start_slow are declared:

thread_t fast, slow;
event_t  start_fast, start_slow;

The thread fast awaits start_fast to start. Then, before serving, it stops the slow variant. It is an instance of the following module fast_variant:

module fast_variant ()
await (start_fast);
stop (slow);
< fast service >
end module

The thread slow is an instance of slow_variant defined similarly by:

module slow_variant ()
await (start_slow);
stop (fast);
< slow service >
end module

The program consists in creating the two threads and the two events:

  ....  
  fast = fast_variant_create ();
  slow = slow_variant_create ();  
  start_fast = event_create ();
  start_slow = event_create ();
  ....

The question is now: what happens if both start_fast and start_slow are simultaneously present (this should not appear, but the question remains of what happens if by accident it is the case)? The answer is clear and precise, according to the semantics: the two variants are executed during only one instant, and, then, they both terminate at the next instant. Note that to insert a cooperate just after the stop in the two modules would prevent both threads to start servicing.. Now, suppose that the same example is coded using standard pthreads, replacing events by condition variables and stop by pthread_cancel. The resulting program is deeply non-deterministic. Actually, one of the two threads could cancel the other and run up to completion. But the situation where both threads cancel the other one is also possible in a multiprocessor context (where each thread is run by a distinct processor); however, in this case, both variants execute simultaneously while they are not canceled, which produces unpredictable results. Note that the possibility of mutual stopping exhibited in this example cannot be expressed in a synchronous language, because it would be rejected at compile time as having a causality cycle.

3.1.2 Wait-Notify

One considers thread synchronization using conditions which basically correspond to (simplified) condition variables of POSIX. A thread can wait for a condition to be set by another thread. The waiting thread is said to be notified when the condition is set. In a very first naive implementation, conditions are simply boolean variables (initially false). To notify a condition means to set the variable, and to wait for the condition means to test it.

int condition = 0;

void notify (int *condition)
{
   (*condition) = 1;
}

module wait (int *cond)
while ((*local(cond)) == 0) do
   cooperate;
end
{(*local(cond)) = 0;}
end module

Note that no mutex is needed: because of the cooperative model, simple boolean shared variables are sufficient to implement atomic test and set operations needed. There is however a major drawback: the module wait performs busy-waiting and is thus wasting the CPU resource.

3.1.2.1 Use of Events

Of course, events are the means to avoid busy-waiting. One now considers conditions as made of a boolean variable with an associated event:

typedef struct condition_t
{
   int condition;
   event_t satisfied;
}
*condition_t;

The notification sets the condition variable and generates the associated event:

void notify (condition_t cond)
{
   cond->condition = 1;
   generate (cond->satisfied);
}

The wait module awaits the event while the condition is not set:

module wait (condition_t cond)
while (local(cond)->condition == 0) do
   await (local(cond)->satisfied);
   if (local(cond)->condition == 0) then cooperate; end
end
{local(cond)->condition = 0;}
end module

Note that once the event is received, the condition must be tested again because an other thread waiting for the same condition could have been scheduled before. In this case, the wakening is not productive and the thread has just to cooperate. Without the second test, an instantaneous loop would occur in this situation. The solution proposed does not allow single notifications, where only one thread is notified at a time. Actually, all threads waiting for the same condition are simultaneously awaken and all but one will be able to proceed, the other ones returning to their previous state. This can be inefficient when several threads are often simultaneously waiting for the same condition.

3.1.2.2 Single Notification

To be able to notify threads one by one, an event is associated to each waiting thread. Thus, a condition now holds a list of events managed in a fifo way and storing the waiting threads. There are two notifications: notify_one which notifies a unique thread and notify_all which, as previously, notifies all the threads. The notify_one function gets the first event (get) and generates it:

void notify_one (condition_t cond)
{
   event_t event = get (cond);
   if (event == NULL) return;
   cond->condition = 1;
   generate (event);
}

The function get returns NULL if no thread is currently waiting on the condition. The notification is lost in this case (this conforms to POSIX specification). The notify_all function notifies all the threads in one single loop:

void notify_all (condition_t cond)
{
   int i, k = cond->length;
   for (i = 0; i < k; i++) notify_one (cond);
}

The module wait now begins with the creation of a new event which is stored (put) in the condition:

module wait (condition_t cond)
local event_t event;
while (local(cond)->condition == 0) do
   {
      local(event) = event_create_in (current_scheduler ());
      put (local(cond),local(event));
   }
   await (local(event));
   if (local(cond)->condition == 0) then cooperate; end
end
{local(cond)->condition = 0;}
end module

The notification of a unique thread does not imply the competition of all the other threads waiting for the same condition because they are actually waiting on distinct events. Note that the choice of a fifo strategy to manage events in conditions is quite arbitrary (while reasonable). This is of course a question which is inherent to the notify_one primitive: how is the notified thread chosen? Either which thread is notified is left unspecified, which introduces some nondeterminism, or the way threads are managed is specified, which can be felt as an over-specification. This is actually an issue which can only be solved by application programmers according to their needs.

3.1.3 Synchronization Points

A synchronization point is a given point in a program where different threads must wait until a certain number of threads have reached it. The type of synchronization points is defined by:

typedef struct sync_point_t
{
   int sync_count;
   int threshold;
   event_t go;
}
*sync_point_t;

If the threshold is reached, then the counter is reset to 0 and the event go is generated. Thus, all the waiting threads can immediately proceed. If the threshold is not reached, then the counter is incremented and the thread awaits go to continue.

module sync (sync_point_t cond)
if (local(cond)->threshold > local(cond)->sync_count) then
   {local(cond)->sync_count++;}
   await (local(cond)->go);
else
   {
      local(cond)->sync_count = 0;   
      generate (local(cond)->go);
   }
end
end module

3.1.4 Readers/Writers

One considers several threads that are reading and writing a shared ¹. The writers have priority over readers. Several readers can simultaneously read the resource while a writer must have exclusive access to it (no other writer nor reader) while writing. One adopts the terminology of locks and note rwlock_t the type of control structures:

typedef struct rwlock_t
{
   int lock_count;
   int waiting_writers;
   event_t read_go;
   event_t write_go;
}
*rwlock_t;

The convention for lock_count is the following: 0 means that the lock is held by nobody; when held by the writer, lock_count has value -1; when positive, lock_count is the number of readers currently reading.

3.1.4.1 Writers

In order to write, a writer must first run the module write_lock:

module write_lock (rwlock_t rw)
{local(rw)->waiting_writers++;}
while (local(rw)->lock_count) do
   await (local(rw)->write_go);
   if (local(rw)->lock_count) then cooperate; end
end

{
   local(rw)->lock_count = -1;
   local(rw)->waiting_writers--;
}
finalize
{
   local(rw)->waiting_writers--;
}
end module

When writing is finished, the writer must call the write_unlock function:

void write_unlock (rwlock_t rw)
{
   rw->lock_count = 0;
   if (!rw->waiting_writers) generate (rw->read_go);
   else generate (rw->write_go);
}

write_unlock should also be called in the finalizer of the writer in order to release the lock when the writer is forced to terminate.

3.1.4.2 Readers

A reader can read if no writer is currently writing (lock_count < 0) and if there is no waiting writer:

module read_lock (rwlock_t rw)
while ((local(rw)->lock_count < 0) || (local(rw)->waiting_writers)) do
   await (local(rw)->read_go);
end
{local(rw)->lock_count++;}
end module

After reading or when stopped, a reader should call the function read_unlock:

void read_unlock (rwlock_t rw)
{
   rw->lock_count--;
   if (!rw->lock_count) generate (rw->write_go);
}

3.1.5 Producers/Consumers

3.1.5.1 Unique area

One implements the simplest form of a producers/consumers system where several threads are processing values placed in a buffer. Each thread gets a value from the buffer, process it, and then put the result back in the buffer. To simplify, one considers that values are integers that are decremented each time they are processed. Moreover, the processing terminates when 0 is reached. First, a buffer implemented as a list of type channel, and an event are defined:

channel buffer;
event_t new_input;

The processing module cyclically gets a value from the buffer, tests if it is zero, and if it is not, processes the value. When it is finished, the value decremented by one is put back in the buffer. The event new_input is used to avoid busy-waiting while the buffer is empty (one considers that the buffer is rarely empty; so, a unique event is preferred to single notifications of section Wait-Notify3.1.2).

module process ()
local int v;
while (1) do
   while (buffer->length == 0) do
      await (new_input);
      if (buffer->length == 0) then cooperate; end
   end
   {local (v) = get (buffer);}
   if (local(v) == 0) then return; end
   < processing the value >
   {local(v)--;}
   put (buffer,local(v));
   generate (new_input);
end
end module

All accesses to the shared buffer are atomic under the condition that all instances of process are created in the same scheduler. This is of course the case when the implicit scheduler is the unique scheduler in use.

3.1.5.2 Two areas

One now considers a situation where there are two buffers in and out, and a pool of threads that take data from in, process them, and then put results in out. A distinct scheduler is associated to each buffer.

channel in, out;
scheduler_t in_sched, out_sched;
event_t new_input, new_output;

The module process gets values from in, avoiding busy-waiting by using the event new_input. A new thread instance of process_value is run for each value.

module process ()
while (1) do
    if (in->length > 0) then
      run process_value (get (in));
    else 
      await (new_input);
      if (in->length == 0) then cooperate; end
    end
end
end module

The process_value module process its parameter and then links to out_sched to deliver the result in out. At delivery, the event new_output is generated to awake threads possibly waiting for out to be filled.

module process_value (int v)
< processing the value >
link (out_sched);
put (out,local(v));
generate (new_output);
end module

This code shows a way to manage shared data using schedulers. The use of standard locks is actually replaced by linking operations. Of course, locks still exist in the implementation and are used when linking operations are performed, but they are totally masked to the programmer who can thus reason in a more abstract way, in terms of linking actions to schedulers, and not in terms of low-level lock take and release actions.

3.2 Reflex Game Example

In this section, one considers the example of a little game for measuring the reactivity of users. This example, issued from Esterel, shows how Loft can be used for basic reactive programming. The purpose of the game is to measure the reflexes of the user. Four keys are used. The c key means "put a coin", the r key means "I'm ready", the e key means "end the measure" and the q key means "quit the game". After putting a coin, the user signals the game that he is ready; then, he waits for the GO! prompt; from that moment, the measure starts and it lasts until the user ends the measure. After a series of four measures, the game outputs the average score of the user. The game is over when an error situation is encountered. There are actually two such situations: when the player takes too much time to press a button (the player has abandonned); or when e is pressed before GO! (this is considered as a cheating attempt). First, one codes a preemption mechanism which will be used for processing error situations.

3.2.1 Preemption

One considers the preemption of a thread by an event, which means that the thread must be stopped when the event becomes present. This can be obtained using the following module killer:

module killer (event_t event,thread_t thread)
await (local(event));
stop (local(thread));
end module

However, this definition is not completely satisfactory because, when the controlled thread terminates normally, that is without being preempted, the instance of killer should also be stopped. One thus encapsulates it in the following module until:

module until (thread_t thread,event_t event)
local thread_t kill;
{local(kill) = killer_create (local(event),local(thread));}
join (local(thread));
stop (local(kill));
end module

Then, the instance of killer is stopped when the controlled thread terminates, as in this case, it is joined. The preemption of a thread t by the event e thus takes the form:

run until (t,e);

3.2.2 The Reflex Game

Several variables, functions, and events are first introduced:

Several integer variables: pause_length, limit_time, total_time, and measure_number.
A function print_out to print messages.
Five events: coin_evt is generated when c is pressed; ready_evt is generated when r is pressed; end_evt is generated when e is pressed; tilt_evt is generated by the game when an error is detected; the value of display_evt holds the measures times to be printed.

int pause_length, limit_time, total_time, measure_numbers;
void print_out (char*);
event_t coin_evt, ready_evt, end_evt, display_evt, tilt_evt;

Phase 1

The first phase consists in waiting the notification that the user is ready. The waiting lasts at most limit_time instants, and if the timeout is reached, an error is produced and tilt_evt is generated. During the waiting, mistakes are signaled by a beep sound: a mistake here means that the user presses the e key instead of the r key. Beeps are produced by the following module beep_on:

module beep_on (event_t event)
while (1) do
  await (local(event));
  print_out ("\07");
  cooperate;
end
end module

The first phase starts by creating an instance of beep_on which reacts on end_evt. Then, ready_evt is awaited during at most limit_time instants. If the timeout is reached then tilt_evt is generated and the execution blocks. In all cases, the previously created instance of beep_on is stopped.

module phase1 ()
local thread_t beeper;
print_out ("press r when ready\r\n");   
{local(beeper) = beep_on_create (end_evt);}
await (ready_evt,limit_time);
stop (local(beeper));
if (return_code () == ETIMEOUT) then
   generate (tilt_evt);
   halt;
end
end module

Phase 2

In phase 2, a prompt message is output after a random number of instants and then a mew measure starts. Cheating attempt are detected: the player is cheating if he tries to press e in advance, trying to anticipate the prompt. In this case, an error is detected and the game terminates. The module detector detects the presence of an event and then generates an other event and stops a thread:

module detector (event_t trigger,event_t produced,thread_t thread)
await (local(trigger));
stop (local(thread));
generate (local(produced));
end module

The module phase2 begins by creating a cheating detector which generates tilt_evt as soon as e is pressed. Then, one waits for a random number of instants (returned by the function myrandom) before printing the prompt message. At that moment, the cheating detector is stopped.

module phase2 ()
local thread_t cheat_detector;
print_out ("wait...\r\n");
{local(cheat_detector) = detector_create (end_evt,tilt_evt,self ());} 
repeat (myrandom ()) do cooperate; end
print_out ("GO!\r\n");
stop (local(cheat_detector));
end module

Phase 3

Phase 3 consists in measuring the number of instants taken by the player to press e. An abandon (e is not pressed during limit_time instants) terminates the game. Moreover, mistakes (here, pressing r instead of e) are detected and signaled. Finally, the value measured is associated to display_evt to be printed. An auxiliary module increm for counting instants is first defined. Once started, it increments at each instant a global variable whose reference is passed as parameter:

module increm (int *where)
while (1) do
   {(*local(where))++;}
   cooperate;
end
end module

The phase3 module is defined by:

   
module phase3 ()
local int count, thread_t counter, thread_t beeper;
{
   local(count) = 0;
   local(counter) = increm_create (&local(count));
   local(beeper)  = beep_on_create (ready_evt);
}
await (end_evt,limit_time);
stop (local(counter));
stop (local(beeper));
if (return_code () == ETIMEOUT) then
   generate (tilt_evt);
   halt;
else
   {total_time += local(count);}
   generate_value (display_evt, (void*)local(count));
end
end module

Reflex Game Module

The final_display module waits during pause_length instants before generating display_evt with a value which is the average of the measured times:

module final_display ()
repeat (pause_length) do cooperate; end
print_out ("**** final ");
generate_value (display_evt, (void*)(total_time / measure_number));
end module

A measure consists in running in sequence the three previous phases. The module list_of_measures execute measure_number measures and finally runs an instance of final_display:

module list_of_measures ()
repeat (measure_number) do
   run phase1 ();
   run phase2 ();
   run phase3 ();
end
run final_display ();
end module

The reflex_game module waits for c and then executes a list of measures. The measures are preempted by tilt_evt which is generated in case of errors. The preemption results from the use of the until module defined in Preemption3.2.1:

module reflex_game ()
print_out ("A reflex game ... press c to start, q to stop.\r\n");
print_out ("You must press `e' as fast as possible after GO!.\r\n");
while (1) do
   {total_time = 0;}
   await (coin_evt);
   run until (list_of_measures_create (),tilt_evt);
   print_out ("game over. Press c to restart, q to stop.\r\n");
end
end module

3.2.3 Environment

One now considers the executing environment needed to use the reflex game. Two functions are first defined to put the terminal in a raw mode (to be able to directly get the key pressed by the player), and to restore the initial mode and terminate the game.

void the_beginning (void){
   system ("stty raw -echo");
}

void the_end (void){
   print_out ("It's more fun to compete ...\r\n");
   system ("stty -raw echo");
   exit (0);
}

Input

The input of the program is produced from the keys pressed. The keys are returned by the getchar function which is not cooperating. In order to use it, one thus defines the native module analyze_input which, after unlinking, cyclically calls getchar and generates the appropriate events:

module native analyze_input ()
unlink;
the_beginning ();
while (1) do
   {
      switch(getchar()){
         case 'c': generate (coin_evt); break;
         case 'r': generate (ready_evt); break;

         case 'e': generate (end_evt); break;
         case 'q': the_end ();
      }
   }
end
end module

Output

The results of measures are the values generated with display_evt. The way to get them is to use the get_value instruction. Note that only one value can be generated at a time; thus only the value of index 0 has to be considered. The module analyze_display is:

module analyze_display ()
local int res;
while (1) do
   await (display_evt);
   get_value (display_evt,0, (void**)&local(res));
   fprintf (stdout,"score: %d\r\n",local(res));
   cooperate;
end
end module

A message with a beep must be issued when tilt_evt is generated; this is done by the module analyze_tilt:

module analyze_tilt ()
while (1) do
   await (tilt_evt);
   print_out ("\07TILT!!\r\n");
   cooperate;
end
end module

Finally, the analyse of the output consists in creating two threads, one instance of analyze_display and one of analyze_tilt:

module analyze_output ()
analyze_display_create ();
analyze_tilt_create ();
end module

Main Module

The main module creates the five events and the three threads needed:

   
module main ()
{
   ready_evt      = event_create ();
   coin_evt       = event_create ();
   end_evt        = event_create ();
   display_evt    = event_create ();
   tilt_evt       = event_create ();
   analyze_input_create ();
   reflex_game_create ();
   analyze_output_create ();
}
end module

Two points are to be noticed. First, the order in which the threads are created corresponds to the intuitive sequence: input processing - reaction - output processing. Second, the values of pause_length and limit_time, and the definition of myrandom must be adapted to the executing platform, in order to get a playable system.

3.3 Sieve Examples

In this section, the focus is put on the dynamic creation of threads. One considers sieves which are algorithms for producing numbers with a given characteristics. The most well-known sieve is of course the sieve of Eratosthenes for prime numbers. Actually, there are two types of sieves: the number of elements of bounded sieves is statically fixed, while in unbounded sieves the number of elements is potentially infinite. Only unbounded sieves are considered here.

3.3.1 Standard Eratosthenes Sieve

The basic version of the unbounded sieve of Eratosthenes can be implemented by the following ²:

int main ()
{
   list_t primes = init_list ();
   int current = 3, create = 1;
   while (1) {
      cell_t cell = primes->first;
      while (cell != NULL) {
         if (multiple_of (current,cell->val)) {
            create = 0; break;
         }
         cell = cell->next;
      }
      if (create) {
         add_list (primes,current);
         output (current);
      } else create = 1;
      current += 2;
   }
   return 0;
}

The type list_t is the type of lists made of elements of type cell_t which contains an integer field val and a pointer next to the next cell. The definition of these two types is standard and not given here.

3.3.2 Eratosthenes Sieve in Loft

Dynamic memory allocation is somewhere needed in unbounded sieves. Actually, in Loft memory allocation can be masked by the creation of threads. The Eratosthenes's sieve can be thus basically implemented in Loft without the need of any user-defined data type.. One first defines a module which generates the odd numbers, one by instant. The odd numbers are associated to an event given as parameter.

module producer (event_t start)
local int i;
{local(i) = 1;}
while (1) do
   {local(i) += 2;}
   generate_value (local(start), (void**)local(i));
   cooperate;
end
end module

First, a macro is defined for waiting an event evt and assigning its first value to a variable res:

#define get_first_val(evt,res) await(local(evt));get_value(local(evt),0, (void**)&local(res))

An instance of filter_prime is created for each prime number passed as the value associated to the event in parameter:

module filter_prime (event_t me)
local int prime, int res, event_t next;
   get_first_val(me,prime);
   {
      output (local(prime));
      filter_prime_create (local(next = event_create ()));
   }
   while (1) do
      cooperate;
      get_first_val(me,res);
      {
         if (local(res) % local(prime))
              generate_value (local(next), (void**)local(res));
      }
   end
end module

The condition local(res) % local(prime) is true if local(res) is not a multiple of local(prime); in this case local(res) is transmitted to the next filter; otherwise, nothing is done. The main module creates a producer and a first filter, linked by the same event:

module main ()
{
   event_t start = event_create ();
   producer_create (start);
   filter_prime_create (start);
}
end module

The beginning of the output is:

 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73
79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167
173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263
269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367
373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541 547 557 ...

3.3.3 Lucky Numbers

lucky numbers[18] are generated by a sieve which is very close to the one of Eratosthenes. The production of lucky numbers is defined by the following algorithm: write all the odd numbers: 1,3,5,7,9,11,... The first number greater than 1 is 3, so delete every third number: 1,3,7,9,13,15,19,... The first number greater than 3 in the list is 7, so delete every seventh number: 1,3,7,9,13,15,21,25,31,... And so on, without ending. Numbers that remain are called lucky numbers. The sieve to produce lucky numbers is based on a filtering function which filters numbers according to their indexes and not to their values, as the Eratosthenes sieve.

module filter_lucky (int i,event_t trigger,event_t out)
local int base, int res, int k, event_t next;
   get_first_val(trigger,base);
   {
      generate_value (local(out), (void**)local(base));
      filter_lucky_create (local(k)= local(i)+1,
                           local(next) = event_create (),
                           local(out));
   }
   while (1) do
      cooperate;
      get_first_val(trigger,res);
      {
         local(k)++;
         if (local(k) % local(base)) 
            generate_value (local(next), (void**)local(res));
      }

   end
end module

The produced numbers are generated as values of the event out in parameter. Moreover, the indexes of the filtered values is given as first parameter to filter_lucky. The sieve for lucky numbers is defined by:

module main ()
{
   event_t go = event_create ();
   event_t result = event_create ();

   out1_create (result);
   producer_create (go);
   filter_lucky_create (1,go,result);
}
end module

The module out1 prints all the values associated to its parameter which is an event. The list of lucky numbers starts with:

 3 7 9 13 15 21 25 31 33 37 43 49 51 63 67 69 73 75 79 87
93 99 105 111 115 127 129 133 135 141 151 159 163 169 171 189 193 195
201 205 211 219 223 231 235 237 241 259 261 267 273 283 285 289 297
303 307 319 321 327 331 339 349 357 361 367 385 391 393 399 409 415
421 427 429 433 451 463 475 477 483 487 489 495 511 517 519 529 535
537 541 553 559 577 579 583 591 601 613 615 619 ...

3.3.4 Lucky Prime Numbers

Let us now consider the numbers that are simultaneously prime and lucky. The direct production of these lucky prime numbers is not so clear using a unique sieve (what would be the filtering function?). However, a possible solution is to reuse the two previous sieves and to build a system made of two sub-systems, one producing prime numbers and the other producing lucky numbers. The values produced by the two sub-systems are comming from the same event and are output by the module out2 which outputs a value when two generation of it are received during the same instant, meaning thus that the value has passed all the prime filters but also all the lucky filters. The definition of out2 is:

module out2 (event_t event)
local int k, int res, int all;
while (1) do
   await (local(event));
   {local(k) = 0; local(all) = 0;}
   while (!local(all)) do
      get_value (local(event),local(k), (void**)&local(res)); 
      {
         if (return_code () == ENEXT) {
            local(all) = 1;
            if (local(k) == 2) output (local(res));
         }
         local(k)++;
      }
   end 
end
end module

The module filter_prime is changed to let it produce its results as values of an event. The system is the following:

module main ()
{
   event_t go = event_create ();
   event_t result = event_create ();

   out2_create (result);
   producer_create (go);
   filter_lucky_create (1,go,result);
   filter_prime_create (go,result);
}
end module

The list of lucky prime numbers is:

3 7 13 31 37 43 67 73 79 127 151 163 193 211 223 241 283
307 331 349 367 409 421 433 463 487 541 577 601 613 619 631 643 673
727 739 769 787 823 883 937 991 997 1009 1021 1039 1087 1093 1117 1123
1201 1231 1249 1291 1303 1459 1471 1543 1567 1579 1597 1663 1693 1723
1777 1801 1831 1879 1933 1987 2053 2083 2113 2221 2239 2251 2281 2311
2467 2473 2557 2593 2647 2671 2689 2797 2851 2887 2953 2971 3037 3049
3109 3121 3163 3187 3229 3259 3301 3307 3313 ...

This example shows how code reuse can be made easier by the existence of common instants shared by all the threads.

3.4 Data-Flow Programming

In this section, one considers data-flow programming and shows how to implement it in Loft.

3.4.1 Introduction

One considers programs made of processes connected through channels. Channels are basically unbounded FIFO files. Processes can put values in channels (this is always possible as channels are unbounded) and get values from them. Each channel as at most one producer (a process that fills it) and at most one consumer (a process that empties it). Thus, there is no concurrency on channels accesses, except between producers and consumers. These programs are called data-flow programs because processes are basically creating and transforming flows of values. They were first introduced in [24], and they are actually the initial computing model at the basis of dataflow synchronous languages. There are two variants: in Nets of Sequential Processes, the consumer of an empty channel remains stuck until some value is (possibly) produced by the producer of the channel. In Nets of Reactive Processes, all processes share the same instants, and a process can thus detect that a channel is empty (and thus it can avoid to get stuck on it) during one instant. In what follows, one does not distinguish between the two variants.

3.4.2 Channels

Channels are basically lists of cells containing values, with an access to the first cell and to the last cell. Moreover, an event is associated to each channel for signaling that a new value is put in it. Cells hold integers of type value_t.

typedef struct channel_t {
   cell_t   first;
   cell_t   last;
   int      length;
   event_t  new_value;
} *channel_t;

The function to put a value into a channel is:

void put (channel_t chan,value_t v)
{
   if (chan->length == 0) {
      chan->last = cell_create (v);
      chan->first = chan->last;
      generate (chan->new_value);
   } else {
      chan->last->next = cell_create (v);
      chan->last = chan->last->next;
   }
   chan->length++;
}

Note the generation of the event hold by the channel, to signal the consumer that the channel is no more empty. Extraction of a value from a channel is performed by the function extract:

value_t extract (channel_t chan)
{
   cell_t head;
   value_t res;
   if (chan->length == 0) return NULL;
   head = chan->first;
   chan->first = head->next;
   res = head->val;
   free (head);
   chan->length--;
   return res;
}

The module get blocks execution until a first value can be extracted out of a channel:

module get (channel_t in,value_t *res)
while (local(in)->length == 0) do
   await (local(in)->new_value);
   if (local(in)->length == 0) then cooperate; end
end
{(*local(res)) = extract (local(in));}
end module

If the channel is not empty, the value is immediately extracted from it and assigned to the result variable. Otherwise, the event associated to the channel is awaited. Note that a process can extract several values from a channel during the same instant. Thus, the event associated to a channel may be present while the channel is empty, because the channel has been already emptied during the same instant. There are two consequences:

One must always test if the channel is empty, after receiving the event.
In case the channel is empty (which means that all the values in it have been previously extracted), the module should cooperate, to avoid cycling.

A macro is defined to replace runs of the get module:

#define GET(chan) \
   while (chan->length == 0) do\
      await (chan->new_value);\
      if (chan->length == 0) then cooperate; end\
   end

3.4.3 Processes

Processes are modules. One just presents here two basic modules: the first one produces integer values on its output channel, and the second one prints the values received on its input channel. The producer ³:

module producer_process (value_t from,value_t incr,channel_t out)
local value_t val;
{local(val) = local(from);}
while (1) do
   if (local(out)->length < 10) then
   {
      put (local(out),local(val));
      local(val) += local(incr);
   }
   end
   cooperate;
end
end module

The out_process process is:

module out_process (channel_t in)
while (1) do
   GET (local(in))
   printf ("%d ",extract (local(in)));
   fflush(stdout);
end
end module

Programs are built by connecting processes with channels. Each channel should have at most one producer and at most one consumer. For example, the following program is made of a channel (created with the function InitChannel) connecting an instance of the process Producer to an instance of the process Out:

module main()
{ 
   channel_t c = channel_create ();
   producer_process_create (0,1,c);
   out_process_create (c);
}
end module

The program prints the list of integers.

3.4.4 Prime Numbers

One programs in the dataflow style the Eratosthenes sieve to produce prime numbers. A list of filters corresponding to already produced prime numbers are put in sequence and connected through channels. A number that is able to pass all the filters is prime. A new filter is introduced for each new prime number that rejects multiples of it. The filter module is:

module filter_process (int index,int base,channel_t in,channel_t out,filter_t filter)
while (1) do
   GET (local(in))
   {
      int v = extract (local(in));
      local(index)++;
      if (local(filter) (local(index),local(base),v)) 
            put(local(out),v);
   }
end
end module

The list of filters is constructed by the module sift_process:

module sift_process (channel_t in,channel_t out,filter_t filter)
local int k;
{local(k) = 1;}
while (1) do
   GET (local(in))
   {
      channel_t intern = channel_create_in (current_scheduler ());
      int v = extract (local(in));
      put(local(out),v);
      local(k)++;
      filter_process_create (local(k),v,local(in),intern,local(filter));
      local(in) = intern;
   }

end
end module

Finally, the prime numbers are produced by the following program, where the filtering function primes (of type filter_t) filters the multiples of a base value:

int primes (int index,int base,int value)
{
   return value%base;
}

module main()
{ 
   channel_t c1 = channel_create ();
   channel_t c2 = channel_create ();
   producer_process_create   (3,2,c1);
   sift_process_create       (c1,c2,primes);
   out_process_create        (c2);
}
end module

4 Programming Style

The specificities of Loft leads to a programming style which can be sum-up by the following rules of thumb: don't forget to cooperate; define a good granularity of time for instants; don't loose events; choose a smart decomposition into distinct synchronous areas. One now considers these rules in turn.

4.1 Necessity of Cooperation

Fair threads are basically cooperative, that is they have to leave the control periodically and to return it to the scheduler they are linked to in order to let the others threads execute. There are basically two ways for a thread to be non-cooperative: to fall in an instantaneous loop, or to execute a never-terminating atom. It must be noticed that recursion cannot by itself lead to a lack of cooperation. One now consider these points in more details.

4.1.1 Instantaneous Loops

An instantaneous loop is a loop that never terminates and cycles forever during the same instant. Thus, it never cooperates and prevents the others threads from being executed. The first way to get an instantaneous loop is to forget an explicit cooperate statement, as in:

while (1) do 
   printf ("loop!"); 
end

The use of implicit cooperation points is not always the solution to break instantaneous loops. For example, consider the case where the waiting of an event triggers the execution of the loop body:

while (1) do
   await (evt);
   printf ("loop!"); 
end

Then, the infinite cycle does not immediately starts, but it does when the event becomes present. Indeed, from that moment, the await statement will always consider the event as present, and the loop becomes then instantaneous. To falsely assume that two events are never simultaneous can also lead to an instantaneous loop. Consider:

while (1) do
   await (evt1);
   await (evt2);
   { ... }
end

This code cannot lead to an instantaneous loop provided evt1 and evt2 are never simultaneous. However, there is an infinite cycle if both events are present at the same instant during the loop execution. Note that this may occur if evt1 and evt2 actually denote the same event. To produce a potentially infinite number of values can also be possible in an indirect way, through the get_value primitive. For example, consider:

while (search) do
   get_value (evt,i,&res);
   {
      if (return_code () == ENEXT) {
         search = 0;
      } else {
         printf ("value #%d: %d ", i,res);
         i++;
         generate_value (evt,i);
      }
   }
end

This piece of code can lead to an infinite loop (and also to the generation of an infinite number of values). Indeed, to get the ith value leads to generate the (i+1)th one. Thus, an instantaneous loop appears as soon as the first value is generated.

4.1.2 Non-terminating Atoms

A non-terminating atom (which is really bad-named in this case) also leads to an absence of cooperation. The purest form of a non-terminating atom is:

{
   for (;;) {}
}

There are, of course, many others ways for an atom to be non-terminating. Note that, in addition to terminate, an atom as also to do it properly: it is forbidden to jump in a way or an other, for example with a goto, outside the atom, as outlined in C Statements2.3.4.

4.1.3 Recursion

There is no possibility to recursively create infinitely many threads during the same instant, as thread creation is always postponed to the next instant. Thus, the following code, while not useful, will not prevent cooperation:

module m ()
m_create ();
end module

Of course, one can exhaust the system memory using recursion, by creating infinitely many new threads, as in:

module m ()
m_create ();
halt
end module

Recursive creation of threads is also possible using the run primitive, as for example in:

module m ()
run m ();
end module

However, as with an explicit creation of threads, there is no possibility using run to cycle forever during the same instant.

4.2 Granularity of Instants

Supposing that there is no instantaneous loop to prevent the passing of instants, the problem still remains to get a correct granularity for instant durations. Indeed, if instants take too much time, reactivity of the whole system could be lost. This problem is actually central in the context of Loft. Granularity of unlinked threads is not under control of the user, but only of the scheduler. Of course, standard methods can then be used, for example based on the use of the yield primitive, but with all their standard problems too. One does not consider this case in more details here. With non-native threads, users can have some control on the granularity of instants, based on cooperation points either explicit (cooperate) or implicit (await). For example, consider the following atom which calls a function f 2*N times during the same instant:

{
   int i;
   for (i=0; i<2*N; i++) {
      f ();
   }
}

It can be split into 2 instants, just by introducing a cooperate statement:

{
   int i;
   for (i=0; i<N; i++) {
      f ();
   }
}
cooperate;
{
   int i;
   for (i=0; i<N; i++) {
      f ();
   }
}

Then, the complete execution of the series of calls takes two instants instead of one, leaving the other threads the possibility to execute after the first half of the calls has been performed. At the extreme, f can be called once by instant:

repeat (2*N) do
   f ();
   cooperate
end

Introducing supplementary cooperation points in a program will give others programs more opportunities to execute. The point however is that the introduction of supplementary cooperating points must remain compatible with the use of events. Remember indeed that events are not persistent data and are automatically reset as absent at the beginning of each new instant.

4.3 Use of Events

4.3.1 Busy-waiting

A major use of events is to avoid busy-waiting on a condition (just like condition-variables with standard pthreads). For example, consider a system where one thread sets a boolean variable X to true to trigger a processing done by another thread. A possibility is that the processing thread tests X at each instant:

module processing_thread ()
   while (!X) do cooperate end;
   ....
end

Note that no race condition on X can occur, as the framework is cooperative. This solution is however unsatisfactory because the processing thread tests X at each instant while it is false, which is of course inefficient. The correct solution is to wait for an event generated to trigger the processing thread, which then becomes:

module processing_thread ()
   await (X);
   ....
end

Now, the processing thread does not consume any resources while waiting for X. As a side effect of using events, one is sure that the processing starts at the very same instant X is generated. This is to be compared with the previous solution using a boolean variable, where the processing is delayed to the next instant if X is tested before being set to true.

4.3.2 Lost Generations

A rather delicate point, using events, is that they may be awaited or generated at the wrong moment. For example, let us return to the previous example. If the processing thread is created after the generation of the event, then processing will not be done as the thread will never receive the event. For example, the following code is not correct (more precisely, the processing will not be done immediately despite the generation of X):

generate (X);
processing_thread_create ();

Indeed, the processing thread will only be created at the next instant, and thus will not see X as being present. Actually, the generation of X has been lost.

4.3.3 Deadlocks

As there is no lock, deadlocks, considered in the original meaning of the term, cannot occur. However, it is possible to make some analogy with situations in Loft. Actually, the closest analogy with a deadlock would be a situation made of two threads, each of them preventing the other from closing the current instant. This, would be the case if one thread tests the variable X with the atom:

{
   while (!X) {}
   Y=1;
}

while the other tests the variable Y with:

{
   while (!Y) {}
   X=1;
}

This is however an incorrect situation because the two atoms considered are actually non-terminating. Provided that all atoms always terminate, such a situation should never appear. Another analogy with a deadlock would be a situation where each thread prevents the other from passing a certain point of its execution. This is the case if one thread is executing the non-atomic instruction:

while (!X) do cooperate; end
{Y=1;}

while the other executes:

while (!Y) do cooperate; end 
{X=1;}

Then, the two threads remain blocked, but this does not in any way prevent the passing of instants. Using events instead of boolean variables, avoids busy-waiting. The two instructions would be:

await (X);
generate (Y)

and:

await (Y);
generate (X)

Actually, such a situation could be called a "communication deadlock". Real deadlocks are at implementation level: they appear because some data are to be protected by locks. Real deadlock introduces the risk that a data become inaccessible forever because some lock used to protect it is never released. This problem concerning data does not appear with communication deadlocks which can be seen as occuring at a higher, more "logical", level.

4.4 Architectural Design

Architectural design is of major concern in Loft. Indeed, the language allows programmers to decompose a system in several sub-systems, implemented as distinct schedulers. These schedulers can either be run synchronously or asynchronously.

4.4.1 Synchronous Schedulers

Synchronous schedulers are schedulers whose instants are related. Many different situations exist in which the presence of several synchronous schedulers can be useful. Let us, for example, consider graphical objects which should be painted only after they have all moved. A simple solution consists in defining two threads for each object, one for moving it and one for painting it, and to create all the moving threads before all the painting ones. However, this forbids the dynamic creation of new objects, as their moving threads should be executed after the painting threads of the old objects. The solution is to use two schedulers, one in which moving threads are linked, and the other for painting threads:

scheduler_t move_scheduler, paint_scheduler;

main ()
{
   move_scheduler = scheduler_create ();
   paint_scheduler = scheduler_create ();
   ....
   while (1) {
        scheduler_react (move_scheduler);
        scheduler_react (paint_scheduler);
   }
}

The function to create an object is:

void object_creation (graphical_object obj)
{
   move_create_in (move_scheduler,obj);
   paint_create_in (paint_scheduler,obj);
}

One can also imagine many situations where several instants of a scheduler should actually correspond to one instant of another. Loft gives the possibility to code these various situations very easily.

4.4.2 Asynchronous Schedulers

Schedulers that are run asynchronously define reactive areas in which threads share the same instants and the same events. Asynchronous schedulers are especially interesting in the context of multiprocessing, as each of them can be run by a dedicated native thread, on a distinct processor. The example of the sieve in section Lucky Prime Numbers8.1.2 shows how this can be achieved quite simply. However, one has to be very cautious using unlinked threads, as, then, all the problems of standard threads come to the foreground. For example, unlinked threads should never access data used by linked threads, because these are usually not protected. Data shared by two unlinked threads should be protected by locks exactly as if they were standard threads (actually, they are standard threads!). The discipline that should be obeyed using unlinked threads has to be very strict, and it seems good practice to adopt the following rules:

any shared data should only be accessed by threads linked to a same scheduler, in charge of the data. In particular, any access to a shared data by an unlinked thread should be forbidden. The unlinked thread must first link to the scheduler in charge of the data, before accessing it.
any communication between distinct schedulers running asynchronously should use either some event, or a native thread acting as a go-between for transferring data.

Of course, to decompose a system in several areas can be a difficult question, with several possible different answers.

4.4.3 Native Modules

Unnecessary uses of native modules should be avoided. Indeed, each instance of a native module requires a native thread to execute. This is required because such an instance can be unlinked, and in this case it behaves exactly as a standard pthread. Instances of non-native modules do not need the full power of a native thread to execute and they can be simply implemented as automata. The reason is that, when making a thread react, the stack state is the same at the beginning and at end of the reaction. Actually, the stack is only needed for atoms which by definition should always terminate instantly. Thus, only the state of the execution and the local variables have to be preserved between instants, which can be done using the heap. Thus, instances of non-native modules can be simply run by the thread of the scheduler to which they are linked. Actually, there is no reason why a module in which the unlink instruction is never used should be native. One also have to keep in mind that there exist partial implementations of Loft which do not have any support for native modules (see Direct Implementation7.5). This is for example the case when the targets are embedded systems with limited resources (memory and CPU).

5 Semantics

One considers the formal semantics of the cooperative part of Loft. To simplify, one supposes that there is only one scheduler to which all the created threads are linked. One first considers the basic semantics, called Rewrite, then two variants of it, called Replace and Optim. Replace puts the focus on memory use by limiting the number of intermediate instructions that are created during execution of an instant. The Optim variant puts the focus on CPU usage, by avoiding busy-waiting on absent events. The Rewrite variant is defined in the first 3 sections. The basics of the semantics are introduced in Semantics Basics5.1. Then, one describes in Instructions in REWRITE5.2 the semantics of expressions and of instructions. The semantics of threads and schedulers is defined in Threads and Schedulers in REWRITE5.3. The Replace variant is considered in REPLACE Semantics5.4 and the Optim variant in OPTIM Semantics5.5. Finally, Conclusion7.6 concludes the formal description of the language.

5.1 Semantics Basics

The semantics is expressed in the structural operational semantics (SOS) format defined in [30]. First, one defines the data structures corresponding to schedulers and threads; then, an intuitive description of the semantics is given to make the reading of the rest of the semantics easier; then, one defines the various formats of the rewriting rules; finally, the BNF syntax of the instructions is described.

5.1.1 Schedulers

A scheduler sched is a structure made of the following components:

sched.linked is the list of threads linked to the scheduler.
sched.current is the thread currently executed.
sched.events is the set of generated events.
sched.values is a table that associates to events the list of values generated during the current instant.
sched.eoi is a boolean flag that indicates the end of the current instant.
sched.move is a boolean flag that indicates the need to postpone the end of the current instant.
sched.to_broadcast is the list of events and of couples (event,value) that must be generated at the beginning of the next instant.
sched.orders is the list of orders (stop, suspend, resume, or link a thread) that must be executed at the beginning of the next instant. Orders are build with the function order which takes the kind of order as first parameter: STOP_ORDER, SUSP_ORDER, RESUME_ORDER, or LINK_ORDER.
sched.to_finalize is the list of threads that must be finalized.

One adopts the following notations:

sched<f = v> changes to v the value of the field f of sched. For example, s<eoi = true> is equal to s, except that s.eoi is set to true.
l += v adds the value v to the list l and sched<f += v> adds v to the field f (which must be a list or a set) of sched. In the same way, l -= v removes the value v from l and sched<f -= v>removes v from f in sched.
The list (or table) with no element in it is noted Ø. The nth element of a list l is noted l(n), and the number of element of l is noted length(l).

5.1.2 Threads

A thread th is a structure made of the following components:

th.run is the instruction run by the thread.
th.finalizer is the atomic instruction executed when the thread is stopped.
th.suspended is a boolean which is true when the thread is suspended.
th.state is the execution state of the thread:
- TERM: the thread is completely terminated.
- COOP: the thread cooperates.
- CONT: the thread needs to be continued during the current instant. This is the initial state, and the one of a thread waiting for an event which is not present, or trying to get a non-available event value.
th.signal is an event used to signal the final termination of the thread.
th.code is the return code set by some non-atomic instructions. There are 3 possible codes:
- OK: the instruction has completed normally.
- ETIMEOUT: the limit has been reached (for await and join).
- ENEXT: not available value (for get_value).
th.called denotes the thread created and executed when a run non-atomic instruction is. Termination of this thread is awaited by the calling thread.

An instance of a module is a new structure th. At creation, th is such that:

th.suspended is false.
th.state is CONT.
th.signal is a new event.
th.run is a sequence made of the module body followed by a generation of th.signal.
th.finalizer is the module finalizer.
th.code is OK.
th.called is NULL.

5.1.3 Overview of the Semantics

In this section, one gives an intuitive overview of the semantics. Let's consider a scheduler sched and an environment env. The final goal of the semantics is to show without ambiguity how sched and env evolve as time passes. Actually, evolution is decomposed in a sequence of instants; the first instant is: sched, env => sched₁, env₁ which means that, starting from sched and env, one gets sched₁ and env₁ at the end of the instant. Threads are not immediately started as soon as they are created in order to avoid interferences with currently running threads. Actually, all threads created during one instant are stored in sched.to_link, and are effectively started at the beginning of the next instant. In the same way, events broadcast from the outside world are stored in sched.to_broadcast and are incorporated in the system at the beginning of the next instant. Stop, suspend, and resume orders are processed in the same way. In case of stop orders, finalizers of stopped threads are executed. Thus, evolution of the scheduler as time passes is a sequence of the form:

sched, env => sched₁, env₁

sched₁', env₁' => sched₂, env₂

sched₂', env₂' => sched₃, env₃

...

where sched_i' incorporates in sched_i all orders collected during instant i, and env_i' is obtained from env_i after execution of finalizers of stopped threads. Now, let's decompose instants: each instant consists in cyclically running phases where all the threads are run in turn. Running a thread t means to resume execution of its associated instruction t.run. A thread which needs to be continued is a non-suspended thread that has not already cooperated during the instant. The instant is over when there is no such thread. Threads which are started are placed in the linked list of the scheduler. At the beginning of each instant, threads that are terminated (TERM) and threads stopped during the previous instant are removed from linked. Note that the order in which the threads are executed always remains the same during an instant. Note also that suspensions and resumptions of threads never change the order in which the other threads are executed. The scheduler behavior consists in cyclically running the threads in linked while there exist threads needing to be continued. The number of phases depends on the events generated. For example, consider a situation where linked = (t₁,t₂) and t₁ awaits the event e generated by t₂. Then, after t₂ execution, a new phase is needed to resume t₁. Otherwise, t₁ would not consider e as present despite the fact that it is generated; in such a situation, one could not say that e is broadcast. Note that a third phase would be necessary if, after generating e, t₂ is waiting for another event generated by t₁. Actually, new phases are needed until one reaches a situation where no new event is generated; then, the end of the current instant can be safely decided. The scheduler uses two boolean flags to manage instants:

The flag move is set each time a new event is generated; it is reset by the scheduler at the beginning of each phase. The scheduler does not decide the end of the current instant when move is true, to give threads waiting for new generated events the possibility to react to them.
The flag eoi is set by the scheduler when the end of the current instant is decided; it is reset at the beginning of each new instant. It is used to inform threads which are waiting for events that these events are definitely absent. Then, the threads can cooperate, and the next instant can safely take place.

Awaiting an event which is absent blocks a thread up to the end of the current instant. This forbids immediate (that is, during the same instant) reaction to the absence of an event; reaction, if any, is thus postponed to next instant. This is important to avoid situations where one could react to the absence of an event during one instant by generating it during the same very instant, which would contradict the fact that the event is absent. These kind of contradictions, known as "causality problems" in synchronous languages [19], do not exist here. In the same way, trying to get a not available generated value blocks a thread up to the end of the current instant.

5.1.4 Rewriting Rules Formats

In this section, one gives the various formats of the rewriting rules used in the semantics.

Instructions

The format of the rewritings of instructions is: _s inst, sched, env → inst', sched', env' In such a rewriting, the initial instruction is inst, the scheduler sched, and the environment of data env. As a result, the rewriting produces a status s, a new instruction inst', a new scheduler sched' and a new environment env'. There are 4 possible status:

TERM: the instruction is terminated.
COOP: the instruction cooperates.
CONT: the instruction needs to be continued. It is the status of an instruction waiting for an event which is not present, or trying to get a non-available event value.
RETURN: a return instruction has been reached.

Expressions

The notation for evaluating expressions is: exp |= sched, env → val, sched', env' The evaluation of exp, in the scheduler sched and in the environment env, produces the value val, and leads to the new scheduler sched' and to the new environment env'. Several expressions can be evaluated in one step. The notation is: (exp₀,...,exp_n) |= sched, env → (val₀,...,val_n), sched', env' The evaluation of exp_i produces the value val_i. The final scheduler is sched' and the final environment is env'. The order in which the expressions are evaluated is left unspecified.

Threads and Schedulers

The micro-step execution of a thread is noted: th, sched, env → th', sched', env' At scheduler level, the execution of the thread number n is noted: sched, n, env → sched', env' A phase of the scheduler consists in executing in turn all the threads linked in it. The format is: sched, env → sched', env' Finally, an instant of the scheduler consists in cyclicly executing phases up to a point where no thread needs to be continued. The format is: sched, env => sched', env'

5.1.5 Syntax of Instructions

One gives the syntax of instructions considered in the semantics. The syntax is very close to the real syntax of programs. However, we feel free to make some simplifications when there is no deep impact over the semantics (for example, in if instructions, the else branch is not optional). Some extensions are also included in the syntax to ease the writing of the semantics rules (actually, these extensions never appear in programs). The syntax of instructions is:

inst =
   inst ... inst
|  atomic
|  non-atomic

Atomic instructions execute completely in one single instant:

atomic =
   {<c-code>}
|  stop (exp)
|  suspend (exp)
|  resume (exp)
|  generate (exp)
|  generate (exp, exp)

Some special expressions are added to the ones of C:

exp =
   special_exp
|  <c-exp>

special_exp =
   <module>_create (exp, ..., exp)
|  return_code ()
|  self ()

Execution of non-atomic instructions can last several instants:

non-atomic =
   cooperate
|  halt
|  return
|  await (exp)
|  await (exp, exp)
|  join (exp)
|  join (exp,exp)
|  get_value (exp, exp, exp)
|  if (exp) then inst else inst end
|  while (exp) do inst end
|  repeat (exp) do inst end
|  run <module> (exp, ..., exp)
|  extensions

Some extensions are defined. They do not appear in real programs, but are introduced as intermediary instructions:

extensions =
   nothing
|  repeat* (n, inst) do inst end
|  while* (exp, inst) do inst end
|  await* (event)
|  await* (event, n)
|  get_value* (event, n, v)

5.2 Instructions in REWRITE

One now considers the evaluation of expressions and the execution of instructions in the basic semantics Rewrite.

5.2.1 Expressions

Expressions are evaluated in the standard way of C. However, evaluation of some special expressions has an effect on the scheduler (for example, the creation of threads) or are using it. Thus, the general form of evaluation of expressions is: exp |= sched, env → value, sched', env' In this rewriting, value is the value obtained by evaluating exp. The resulting scheduler is sched' and the resulting environment is env'. One now gives the semantics of special expressions.

Creation of Threads

New threads instances of a module m are created by calling the function m_create.

Rule 1: Create

(e₁,...,e_n) |= sched, env → (v₁,...,v_n), sched', env'

m_create(e₁,...,e_n) |= sched, env → t, sched'', env'

sched'' = sched'<orders += order (LINK_ORDER,t)>

The thread t is a new thread created from the module m, with (v₁,...,v_n) as arguments. It will be started only at the next instant.

Thread Code

A call to return_code returns the code of the current thread.

Rule 2: Return Code

return_code() |= sched, env → sched.current.code, sched, env

Self

A call to self returns the executing thread which is actually the current thread of the scheduler.

Rule 3: Self

self() |= sched, env → sched.current, sched, env

5.2.2 Sequence of Instructions

The semantics of sequences is contained in three rules. In the first one, the execution of the first element of the sequence is not terminated and thus only this element is changed in the resulting instruction:

Rule 4: Sequence - Not Term

_s i₀, sched, env → i'₀, sched', env'

s ≠ TERM

_s i₀...i_n, sched, env → i'₀...i_n, sched', env'

The execution goes to the next component as soon as the previous one terminates:

Rule 5: Sequence - Term

_TERM i₀, sched, env → i'₀, sched', env'

_s i₁...i_n, sched', env' → j, sched'', env''

_s i₀...i_n, sched, env → j, sched'', env''

Finally, one has also to consider for consistency reason the degenerated case where the list is empty:

Rule 6: Sequence - Empty

_TERM Ø, sched, env → Ø, sched, env

5.2.3 Atomic Instructions

C Code

Standard C code enclosed in "{" and "}" is an atomic instruction, executed in the standard way. This leads to rewritings of the form: _TERM {<c-code>}, sched, env → nothing, sched', env' Note that the scheduler can also be changed by execution of standard C code because special expressions can appear in it.

Stop, Suspend, Resume

Execution of stop(t) adds the order to stop t to the list of orders.

Rule 7: Stop

e |= sched, env → t, sched', env'

_TERM stop(e), sched, env → nothing, sched'', env'

sched'' = sched'<orders += order (STOP_ORDER,t)>

Execution of suspend(t) adds the order to suspend t to the list of orders.

Rule 8: Suspend

e |= sched, env → t, sched', env'

_TERM suspend(e), sched, env → nothing, sched'', env'

sched'' = sched'<orders += order (SUSPEND_ORDER,t)>

Execution of resume(t) adds the order to resume t to the list of orders.

Rule 9: Resume

e |= sched, env → t, sched', env'

_TERM resume(e), sched, env → nothing, sched'', env'

sched'' = sched'<orders += order (RESUME_ORDER,t)>

Generate

A generate statement adds the generated event in the event set of the scheduler, and immediately terminates; moreover, the flag move is set to indicate that something new happened:

Rule 10: Generate - Pure

e |= sched, env → v, sched', env'

_TERM generate(e), sched, env → nothing, sched'', env'

sched'' = sched'<move = true><events += v>

Moreover, when a value is associated to the generation, it is added at the end of the table of values associated to the event.

Rule 11: Generate - Valued

(e₁, e₂) |= sched, env → (v₁,v₂), sched₁, env₁

_TERM generate(v₁), sched₁, env₁ → nothing, sched₂, env₂

_TERM generate(e₁, e₂), sched, env → nothing, sched₃, env₂

sched₃ = sched₂<val(v₁) += v₂>

5.2.4 Non-Atomic Instructions

Cooperate, Halt, Nothing, Return

The cooperate statement finishes execution for the current instant and returns COOP. Nothing remains to be done at the next instant (thus, execution at the next instant will resume in sequence from the cooperate instruction):

Rule 12: Cooperate

_COOP cooperate, sched, env → nothing, sched, env

The halt instruction never terminates:

Rule 13: Halt

_COOP halt, sched, env → halt, sched, env

The following rule is introduced for completeness; actually, the nothing instruction does not appear in real programs.

Rule 14: Nothing

_TERM nothing, sched, env → nothing, sched, env

The return instruction forces the termination of the executing thread by returning the status RETURN. It also generates the event which signals the thread termination:

Rule 15: Return

_TERM generate (sched.current.signal), sched, env → nothing, sched', env'

_RETURN return, sched, env → nothing, sched', env'

Await

An instruction await first determines which event is awaited and then behaves as await*:

Rule 16: Await

e |= sched, env → v, sched', env'

_s await* (v), sched', env' → i, sched'', env''

_s await (e), sched, env → i, sched'', env''

The waiting terminates when the awaited event is present:

Rule 17: Await - Present

v ∈ sched.events

_TERM await* (v), sched, env → nothing, sched, env

Execution needs to be continued during the same instant when the awaited event is unknown (neither present nor absent):

Rule 18: Await - Unknown

v not in sched.events

sched.eoi = false

_CONT await* (v), sched, env → await* (v), sched, env

The await instruction cooperates when the awaited event is absent. The absence is detected because the event is not generated while the end of the current instant has been decided:

Rule 19: Await - Absent

v not in sched.events

sched.eoi = true

_COOP await* (v), sched, env → await* (v), sched, env

Limited Await

With a limited await, the waiting is limited to a fixed number of instants:

Rule 20: Limited Await

(e₁,e₂) |= sched, env → (v,n), sched', env'

_s await* (v,n), sched', env' → i, sched'', env''

_s await (e₁,e₂), sched, env → i, sched'', env''

Rules for the limited await are very similar to the ones of the standard await, except that tests for the delay are introduced. The instruction immediately terminates when the delay is over, and in this case, the return code of the executing thread is set to ETIMEOUT.

Rule 21: Limited Await - Over

n ≤ 0

_TERM await* (v,n), sched, env → nothing, sched', env

sched' = sched<current.code = ETIMEOUT>

The instruction also immediately terminates when the awaited event is present; in this case, the return code of the executing thread is set OK.

Rule 22: Limited Await - Present

n > 0

v ∈ sched.events

_TERM await* (v,n), sched, env → nothing, sched', env

sched' = sched<current.code = OK>

The executing thread needs to be continued during the current instant while the awaited is unknown.

Rule 23: Limited Await - Unknown

n > 0

v not in sched.events

sched.eoi = false

_CONT await* (v,n), sched, env → await* (v,n), sched, env

The executing thread cooperates when the awaited event is absent. This can only happen at the end of the instant. In this case, the waiting delay is decremented.

Rule 24: Limited Await - Absent

n > 0

v not in sched.events

sched.eoi = true

_COOP await* (v,n), sched, env → await* (v,n-1), sched, env

An important point must be noticed: the previous rules describe a busy-waiting mechanism: actually, the await* instruction should activated at each micro-step of each instant, until the awaited event is present (except of course during instants where the executing thread is suspended).

If

The if instruction chooses a branch accordingly to the value of a boolean expression, and it executes it up to completion (which may take several instants; note that the expression is evaluated only once):

Rule 25: If - True

e |= sched, env → true, sched', env'

_s i, sched', env' → i', sched'', env''

_s if (e) then i else j end, sched, env → i', sched'', env''

Rule 26: If - False

e |= sched, env → false, sched', env'

_s j, sched', env' → j', sched'', env''

_s if (e) then i else j end, sched, env → j', sched'', env''

While

A while loop first evaluates the associated boolean expression and terminates if the returned value is false:

Rule 27: While - False

e |= sched, env → false, sched', env'

_TERM while (e) do i end, sched, env → nothing, sched', env'

Otherwise, the loop behaves as the auxiliary while* loop defined below:

Rule 28: While - True

e |= sched, env → true, sched', env'

j = while* (e, i) do i end

_s j, sched', env' → j', sched'', env''

_s while (e) do i end, sched, env → j', sched'', env''

The auxiliary while* instruction runs the loop body up to termination, without re-evaluating the boolean condition; then, it rewrites as a standard while loop which evaluates the condition to determine if the body has to be run one more time.

Rule 29: While - Not Term

_s i, sched, env → i', sched', env'

s ≠ TERM

_s while* (e, init) do i end, sched, env → j, sched', env'

j = while* (e, init) do i' end

Rule 30: While - Term

_TERM i, sched, env → i', sched', env'

_s while (e) do init end, sched', env' → j, sched'', env''

_s while* (e, init) do i end, sched, env → j, sched'', env''

Repeat

A repeat loop executes its body a limited number of times. This number is the value of an expression which is first evaluated:

Rule 31: Repeat

e |= sched, env → n, sched', env'

j = repeat* (n, i) do i end

_s j, sched', env' → j', sched'', env''

_s repeat (e) do i end, sched, env → j', sched'', env''

The loop terminates when the number of cycles to be performed is negative:

Rule 32: Repeat - Negative

n ≤ 0

_TERM repeat* (n, init) do i end, sched, env → nothing, sched, env

Otherwise, the loop behaves as its body when it does not terminates:

Rule 33: Repeat - Not Term

0 < n

_s i, sched, env → i', sched', env'

s ≠ TERM

_s repeat* (n, init) do i end, sched, env → j, sched', env'

j = repeat* (n, init) do i' end

When the body terminates, the number of cycles to be performed is decremented and the loop is immediately restarted.

Rule 34: Repeat - Term

0 < n

_TERM i, sched, env → i', sched', env'

j = repeat* (n-1, init) do init end

_s j, sched', env' → j', sched'', env''

_s repeat* (n, init) do i end, sched, env → j', sched'', env''

Get Value

The get_value instruction returns the with index n (it it exists) associated to a given event. First, three expressions are evaluated: the event, the index n, and the place where the result should be placed.

Rule 35: Get Value

(e₁,e₂,e₃) |= sched, env → (v,n,r), sched', env'

_s get_value* (v,n,r), sched', env' → i, sched'', env''

_s get_value (e₁,e₂,e₃), sched, env → i, sched'', env''

The instruction immediately terminates if the value is available. In this case, the value is assigned to the result variable and the return code of the executing thread is set to OK.

Rule 36: Get Value - Available

length(sched.values(v)) > n

_TERM get_value* (v,n,r), sched, env → nothing, sched', env'

sched' = sched<current.code = OK>

env' = env<r = (sched.values(v))(n)>

The instruction has to be continued if the value is not available while the current instant is not terminated:

Rule 37: Get Value - Unknown

length(sched.values(v)) ≤ n

env.eoi = false

_CONT get_value* (v,n,r), sched, env → get_value* (v,n,r), sched, env

The instruction cooperates if the value is still not available at the end of the current instant. In this case, the return code is set to ENEXT which will indicate at the next instant that the value has not been got.

Rule 38: Get Value - Not Available

length(sched.values(v)) ≤ n

env.eoi = true

_COOP get_value* (v,n,r), sched, env → nothing, sched', env

sched' = sched<current.code = ENEXT>

Join

Nothing is done if the thread to be joined is already terminated:

Rule 39: Join - Terminated

e |= sched, env → t, sched', env'

t.state = TERM

_TERM join (e), sched, env → nothing, sched', env'

Otherwise, the instruction waits for the event which is generated when the thread terminates (see Threads7.2.3):

Rule 40: Join - Not Terminated

e |= sched, env → t, sched', env'

t.state ≠ TERM

_s await* (t.signal), sched', env' → i, sched'', env''

_s join (e), sched, env → i, sched'', env''

Limited Join

Limited join is similar to join, except that the delay is also tested for termination. The instruction immediately terminates if the thread to be joined is already terminated. In this case, the return code of the executing thread is set to OK.

Rule 41: Limited Join

(e₁,e₂) |= sched, env → (t,n), sched', env'

t.state = TERM

_TERM join (e₁, e₁), sched, env → nothing, sched'', env'

sched'' = sched'<current.code = OK>

If the thread to be joined is not already terminated, then the termination event is awaited.

Rule 42: Limited Join - Waiting

(e₁,e₂) |= sched, env → (t,n), sched', env'

th.state ≠ TERM

_s await* (t.signal, n), sched', env' → i, sched'', env''

_s join (e₁, e₂), sched, env → i, sched'', env''

Run

To run a module means to create a new instance of the module and to join it. Moreover, the instance is stored in the component called of the calling thread in order to be stopped when the calling thread is.

Rule 43: Run

m_create (e₁,..., e_n) |= sched, env → t, sched', env'

sched₁ = sched'<current.called = t>

_s join (t), sched₁,env' → i, sched₂, env₂

_s run m (e₁,..., e_n), sched, env → i, sched₂, env₂

5.3 Threads and Schedulers in REWRITE

The semantics of threads and schedulers in Rewrite is now considered. One defines some auxiliary functions using a very simple self-explanatory syntax.

5.3.1 Threads

First, one defines the firing of a thread. A thread is fireable if it is not suspended and if its state is CONT:

fireable (thread)
   return thread.suspended = false and thread.state = CONT

To fire a fireable thread means to rewrite its body and to force the state to TERM if the rewriting produces RETURN.

Rule 44: Fireable Thread

fireable (t) = true

_s t.run, sched, env → i, sched', env'

t, sched, env → t', sched', env'

t' = t<run = i> <state = (if s = RETURN then TERM else s)>

To fire a thread which is not fireable has no effect.

Rule 45: Not Fireable Thread

fireable (t) = false

t, sched, env → t, sched, env

5.3.2 Schedulers

During a phase, the scheduler performs a sequence of steps in which it fires all the threads linked to it. The nth step consists in firing the nth thread, considering it as the current thread of the scheduler:

Rule 46: Nth Thread

length(sched.linked) > n

t = sched.linked(n)

t, sched<current = t>, env → t', sched', env'

sched, n, env → sched'<linked(n) = t'>, env'

One now switches to the rules describing phases. The first rule corresponds to an empty scheduler. In this case, nothing is done during the phase:

Rule 47: Scheduler - Empty

length(sched.linked) = 0

sched, env → sched, env

Otherwise, when the scheduler is not empty, all the threads are fired in turn:

Rule 48: Scheduler - Not Empty

length(sched.linked) = n+1

sched, 0, env → sched₁, env₁

sched₁, 1, env₁ → sched₂, env₂

...

sched_n, n, env_n → sched', env'

sched, env → sched', env'

Note that threads are considered in a fixed order, which is the order of appearance in linked.

5.3.3 Instants

Instants are macro-steps of schedulers. Basically, an instant is a sequence of phases leading to a stable situation, where no thread remains to be continued. One first defines the function stable that determines if a scheduler is in a stable situation or not.

stable (sched)
   forall thread in sched.linked
       if (fireable (thread)) return false
   return true

The first rule corresponds to a phase leading to a stable situation. In this case, execution for the current instant is over.

Rule 49: Instant - Finished

sched, env → sched', env'

stable (sched') = true

sched, env => sched', env'

If the phase leads to an unstable situation, then the scheduler cyclically re-schedule all the threads. If no event is generated during a phase (move is still false at the end of the phase), then the scheduler decides that the current instant can be finished by setting the flag eoi to true.

Rule 50: Instant - Not Finished

sched, env → sched', env'

stable (sched') = false

make_decision(sched'), env' => sched'', env''

sched, env => sched'', env''

where the function make_decision is:

make_decision (sched)
   if (sched.move = true) return sched<move = false>
   else return sched<eoi = true>

5.3.4 Inter Instants Processing

One finally considers the chaining of instants and more specifically the actions performed between two successive instants. Actually, these actions are regrouped at beginnings of instants and are specified by the function start_instant. The call start_instant(sched,env) resets the two flags eoi and move of the scheduler and incorporates in sched the events and orders that have been produced (by the program or by the external world) during the previous instant; then it removes the terminated threads and resets those that are not suspended; finally, it finalize the stopped threads:

start_instant (sched,env)
   sched.eoi = false
   sched.move = false
   transfer_events (sched)
   process_orders (sched,env)
   reset_threads (sched)
   finalize_stopped_threads (sched)

Transfer Events

The transfer_events functions sets the events and the table of values associated to them, according to the broadcast orders.

transfer_events (sched)
    sched.events = get_events (to_broadcast)
    sched.val = get_values (to_broadcast)
    sched.to_broadcast = Ø

The call get_events (x) returns the list of events extracted from x; get_values (x) returns the table build from the couples (event,value) belonging to x. Note that previous elements of events are lost; this is coherent as events are always resets at the beginning of each new instant.

Processing Orders

The various orders (others than the broadcast ones) are processed in turn by the function process_orders.

process_orders (sched,env)
    forall order in previous
       if (order.kind = SUSPEND_ORDER) 
           order.thread.suspended = true
       else if (order.kind = RESUME_ORDER)  
           order.thread.suspended = false
       else if (order.kind = STOP_ORDER)    
           sched.to_finalize += thread
           thread.state = TERM
       else if (order.kind = LINK_ORDER)
           sched.linked += order.thread
    sched.orders = Ø

Stopped threads are places into the to_finalize list in order to be processed later. Thus, orders issued from finalizer execution will be performed at the next instant, and not during the current one. As a consequence, no instantaneous can occur from a finalizer sending orders to the scheduler running it.

Reset Threads

The function reset_threads eliminates from linked the threads that are terminated or unlinked and sets to CONT the status of the remaining threads that are not suspended.

reset_threads (sched)
    forall thread in sched.linked
       if (thread->state = TERM)
          sched.linked -= thread
       else if (thread->suspended = false and thread->state = COOP)
          thread.state = CONT

Finalization

To finalize a thread means to set its state to TERM, to add the signaling event in the list of events, to call the finalizer of the thread, and finally to recursively finalize the called thread, if there is one.

finalize (sched,env,thread)
   if (thread.state = TERM) return
   thread.state = TERM
   sched.events += thread.signal
   sched = sched'
   env = env'
   if (thread.called != NULL) finalize (sched,env,thread.called)

where sched',env' results from the execution of thread.finalizer in sched,env: _TERM thread.finalizer, sched, env → nothing, sched', env' All threads which must be finalized are processed by the scheduler in turn:

finalize_stopped_threads (sched)
   forall thread in sched.to_finalize
      finalize (thread)
   sched.to_finalize = Ø

This ends with the inter-instant actions that are performed by the scheduler at each beginning of instant.

Chaining Instants

Finally, the execution accross instants, starting from an environment env, of a scheduler sched is a chain of rewritings of the form:

sched, env => sched₁, env₁

start_instant(sched₁,env₁), => sched₂, env₂

start_instant(sched₂,env₂), => sched₃, env₃

...

This concludes the description of the semantics Rewrite.

5.4 REPLACE Semantics

One now considers the Replace semantics, which is a variant of the basic semantics Rewrite, designed to limit the creation of new instructions during the rewriting process. Actually, no new instruction is created by Replace during rewritings, and to rewrite an instruction basically means to change its state.

5.4.1 Sequence

The first instruction to be considered is the sequence which receives a state indicating which element is currently executed. One introduces explicitely the sequence operator seq in order to assign a state to it. The state is an integer which can be added as an index to the sequence operator. The execution begins with the first component (with index 0):

Rule 51: Sequence - Init

_s seq₀ (i₀, ..., i_n), sched, env → i, sched', env'

_s seq (i₀, ..., i_n), sched, env → i, sched', env'

Execution stays in the same component while it is not terminated:

Rule 52: Sequence - Stay

_s i_k, sched, env → i'_k, sched', env'

s ≠ TERM

_s seq_k (i₀, ..., i_k, ..., i_n), sched, env → j, sched', env'

j = seq_k (i₀, ..., i'_k, ..., i_n)

Execution goes to the next component as soon as the current one terminates:

Rule 53: Sequence - Next

_TERM i_k, sched, env → i'_k, sched', env'

_s seq_k+1 (i₀, ..., i_n), sched', env' → i'', sched'', env''

_s seq_k (i₀, ..., i_n), sched, env → i'', sched'', env''

Finally, execution terminates when the last component does:

Rule 54: Sequence - Term

_TERM seq_n+1 (i₀, ..., i_n), sched, env → seq_n+1 (i₀, ..., i_n), sched, env

5.4.2 Atomic Instructions

Atomic instructions have no state but always rewrite in themselves, producing the status TERM. Actually, the execution which passes through an atomic instruction is under control of the instruction which encloses it (typically, a sequence). One just gives the rule for C code: _TERM {<c-code>}, sched, env → {<c-code>}, sched', env' Note that now the atomic instruction does not anymore rewrite in nothing, but in the same instruction.

5.4.3 Non-Atomic Instructions

Non-atomic instructions have states. We only give the semantics of some of them to show the introduction of states and the way they are managed. The semantics of others instructions should be straightforward.

Cooperate

The cooperate has a state which indicates if the control has or not passed through the instruction. Actually, a simple tag is needed. The cooperate statement returns COOP and sets its state to return TERM at the next instant. One now has two rules for cooperate:

Rule 55: Cooperate

_COOP cooperate, sched, env → cooperate_true, sched, env

Rule 56: Cooperate - Term

_TERM cooperate_true, sched, env → cooperate_true, sched, env

Await

An await instruction has a state which stores the awaited event. First, the awaited event is stored in the state:

Rule 57: Await (State)

e |= sched, env → v, sched', env'

_s await_v (e), sched', env' → i, sched'', env''

_s await (e), sched, env → i, sched'', env''

The waiting terminates when the awaited event is present:

Rule 58: Await - Present (State)

v ∈ sched.events

_TERM await_v (e), sched, env → await_v (e), sched, env

Execution must be continued when the awaited event is unknown (neither present nor absent):

Rule 59: Await - Unknown (State)

v not in sched.events

sched.eoi = false

_CONT await_v (e), sched, env → await_v (e), sched, env

Finally, the instruction cooperates when the awaited event is absent:

Rule 60: Await - Absent (State)

v not in sched.events

sched.eoi = true

_COOP await_v (e), sched, env → await_v (e), sched, env

Note that actually the difference with the corresponding rules of Rewrite consists in the replacement by a state of the await* extension of the syntax.

If

The if instruction chooses a branch, according to the value of a boolean expression, and executes it up to completion. A boolean state is added to store which branch is chosen:

Rule 61: If

e |= sched, env → v, sched', env'

_s if_v(e) then i else j end, sched', env' → k, sched'', env''

_s if (e) then i else j end, sched, env → k, sched'', env''

Rule 62: If - True

_s i, sched, env → i', sched', env'

_s if_true (e) then i else j end, sched, env → k, sched', env'

k = if_true (e) then i' else j end

Rule 63: If - False

_s j, sched, env → j', sched', env'

_s if_false (e) then i else j end, sched, env → k, sched', env'

k = if_false (e) then i else j' end

While

One major difference of Replace with Rewrite concerns the loops. In Replace, loops bodies are not duplicated as in Rewrite (using the extended instructions repeat* and while*), but are reset at the end of each cycle, in order to be ready for the next cycle. One only considers here while loops. A while loop first evaluates the associated boolean expression A state is introduced to store the fact that the expression is evaluated or not. The state is actually a simple tag. The loop terminates if the value of the expression is false:

Rule 64: While - False

e |= sched, env → false, sched', env'

_TERM while (e) do i end, sched, env → while (e) do i end, sched', env'

Otherwise, the loop sets its state:

Rule 65: While - True

e |= sched, env → true, sched', env'

_s while_true (e) do i end, sched', env' → j, sched'', env''

_s while (e) do i end, sched, env → j, sched'', env''

The body is run up to termination, without re-evaluating the boolean condition.

Rule 66: While - Not Term

_s i, sched, env → i', sched', env'

s ≠ TERM

_s while_true (e) do i end, sched, env → while_true (e) do i' end, sched', env'

When the body terminates, it is reset using the reset function defined in Reset Function5.4.4, in order to be re-executed from the beginning:

Rule 67: While - Reset

_TERM i, sched, env → i', sched', env'

_s while (e) do reset(i) end, sched', env' → j, sched'', env''

_s while_true (e) do i end, sched, env → j, sched'', env''

Join

The thread to be joined is first determined and stored in the instruction state:

Rule 68: Join

e |= sched, env → t, sched', env'

_s join_t (e), sched', env' → i, sched'', env''

_s join (e), sched, env → i, sched'', env''

Nothing is done if the thread is already terminated:

Rule 69: Join - Terminated

t.status = TERM

_TERM join_t (e), sched, env → join_t (e), sched, env

Otherwise, the instruction waits for the termination signal:

Rule 70: Join - Waiting

t.status ≠ TERM

_s await_t.signal (t.signal), sched, env → i, sched', env'

_s join_t (e), sched, env → join_t (e), sched', env'

5.4.4 Reset Function

The reset function is used by loops to reset their body at the beginning of each cycle. To reset a term actually means to erase all indices appearing in it. The recursive definition of reset is straightforward and is left to the reader. For example, consider:

repeat₃ (5) do 
   seq₀ (
       await_e (event),
       cooperate_true
   )
end

The result of reseting the previous instruction is:

repeat (5) do 
   seq(
      await (event),
      cooperate
   )
end

A strong property holds concerning the reset function: the basis of an instruction, which is the instruction without any indices in it, never changes during the rewriting process. That is, one has: _s i, sched, env → i', sched', env' implies reset(i) = reset(i'). In Replace, the rewriting is thus just a change of instruction states. This is the reason why it can be implemented more efficiently than Rewrite, without creation of new instructions during execution (except of course when new threads are created).

5.5 OPTIM Semantics

One now considers a variant of the previous semantics Replace, called Optim, in which threads are not busy-waiting for events. This variant basically uses queues associated to events in which waiting threads are stored. A special queue to store threads that are submitted to deadlines (because they execute limited forms of await and join) is also defined. There are few changes with the previous semantics and only the major ones are given in the sequel. First, one adds the new status WAIT to the possible ones returned by the rewriting of instructions. This status corresponds to a registration in a waiting queue. Second, one adds three new fields to schedulers:

A table, named waiting (initially empty), which associates to an event the list of threads waiting for it.
An integer instant which holds the current instant number (initially 0).
A list of threads, named timer (initially empty), whose elements are the threads waiting for a deadline.

Finally, a new field deadline is added to threads to store the maximal instant to be considered by limited instructions.

5.5.1 Atomic Instructions

Amongst the atomic instructions, only the generate instruction has to be changed. Indeed, in addition to what was previously done, a generate statement awakes all the threads waiting for the generated event:

Rule 71: Generate - Pure

e |= sched, env → v, sched', env'

_TERM generate(e), sched, env → nothing, sched'', env'

sched'' = awake_all (sched',v)<move = true><events += v>

The awake_all function awakes all the threads waiting for the generated event:

awake_all (sched, event) 
    forall thread in sched.waiting (event)
       awake (sched,thread,event)
    return sched

To awake a thread means to change its state to CONT and to unregister it from the queues where it is registered. Of course, this is for threads that are effectively waiting for the event and that are not suspended.

awakeable (thread)
   return thread.suspended = false and thread.state = WAIT

The awake function is defined by:

awake (sched,thread,event)
   if (awakeable (thread)) 
       thread.state = CONT,
       sched.waiting(event) -= thread
       sched.timer -= thread

5.5.2 Non-Atomic Instructions

One only considers the changes concerning the await statement (changes for other instructions are straightforward).

Await

An await instruction first determines the awaited event and stores it in its state:

Rule 72: Await

e |= sched, env → v, sched', env'

_s await_v(e), sched', env' → i, sched'', env''

_s await (e), sched, env → i, sched'', env''

The waiting terminates when the awaited event is present:

Rule 73: Await - Present

v ∈ sched.events

_TERM await_v(e), sched, env → await_v(e), sched, env

The executing thread is put in the state WAIT and registered in the waiting list when the awaited event is not present:

Rule 74: Await - Unknown

v not in sched.events

_WAIT await_v(e), sched, env → await_v(e), sched', env

sched' = sched<waiting(v) += sched.current>

Limited Await

With a limited await, the waiting is limited to a fixed number of instants:

Rule 75: Limited Await

(e₁,e₂) |= sched, env → (v,n), sched', env'

sched₁ = sched'<current.deadline = n + sched.instant>

_s await_v(e₁,e₂), sched₁, env' → i, sched₂, env₂

_s await (e₁,e₂), sched, env → i, sched₂, env₂

Rule 76: Limited Await - Over

sched.current.deadline > sched.instant

_TERM await_v(e₁,e₂), sched, env → await_v(e₁,e₂), sched', env

sched' = sched<current.code = ETIMEOUT>

Rule 77: Limited Await - Present

sched.current.deadline ≤ sched.instant

v ∈ sched.events

_TERM await_v(e₁,e₂), sched, env → await_v(e₁,e₂), sched', env

sched' = sched<current.code = OK>

When the waiting is effective, the current thread is registered in the timer and in the waiting list associated to the awaited event:

Rule 78: Limited Await - Unknown

t = sched.current

t.deadline ≤ sched.instant

v not in sched.events

_WAIT await_v(e₁,e₂), sched, env → await_v(e₁,e₂), sched', env

sched' = sched<waiting(v) += t><timer += t>

5.5.3 Schedulers

The change in the scheduler concerns the processing of the timer which is introduced in start_instant:

start_instant (sched,env)
   sched.eoi = false
   sched.move = false
   timer_processing (sched)
   transfer_events (sched)
   process_orders (sched,env)
   reset_threads (sched)
   finalize_stopped_threads (sched)

where the function timer_processing is defined by:

timer_processing (sched)
   sched.instant = sched.instant + 1
   forall thread in sched.timer
     if (awakeable (thread) and thread.deadline ≤ sched.instant) 
        awake (thread),

A point must be noted: to perform supplementary calls to the awake function is not dangerous, in the sense that it cannot change the meaning of a program. Indeed, to awake a thread which has not to be awaken will leads to reintroduce it into the game, and does not change the actions that the thread has to perform. This basically means that spurious awake actions are not a problem. The consequence is that one can choose to never unregister threads from the timer, without risk of perturbating the execution later. This actually can be useful for optimizing implementations.

5.6 Conclusion

One has precisely defined the semantics of the cooperative part of Loft using rewriting rules. In a first step, one has described the semantics Rewrite with a set of about 50 rules covering all the aspects of the cooperative part of the language. The rules are simple to be understood one by one, and one can be rather confident that they capture the essence of the language. Actually, a way to make this confidence stronger is to implement the semantics as is, and to execute tests covering the largest possible set of various aspects of the language. This is of course possible because the semantics is basically deterministic, which allows one to implement it in a straightforward way. A direct implementation of the semantics is however unsatisfactory when considered in terms of efficiency for at least the following reasons:

A huge number of instructions appear as intermediary results in the rules. From an implementation point of view, they are wasting memory and should not be created at all.
The threads are busy-waiting events, which waste CPU cycles. Note that a thread which is waiting for an event tests for it at each micro-step of each instant, up to the possible generation of it.
Sequencing of instructions has to be explicitely executed, and is not just implicit as for example in standard C.
All the threads linked to a scheduler are considered in turn during each phase. Thus, to determine which is the next thread to execute is not constant in time, but linear in the number of threads linked to the scheduler.

Actually, the main interest of Rewrite is to serve as a reference for teaching and implementation purposes. Two improved semantics are defined which model more efficient implementations. The variant Replace of Rewrite is defined to deal with the first previous point. In it, the construction of new instructions is replaced by the introduction of states in instructions. Instead of creating a new instruction from an old one, one just changes the state. Replace is transformed to avoid busy-waiting (previous point 2), using queues in which threads that are waiting for events are stored. This leads to the variant Optim which is closer to an efficient implementation of Loft than the previous semantics. The two last points are not covered at the semantics level but will be considered later in section Implementations7, when implementation of Loft is discussed. ;Concerning efficiency, it should however be noticed that all the ;threads linked to the scheduler are considered in turn during each ;phase. Thus, to determine which is the next thread to execute is not ;constant in time, but linear in the number of threads linked to the ;scheduler. This is certainly not the best way to proceed for ;implementing schedulers. To end with semantics concerns, one should notice that the non-cooperative part of Loft is not considered by any of the previous semantics. To cover the whole language, a way that should be explored is the chemical abstract machine paradigm[10] to deal with nondeterministic aspects arising in this case.

6 FairThreads in C

FairThreads is an alternative to standard threads, which is very close to Loft. FairThreads is first presented; then some little examples are given; finally, FairThreads is compared with Loft and with standard POSIX threads.

6.1 FairThreads Overview

The API of FairThreads is overviewed in this section. All functions are presented, but some details, such as error codes, are for simplicity not considered here. The API is summarized in the annex.

6.1.1 Creation

Schedulers

FairThreads explicitly introduces schedulers, of type ft_scheduler_t. Before being used, a scheduler must be created by calling the function ft_scheduler_create. In order to be executed, a scheduler must be started by a call to the function ft_scheduler_start. Note that several schedulers can be used without problem simultaneously in the same program. Each scheduler is run by a dedicated native thread which gives the control in turn to the threads linked to the scheduler.

Threads

Fair threads are of type ft_thread_t. The call ft_thread_create(s,r,c,a) creates a thread in the scheduler s. The thread is run by a dedicated native thread. The creation becomes actual at the beginning of the next instant of s. The thread is automatically started as soon as it is created. The function r is executed by the thread, and the parameter a is transmitted to it. If stopped (by ft_scheduler_stop), the thread switches execution to the function c to which a is also transmitted.

Events

Events are of the type ft_event_t. An event is created by calling the function ft_event_create which takes as parameter the scheduler in charge of it.

Automata

Automata are fair threads of the type ft_thread_t created with the function ft_automaton_create. The thread returned by ft_automaton_create(s,r,c,a) is executed as an automaton by the scheduler s, which means that it is run by the native thread of the scheduler. The function r executed by the automaton must be defined with the macro DEFINE_AUTOMATON.

6.1.2 Orders

Control of Threads

The call ft_scheduler_stop(t) gives to the scheduler which executes the thread t the order to stop it. The stop will become actual at the beginning of the next instant of the scheduler, in order to assure that t is in a stable state when stopped. The call ft_scheduler_suspend(t) gives to the scheduler which executes t the order to suspend t. The suspension will become actual at the beginning of the next instant of the scheduler. The function ft_scheduler_resume is used to resume execution of suspended threads.

Broadcast of Events

The function ft_scheduler_broadcast(e) gives to the scheduler of the event e the order to broadcast it to all the threads running in the scheduler. The event will be actually generated at the beginning of the next instant of the scheduler. The call ft_scheduler_broadcast_value(e,v) associates the value v to e (v can be read using ft_thread_get_value).

6.1.3 Basic Primitives

Cooperation

The call ft_thread_cooperate() is the explicit way for the calling thread to return control to the scheduler running it. The call ft_thread_cooperate_n(i) is equivalent to a sequence of i calls ft_thread_cooperate().

Termination

The call ft_thread_join(t) suspends the execution of the calling thread until the thread t terminates (either normally or because it is stopped). Note that t needs not to be linked to the scheduler of the calling thread. With ft_thread_join_n(t,i) the suspension takes at most i instants.

6.1.4 Managing Events

Generating Events

The call ft_thread_generate(e) generates the event e in the scheduler which was associated to it, when created. The call ft_thread_generate_value(e,v) adds v to the list of values associated to e during the current instant (these values can be read using ft_thread_get_value).

Awaiting Events

The call ft_thread_await (e) suspends the execution of the calling thread until the event e becomes generated. Execution is resumed as soon as the event is generated. With ft_thread_await_n (e,i), the waiting takes at most i instants.

Selecting Events

The call ft_thread_select(k,array,mask) suspends the execution of the calling thread until the generation of one element of array which is an array of k events. Then, mask, which is an array of k boolean values, is set accordingly. With ft_thread_select_n(k,array,mask,i), the waiting takes at most i instants.

Getting Events Values

The call ft_thread_get_value(e,i,r) is an attempt to get the ith value associated to the event e during the current instant. If such a value exists, it is returned in r and the call immediately terminates. Otherwise, the value NULL is returned at the next instant.

6.1.5 Linking

Link and Unlink

The call ft_thread_unlink() unlinks the calling thread t from the scheduler in which it was previously running. Then, t will no longer synchronize, instant after instant, with other threads linked to the scheduler. Actually, after unlinking, t behaves as a standard native thread. The call ft_thread_link(s) links the calling thread to the scheduler s. The calling thread must be unlinked when executing the call. The linkage becomes actual at the beginning of the next instant of s.

Locks

In presence of unlinked threads, locks can be needed to protect data shared between unlinked and linked threads. Standard mutexes are used for this purpose. The call ft_thread_mutex_lock(p), where p is a mutex, suspends the calling thread until the moment where p can be locked. The lock is released using ft_thread_mutex_unlock. Locks owned by a thread are automatically released when the thread terminates definitively or when it is stopped.

6.1.6 Automata

Automata are coded using macros. Here are the macros to define the automaton structure:

AUTOMATON(aut) declares the automaton aut.
DEFINE_AUTOMATON(aut) starts definition of the automaton aut.
BEGIN_AUTOMATON starts the list of states.
END_AUTOMATON ends the list of states.

A state with number num starts with one of the following macros:

STATE(num) introduces a standard state.
STATE_AWAIT(num,event) and STATE_AWAIT_N(num,event,delay) are states to await event.
STATE_JOIN(num,thread) and STATE_JOIN_N(num,thread,delay) are states to join thread.
STATE_STAY(num,n) is a state which keeps execution in it for n instants.
STATE_GET_VALUE(num,event,n,result) is a state to get the nth value associated to event.
STATE_SELECT(num,n,array,mask) and STATE_SELECT_N(num,n,array,mask,delay) are generalizing STATE_AWAIT and STATE_AWAIT_N to an array of events of length n.

Going from one state to another one is done with the macros:

GOTO(num) blocks execution for the current instant and sets the state for the next instant to be state num.
GOTO_NEXT blocks execution for the current instant and sets the state for the next instant to be the next state in the list of states.
IMMEDIATE(num) forces execution to jump to state num which is immediately executed.
RETURN immediately terminates the automaton.

Finally, the following macros define some special variables:

SELF is the automaton.
LOCAL is the local data of the automaton.
SET_LOCAL(data) sets the local data of the automaton.
ARGS is the argument which is passed at creation to the automaton.
RETURN_CODE is the error code set by the macros run during automaton execution.

6.1.7 Miscellaneous

Current Thread

The calling thread is returned by ft_thread_self().

Current Scheduler

The scheduler of the calling thread is returned by ft_thread_scheduler().

Pthread

The call ft_pthread(t) returns the native pthread which executes the fair thread t. This function gives direct access to the Pthreads implementation of FairThreads. In the rest of the paper, native thread and pthread will be considered as synonymous.

Exiting

The function ft_exit is equivalent to pthread_exit. The basic use of ft_exit is to terminate the pthread which is running the function main, without exiting from the process running the whole program.

6.2 Examples

One only gives two small examples just to show the programming style in FairThreads. The first example is a complete executable programs, and the second is an automaton.

6.2.1 Complete Program

The following code is a complete example of FairThreads program, made of two threads linked to the same scheduler.

#include "fthread.h"
#include <stdio.h>

void h (void *id)
{
   while (1) {
      fprintf (stderr,"Hello ");
      ft_thread_cooperate ();
   }
}

void w (void *id)
{
   while (1) {
      fprintf (stderr,"World!\n");
      ft_thread_cooperate ();
   }
}

int main (void)
{
  ft_scheduler_t sched = ft_scheduler_create ();
  ft_thread_create (sched,h,NULL,NULL);
  ft_thread_create (sched,w,NULL,NULL);
  ft_scheduler_start (sched);
  ft_exit ();
  return 0;
}

The program outputs Hello World! cyclically. Note the call of ft_exit to prevent the program to terminate before executing the two threads. Execution of linked fair threads is round-robin and deterministic: the two messages Hello and World! are always printed in this order.

6.2.2 Automaton Use

The following automaton switches control between two threads, according to the presence of an event. The automaton switch_aut has three states. The first state resumes the first thread to run (initially, one assumes that both threads are suspended). The switching event is awaited in the second state, and the threads are switched when the event becomes present. The third state is similar to the second, except that the threads are exchanged.

DEFINE_AUTOMATON (switch_aut)
{
   void       **args    = ARGS;
   ft_event_t   event   = args[0];
   ft_thread_t  thread1 = args[1];
   ft_thread_t  thread2 = args[2];
   BEGIN_AUTOMATON
     STATE (0)
     {
        ft_scheduler_resume (thread1);
     }
     STATE_AWAIT (1,event)
     {
        ft_scheduler_suspend (thread1);
        ft_scheduler_resume  (thread2);
        GOTO(2);
     }
     STATE_AWAIT (2,event)
     {
        ft_scheduler_suspend (thread2);
        ft_scheduler_resume  (thread1);
        GOTO(1);
     }
   END_AUTOMATON
}

The automaton behavior is actually the one of a thread running the following function:

void switch_aut (void *arg)
{
   void       **args    = arg;
   ft_event_t   event   = args[0];
   ft_thread_t  thread1 = args[1];
   ft_thread_t  thread2 = args[2];
   ft_scheduler_resume (thread1);
   while (1) {
      ft_thread_await (event);
      ft_scheduler_suspend (thread1);
      ft_scheduler_resume  (thread2);
      ft_thread_cooperate ();    
      ft_thread_await (event);
      ft_scheduler_suspend (thread2);
      ft_scheduler_resume  (thread1);
      ft_thread_cooperate ();
   }
}

6.3 Comparison with Loft

The two main differences between Loft and FairThreads concerns the ease of programming and the expressiveness given by the existence of private stacks.

6.3.1 Ease of Programming

The first main difference relies of course on the fact that Loft is a language while FairThreads is an API. It indeed implies two very different styles of programming. In Loft, one writes reactive programs which are linked with standard C code through atoms. In FairThreads, one basically writes C code in which some function calls are reactive primitives. Thus, the reactive structure of Loft programs certainly appears more clearly than in FairThreads programs. Actually, in FairThreads native threads are the rule and the automata are the exception, while it is the converse in Loft. The passage from a thread to an automaton can need in FairThreads the complete rewriting of the code in order to fit with the automata format. In Loft, the passage from an automaton to a thread or vice-versa is extremely simplified, and is actually concentrated in the keyword native. For example, the Loft module that corresponds to the previous switch_aut automaton is:

module switch_aut (ft_event_t event,ft_thread_t thread1,ft_thread_t thread2)
resume (local(thread1));
while (1) do
   await (local(event));
   suspend (local(thread1));
   resume (local(thread2));
   cooperate;
   await (local(event));
   suspend (local(thread2));
   resume (local(thread1));
   cooperate;
end
end module

it produces code which is very similar to the previous automaton. However, by adding the native keyword, the code produced would be very close to the one of the corresponding fair thread. The linking mechanism of Loft is a simplification of the one of FairThreads as there is no need in Loft to be unlinked in order to be able to link to another scheduler. Actually, when linking to a target scheduler, the unlinking of the source scheduler is implicit in Loft. The important point is that is does not implies that the thread stays unlinked, which would need the use of a native thread. The arguments of threads in FairThreads have to be passed through an unique parameter of type void*. This complicates things when several arguments have to be passed to a same thread. Actually, these arguments have to be packed in one new structure whose reference is passed to the thread. In case of an automaton, the arguments are accessed with the macro ARGS. An implicit scheduler is always present in Loft which is not the case in FairThreads where there is no implicit main function which would create it. The local variable in Loft have to be explicitely declared as local in Loft while they coincide with local C variables in FairThreads. This is an aspect that is simpler in FairThreads than in Loft.

6.3.2 Private Stacks

FairThreads offers possibilities that do not exist in Loft and which are related to the stacks. Actually, in FairThreads all the threads, except automata, use their own private stack, as they are basically pthreads. The opposite holds in Loft where threads, except instances of native modules, do not need a private stack to execute. The existence of a private stack allows to code functions that are not tail-recursive. For example, consider the following recursive function, which is not tail-recursive because a printing action occurs after the recursive call:

void sum_aux (int n)
{
   if (n == 0) { printf ("0"); return; }
   ft_thread_cooperate_n (n);
   sum_aux (n - 1);
   printf ("+%d",n);
}

The execution exits after a call to the previous function:

 
void sum (void *k)
{
   sum_aux ((int)k);
   printf (" = instant number\n");
   exit(0);
}

One also defines a function to trace the instants:

void trace_instants (void *n)
{
   int i = 0;
   while (1){
      printf ("\ninstant %d: ", i++);
      ft_thread_cooperate();
   }
}

Finally, the main function is defined by:

int main(int argc, void** argv)
{
  ft_scheduler_t sched = ft_scheduler_create ();
  ft_thread_create (sched,trace_instants,NULL,NULL);
  ft_thread_create (sched,sum,NULL,argv[1]);
  ft_scheduler_start(sched);
  ft_exit();
  return 0;
}

Given the argument n, the program exits at the instsnt number s where s is the sum of the n first instegers. For example, with n equals to 4, on gets:

instant 0:
instant 1:
instant 2:
instant 3:
instant 4:
instant 5:
instant 6:
instant 7:
instant 8:
instant 9:
instant 10: 0+1+2+3+4 = instant number

Thus, the thread executing sum absolutely needs a stack to store the partial results of the recursive calls because several instants are passing before the final result is computed.

6.3.3 Select

There is no counterpart in Loft of the ft_thread_select functions. Thus, the waiting of one amongst several events must be coded indirectly in Loft. Next versions of Loft should certainly propose a primitive similar to ft_thread_select of FairThreads.

6.4 Comparison with POSIX Threads

When comparing FairThreads with POSIX, the first point to note is of course that FairThreads is basically cooperative while POSIX is basically preemptive. More precisely, threads linked to the same scheduler are executed cooperatively. As a consequence, data protection is considered very differently in the two contexts: in POSIX mutexes are basically used for that purpose, while in FairThreads schedulers are used for the same purpose. If one is sure that a data can only be accessed by the threads linked to a given scheduler, one can safely deduce that no race condition occur for the data. If accessing the data needs more than simple atomic instructions, a boolean variable can be used to lock and unlock the variable. In this case, however, one is in a situation very similar to the one using mutexes, with all their identified problems. The second point to note when comparing FairThreads and POSIX is that FairThreads proposes the notion of an automaton which does not need the full power of a kernel thread to execute. Using automata, one can pass around the limitations of POSIX implementations in which threads are mapped onto kernel threads. One ends the comparison with POSIX by detailing some others important differences.

Threads and Scheduling

The POSIX notion of a thread attribute and the related functions pthread_attr_* do not exist in FairThreads in which threads are more simply of two types: standard fair threads or automata. Like POSIX (and unlike Java), there is no primitive to start a fair thread which will be automatically run after creation by ft_thread_create. Equality of fair threads is standard pointers equality as the type ft_thread_t is a pointer type (no equivalent of pthread_equals). Data local to a thread can be defined in POSIX with the notion of keys and of thread-specific data (functions pthread_key_* and pthread_getspecific, and pthread_setspecific). There is no direct counterpart in FairThreads. However, the POSIX functions for data local can be used in FairThreads by means of the ft_pthread function. The pthread_once function of POSIX can be simply implemented with a boolean in FairThreads as one is in a cooperative context. The functions related to POSIX scheduling policies (for example pthread_setschedparam) are of no use in FairThreads where only one policy is available: the fair one. Note that this policy more or less corresponds to the SCHED_FIFO policy of POSIX.

Priorities

Priority of POSIX do not exist in FairThreads. Priority-based systems can be coded in FairThreads using several schedulers to manage the various priority levels. This is another reason why FairThreads is simpler than POSIX.

Mutex and Condition Variables

All the functions related to mutexes (pthread_mutex_* and pthread_muttexattr_*) have no counterpart in FairThreads, where locks are not needed when one stays in the cooperative part of it. As stated before, schedulers in FairThreads can be used to structure applications around protected data which are only accessed by linked threads; in this respect, schedulers are playing a role very similar to the one of mutexes. Events in FairThreads basically correspond to condition variables of POSIX. To generate an event corresponds to pthread_cond_broadcast and to await it corresponds to pthread_cond_wait. The limited variant pthread_cond_timedwait is the counterpart of ft_thread_await_n except that in FairThreads deadlines are counted in instants. There is no direct counterpart for the nondeterministic pthread_cond_signal of POSIX which must be coded in FairThreads in a way similar as the one presented in Single Notification3.1.2.2.

Thread Termination

The notion of a detached thread (pthread_detach) has no counterpart in FairThreads where termination of all threads can be tested at any moment by ft_thread_join. Joining a thread in FairThreads and in POSIX are very similar except that no value are returned by terminating fair threads.

Thread Cancellation

To kill a thread is done in FairThreads with the ft_scheduler_stop function. In POSIX, the cancellation mechanism with the related functions (for example, pthread_setcancelstate) can be used for that purpose. However, it implies that the cancelled thread is in the correct mode and has set meaninful cancellation points. In FairThreads thread cancellation can naturally only occurs at beginning of instants. In this respect, FairThreads is much more simpler than POSIX.

Control over Threads

The suspend-resume mechanism of FairThreads has no direct counterpart in POSIX where it must be coded indirectly. Note that the equivalent functionnality has disapeared from the recent versions of Java (it is deprecated).

7 Implementations

One considers now the question of implementing Loft. Actually, five implementations are described:

The basic implementation basically translates Loft programs into FairThreads. Its is a complete implementation which can deal with non-native modules as well as with native ones.
Three implementations correspond to the previous Rewrite, Replace and Optim semantics.
Finally, the last implementation is suited for embedded systems. Actually, it can be seen as a mix of Optim with the automata-based part of the translation in FairThreads.

7.1 Implementation in FairThreads

The basic implementation of Loft uses FairThreads in C which is very close to it (let us recall that Loft means Language Over Fair Threads) and which was introduced in FairThreads Overview6.1. FairThreads is supposed to be executed in the context of a preemptive OS. In this context, unlinked fair threads are run by dedicated native threads, at the kernel level. This is exactly what is needed to implement native modules of Loft. The implementation described in this section is complete, as opposite to implementations considered later, that deal only with the cooperative part of Loft.

7.1.1 Translation in FairThreads

The implementation of Loft basically consists in a translation of modules in FairThreads. The native modules are translated into standard fair threads, while the non-native ones are translated into automata. The translation is rather direct:

Types event_t, scheduler_t, thread_t of Loft are directly mapped to the corresponding types ft_event_t, ft_scheduler_t, ft_thread_t of FairThreads.
Orders of Loft are directly translated into corresponding orders of FairThreads. For example, stop (t) is translated into ft_scheduler_stop (t).
The generate instruction of Loft translates into ft_thread_generate or into ft_scheduler_broadcast, depending on the linking of the executing thread.
The instructions cooperate is translated into ft_thread_cooperate when a native module is considered, and into GOTO_NEXT when it is not native.
Instruction await of Loft translates into ft_thread_await or into STATE_AWAIT. Other non-atomic instructions are translated into their equivalent statements of FairThreads.

However, some points are specific to the implementation. They basically concern schedulers, deallocation, and the main module.

Schedulers are always run as standard autonomous pthreads, using the function ft_scheduler_start. Actually, the scheduler_react function of Loft becomes useless and is not implemented.
The main module must always be defined. Thus, the function loft_initialize becomes useless.
Deallocation functions are not implemented. It is left to the programmer to define means to deallocate objects created in the heap.

7.1.2 Modules

Each instance of a module is mapped onto a fair thread of type ft_thread_t. This thread is either a standard fair thread in case of a native module, or an automaton in case of a non-native one. Thus, only instances of non-native modules are run autonomously as standard pthreads. The translation will be described through the example of a module. The example considered is the following module that waits for the event go and then prints a message 100 times:

module native m (char *s)
local int i;
await (go);
{local(i) = 0;}
repeat (100) do
   printf ("instant %d: %s\n",local(i)++,local(s));
   cooperate;
end
exit (0);
end module

One first describes the translation of m, then one considers the change where m is a non-native module. In both cases, the module translates into a data structure for local variables, a function that describes the module behavior, and two functions to create new instances of the module.

7.1.2.1 Native Modules

The translation of the module m first defines the datatype structure m_data_ to store the parameters and local variables of m.

typedef struct m_data_
{
   char* s;
   int i;
} *m_data_;

Second a function m_finalize is defined to process the finalizing atom, which in this case is absent.

void m_finalize (void *_args)
{
   m_data_ _local_vars = _args;
}

Third, a function m_fun is built directly from the module body (one has manually indented the code in order to make it more readable).

void m_fun (void *_args)
{
   thread_t _self = ft_thread_self ();
   m_data_ _local_vars = (m_data_)_args;
   _set_thread_return_code (_self,ft_thread_await (go));
   {(_local_vars->i )  = 0;}
   {
       int _i0 = 100;
       for(;_i0>0;_i0--){
          {
             printf ("instant %d: %s\n",
                      (_local_vars-> i ) ++,
                      (_local_vars-> s ) );
          }
          ft_thread_cooperate ();
       }
   }
   {exit (0);}
}

Several points can be noticed in the translation:

The executing thread returned by ft_thread_self is assigned to the variable _self in order to be used later, in the function body.
The local data of the thread is obtained by the argument arg which is thus cast to the correct type. The macro local used in the module body is expanded into an access to the _local_vars variable.
The result of the call of ft_thread_await is passed to a special function that makes it available by calling the return_code function.
The repeat loop is translated into a standard for loop controlled by an automatic variable (produced by the compiler with an unique name).

Third, the m_create_in function is defined. It first allocates a structure _locals for parameters and local variables of the returned thread _th. A special allocation function is used, called mymalloc. Actually it just adds some tracing possibilities to the standard malloc function of C. The parameter are copied in _locals. Then, a fair thread is created using ft_thread_create. Finally, the new pthread is detached (using the pthread_detach function of Pthread), in order to avoid the limitation on the number of pthreads that can be attached in an application. Actually, the join primitive of FairThreads is not implemented using pthtread_join which supposes that threads are attached.

thread_t m_create_in (scheduler_t _sched ,char* s)
{
   thread_t _thread;
   m_data_ _locals = mymalloc (sizeof (struct m_data_));
   _locals->s = s;
   _thread = ft_thread_create (_sched,m_fun,m_finalize,_locals);
   pthread_detach (ft_pthread (_thread));
   return _thread;
}

Finally, the m_create function is defined as a call to m_create_in with the implicit scheduler as parameter.

thread_t m_create (char* s)
{
   return m_create_in (implicit_scheduler (),s);
}

7.1.2.2 Non-native modules

One now considers non-native modules and changes the previous module m into a non-native one. Syntaxically, this just means to suppress the native keyword in the definition of m. Non-native modules are naturally translated into automata as they do not need the full power of a dedicated pthread to execute. As auxiliary variables used in the translation, such as the previous variable _i0, cannot anymore be implemented as automatic variables, they are introduced in the local data structure produced. The m_data_ becomes (_i0 is renamed in _loc0):

typedef struct m_data_
{
    int _loc0;
    char* s;
    int i;
} *m_;

The automaton produced by the translation of m has 9 states and is defined by a function which is as previously called m_fun. The first state is used to await the event go. State 1 contains initialization of the local variable i. States 2,3, and 6 implements the repeat loop, using the local variable _loc0, which is initialized in state 2, tested in 3, and decremented in 6. State 4 contains the atomic printing action. State 5 corresponds to the cooperate instruction. State 7 is an auxiliary state (which should probably be removed). Finally, state 8 contains the call to the exit function.

DEFINE_AUTOMATON(m_fun)
{
m_data_ local_vars = (m_data_)_get_thread_args  (_self);
BEGIN_AUTOMATON
STATE_AWAIT (0,go) {}
STATE (1) {local(i) = 0;}
STATE (2) {local(_loc0) = 100;}
STATE (3) {if (local(_loc0)<=0) IMMEDIATE(7);}
STATE (4) {printf ("instant %d: %s\n",local(i)++,local(s));}
STATE (5) {GOTO_NEXT;}
STATE (6) {local(_loc0)--; IMMEDIATE (3); }
STATE (7) {}
STATE (8) {exit (0);}
END_AUTOMATON
}

The m_create_in function creates an automaton using ft_automaton_create:

thread_t m_create_in (scheduler_t sched ,char* s)
{
   thread_t _thread;
   m_ locals = malloc (sizeof (struct m_));
   locals->s = s;
   _thread = ft_automaton_create (sched,m_fun,m_finalize,locals);
   pthread_detach (ft_pthread (_thread));
   return _thread;
}

7.1.2.3 Main Module

The main function creates the implicit scheduler, then creates the main thread in it, and finally starts the implicit scheduler.

static scheduler_t _implicit;

int main (int n,char **args)
{
   _implicit = scheduler_create ();
   _main_create_in (_implicit,n,args);
   ft_scheduler_start (_implicit);
   ft_exit ();
   return 0;   
}

The pthread running the main function is exited, using the ft_exit function, leaving thus the implicit scheduler running autonomously.

7.1.3 Efficiency

Efficiency of the implementation is basically based on two points:

There is no busy-waiting on absent events because fair threads waiting for an event are placed into a list associated to it, and then become transparent for the scheduler. Transparent fair threads return to the visible state as soon as the event they are waiting for becomes generated. Thus, one gets an implementation using queues to store waiting threads which actually implements the method described in section OPTIM Semantics5.5.
The use of automata for non-native threads is very efficient. Actually, an automaton is implemented as a switch instruction which dispatches the control to the appropriate state. Moreover, there is no limit on the number of automata that can be created, which is not true for native threads.

However, the implementation in FairThreads has a major drawback: it absolutely needs the presence of pthreads (even if they are not started, with non-native modules) and thus suffers of the problems related to pthreads. Amongst them, one can cite the need of a private stack for each pthread, which is memory consuming, and the presence of uncontrolled context-switches, which is inherent to the preemptive basis of pthreads. This last characteristic is clearly time consuming. Because pthreads are mandatory, the described implementation cannot be used for embedded systems with limited resources; this is the reason why the implementation described in section Direct Implementation7.5 is proposed.

7.2 Implementation of REWRITE

One now switches to implementations of the cooperative part of Loft. In this section, one directly implements the semantics rules of section Instructions in REWRITE5.2. As the semantics, this implementation is called Rewrite to emphasize on the fact that it mimics the rewriting rules execution. The main purpose of Rewrite is to serve as a reference for others implementations.

7.2.1 Instructions

Instructions are of type instruction_t. Here is the definition of the type:

struct _instruction_t
{
   char type;
}*instruction_t;

The type instruction_t is used to defined types of specific instructions. They all contain a field parent of type instruction_t which store the actual type of the instruction. For example, the type of sequences (to simplify, one only considers binary sequences) sequence_t, is defined by:

typedef struct _sequence_t
{
   instruction_t parent;
   instruction_t left;
   instruction_t right;
}
*sequence_t;

The make_sequence function returns new sequences:

instruction_t make_sequence (instruction_t left,instruction_t right)
{
   sequence_t res = mymalloc (sizeof (struct _sequence_t));
   res->parent.type = SEQ;
   res->left = left;
   res->right = right;
   return (instruction_t)res;
}

Types corresponding to other instructions are build in the same way, and for simplicity are not consider here. The scheduler and the environment which are transformed by the rewriting rules are considered as global variables in the implementation. The prototype of the rewriting function is:

rewrite_result_t rewrite (instruction_t);

The result of a rewriting (of type rewrite_result_t) is a structure made of two components: the status (TERM, CONT, COOP, or RETURN) and the residual instruction. For example, the implementation of the rewriting of a sequence is:

rewrite_result_t sequence_rewrite (instruction_t inst)
{
   sequence_t s = (sequence_t)inst;
   rewrite_result_t res = rewrite (s->left);
   if (res->status != TERM) {
      return
          make_rewrite_result (res->status,make_sequence(res->inst,s->right));
   }
   return rewrite(s->right);
}

Note that a new sequence is systematically build by make_sequence when the left branch is not terminated.

7.2.2 Modules

The implementation of modules is close to the one presented in section Implementation in FairThreads7.1 except that pthreads are no more used and are replaced by elements of type instruction_t. The translation of a module m creates several items:

A structure m_data_ which defines the local data associated to the module (parameters and local variables). The structure m_data_ is constructed exactly as in Implementation in FairThreads7.1.
The function m_selector which contains all the atomic instructions called in the module.
The function m which returns the instruction of type instruction_t build from the module body.
The two functions m_create_in and m_create which returns new instances of m. m_create is the same as in Implementation in FairThreads7.1.

As example, consider the following (non-native) module m (also considered in Implementation in FairThreads7.1):

module m (char *s)
local int i;
await (go);
{local(i) = 0;}
repeat (100) do
   printf ("instant %d: %s\n",local(i)++,local(s));
   cooperate;
end
exit (0);
end module

The function m_selector lists all the atoms and execute them according to an integer parameter:

static void *m_selector (int _n,m_data_ _local_vars)
{
    switch (_n) {        
      case 0: {void *res = (void*)(go); return res;}
      case 1: {local(i) = 0;} return NULL;
      case 2: {printf("instant %d: %s\n",local(i)++,local(s));} return NULL;
      case 3: {void *res = (void*)(100); return res;}
      case 4: {exit(0);} return NULL;
    }
    return NULL;
}

The m function returns an instruction which actually corresponds to the syntax tree of the module:

instruction_t m (m_data_ _local_vars)
{
    return 
      make_sequence (
        make_sequence (
          make_sequence (
             make_await (make_wrapper (m_selector,0)),
             make_atom (m_selector,1)),
          make_repeat (make_wrapper (m_selector,3),
             make_sequence (make_atom (m_selector,2),make_cooperate ()))),
        make_atom (m_selector,4));
}

The function make_wrapper returns a wrapper which is a structure which contains a selector, as the previous m_selector, and an index. The execution of an atom made from a wrapper returns the evaluation of the case corresponding to the index of the selector . Thus, the selector is evaluated at execution time and not when the thread is constructed. Finally, the function m_create_in which creates threads is defined by:

thread_t m_create_in (scheduler_t _sched, char* s)
{
   thread_t _thread;
   m_data_ _locals = mymalloc (sizeof (struct m_data_));
   _locals->s = s;
   _thread = make_thread (_sched,m (_locals),m_finalize,_locals);
   add_to_link (_sched,_thread);
   return _thread;
}

7.2.3 Threads

Threads are structures defined in Semantics Basics5.1. The type thread_t is implemented by:

typedef struct _thread_t
{
   instruction_t          *run;
   finalizer_t             finalizer;
   void                   *locals;
   char                    state;
   char                    suspended;
   event_t                *signal;
   thread_t               *called;
   char                    code;
   scheduler_t            *scheduler;
}*thread_t;

The field run is the instruction executed by the thread. locals points to the thread local variables. The state of the thread is contained in state. The thread is suspended when suspended is different from 0. The event signal is generated when the thread terminates its execution. The field called is set while a thread is under execution through a run instruction. The field code holds the return code of some instructions (namely, await, join, and get_value). Finally, scheduler denotes the scheduler in which the thread is currently executed. The make_thread function returns a new thread in which a the termination signal is generated at the end of the executed instruction:

thread_t make_thread (scheduler_t sched,
                      instruction_t inst,
                      finalizer_t final,
                      void *locals)

{
   thread_t res      = mymalloc (sizeof (struct _thread_t));
   res->signal       = event_create_in (sched);
   res->run =
      make_sequence (
         inst,
         make_generate (make_const_expression (res->signal))
      );
   res->finalizer    = final;
   res->state        = _CONT;
   res->suspended    = 0;
   res->called       = NULL;
   res->code         = OK;
   res->locals       = locals;
   res->scheduler    = sched;
   return res;
}

The reaction of a thread consists in a rewriting of the instruction embedded in it:

void fire_thread (thread_t thread)
{
   rewrite_result_t res;
   thread->scheduler->current = thread;
   res = rewrite (thread->run);
   return_processing (thread,res->status);
   thread->run = res->inst;
}

The return_processing function basically replaces the RETURN status by TERM.

7.2.4 Schedulers

Schedulers are structures defined in Semantics Basics5.1 and implemented by:

typedef struct _scheduler_t
{
   item_list_t          linked;
   thread_t             current;
   item_list_t          events;
   item_list_t          orders;
   item_list_t          to_broadcast;
   char                 eoi;
   char                 move;
}*scheduler_t;

One just describes the way the scheduler reacts, corresponding to the instant function:

void instant (scheduler_t sched)
{
   start_instant (sched);
   while (1) {
      fire_all_threads (sched);
      if (stable (sched)) break;
      make_decision (sched);
   }
}

The start_instant function called at the beginning of each new instant is defined by:

static void start_instant (scheduler_t sched)
{
   sched->move = sched->eoi = 0;
   transfer_threads       (sched);
   transfer_all_events    (sched);
   thread_changes         (sched);
   reset_scheduler        (sched);
}

During a phase, the threads in linked are all fired in turn:

static void fire_all_threads (scheduler_t sched)
{
   item_cell_t cell = sched->linked->first;
   while (cell) {
      thread_t th = cell->item;      
      if (fireable (th)) fire_thread (th);
      cell = cell->next;
   }
}

Ths stability of the scheduler is tested by the function stable defined by:

static int stable (scheduler_t sched)
{
   item_cell_t cell = sched->linked->first;
   while (cell) {
      if (fireable ((thread_t)cell->item)) return 0;
      cell = cell->next;
   }
   return 1;
}

Finally, the end of the current instant is decided when the make_decision function is called while the move flag is false:

static void make_decision (scheduler_t sched)
{
   if (sched->move) sched->move = 0; else sched->eoi = 1;
}

7.3 Implementation of REPLACE

One now consider the variant Replace of Rewrite. The basic idea is to introduce states in instructions and to change instructions states during the rewriting process in order to limit to the maximum the creation of new instructions. One just gives here the main changes with the previous Rewrite implementation, concerning cooperate, sequences, and loops.

7.3.1 Cooperate

The cooperate instruction has a state which is set to indicate that the instruction has been executed:

char cooperate_rewrite (instruction_t inst)
{
   cooperate_t coop = (cooperate_t) inst;
   if (coop->state) return TERM;
   coop->state = 1;
   return COOP;
}

A function is defined which resets the state of the instruction:

void cooperate_reset (instruction_t inst)
{
   cooperate_t coop = (cooperate_t) inst;
   coop->state = 0;
}

7.3.2 Sequence

A state is also introduced in sequences (only binary ones are considered) which is set when the left branch of the sequence is terminated:

char sequence_rewrite (instruction_t inst)
{
   sequence_t s = (sequence_t)inst;
   if (0 == s->state) {   
      int res = rewrite (s->left);
      if (res != TERM) return res;
      s->state = 1;
   }
   return rewrite(s->right);
}

Note that the rewrite function directly returns the execution status because no new instruction is created as in Rewrite. The reset function for sequence resets the state and recursively resets the branches of the sequence:

void sequence_reset (instruction_t inst)
{
   sequence_t s = (sequence_t)inst;
   s->state = 0;
   reset_instruction (s->left);
   reset_instruction (s->right);   
}

7.3.3 While

The while loop has a state which is set to 1 when the expression is evaluated. When the loop body terminates, it is reset and the state is reset to 0 to force the expression to be re-evaluated:

char while_rewrite (instruction_t inst)
{
   while_t w = (while_t)inst;
   char r;
   while (1) {
      if (0 == w->state) {
         int b = (int)evaluate (w->exp);
         if (!b) return TERM;
         w->state = 1;
      }
      r = rewrite (w->body);
      if (r != TERM) return r;
      reset_instruction (w->body);
      w->state = 0;
   }
}

7.4 Implementation of OPTIM

The semantics Optim of OPTIM Semantics5.5 can be now implemented as a variant of the previous implementation Replace, to avoid busy-waiting on events. One has made a little change with the semantics of OPTIM Semantics5.5 by implementing the global table waiting of the scheduler with a list included in each event. Two functions are defined to register threads returning the WAIT status: register_in_event (e) registers the executing thread (actually self ()) in the waiting list associated to the event e, and register_in_timer (s) registers the executing thread in the timer list of the scheduler s. One now considers the implementation of the await instruction (in both the limited and the unlimited forms).

7.4.1 Await

If the awaited event is not present, then the await instruction is registered in the waiting queue of the event and WAIT is returned.

char await_rewrite (instruction_t inst)
{
   await_t a = (await_t)inst;
   if (0 == a->state) {
      a->event = evaluate (a->exp);
      a->state = 1;
   }

   if (is_present (a->event)) return TERM;
   register_in_event (a->event);
   return WAIT;
}

Now the status CONT and COOP are not returned. Either, the instruction terminates because the event is present (and the status returned is TERM), or it is registered in the event.

7.4.2 Limited Await

For a limited await registration is both in the waiting queue of the event and in the scheduler timer, before returning WAIT.

char limited_await_rewrite (instruction_t inst)
{
   limited_await_t a = (limited_await_t)inst;
   scheduler_t sched = current_scheduler ();
   if (0 == a->state) {
      a->event = evaluate (a->evt_exp);
      a->limit = (int)evaluate (a->limit_exp);
      self ()->deadline = a->limit + sched->instant;
      a->state = 1;
   }
   if (self ()->deadline <= sched->instant) {
      self ()->code = ETIMEOUT;
      return TERM;
   }
   if (is_present (a->event)) {
      self ()->code = OK;
      return TERM;
   }
   register_in_event (a->event);
   register_in_timer (sched);
   return WAIT;
}

7.4.3 Awake

The function generate now calls the awake function in order to awake the threads waiting for the generated event. To awake a thread makes it once again executable by the scheduler by setting its state to CONT:

static void awake (event_t event)
{
   item_cell_t cell = event->waiting->first;

   while (cell) {
      thread_t th = cell->item;
      if (!th->suspended && th->state == WAIT) th->state = CONT;
      cell = cell->next;
   }
   reset_item_list (event->waiting);
}

The processing of deadlines uses the auxiliary function which returns 1 if the deadline is reached and 0 otherwise:

static int deadline_reached (void *item)
{
   thread_t thread = item;
   if (thread->deadline > current_scheduler ()->instant) return 0;
   if (thread->state == WAIT) thread->state = CONT;
   return 1;
}

The awaking of threads with deadlines reached is done at the beginning of each instant, and the start_instant function becomes:

static void start_instant (scheduler_t sched)
{
   sched->instant++;
   sched->move = sched->eoi = 0;
   awake_deadline_reached (sched);
   transfer_threads       (sched);
   transfer_all_events    (sched);
   thread_changes         (sched);
}

where the awake_deadline function calls deadline_reached on all the elements in timer, removing them when 1 is returned.

7.5 Direct Implementation

One considers now an implementation of Loft which mixes the previous implementation of the Optim semantics described in Implementation of OPTIM7.4 and the automata-based approach of the implementation in FairThreads considered in Implementation in FairThreads7.1, but without the need of pthreads anymore. One calls this implementation Direct as it consists in a rather direct translation in C, without the need of some particular library. Moreover, Direct use an efficient algorithm to access the next thread which needs execution. One gets an efficient implementation with low consumption of memory. This implementation is suited for using Loft on embedded systems with limited resources. An experiment described in section Prey-Predator Demo8.3 has been made with the GBA (Game Boy Advance) platform of Nintendo.

7.5.1 Event Queues

Event queues are used to avoid busy-waiting not only during one instant, but also accross several instants. A thread which is awaiting a not present event registers itself in a queue associated to the event. Up to this point, the thread falls into a new WAIT state, and will not be considered for execution by the scheduler to which it is linked, until the event is generated. When an event is generated, elements of the queue associated to it all return in the normal execution state. Note that the order in which the threads are executed is not perturbed by this mechanism which is actually just a way to suspend execution of threads awaiting non generated events. In this case, suspension and resumption are immediate, i.e. they become effective in the current instant and not in the next one as it is the case with the suspend/resume instructions.

7.5.2 Translation in C

Let us consider another time the module of Modules. The translation makes use of the following macros, which are very similar to the ones of FairThreads:

#define BEGIN_AUTOMATON\
      while (1) {\
         switch (_local_vars->_state)\
         {	    

#define STATE(n)\
            case n: SET_STATE(n);

#define END_AUTOMATON\
            default: THE_END\
         }\
      }

#define SET_STATE(n) _local_vars->_state=n
#define GOTO_NEXT    {SET_STATE(_local_vars->_state+1); return COOP;}
#define GOTO(n)      {SET_STATE(n); return COOP;}
#define IMMEDIATE(n) {SET_STATE(n); break;}
#define THE_END      {SET_STATE(-1); generate (self()->signal); return TERM;}

The code is actually very close to the one obtained with the translation in FairThreads. However, now the state of the automaton (_state) becomes explicit and is considered as a local variable. The local data structure becomes:

typedef struct m_data_
{
    event_t _loc0;
    int _loc1;
    char* s;
    int i;
    int _state;
} *m_data_;

The module body is translated into the automaton m_automaton which is the function:

char m_automaton (m_data_ _local_vars)
{
BEGIN_AUTOMATON
STATE(0) {local(_loc0)=go;}
STATE(1) {
   event_t event = local(_loc0);
   if (event->scheduler != current_scheduler ()) {
      self()->code = EBADLINK;
   } else if (!is_present (event)) {
      register_in_event (event);
      return WAIT;
   }
}
STATE(2) {local(i) = 0;}
STATE(3) {local(_loc1) = 100;}

STATE(4) {if (local(_loc1)<=0) IMMEDIATE(8);}
STATE(5) {printf ("instant %d: %s\n",local(i)++,local(s));}
STATE(6) {GOTO_NEXT;}
STATE(7) {local(_loc1)--; IMMEDIATE (4);}
STATE(8)
STATE(9) {exit (0);};
END_AUTOMATON
}

The code of state 1 implements the await (go) instruction. If go is not present, then the executing thread (returned by self) is registered in the event and the status returned is WAIT. Thus, the thread will never be considered for execution until go will be generated. When it will be the case (if it happens), then the thread status returns to CONT and thus the execution will resume. An important point to note is that the order in which threads are executed always remains the same and is not changed by awaiting events. The function which creates a thread is:

thread_t m_create_in (scheduler_t _sched,char* s)
{
   thread_t _thread;
   m_data_ _locals = mymalloc (sizeof (struct m_data_));
   _locals->s = s;
   _thread = make_thread (_sched,m_automaton,m_finalize,_locals);
   _locals->_state = 0;
   add_to_link (_sched,_thread);
   return _thread;
}

Note the initialization of the automaton state. The function make_thread returns a thread executing the function passed as first parameter with the finalizer as second parameter and the local data as third parameter. The new thread is added in the implicit scheduler by the call to add_to_link.

7.5.3 Main Function

The default main function is defined by:

int main (int argc,char **argv)
{
   loft_initialize ();
   _main_create_in (implicit_scheduler (),argc,argv);
   while (1) scheduler_react (implicit_scheduler ());
   return 0;
}

This function has to be adapted if one does not want all the CPU to be consumed by it.

7.5.4 Access to the Next Thread

The threads placed in linked are considered in turn fot execution. When a thread is in the state WAIT it should not be considered for execution and the scheduler should pass on it without any other action to perform. Of course, the state can later return to CONT when the thread is awaken, and in this case the scheduler will have to reconsider it again. To avoid to consider threads that are in state WAIT, one uses a masking ¹: threads with the state WAIT are masked and disapear from linked. However, they are not definitively extracted from the list and can be reintroduced in it at the same place. This of course happens when a masked thread is awaken. In this way, the efficiency of the scheduler do not depend anymore from threads that are registered in events or in the scheduler timer. To implement masking, linked is implemented as a list of cells with two series of pointers. The first serie describes the structure of the list which does not change except when a thread terminates or when new threads are linked to the scheduler. Note that this can only appear in-between instants. The second serie of pointers gives the next and previous unmasked cell. This pointers are changed when a thread returns WAIT and when it is awaken. Using the structure pointers allows to reinsert a thread at its previous position when awaken. The cost of finding the next unmasked thread to execute is constant. The cost of a masking action is constant. However the cost of unmaking depends on the number of threads (masked and unmasked) present in the list. This is basically because threads are reintroduced in linked at their initial place. One thus gets a very efficient mechanism that is specially useful when a large number of threads is in use while, at each instant, few of them are really executed, the others ones waiting for absent events.

7.6 Conclusion

One has considered five implementations of Loft. The first implementation consists in a direct translation into FairThreads, which is an API based on a computing model very close to Loft. From this point of view, Loft can be seen as a mean to replace an API by a language, in order to ease the programming of fair threads. The implementation is complete, as native modules are taken in account. Actually, instances of native modules are implemented as standard kernel threads, which are provided by the pthread library. Then implementations of the three semantics proposed have been considered. Rewrite is a direct implementation of the basic rewriting rules defining the semantics of Loft. This direct coding is possible because the semantics is deterministic and structurally defined. In Rewrite, new terms are build at each micro-step, which is clearly an unsatisfactory solution. Moreover, the memory is never released (there is no GC!) In the variant Replace of Rewrite, the construction of new terms is limited to the minimum. Basically, in Replace, new allocation occurs only when new threads are added in the system. Thus, the memory remains stable while no new thread is created. Both Rewrite and Replace are using a polling strategy for awaiting events. Actually, a thread executing await (e) tests for the presence of e at each micro-step and during all instants, until e is generated (if it is). This is clearly inefficient. The Optim implementation avoids busy-waiting for absent events, using queues in which waiting threads are placed. Finally, the fifth implementation Direct is suited for embedded systems. As Optim, it uses queues to store threads awaiting for an event are used, instead of busy-waiting. As the implementation in FairThreads, it translates non-native modules in automata. However, no library of kernel threads must be used in this case. Moreover, Direct uses a masking technique to get direct access to the next thread to be executed. One thus gets a very efficient with low memory consumption implementation of the cooperative part of Loft.

8 Examples - 2

One continues the presentation of examples started in chapter Examples - 13. First one considers in Multiprocessing8.1 examples which benefit from multiprocessing. Section Direct Access8.2 considers the masking optimization included in the Direct implementation. Finally, section Prey-Predator Demo8.3 describes the code of an embedded system which runs on a platform with very limited resources (the GBA of Nintendo).

8.1 Multiprocessing

In this section, one considers examples in which the presence of several processors clearly improves the computation efficiency.

8.1.1 Producer/Consumer

Let us return to the producers/consumers example of Producers/Consumers3.1.5 and let us suppose that the actual processing of values is made by some thread-safe C function (a re-entrant function that can be called simultaneously by more than one kernel thread). Thus, each processing thread instance of the process module can unlink from the scheduler during the computation of the result, and re-link to it to return the result. Then, several instances of process can be simultaneously unlinked and running on distinct processors which would certainly speed-up the overall computation. The process module becomes:

module native process ()
local int v;
while (1) do
   while (buffer->length == 0) do
      await (new_input);
      if (buffer->length == 0) then cooperate; end
   end
   if ((local(v) = get (buffer)) == 0) then return; end
   unlink;

   < call of a thread-safe C function >
   link (implicit_scheduler ()); 
   {local(v)--;}
   put (buffer,local(v));
   generate (new_input);
end
end module

In this case, to benefit from multiprocessor machines, one basically has to define the process module as native.

8.1.2 Lucky Prime Numbers

One now considers the production of lucky prime numbers defined in Lucky Prime Numbers8.1.2. The system is decomposed in two uncoupled sub-systems, one for producing prime numbers and one for lucky numbers, that can be run by distinct kernel threads on distinct processors. The sub-systems naturally correspond to schedulers. As the two sub-systems do not share the same instants, result produced have to be stored elsewhere. For this purpose, one uses two fifo channels (of type channel_t defined in Data-Flow Programming3.4). Finally, a comparison thread is defined which goes between schedulers to detect equal values and output them.

scheduler_t lucky_sched, prime_sched;
event_t lucky_go, prime_go;
channel_t lucky_result, prime_result;

8.1.2.1 Compare

One uses a macro which extract values from a channel chan as much as possible, until a value greater than the one in hold is encountered. Moreover, a value equal to hold is output:

#define COMP(chan,go) \
{local(sw) = 0;} \
while (!local(sw)) do \
   GET(chan,go) \
   {\
      int n = extract (chan);\
      if (n == local(hold)) output (n);\
      if (local(hold) <= n) {\
         local(hold) = n;\
         local(sw) = 1;\
      }\
   }\
end

The module that cyclically goes from one scheduler to the other and compares values is defined by:

module compare ()
local int hold, int sw;
while (1) do 
   link (lucky_sched);
   COMP(lucky_result,lucky_go);
   link (prime_sched);
   COMP(prime_result,prime_go);
end
end module

8.1.2.2 Main

The system produces prime numbers in the channel prime_result, lucky numbers in lucky_result, and it creates an instance of compare in the implicit scheduler:

module main (int argc, char **argv)
{
   lucky_sched = scheduler_create ();
   lucky_go = event_create_in (lucky_sched);
   lucky_result = init_channel ();
   scheduler_start (lucky_sched);
   producer_create_in (lucky_sched,lucky_go);
   filter_create_in (lucky_sched,lucky_filtering,1,lucky_go,lucky_result);
   prime_sched = scheduler_create ();
   prime_go = event_create_in (prime_sched);
   prime_result = init_channel ();
   scheduler_start (prime_sched);
   producer_create_in (prime_sched,prime_go);
   filter_create_in (prime_sched,prime_filtering,1,prime_go,prime_result);
   compare_create ();
}
end module

The filter module has been slightly changed to store produced values in a channel instead of, as previously, generating them. Note that the scheduler in which the instance of compare is created is actually irrelevant; the instance could be as well created in any of the two schedulers lucky_sched or prime_sched.

8.1.2.3 Results

One executes the previous program on a SMP (Symetric Multi Processor) bi-processor machine made of 2 PIII 1Ghz processors running on Linux 2.20 with 512Mb memory. One uses the implementation based on FairThreads to produce the first 300 lucky-prime numbers. The previous program is compared with a variant in which there is only one scheduler (both lucky_sched and prime_sched are mapped to the implicit scheduler). The time taken by the initial system with 2 schedulers is 10.5s, while it is 19.2s with the variant with a single scheduler. This shows that the presence of two processors is fully exploited by the program.

8.2 Direct Access

One now illustrates the efficiency of the masking technique used in the Direct implementation to get a constant time access to the next thread.

Best Case

One first considers an example in which the gain of direct access is maximal. The considered system is made of two active threads and of a large number of inactive threads, which thus can be masked. First, one defines a module blocked which just awaits an event and terminates:

module blocked (event_t event)
await (local(event));
end module

The module active prints a certain number of times a message and then exits the program:

module active (char *s)
repeat (CYCLES) do
   printf ("%s ",local(s)); 
   cooperate;
end
end module

Finally, the system one creates two active threads and a large number of threads blocked on the same event:

module main ()
{
   int i;
   event_t go = event_create ();
   active_create ("1");
   for (i = 0; i < THREADS; i++) blocked_create (go);
   active_create ("2");
}
end module

At the beginning of the second instant, the linked list of the implicit scheduler contains THREADS+2 elements. All but the active threads are registered in the waiting list of the event go during the second instant. Then, at the beginning of the third instant, linked contains only the 2 active threads, the other ones being masked. Thus execution of the system do not depend any more of the THREADS number of blocked threads, and can be extremely fast as only 2 threads are run from that moment. To give an idea, it takes about 0.7 seconds to run the program on a PIII 850Mhz with 256M of memory with THREADS equals to 100000 and CYCLES equals to 100. With CYCLES equals to 1000, the time becomes 0.9s. This is not surprising as CPU is mainly consumed during the second instant. Note that, without the masking technique implemented by Direct one should get about 6s for CYCLES equals to 100 but 57s for CYCLES equals to 1000!

Worst Case

Now let us consider the worst case for the masking technique, when threads are actually masked and unmasked at each instant. One changes blocked to get a cyclic behavior:

module blocked (event_t event)
while (1) do
   await (local(event));
   cooperate;
end
end module

Then, one generates the triggering event go at each instant. Now, in the same previous conditions THREADS equals to 100000 and CYCLES equals to 100, the time becomes 27s with the masking technique and 26s without it. With CYCLES equals to 1000, the time is 4m29s with masking and 4m6s without. Thus, in this case, the masking approach leads to a loss of efficency (actually a small one). Of course, intermediate results can be obtained which depend on the frequency of generations of go.

Sieve for Prime Numbers

From the rather academic previous examples, one sees that the gain increases as there are more and more threads that are rarely awaken. In real situations the gain is of course less important than the one of the best case. This is the case with the sieve of Sieve Examples3.3 for producing prime numbers. For producing the 3000 first prime numbers with the making technique it takes 13.7s and without it 17.4s. For producing 5000 prime numbers, it takes 38s with masking and 1m without it. For 10000 numbers and masking: 2m37s and without masking: 4m24s. Thus, in this example the gain obtained by using masking is significant.

8.3 Prey-Predator Demo

One describes the main parts of the code of a demo of a small preys-predators system implemented for the GameBoy Advance (GBA) platform of Nintendo. A predator moves using the arrow keys. Preys are running away from predators and are killed when a predator is sufficiently close to them. New preys are automatically created. A predator is put in an automatic chasing mode with key 'a' and returns in the manual mode with 'b'. More predators can be created with 'l', and more preys with 'r'. Here is a screenshot of the demo running on the VisualBoyAdvance emulator.

The predator is the ghost and the preys are the pacmans. This demo is available at the Web site http://www.gbadev.org.

8.3.1 Sprites

One calls sprites the basic moving graphical objects to which reactive behaviors are associated. Sprites are in limited number in the GBA (actually, 128). The type of sprites sprite_t is:

typedef struct 
{
    s16       x;
    s16       y;
    s16       sx;
    s16       sy;
    u8        index;
    thread_t *behaviors;
    u8        behav_num;
    event_t   destroy;
    u8        dead;
}
struct_sprite_t, *sprite_t;

The types s16 and u8 are integer types (actually, signed 16 bits, and unsigned 8 bits). Fields x and y are the sprite coordinates. sx and sy define the speed vector of the sprite. index is the index of the sprite in the table where sprites are stored. behaviors is an array of threads in elementary behaviors of the sprite can be stored. The size of behaviors is stored in behav_num. The event destroy is used to signal the destruction of the sprite. Finally, the boolean dead is used to note that the sprite is destroyed.

Destruction

Destruction of a sprite is performed using the following module destroy_sprite:

module destroy_sprite (sprite_t s)
{
    int i;
    sprite_t s = local(s);
    for (i=0;i<s->behav_num;i++) stop (s->behaviors[i]);
}
cooperate;
{
    int i;
    sprite_t s = local(s);
    suppress_sprite (s->index);
    for (i=0;i<s->behav_num;i++) thread_deallocate (s->behaviors[i]);
    event_deallocate (s->destroy);
    free (s->behaviors);
    free (s);
    thread_deallocate (self ());
}
end module

First, all components of behaviors are stopped. Then, at the next instant, they are deallocated (by thread_deallocate) with the sprite structure itself. Finally, the executing thread (returned by self) is deallocated. Because of the cooperate, threads are actually stopped at instant N+1, and are deallocated at instant N+2 (remember thar calls to stop and to thread_deallocate produce orders which are not immediately processed, but are delayed to the next instant).

8.3.2 Keyboard

Basically, keys are converted into events. The function create_key_events creates the 8 events needed:

void create_key_events (void)
{
    a_pressed     = event_create ();
    b_pressed     = event_create ();
    up_pressed    = event_create ();
    down_pressed  = event_create ();
    left_pressed  = event_create ();
    right_pressed = event_create ();
    r_pressed     = event_create ();
    l_pressed     = event_create ();
}

The purpose of the module key_handler is to convert at each instant, the key press (obtained through the function pressed_key) into the corresponding events (remark that several keys can be simultaneously processed).

module key_handler ()
while (1) do
   {
      if (pressed_key (KEY_A))     generate (a_pressed);
      if (pressed_key (KEY_B))     generate (b_pressed);
      if (pressed_key (KEY_UP))    generate (up_pressed);
      if (pressed_key (KEY_DOWN))  generate (down_pressed);
      if (pressed_key (KEY_LEFT))  generate (left_pressed);
      if (pressed_key (KEY_RIGHT)) generate (right_pressed);
      if (pressed_key (KEY_R))     generate (r_pressed);
      if (pressed_key (KEY_L))     generate (l_pressed);
   }
   cooperate;
end
end module

The module key_processing associates a callback (of type void (*callback_t) ()) to a key. 10 instants are passed when a key press is detected, in order to give enough time for the key to be released.

module key_processing (event_t key_pressed,callback_t callback)
while(1) do
   await (local(key_pressed));
   {
      local(callback) ();
   }
   repeat (10) do cooperate; end
end
end module

8.3.3 Elementary Behaviors

One describes several elementary behaviors which will be used by sprites: inertia, moves in the 4 directions, and collision.

Inertia

A sprite gets an inertial behavior with the inertia module, in which the sprite given in parameter increments at each instant its coordinates according to its speed.

module inertia (sprite_t s)
while (1) do
   {
      sprite_t s = local(s);
      s->x += s->sx/INERTIA_FACTOR;
      s->y += s->sy/INERTIA_FACTOR;
   }
   cooperate;
end
end module

Moves

With the module move_right, a sprite moves when the event right_pressed is present:

void go_right (sprite_t s,s16 dist)
{
   s->x += dist;
   if (s->x > MAX_BORDER_X) s->x = MAX_BORDER_X;
}

module move_right (sprite_t s,int dist)
while (1) do
   await (right_pressed);
   go_right (local(s),local(dist))
   cooperate;
end
end module

Similar modules are defined for the 3 other directions.

Collisions

A function which performs elementary collision between two sprites is first defined:

static void collision (sprite_t s1,sprite_t s2)
{
   u16 dx = s2->x - s1->x;
   u16 dy = s2->y - s1->y;
   if (dist2 (s1,s2) > min_collision) return;
   s1->sx = -s1->sx;
   s1->sy = -s1->sy;
   if (dx>0) go_left (s1,dx); else go_right (s1,-dx);
   if (dy>0) go_up (s1,dy); else go_down (s1,-dy);
}

Then, the collision behavior of a sprite is described in the module collide. Collision detection is related to a special event which is generated by all sprites subject to collision. This event is generated by all these sprites, at each instant.

module collide (sprite_t me,event_t collide_evt)
local sprite_t other, int i, int exit;
while (1) do
    {
       local(i) = 0;
       local(exit) = 0;
    }
    generate_value(local(collide_evt), (void*)local(me));
    while(!local(exit)) do
        get_value (local(collide_evt),local(i), (void**)&local(other));
        {
           if (return_code () == OK) {
              if (local(me) != local(other)) collision (local(me),local(other));
              local(i)++;
           } else {
              local(exit) = 1;
           }
        }
    end
end module

The body of the module collide is an infinite loop in which the collision event collide_evt is generated at each instant. The sprite which generates the event is passed as value to the generate_value function. Then, all generations of collide_evt are considered in turn, and the function collide is called for all (except of course itself) the generated values, that are the other sprites. The return code becomes different from OK (more precisely: ENEXT) when all the sprites are processed. This is detected at the next instant, and there is thus no need of an explicit cooperate to prevent the loop to be instantaneous. Actually, the collision function implements a very basic algorithm which leads to some unrealistic moves; it can of course be replaced by more realistic (but more expensive!) collision treatments. Moreover, each sprite considers all others sprites, at each instant which is inefficient (complexity in the square of the number of sprites). The algorithm could certainly also be improved in this respect.

8.3.4 Predators

The basic behavior of a predator is defined as the module predator:

module predator (sprite_t me,event_t pred_evt,event_t prey_evt)
local sprite_t other, sprite_t closest, 
      int min, int i, int finished;
while (1) do
    {
       local(i) = 0;
       local(finished) = 0;
       local(min) = predator_visibility;
       generate_value (local(pred_evt), (void*)local(me));
    }
    while (!local(finished)) do
        get_value (local(prey_evt),local(i), (void**)&local(other));
        if (local(me)->dead) then return; end
        {
           if (return_code () == OK) {
              int d2 = dist2 (local(me),local(other));
              if (local(me) != local(other) && d2 < local(min)) {
                 local(min) = d2;
                 local(closest) = local(other);
              }
              local(i)++;
           } else {
               local(finished) = 1;
               if (local(min) < predator_visibility && !local(closest)->dead) 
                          chase (local(me),local(closest));
           }
        }
    end
end
end module

At each instant, the event pred_evt is generated by each predator. It is a valued generation, in which the sprite is passed as value. Then, all values associated to the event prey_event are considered in turn. When a new value is available, one tests if the sprite is dead (it could have been killed just before); the behavior returns if it is the case. Otherwise, the return code is considered:

if it is OK, then other denotes the other sprite. The distance to it is computed (actually, the square of it, returned by dist2), and if the other sprite is distinct from the sprite and closer than the one in the field closest, then closest is changed.
otherwise, the closest prey is chased, provided it can be seen by the predator (according to the global variable predator_visibility) and provided it is still alive (according to its death field). The destroy event of the prey is generated when it is killed.

The module defining a predator sprite is called predator_sprite:

module predator_sprite ()
local thread_t inertia;
{
    sprite_t s = new_sprite (0,GHOST);
    predator_create (s,pred_evt,prey_evt);
    sync_sprite_create (s);
    move_right_create (s,pred_run_step);
    move_left_create (s,pred_run_step);
    move_up_create (s,pred_run_step);
    move_down_create (s,pred_run_step);
    bounce_on_borders_create (s);
    collide_create (s,pred_evt);
    local(inertia) = inertia_create (s);
    suspend (local(inertia));
}
while (1) do
   await (a_pressed);
   resume (local(inertia));
   cooperate;
   await (b_pressed);
   suspend (local(inertia));
   cooperate;
end
end module

The local variable inertia stores the inertia thread giving the inertial behavior to the sprite. Initially, inertial is suspended and the predator is thus immobile. When receving the event a_pressed, inertia is resumed, and it is suspended again with b_pressed. Initially, an atomic action is first executed which performs the following actions:

A new sprite s is first created with a zero speed vector and a ghost aspect.
An instance of the module predator is created. The sprite and the events pred_evt and prey_evt are passed to it.
Several threads are created, all of them taking s as parameter: an instance of sync_sprite to perform synchronisation of the sprite with the corresponding entry in the table of sprites; four instances of move to move the sprite in the 4 directions, according to 4 corresponding events; an instance of bounce_on_borders to let the sprite bounce on the borders of the applet; an instance of collide to let the sprite collide with other predators.
Finally, an instance of the module inertia is created and assigned to the local variable with the same name. The instance is immediately suspended, using the suspend function.

Note that the various threads created are not stored in the behaviors field of the sprite. Actually, only inertia will be accessed after creation, and one use a special local variable to store it.

8.3.5 Preys

The prey sprite is defined in the module prey_sprite:

module prey_sprite ()
local sprite_t s;
{
    sprite_t s = new_sprite (0,PACMAN);
    alloc_behaviors (s,5);
    s->behaviors[0] = inertia_create (s);
    s->behaviors[1] = bounce_on_borders_create (s);
    s->behaviors[2] = prey_create (s,pred_evt,prey_evt);
    s->behaviors[3] = sync_sprite_create (s);
    s->behaviors[4] = collide_create (s,prey_evt);
    local(s) = s;
}
await (local(s)->destroy);
run destroy_sprite (local(s));
thread_deallocate (self ());
end module

The overall behavior is made of several components: the specific one, called prey, an inertia behavior, a bouncing one, a synchronisation behavior and a collision one. All these elementary behaviors are stored in behaviors because they must be destroyed when the prey is killed. The definition of the module prey is very close to the one of predator, with 2 changes: first, the events generated and observed are inverted; second, the chase function is changed by a function that make the prey run away from the predator. After creation of the elementary behaviors, the destroy event is awaited (let us recall that it is generated when the prey is killed by the the predator). Then, an instance of the destroy_sprite module is run in order to deallocate the various behaviors of the prey, and finally the executing thread is deallocated.

8.3.6 Automatic Creation of Preys

The module automatic automatically creates new preys, after a while, when there is none in the system.

module automatic (int delay)
local sprite_t other;
while(1) do
    get_value (prey_evt,0, (void**)&local(other));

    if (return_code () == ENEXT) then
        repeat (local(delay)) do cooperate; end
        repeat (NUMBER_OF_PREYS) do prey_sprite_create (); end
        cooperate;
    end
    cooperate;    
end
end module

At each instant, the event generated by preys (prey_evt) is observed, using the get_value function. If no value with index 0 is available, which means that no prey was present during the previous instant, then new preys are created. Otherwise, the thread simply cooperates.

8.3.7 Main Program

In the context of the GBA, the main module is called GBA_main.

module GBA_main()
{
    prey_evt = event_create ();
    pred_evt = event_create ();
    create_key_events ();
    key_handler_create ();
    automatic_create (CREATION_DELAY);
    predator_sprite_create ();
    key_processing_create (l_pressed,predator_sprite_create);
    key_processing_create (r_pressed,prey_sprite_create);
    max_pred_speed      = 10;
    kill_dist2          = 200;
    predator_visibility = 40000;
    pred_run_step       = 2;
    max_prey_speed      = 30;
    prey_visibility     = 8000;
    prey_run_step       = 3;
}
end module

First, the two events generated at each instant by preys and predators are created. Then, events corresponding to keys are created with an instance of the module key_handler which convert key presses into events. An instance of automatic and one of predator_sprite are created. Finally, some global variables are set to correctly parametrize the demo.

9 Related Work

One considers three domains related to Loft. The first is the one of threads which either appear as libraries or are incorporated in existing languages. The second domain is the synchronous-reactive approach from which Loft takes its major constructs. The third domain contains approaches in which computations are decomposed into small pieces that can be combined to build complex behaviors.

9.1 Thread Based Formalisms

Thread Libraries in C

Several thread libraries exist for C. Among them, the Pthreads library [29] implements the POSIX standard for preemptive threads. LinuxThreads [4] is an implementation of Pthreads for Linux; it is based on native (kernel-level) threads. Quick Threads[25] provides programmers with a minimal support for multi-threading at user-space level. Basically, it implements context-switching in assembly code, and is thus a low-level solution to multi-threading. Gnu Portable Threads[17] (GNU Pth) is a library of purely cooperative threads which has portability as main objective. The Next Generation POSIX Threading project[5] proposes to extend GNU Pth to the M:N model (M user threads, N native threads), with Linux SMP machines as target.

Java Threads

Java introduce threads at language level. Actually, threads are generally heavily used in Java, for example when graphics or networking is involved. No assumption is made on the way threads are scheduled (cooperative or preemptive schedulings are both possible) which makes Java multi-threaded systems difficult to program and to port[23]. This difficulty is pointed out by the suppression from the recent versions (1.2) of the language of the primitives to gain fine control over threads[3]. These primitives actually corresponds to the stop, suspend, and resume of Loft. The use of Java threads with multiprocessor architectures leads to several difficulties which are actually under study in the Java community. One of these difficulties is called the Double Check Locking (DCL) which concerns the Java memory model (JMM) and is rather tricky (see ??? for details). These studies should lead to some changes of the Java thread model in the next versions (1.5) of Java. It should be mentioned that the initial version of FairThreads has been proposed in the context of the Java language[12] in order to simplify concurrent programming in Java; this version was however limited to cooperative threads.

Threads in Functional Languages

Threads are used in several ML-based languages such as CML[31]. CML is preemptively scheduled and threads communication is synchronous and based on channels. Threads are also introduced in CAML[2]; they are implemented by time-sharing on a single processor, and thus cannot benefit from multiprocessor machines. FairThreads has been recently introduced in the Bigloo[1] implementation of Scheme. The present version does not support unlinked threads, but special constructs are introduced to deal with non-blocking cooperative I/Os.

9.2 Reactive Approach

FairThreads and Loft actually belong to the so-called reactive approach[6] which is issued from synchronous languages. One first compare synchronous languages and the reactive approach before describing several reactive programming languages.

Synchronous Languages vs. Reactive Programming

To the family of synchronous languages[19] belong several programming languages which all share the same notion of an instant which is supposed to be of zero duration. In this context, the output of a program at a given instant is synchronous with its input; the synchronous characterization of these languages comes from this hypothesis. Lustre and Signal are two data-flow synchronous languages in which one programs nets of operators, in a style very close to the one of section Data-Flow Programming3.4. At the basis of these dataflow languages are the nets of processes of [24]. Following the StateCharts of D. Harel, several systems have been proposed for synchronous graphical programming, among which are SyncCharts and Argos. The Esterel synchronous language[9] introduces broadcast events, called signals, in an imperative style. However, in Esterel, the absence of an event can be decided immediately and consequences of this absence can take place in the very same instant. This leads to "causality problems" as for example in present S else emit S end. In this program, indeed, the signal S is emitted if it is absent during the instant, which is of course contradictory. As opposite to Esterel, in reactive programming the absence of an event during one instant cannot be decided before the end of this very instant. As a consequence, the reaction to the absence of one event is delayed to the next instant. This is a way to solve the causality problems which are obstacles to modularity. All synchronous languages are static in the sense that they disallow the creation at run time of new parallel components and of new events. Moreover, they do not give users any way to deal with multiprocessor machines. However, synchronous languages generally put the focus on hardware circuits in which dynamic creation does not appear. This is a major difference with reactive programming which limits to software systems. A major claim of synchronous languages is that they make possible proofs and validations of programs. This is a consequence of the existence of a formal semantics and of the static characteristics of these languages. Program proofs and validations have not yet be considered in reactive programming.

Reactive-C

The Reactive-C[11] language was the first proposal for reactive programming in C; in this respect, Loft can be considered as a descendant of it. Reactive-C proposes instants and the merge operator which implements deterministic parallelism. A reaction of the instruction merge i₁ i₂ consists in a reaction of i₁ followed in the same instant by a reaction of i₂. Thus, one has a deterministic left-right operator with which parallelism can be introduced at any level. Reactive-C does not define events, as they are defined in Loft. Reactive-C is basically implemented as a straightforward preprocessor of C. Reactive-C introduces primitives for remote execution of reactive programs over the network. Remote execution is based on the use of the Remote Procedure Call (RPC) mechanism. Distribution is not yet possible neither in Loft nor in FairThreads.

Reactive Programming in Java

The reactive approach has been implemented in Java in several ways. The first one is the SugarCubes[13] framework which is a set of classes for reactive programming in Java. Several related formalisms have been designed, among which are Junior[21] and Rejo [7]. The definition of Rewrite and Replace in Implementations7 is strongly linked to the implementations with the same names of the Java Junior framework [22]. The use of waiting queues has appeared in the implementation of Junior, called ¹, which has been proposed by Laurent Hazard to deal with large number of parallel components and events. The Direct semantics is strongly inspired from it. Rejo is a language which is proposed for high-level reactive programming in a Java based language. Rejo is implemented on top of Junior and thus can benefit from the various implementations of it. As opposite to SugarCubes and Junior, standard Java code can be freely mixed with reactive code. Rejo also introduces primitives for migration of reactive code, implemented using the serialization and the RMI mechanisms of Java.

9.3 Approaches based on Decomposition

Chores and Filaments

Chores[16] and filaments[27] are small pieces of code that do not have private stack and are never preempted. Chores and filaments are designed for fine-grain parallelism programming on shared-memory machines. Chores and filaments are completely executed and cannot be suspended nor resumed. Generally, a pool of threads is devoted to execute them. Chores and chunk-based techniques are described in details in the context of the Java language in [15] and [23]. Automata in FairThreads are close to chores and filaments, but give programmers more freedom for direct coding of states-based algorithms. Automata are also related to mode automata[28] in which states capture the notion of a running mode in the context of the synchronous language Lustre[19].

Cohorts and Staged Computation

Cohort scheduling[26] dynamically reorganizes series of computations on items in an input stream, so that similar computations on different items execute consecutively. Staged computation intends to replace threads. In the staged model, a program is constructed from a collection of stages and each stage has scheduling autonomy to control the order in which operations are executed. Stages are thus very close to instants of FairThreads and cohort scheduling looks very much like cooperative scheduling. In the staged model, emphasis is put on the way to exploit program locality by grouping similar operations in cohorts that are executed at the same stage; in this way, cohorts and staged computations fall in the family of data-flow models.

10 Conclusion

One has defined Loft, a language which extends C to concurrent and parallel programming in a shared memory context. The main questions to which Loft proposes answers can be listed as follows: how synchronous and asynchronous executions can be mixed together in an unified framework? How system architectures which are basically cooperative can benefit from parallelism offered by multiprocessor machines? How to conciliate concurrency and determinism? What compromises should be accepted in order to get a powerful enough programming, while staying in a simple and relatively safe context? The answers in Loft rely on several aspects related to the underlying programming model, to semantics concerns, to implementation issues, and to design choices.

Programming Model

In Loft, the units of concurrency are threads which are defined as instances of modules. There are basically two kinds of threads: threads which are linked to a scheduler and threads which are unlinked. A thread cannot be linked simultaneously to several schedulers, but it can link to distinct schedulers during its execution. All the threads linked to a same scheduler are run in a cooperative mode under the supervision of the scheduler. Moreover, they are sharing events. The threads are run cyclically until all have either cooperated or have reacted to events which are present. The current instant is finished when a stable situation is found in which all the events which are not present can be safely considered as absent. The scheduler can then proceed to the next instant after having incorporated in the system orders sent to it during the instant. These orders can be orders to stop, suspend or resume threads, or to add new created threads. Several schedulers can be simultaneously present in the same Loft program. Each scheduler defines a synchronous area in which threads run at the same pace, because they share the same instants, and in which events are broadcast, because all threads have exactly the same information on the presence and the absence of events. It is possible for some threads instances of native modules to run autonomously, unlinked from any scheduler. When it is the case, these threads behave exactly as standard native threads scheduled by the preemptive operating system. However, unlinked threads still have the possibility to link to a scheduler, and then to behave as a cooperative thread. Of course, they can later on return in the unlinked state if they want. Several advantages of the programming model of Loft are to be pointed out:

Loft offers an unified framework in which both cooperative and preemptive approaches can be used jointly. Actually, in Loft, users have control on the way threads are scheduled. Loft provides users with programming primitives allowing threads to dynamically link to schedulers and to dynamically unlink from them. This framework is flexible as the structure of schedulers can evolve dynamically. It actually proposes N:M architectures, with the advantage that there exist a syntax to deal with them (as opposite for example with the threads of the Solaris system of Sun).
Cooperative systems can be coded in a simple way, without the need of locks to protect data. Moreover, instants give automatic synchronizations that can greatly simplify programming in certain situations. Linked threads have a precise and clear semantics. The point is that systems exclusively made of threads linked to one unique scheduler are completely deterministic. This simplifies reasonning about program. It also makes debugging easier. Thus, in the cooperative context, one gets a simple language for concurrent programming which is deterministic and have a precise semantics.
When a thread unlinks from a scheduler, it becomes an autonomous native thread which can be run in real parallelism, on a distinct processor. Schedulers can also be executed by native threads, on distinct processors. Thus, Loft give the user access to multiprocessing in a very simple way.

Semantics

Loft has a formal semantics. Actually the semantics is an operational one which describes the steps that should be executed to reach a result. The presence of a formal semantics is not anecdoctical and is useful for at least two reasons: first, it give the exact meaning of programs which can be much more complex than in traditional programming, as a result of concurrency and of the new dimension introduced by instants. The purpose of the semantics is to exactly define what should be the execution of a program made of several concurrent threads along the passing of instants. Second, variants of the semantics are defined in order to model various aspects of the execution. More precisely, a series of semantics variants defines relatively small steps starting from the basic Rewrite semantics and reaching the Direct semantics which is designed to be extremely efficient. One can get strong confidence in each of these steps, and thus see this whole process as a way to validate the implementation. The presence of a formal semantics is in this respect a means for a safer programming.

Implementation

The language has several implementations. The first one is a direct translation in FairThreads which is an API of threads closely related to Loft. The second implementation Rewrite, which serves as a reference for the cooperative part of the language, is a direct translation of the rewriting rules of the semantics. The Replace implementation optimises the memory use, while the Optim implementation optimizes the waiting of events, to avoid busy-waiting, and can be used for embedded systems with few ressources.

To sum-up, the main characteristics of Loft are:

Concurrent language based on C.
Both cooperative and preemptive approaches mixable in the same framework.
Possibility of real parallelism (use of multiprocessors).
Determinism of programs belonging to the cooperative part of the language.
Precise (formal) semantics defined for the cooperative part.
Efficient implementation, usable for embedded systems.

Finally, one would say that the external syntax of Loft could be subject to change in the future. This should not be shocking as, after all, syntax is rather secondary compared to the programming model and to the semantics. One now considers possible topics for future work.

Future Work

Reactive Instructions

In Loft, as in FairThreads, parallelism is only at top-level. Indeed, there is no parallel construct which can be used at any level, as in Esterel for example. Consider for example a statement of the form (A || B);C, where || is the parallel operator. In it, the parallel operator is not at the root of the syntax tree, but on the left branch of a sequence instruction. In Loft one must use indirect constructions for expressing parallelism at arbitrary levels of nesting. The previous example could be for instance coded using a join instruction to wait for the termination of both A and B. A possible extension of Loft would be to introduce a deterministic parallel operator, as for example the one of Reactive-C. Others constructs could also be added to the language; for example, the until module of section Run12.9.10 could be directly introduced in the language as a preemption instruction. One could also consider local events whose extend is defined syntaxically by a language construct, as it is the case in standard reactive programming, such as Junior.

Networking

A rather natural extension of Loft would be to interface it with the network. Thus, distributed synchronous areas could be defined, running on distinct machines. The difficulty is to give a way for a thread to link to a distant machine. This is a kind of thread migration that would open programming to systems based on agents. Some open problems are raised by such an approach: what exactly are the threads that can migrate and how are they dealing with the memory? What is the syntax for migration? How migrating threads can dynamically use the resources of a distant site? How can a site be protected from dangerous migrating threads (for example, not cooperating ones)?

Others Languages

Loft could be rather easily extended to C++. This would simplify the code produced by the Loft translator as classes encapsulating the local data of threads could be generated. Of course, other target languages could also be considered.

Availability

As the others formalisms for reactive programming, Loft and FairThreads are available under the Gnu public licence on the site http://www.inria.fr/mimosa/rp.

11 Annex A: API of FairThreads

Constructors

ft_scheduler_t ft_scheduler_create (void);
 
ft_thread_t    ft_thread_create    (ft_scheduler_t scheduler,
                                    void (*runnable)(void*),
                                    void (*cleanup)(void*),
                                    void *args);

ft_thread_t    ft_automaton_create (ft_scheduler_t,
                                    void (*automaton)(ft_thread_t),
                                    void (*cleanup)(void*),
                                    void *args);

ft_event_t     ft_event_create     (ft_scheduler_t scheduler);

Starting a Scheduler

int ft_scheduler_start  (ft_scheduler_t scheduler);

Control of Threads

int ft_scheduler_stop      (ft_thread_t thread);
int ft_scheduler_suspend   (ft_thread_t thread);
int ft_scheduler_resume    (ft_thread_t thread);

Broadcast of Events

int ft_scheduler_broadcast        (ft_event_t event);
int ft_scheduler_broadcast_value  (ft_event_t event,void *value);

Cooperation

int ft_thread_cooperate    (void);
int ft_thread_cooperate_n  (int num);

Termination

int ft_thread_join    (ft_thread_t thread);
int ft_thread_join_n  (ft_thread_t thread,int timeout);

Generating Events

int ft_thread_generate        (ft_event_t event);
int ft_thread_generate_value  (ft_event_t event,void *value);

Waiting Events

 
int ft_thread_await    (ft_event_t event);
int ft_thread_await_n  (ft_event_t event,int timeout);

Selecting Events

 
int ft_thread_select    (int len,ft_event_t *array,int *mask);
int ft_thread_select_n  (int len,ft_event_t *array,int *mask,int timeout);

Getting Generated Values

 
int ft_thread_get_value  (ft_event_t event,int num,void **result);

Link and Unlink

 
int ft_thread_link    (ft_scheduler_t scheduler);
int ft_thread_unlink  (void);

Current Thread and Scheduler

 
ft_thread_t     ft_thread_self       (void);
ft_scheduler_t  ft_thread_scheduler  (void);

Exit

void ft_exit  (void);

Locks

 
int ft_thread_mutex_lock    (pthread_mutex_t *mutex);
int ft_thread_mutex_unlock  (pthread_mutex_t *mutex);

Pthreads

 
pthread_t ft_pthread  (ft_thread_t thread);

Macros for Automata

AUTOMATON(name) 
DEFINE_AUTOMATON(name) 
BEGIN_AUTOMATON
END_AUTOMATON  
STATE(num)    
STATE_AWAIT(num,event) 
STATE_AWAIT_N(num,event,delay)
STATE_GET_VALUE(num,event,n,result) 
STATE_STAY(num,delay)               
STATE_JOIN(num,thread)              
STATE_JOIN_N(num,thread,delay)      
STATE_SELECT(num,n,array,mask)              
STATE_SELECT_N(num,n,array,mask,delay)
GOTO(num) 
GOTO_NEXT 
IMMEDIATE(num)
RETURN
SELF   
SET_LOCAL(data) 
LOCAL   
ARGS  
RETURN_CODE

12 Annex B: LOFT Reference Manual

Loft is an extension of C for concurrent and parallel programming. Basic units of concurrency are threads which are created as instances of modules and are run under the control of schedulers. Threads can synchronize and communicate using broadcast events. Threads linked to the same scheduler run in a cooperative way, leading to deterministic systems. Threads can dynamically change their linking to schedulers. Moreover, special native threads have the possibility to stay unlinked from any scheduler, becoming then autonomous. Unlinked threads can be executed in real parallelism.

12.1 Modules

Loft code is a mix of standard C code and of modules. Loft code is first preprocessed (typically by the gcc -E command), then translated into C, and finally compiled by the C compiler in order to get executable code. Modules are templates from which threads, called instances, are defined. Module definitions start whith the keyword module and terminate with end module. The syntax of modules is:

module_definition: 
   module kind module_name (params)
   locals
   inst
   final
   end module

A module is native when the keyword native follows module. Native modules should only be used in programs run by a preemptive operating system (Linux for example). Instances of a native module must be implemented as kernel threads. Instances of non-native module do not need specific kernel threads to execute (they can use the one of the scheduler to which they are linked).

kind: native | /*empty*/

The module name is an identifier or the keyword main (this case is considered in section Main Module12.1.2).

module_name: identifier | main

Parameters in params and variables defined in locals are considered in section Variables12.2. The body inst of the module is an instruction (usually, a sequence of instructions) which is executed by the threads instances of the module. Instructions are considered in section Instructions12.6. The finalizer is an optional atomic instruction which is executed in case of forced termination; the executing thread of finalizers is left unspecified.

finalizer: finalizer atomic | /*empty*/

12.1.1 Extern Modules

Modules defined in others files, or later in the same file, can be declared using clauses of the form:

extern_modules: module identifier, ... , identifier ;

12.1.2 Main Module

A program containing a module named main can be directly executed. The parameters of main must follow the argc/argv conventions of C. The translation of the main module, first creates a new instance of the module and an implicit scheduler, then adds the new thread in the implicit scheduler, and finally makes the implicit scheduler react cyclically. When no main module is defined, a function C with the name main must be defined, as usual in C. In this case, it must call the function loft_initialize in order to create the implicit scheduler.

loft_initialize: loft_initialize () ;

12.2 Variables

There are 3 distinct kinds of variables:

Global variables declared outside module definitions. These variable are the standard global variables of C.
Automatic variables declared in blocks of C instructions (of the form {...}). These variables can appear in declarations of C functions, or in atomic instructions, in module definitions. Automatic variables are allocated when the control enters the block in which they are declared, and destroyed when the block is left.
Parameters and local variables declared in modules. Each instance of a module has its own copy of these variables. Their values are preserved during the passing of instants.

Thus, concerning memory management, Loft basically differs from C by the presence of variables of kind 3, which are local to threads and maintain their value between instants.

12.2.1 Types of Local Variables and Parameters

Types of parameters and local variables can be:

Basic types (for example, int).
Named types (defined using typedef).
Pointer or double pointer types on the previous types.

params: defvar, ..., defvar | /*empty*/

defvar: 
    type identifier
|   type * identifier
|   type * * identifier

type: basic_type | identifier

locals: local defvar, ... , defvar | /*empty*/ ;

Remark: the restriction on the types of parameters and local variables is introduced to simplify the implementation and should be removed in future versions of the language.

12.2.2 Access to Local Variables and Parameters

Local variables and parameters must always be explicitly tagged by the keyword local to distinguish them from global or automatic variable with same names: all uses of a local variable or parameter x must be of the form local(x).

12.3 Threads

Threads are concurrent entities created from modules and run by schedulers. Threads are of the type thread_t.

12.3.1 Thread Creation

New threads instances of a module m are returned by the C function m_create which has the same parameters as m. The new threads are linked to the implicit scheduler. With the function m_create_in, new created threads are linked to a scheduler given as first parameter (the other parameters are passed to the thread in the standard way).

thread_creation: 
   identifier_create (exp, ... , exp)
|  identifier_create_in (exp, exp, ... , exp)

Threads linked to a scheduler during one instant start running only at the beginning of the next instant.

12.3.2 Thread State

The state of a thread is composed of two parts: the execution state and the binding state. The possible values for the execution state of a thread are:

Ready to execute.
Terminated. A thread becomes terminated when it has finished to execute its body. A thread can also be forced to terminate when it is stopped.
Suspended. A suspended thread must be resumed before it can continue to execute.

The possible values for the binding state are:

Linked to a scheduler. When linked to a scheduler, a thread must cooperate with the others threads linked to the same scheduler, and its execution is under the control of the scheduler.
Unlinked. This is only possible for instances of native modules. Unlinked threads are autonomous and run at their own pace.

Initially, that is when created, a thread is ready to execute and linked to the scheduler in which it is created.

12.3.3 Thread Execution

Unlinked threads are supposed to be executed by kernel threads. An unlinked thread run autonomously, until it reaches a link instruction which changes its linking state. A linked thread resumes execution of its body when it receives the control from the scheduler to which it is linked. As the scheduling is cooperative, the resumption is always supposed to finish after a finite delay. At the end of the resumption, the thread returns the control back to the scheduler and signals it with one of the following values:

Terminated. The thread terminates when it completes the last instructions it has to run.
Blocked. The thread is blocked either because it is awaiting an unknown event (await), or because it tries to get an event value which is not available (get_value).
Cooperating. The thread is not terminated but has finished to execute for the current instant.

12.3.4 Return Code and Self

Threads have a return code set during the execution of certain non-atomic instructions (namely, await, join, and get_value). The return code can be OK, ETIMEOUT, or ENEXT. The return code of the executing thread, set by the last executed non-atomic instruction, is returned by the return_code function:

thread_return_code: return_code ()

The executing thread is returned by the self function:

self: self ()

12.3.5 Thread Deallocation

The memory allocated to a thread can be recovered using the thread_deallocate function:

thread_deallocation: thread_deallocate (exp) ;

This is the programmers's responsability to perform correct deallocation of threads.

12.4 Schedulers

A scheduler defines instants during which all ready threads linked to it are run in a cooperative way. Schedulers are of type scheduler_t. New schedulers can be created with the function scheduler_create:

scheduler_creation: scheduler_create ()

An implicit scheduler should exist in all program. It is returned by the function implicit_scheduler. During execution, a linked thread can have access to the current scheduler running it, with the function current_scheduler.

scheduler_access: 
   implicit_scheduler ()
|  current_scheduler ()

12.4.1 Instants

A phase of a scheduler consists in giving the control in turn to all the ready threads which are linked to it. The order in which the threads receive the control is the order in which they have been linked to the scheduler. During an instant, the scheduler executes cyclically a serie of phases, up to a situation where no thread remains blocked. Then, the scheduler decides that the current instant is finished, and it can then proceed to the next instant. Thus, at the end of each instant, all ready threads either have terminated or have cooperated. The scheduler decides that the current instant is finished when, at the end of a phase, there is no possibility for any blocked thread to progress: no awaited event has been generated, and no new value has been produced which could unblock a thread trying to get it. One instant of a scheduler is executed by the scheduler_react function.

scheduler_react: scheduler_react (exp) ;

In scheduler_react(exp), exp should evaluate to a scheduler previously created by scheduler_create.

12.4.2 Scheduler Deallocation

The memory allocated to a scheduler can be recovered using the scheduler_deallocate function:

scheduler_deallocation: scheduler_deallocate (exp) ;

This is the programmers's responsability to perform correct deallocation of schedulers.

12.5 Events

Events are used by thread to synchronize and to communicate. Events are of type event_t. An event is always created in a specific scheduler which is in charge of it during all its lifetime. The function event_create returns an event created in the implicit scheduler. The function event_create_in returns an event created in the scheduler in argument.

event_creation: 
   event_create ()
|  event_create_in (exp)

At each instant, events have a presence status which can be present, absent, or unknown. All events managed by a scheduler are automatically reset to unknown at the beginning of each new instant (thus, events are non-remanent data). All events which are unknown become absent when the scheduler decides to terminate the current instant (see Instants12.4.1).

12.5.1 Generate

There are two ways for an event to be present during one instant:

There were an order to generate it received by the scheduler during execution of the previous instant and coming from a thread which is not linked to the scheduler.
It is generated during the current instant by one of the threads linked to the scheduler.

One can associate values to generations of events. All values associated to the generations of the same event during one instant are collected in a list, as they are produced. They are available only during the current instant, using the get_value instruction (Get Value12.9.6).

generate: 
|  generate (exp) ;
|  generate_value (exp, exp) ;

The execution of generate(exp) starts by the evaluation of exp which should return an event e. If the executing thread is unlinked or if the scheduler sched in charge of e (the one in which e has been created) is different from the one of the thread, then the order is sent to sched to generate e at the beginning of its next instant. Otherwise, e is immediately generated in sched. The execution of generate(exp,exp) starts by the evaluation of the two expressions in argument. The first one should return an event e, and the second one a value v of a pointer type (void*). The execution is the same as the one of the previous call, except that v is added as last element to the list of values associated to e at the instant where e is generated.

12.5.2 Event Deallocation

The memory allocated to an event can be recovered using the event_deallocate function:

event_deallocation: event_deallocate (exp) ;

This is the programmers's responsability to perform correct deallocation of events.

12.6 Instructions

Instructions are binding instructions, atomic instructions, non-atomic instruction, or sequences of instructions:

inst: 
   bind
|  atomic
|  non_atomic
|  inst ... inst

A thread starts to execute a component of a sequence as soon as the execution of the previous component is completed.

12.7 Binding Instructions

There are two binding instructions:

bind: 
   unlink ;
|  link (exp) ;

12.7.1 Unlink

The unlink instructions should only appear in native modules. It has no effect if the executing thread is already unlinked. Otherwise, the thread executing the instruction returns back the control to the scheduler, signaling it that it cooperates. In this case, the thread is removed from the scheduler which thus will never consider it again (except, of course, if the thread re-links to it later). As soon as the thread has returned the control back to the scheduler, it starts running in an autonomous way.

12.7.2 Link

A thread executing link(exp) first evaluate exp which should return a scheduler sched. If the thread is already linked, it unlinks from the scheduler to which it is linked, and then waits for sched to give it the control. If the thread is unlinked, it just waits for the control from sched. In all cases, sched will resume execution of the thread, as it does for new created thread, at the beginning of the next instant.

12.8 Atomic Instructions

Atomic instructions have the form of blocks of C code or of C function calls. They can be executed by linked and unlinked threads, with the same semantics, which is actually the one they have in standard C. However, when executed by a linked thread, execution of an atomic instruction is instantaneous, which means that it terminates in the same instant it is started.

atomic: 
   {c-code}
|  identifier (exp,..., exp) ;
|  order

The previous calls of functions to create and manage threads, schedulers, and events are considered as special cases of atomic instructions. Orders are given to schedulers to stop, suspend, or resume threads:

orders: 
   stop (exp) ;
|  suspend (exp) ;
|  resume (exp) ;

All orders received by a scheduler during one instant are systematically processed at the beginning of the next instant (to avoid interference with executing threads) in the order in which they are issued.

12.8.1 Stop

The execution of stop(exp) starts by the evaluation of exp which should return a thread th. The effect of the call is undefined if th is unlinked. Otherwise, the order to terminate th is sent to the scheduler to which it is linked.

12.8.2 Suspend

The execution of suspend(exp) starts by the evaluation of exp which should return a thread th. The effect of the call is undefined if th is unlinked. Otherwise, the order to suspend th is sent to the scheduler to which it is linked.

12.8.3 Resume

The execution of suspend(exp) starts by the evaluation of exp which should return a thread th. The effect of the call is undefined if th is unlinked. Otherwise, the order to resume th is sent to the scheduler to which it is linked. Resuming a thread which is not suspended has no effect.

12.9 Non-atomic Instructions

Non-atomic instructions should only be executed by linked threads, and their semantics is undefined when they are executed by unlinked threads. Execution of non-atomic instructions can take several instants to complete.

non_atomic: 
   cooperate;
|  halt;
|  return;
|  await (exp);
|  await (exp, exp);
|  join (exp);
|  join (exp, exp);
|  get_value (exp, exp, variable);
|  if (exp) then_branch else_branch end
|  while (exp) do inst end
|  repeat (exp) do inst end
|  run identifier (exp, ..., exp);

then_branch: then inst | /*empty*/

else_branch: else inst | /*empty*/

12.9.1 Cooperate

A thread executing cooperate returns the control to the scheduler, signaling it that it cooperates. If the thread regains control in the future, it will resume execution in sequence of the cooperate instruction.

12.9.2 Halt

A thread executing halt returns the control to the scheduler, signaling it that it cooperates. If the thread regains control in the future, it will re-execute the same halt instruction. Thus, halt blocks the thread forever, without never terminating it.

12.9.3 Return

A thread executing return returns the control to the scheduler, signaling it that it terminates. Thus, it will never regain control again from the scheduler.

12.9.4 Await

A thread executing await(exp) first evaluates exp. The expression should evaluate to an event e. If e is present, then the thread proceeds in sequence of the await instruction. Otherwise, if the current instant is finished (which means that e is absent), then the thread returns the control to the scheduler, signaling it that it cooperates. If the thread regains control in some future instant, execution will restart at the same place, waiting for the same event e. If the current instant is not finished, then the thread returns the control to the scheduler, signaling it that it is blocked. When the thread will regain the control, it will, as previously, continue to test the presence of e. A thread executing await(exp,exp) first evaluates the two arguments. The first one should evaluate to an event e, and the second one to an integer value k. Then, the thread behaves as the previous instruction, but the waiting of e is limited to at most k instants. If e is generated during the next k next instants, then the return code of the thread is set to OK. Otherwise, if e is still absent at the end of the kth instant, then the thread returns the control to the scheduler, signaling it that it cooperates ; moreover, the return code of the thread is set to ETIMEOUT. In this case, the thread will proceed in sequence when receiving the control back (just as the cooperate instruction does).

12.9.5 Join

A thread executing join(exp) first evaluates exp. The expression should evaluate to a thread th. If th is terminated, then the thread proceeds in sequence of the join instruction. Otherwise, the thread behaves as if it was awaiting a special event generated by the termination of th. A thread executing join(exp,exp) first evaluates the two arguments. The first one should evaluate to a thread th, and the second one to an integer value k. Then, the thread behaves as the previous instruction, but the waiting of the termination of th is limited to at most k instants. The return code of the thread is set to ETIMEOUT if the limit is reached, and it is set to OK otherwise.

12.9.6 Get Value

A thread executing get_value(exp,exp,var) first evaluates the two expressions in arguments. The first should return an event e, and the second an integer k. Then, the thread tests if there are at least k values generated for e during the current instant. If it is the case, the kth value is assigned to var, the return code of the thread is set to OK, and the thread proceeds in sequence. Otherwise, if the current instant is not finished, then the thread returns the control to the scheduler, signaling it that it is blocked. When the thread will regain the control, it will, as previously, continue to test the existence of the kth value. If the current instant is finished, then the return code of the thread is set to ENEXT, and the thread returns the control to the scheduler, signaling it that it cooperates. When the thread will regain control, it will proceed in sequence (just as a cooperate instruction would do).

12.9.7 If

A thread executing if(exp)then i else j end first evaluates exp. If the evaluation returns true (a non-zero value) then the thread switches to i, else it switches to j. If the chosen branch is empty, the thread proceeds in sequence immediately.

12.9.8 While

A thread executing while(exp)do i end first evaluates exp. If the evaluation returns false (zero) then the thread proceeds in sequence of the loop. Otherwise, the thread behaves as if it was executing i, with one difference: when execution of i terminates, then exp is re-evaluated and the same process restarts: execution is left if evaluation returns false, and otherwise i is re-executed. Thus, the thread cyclically executes i while exp is true, exp being evaluated at the first instant and, after that, only when i terminates.

12.9.9 Repeat

thread executing repeat(exp)do i end first evaluates exp. The evaluation should return an integer value k. If k is negative or equal to 0, then the thread proceeds in sequence of the loop. Otherwise, the thread behaves as if it was executing a sequence of k instructions i.

12.9.10 Run

A thread th executing run m(e1,...,en) first creates a new instance of the module m with e1,...,en as arguments, and then joins the new created thread. Moreover, the new created thread is stopped if th is.

13 Annex C: the LOFT Compiler

The Loft compiler is a one-pass compiler that translates Loft programs into C code and then compile it with the standard C compiler. It runs under Linux and is called loft. There are 5 main options:

-ft for producing FairThreads code. This is the option for unlinked threads. The produced code is described in Implementation in FairThreads7.1.
-direct for producing C code according to the Direct implementation considered in Direct Implementation7.5. This is the option for embedded systems.
-rewrite, -replace -optim produce code directly issued from the semantics rules described in Implementation of REWRITE7.2, Implementation of REPLACE7.3, and Implementation of OPTIM7.4.

Option -ft is the default option. All files containing Loft code should be terminated by .loft or .LOFT. Standard C files can be given to loft and are processed in the standard way. As example, the command loft -direct test.loft produces an executable a.out from the file test.loft. There are others options:

-v to trace the actions performed by the compiler.
-debug like -v but the actions are not performed.
-code to only produce the C code.
-c to only produce the .o code (like the standard C compiler).
-o to rename the executable (like the standard C compiler).
-version to get the current version of the compiler.

For example, loft test.loft -debug produces something which looks like:

/bin/loft/loft2ft test.loft > test.loft.c
gcc -I/include/loft -I/include/ft_v1.0 test.loft.c \
    -L/lib/loft -lloft_ft \
    -L/lib/ft_v1.0 -lfthread -lpthread

and loft -direct test.loft -debug produces:

/bin/loft/loft2c test.loft > test.loft.c
gcc -I/include/loft test.loft.c -L/lib/loft -lloft_c

The loft command actually uses the following processors and libraries:

loft2c, loft2ft, loft2sem which are the commands for producing C code corresponding to -direct, to -ft, and to the 3 implementations of the semantics.
libloft_c.a, libloft_ft.a, libloft_rewrite.a, libloft_replace.a, libloft_optim.a, and libloft_gba.a which are the various libraries for execution.

Some shell variables can be redefined for the command loft:

C_COMPILER to set a different C compiler.
LOFT_BINDIR, LOFT_INCLDIR, and LOFT_LIBDIR to change the directories where the executables, include files, and libraries used by loft are placed.
FT_INCLDIR and FT_LIBDIR to change the directories where the include file and the library needed by the -loft option are placed.

Glossary

Atomic Instruction

An atomic instruction runs in one shot and completes in the same very instant it starts to execute.

Broadcasting

Events are broadcast: all threads linked to the scheduler that manages an event see the presence/absence of it exactly in the same way.

Context Switch

The passage of the control from one thread to another one. The local data and the stack associated to the thread which is left have to be stored.

Cooperative I/O

An input/output operation which does not block the others threads until it terminates.

Cooperative Scheduling

The strategy of a scheduler in which all thread are supposed to friendly leave the control after a certain time.

Deadlock

A situation where two or more threads are blocked, each one owning a lock needed by another one to proceed.

Event

Events are non-persistent data used by threads to synchronise and to communicate. Events are managed by schedulers and at each instant they are either present or absent.

Finalizer

The finalizer of a thread is an atomic instruction which is executed (by an other thread) when the thread is stopped while it is not terminated.

Instant

In one instant, a scheduler let all the threads linked to it execute until they cooperate.

Instantaneous Loop

A loop which cyclically executes its body during one single instant, and which thus does not cooperate with the others threads.

Kernel Thread

Kernel threads are the basic units of concurrency in standard preemptive OS. They are not always available, as for example in embedded systems with limited ressources.

Linked Thread

A cooperative thread which is linked to a scheduler, and which, thus, executes at the pace defined by the scheduler.

Lock

A way to restrict accesses to a data: a thread that wants to access a locked data blocks until the data is unlocked.

Modules

Modules are templates from which instances, called threads, are constructed.

Multiprocessing

The use of several processors.

Native Module

Native modules are modules whose instances must be implemented by kernel threads.

Native Thread

Native threads are instances of native modules. They are implemented as kernel threads.

Non-atomic Instruction

Execution of a non-atomic instruction can take several instants to complete.

Preemptive OS

An operating system which has the ability to withdraw a processor to a thread or a process which is using it. Actually, all modern OS are preemptive.

Preemptive Scheduling

The strategy of a scheduler having the ability to force a running thread to leave the control, in order to give it to another thread.

Scheduler

Schedulers are controling threads linked to them in a cooperative and fair way.

Synchronized Area

A set of threads, all linked to the same scheduler and thus running synchronously at the same pace and sharing the same broadcast events.

SMP

Stands for Symetric Multi-Processor. An architecture of a multi-processor machine in which the assignment of the processors to the threads is highly dynamic and depends on the operating system, and only of it.

Thread

Threads are the basic units of concurrency in the language. They are instances of modules.

Unlinked Thread

A thread instance of a native module which is not linked to any scheduler. Thus, an unlinked thread runs asynchronously. An unlinked thread is implemented as a kernel thread.

Index

A C E F G H I J L M N R S T U W

A
atomic instruction
automatic variable
automaton
`await`
	...
await instruction

C
c statement
concurrent programming
cooperate instruction
current scheduler
	...

E
event
	...
event creation
event generation

F
fair threads
finalizer

G
generate instruction
global variable

H

halt instruction

I
if instruction
implicit scheduler
	...
instantaneous loop

J
java threads
join instruction

L
link
link instruction
	...
local variable
	...

M
main module
module

N
native module
non atomic instruction

R
reactive approach
reactive-C
repeat instruction

resume a thread
	...
return instruction
run instruction

S
scheduler
schedulers
stop a thread
	...
suspend a thread
	...
synchronous languages
	...

T
thread
	...
thread binding state
thread execution state
thread libraries
threads

U
unlink
unlink instruction

W
while instruction

Bibliography

[1]	-- Bigloo Web Site.
[2]	-- CAML Web Site.
[3]	-- Java Web Site.
[4]	-- LinuxThreads Web Site.
[5]	-- Next Generation POSIX Threading Web Site.
[6]	-- Reactive Programming Web Site.
[7]	Acosta-Bermejo, Raul -- Rejo - Langage d'objets réactifs et d'agents -- Thèse de doctorat de l'ENSMP, October, 2003.
[8]	Arnold, Ken and Goslin, James -- The Java Programming Language -- Addison-Wesley, 1996.
[9]	Berry, G. and Gonthier G. -- The Esterel Synchronous Programming Language: Design, Semantics, Implementation -- Science of Computer Programming, 19(2), 1992, pp. 87-152.
[10]	Boudol G. and Berry G. -- Chemical Abstract Machines -- , .
[11]	Boussinot, F. -- Reactive C: An Extension of C to Program Reactive Systems -- Software Practice and Experience, 21(4), april, 1991, pp. 401--428.
[12]	Boussinot, F. -- Java Fair Threads -- Inria research report, RR-4139, 2001.
[13]	Boussinot, F. and Susini, J-F. -- The SugarCubes tool box - a reactive Java framework -- Software Practice and Experience, 28(14), december, 1998, pp. 1531--1550.
[14]	Brinch Hansen, P. -- The Origin of Concurrent Programming -- Springer, 2002.
[15]	Christopher, Thomas W. and Thiruvathukal, George K. -- High Performance Java Platform Computing: Multithreaded and Networked Programming -- Sun Microsystems Press Java Series, Prentice Hall, 2001.
[16]	Eager, Derek L. and Zahorjan, John -- Chores: Enhanced run-time support for shared memory parallel computing -- ACM Transaction on Computer Systems, 11(1), , 1993.
[17]	Engelschall, Ralf S. -- Portable Multithreading -- Proc. USENIX Annual Technical Conference, San Diego, California, 2000.
[18]	Gardner, M. -- Lucky Numbers and 2187 -- Math. Intell., 19(26), 1997.
[19]	Halbwachs, Nicolas -- Synchronous Programming of Reactive Systems -- Kluwer Academic Publishers, New York, 1993.
[20]	Harel, D. and Pnueli A. -- -- , .
[21]	Hazard, L. and Susini, J-F. and Boussinot, F. -- The Junior reactive kernel -- Inria Research Report, RR-3732, july, 1999.
[22]	Hazard, L. and Susini, J-F. and Boussinot, F. -- Programming with Junior -- Inria Research Report, RR-4027, 2000.
[23]	Hollub, A. -- Taming Java Threads -- Apress, 2000.
[24]	Kahn, G. and MacQueen, D.B. -- Coroutines and Networks of Parallel Processes -- Proceeding of the IFIP'74 Congress, 1974.
[25]	Keppel, D. -- Tools and Techniques for Building Fast Portable Threads Packages -- Technical Report UWCSE 93-05-06, University of Washington, 1993.
[26]	Larus, James R. and Parkes Michael. -- Using Cohort Scheduling to Enhance Server Performance -- Proc. of USENIX Conference, Monterey Cal., 2002, pp. 103-114.
[27]	Lowenthal, David K. and Freech, Vincent W. and Andrews, Gregory R. -- Efficient Support for Fine-Grain Parallelism on Shared-Memory Machines -- TR 96-1, University of Arizona, , 1996.
[28]	Maraninchi, F. and Remond, Y. -- Running-Modes of Real-Time Systems: A Case-Study with Mode-Automata -- Proc. 12th Euromicro Conference on Real-Time Systems, Stockholm, Sweden, 2000.
[29]	Nichols, B. and Buttlar, D. and Proulx Farrell J. -- Pthreads Programming -- O'Reilly, 1996.
[30]	Plotkin, G. -- A Structural Approach to Operational Semantics -- Aarhus University Report DAIMI FN-19, 1981.
[31]	Reppy, John H. -- Concurrent Programming in ML -- Cambridge University Press, 1999.