1 Introduction
1.1 Concurrent Programming
|
Concurrent programming is the use of several programs, running
together in the same application. It is opposed to traditional
sequential programming, in which there is one unique program, which
thus coincides with the whole application. Concurrent programming is
usually considered to increase modularity, as one can often naturally
decompose a complex application in several cooperating programs which
can be run concurrently. This is for example the case of servers in
which each user request can quite logically be mapped to a specific
program processing it. With concurrent programming there is no more
need to code a complex application made of many distinct parts in one
single program, as it is the case in sequential programming.
There are two main variants of concurrent programming: the one in
which the programs, called
threads, are sharing a common
memory; and the one in which programs are basically
processes which have their own memory. The presence of a shared
memory eases communication between threads, as threads can directly
use shared data to communicate, while processes have to use
specialized means, for example sockets, for the same purpose. But, the
counterpart is that shared data may have to be protected from
concurrent accesses from distinct threads. Of course, this is not
necessary with processes in which, by definition, data are fully
protected from being accessed by other processes.
Parallelism is the possibility to simultaneously use several computing
resources. It is available with multiprocessor machines, or more
generally, when several machines are distributed over a network. It
is well-known that parallelism is not the best solution in all cases,
and there are systems which are more efficiently run by a single
processor machine than by a multiprocessor one. Parallelism is now
directly available, with no special effort, in standard OS such as
Linux which can operate hardware mapped on
Symmetric
Multi-Processor (SMP) architectures. Parallelism appears quite
natural in concurrent programming, as generally there is no logical
need that concurrent programs should be run by one unique processor.
Concurrent programming is largely recognized as being, in the general
case, much more difficult than sequential programming. Some problems,
such as deadlocks, are specific to it. Moreover, concurrent programs
are most of the time nondeterministic, and thus more difficult to
debug. For these reasons, concurrent programming is often considered
as an activity which must be left to specialists. This seems of course
largely justified in domains such as the programming of OS
kernels. But the question remains of designing a set of concurrent
primitives that can be not too difficult or painful to use, and to
provide general users with it.
Concurrency at Language Level
|
Most of the time, programming languages do no offer any primitive for
concurrent programming. This is the case of C and C++, in which
concurrent programming is only available using system calls (for
example
fork
or
exec
) or through libraries of
threads (for example, the pthread library). Parallelism is also most
of the time uncovered by programming languages primitives. The way to
benefit from parallelism is either to use threads in a SMP context, or
to design distributed systems where programs (often considered as
distributed objects) communicate and synchronize through the network.
Amongst the languages that have primitives to deal with concurrency,
some use message passing such as Erlang, others propose "rendez-vous"
such as Ada and Occam, and several, as CAML, directly include threads.
The most achieved proposal of such an inclusion seems to be
Java[
8] in which threads are first-class objects and
data can be protected using locks and monitors, as defined by
E.W. Dijkstra in the sixties (see [
14] for
reference papers on concurrent programming). In
Java, threads are
supposed to be used in almost every context (the simplest applet
involves several threads, one of them being for example dedicated to
graphics). However,
Java does not give particular means to ease
thread programming, which remains difficult and error-prone (see [
23] for a discussion on this point). Concerning
parallelism,
Java proposes the RMI mechanism to define distributed
objects, methods of which can be invoked through the network. Remote
method invocations basically use specialized threads dedicated to
communications.
Synchronous languages[
19] propose
concurrent programming primitives in the special context of reactive
systems[
20]. In this context, which is roughly
the one of embedded systems, concurrent programs are deterministic and
can be formally verified. However, synchronous languages suffer
several limitations which limit their use. Amongst these is the
limitation to static systems, in which the number of concurrent
programs is known at compile time, and to mono-processor
architectures.
A proposal to extend the synchronous approach to dynamic systems
(systems which are not static) is called the
reactive
approach. It is described in full details in [
6]. Several language belonging to this approach have been designed;
amongst them are
Reactive-C[
11] which implements
the approach in C, and SugarCubes[
13] which
implements it in
Java. Reactive programs can be very efficiently
implemented as they are basically cooperative. However, up to now,
they were not able to get direct benefit from multiprocessor machines.
In this text, one proposes a language for concurrent programming
inspired by the reactive approach and based on special threads, called
fair threads. One major point is that the language contains
primitives that allows multiprocessing. The rest of the section
considers threads and presents the basics of the language proposal.
In sequential programming, a program is basically a list of
instructions run by the processor. The program has access to the
memory in which it can read and write data. Amongst the instructions
are tests and jumps which are the way to control the execution
flow. In this standard computing model, a
program-counter points to
the current instruction executed by the processor.
The model of threads extends the previous sequential model by allowing
the presence of several programs, the
threads, sharing the
same memory. There are thus several program counters, one by thread,
and a new component, the
scheduler, which is in charge of
dispatching the processors to the threads. The transfer of a processor
from one thread to another is called a
context-switch, because
it implies that the program-counter of the thread which is left must
be saved, in order to be restored later, when the thread will regain
the processor. When several threads can be executed, the scheduler
can choose arbitrarily between them, depending on implementation
details, and it is thus generally unpredictable to determine which one
will finally get the processor. This is a source of nondeterminism in
multi-threaded programming.
The strategy of the scheduler is said to be
preemptive when it
is able to force a running thread to release a processor executing it,
in which case one says that the thread is preempted. With a
preemptive scheduler, situations where a non-cooperative thread
prevents the others from any execution, are not possible. A preemptive
scheduler can withdraw a processor from a running thread because of
priority reasons (there exist threads with higher priorities in the
system), or for other reasons. This is also a source of
nondeterminism. In a preemptive context, to communicate or to
synchronize generally implies the need to protect some shared data
involved in the communication or in the synchronization. Locks are
often used for this purpose, but they have a cost and are error-prone
as they introduce possibilities of deadlocks, that are situations
where two threads cannot progress because each one owns a lock needed by
the other.
Preemptive threads are generally considered to be well adapted for
systems made of heavy-computing tasks and needing few communications
and synchronizations. This is typically the case of Web servers, in
which a thread (usually picked out from a pool of available threads)
is associated to each new user request. Advantages are clear: first,
modularity is increased, because requests do not have to be split in
several small parts, as it would be the case with sequential
programming. Second, servers can be run by multiprocessor machines
without any change, and for example they can immediately benefit from
SMP architectures. Finally, blocking I/Os, for example reading from a
file, do not need special attention. Indeed, as the scheduler is
preemptive, there is no risk that a thread blocked forever on an I/O
operation will also block the rest of the system.
The strategy of the scheduler is said to be
cooperative if the
scheduler has no way to force a thread to release the processor it
owns. With this strategy, a running thread must cooperate and release
by itself the processor from time to time, in order to let the other
threads the possibility to get it. Cooperation is of course absolutely
mandatory if there is only one processor: a running thread which would
never release the processor would indeed forbid any further execution
of the other threads. Cooperative threads, sometimes called
green-threads, are threads that are supposed to run under the
control of a cooperative scheduler. Cooperative threads are adapted
for tasks that need a lot of communications between them. Indeed, data
protection is no more needed, and one can thus avoid to use
locks. Cooperative threads can be efficiently implemented at user
level, but they cannot benefit from multiprocessor machines. Moreover,
they need special means to deal with blocking I/O. Thus, purely
cooperative threads are not sufficient to implement important
applications such as servers.
It is often claimed that reuse of sequential code is made easier by
using threads. What is meant is that a sequential program can be
easily embedded in one thread, and then added into a system to be run
concurrently with the others programs present in it.
However, one generally cannot be sure that a sequential program never
enters a loop in which actions leading to releasing the processor,
such as I/Os, will never be performed. In a cooperative scheduler,
such a non-cooperative program would block the whole system (supposing
that the processor is unique). In a way, this is a major justification
for preemptive scheduling: a non-cooperating program is no more a
problem because it can now be preempted by the scheduler.
This is however only one aspect of the problem, because a sequential
program cannot generally be used directly as it, even in a preemptive
context. The first reason is that some data may have to be protected
from being accessed by other threads, which needs locks which were of
course absent in the sequential code. A second reason is that in some
cases, for example libraries, the sequential code must be made
reentrant, because several threads can now make concurrent calls to
it. Finally, the granularity of the execution, which is under control
of the scheduler and only of it, can be too large and can need the
introduction in the initial code of supplementary cooperation points.
In conclusion, it seems that in all cases reuse of sequential code is
largely an illusion in the context of multi-threading.
Portability is very difficult to achieve for multi-threaded systems.
In the larger sense, portability means that a system should run in the
same way, independently of the scheduling strategy. This is of course
a quite impossible challenge in the general case: data should be
protected, for the case of a preemptive scheduling, but cooperation
points should also be introduced, for the case of a cooperative
scheduling, and the conjunction of these two requirements would lead
to very inefficient code. Portability is, thus, often considered as a
goal that can only be partially achieved with multi-threaded
systems. Note that this is a major problem in Java which claims to
be portable but does not specify the scheduling strategy that should
be used for multi-threaded applications.
Determinism and Debugging
|
The choices made by the scheduler are a major source of nondeterminism
which make thread programming complex. In particular, debugging is
more difficult than in sequential programming because one may have to
take in account the choices of the scheduler when tracking a
bug. Moreover, the replay of a faulty situation may be difficult to
achieve, as it may actually depend on the scheduler choices. For
example, it may happens that the introduction of printing instructions
to trace an execution change the scheduling and makes the bug
disappear.
Nondeterminism seems inherent to preemption as it is very difficult to
conceive a multi-threading framework both deterministic and
preemptive. For the reader who is not convinced, imagine two threads
running in a preemptive context, one which cyclically increments a
counter, and one which prints the value of the same counter. What
should be the (unique, because of the determinism) printed value and
how can one simply find it?
However, it should be noticed that it is possible to design completely
deterministic frameworks based on cooperative threads, with clear and
simple semantics. This is actually the case of fair threads which are
introduced later.
The way resources are consumed is a major issue for threads. Several
aspects are relevant to this matter.
First, as any standard sequential program, each thread needs a private
memory stack to hold its local variables and the parameters and
intermediate results of executed procedures. The use of a stack is
actually mandatory in the context of a preemptive scheduler, because
the moments context-switches occur are independent of the thread,
which has thus no possibility to save its local data before the
release of the processor. Note that the situation is different in a
cooperative context, where context-switches are predictable. The
necessity of a private stack for each thread certainly contributes to
waste memory.
Second, user threads are often mapped to kernel threads which are
under supervision of the OS. Of course, this is only meaningful with a
preemptive OS, in order to avoid the risk of freezing the whole
machine when a non-cooperative thread is encountered. The need to
perform context-switches at kernel level is generally costly in CPU
cycles. An other problem that can occur with kernel threads is the
limitation which generally exists on the number of such threads
(typically, the number of allowed threads can vary on a Linux system
from 256 to about 4 thousands). Several techniques can be used to
bypass these problems, specially when large numbers of short-lived
components are needed. Among these techniques are thread-pooling, to
limit the number of created threads, and the decomposition of tasks
in small pieces of code, sometimes called "chores" or "chunks",
which can be executed in a simpler way than threads are.
Preemptive threads (threads designed to be run by a preemptive
scheduler) are also subject to an other problem: some unnecessary
context-switches can occur which are time consuming. Actually,
threads have no way at all to prevent a context-switch. Unnecessary
context-switches causes a loss of efficiency, but this is of course the
price to pay for preemptive scheduling.
Actually, programming with threads is difficult because threads
generally have very "loose" semantics, which strongly depend on
the scheduling strategy in use. This is particularly true with
preemptive threads. Moreover, the semantics of threads may also
depends on many others aspects, for example, the way priorities of
threads are mapped at the kernel level. As a consequence, portability
of multi-threaded systems is very hard to achieve. Moreover,
multi-threaded systems are most of the time nondeterministic which
makes debugging difficult.
Cooperative threads have several advantages over preemptive ones: it
is possible to give them a precise and simple semantics, which
integrates the semantics of the scheduler. They can be implemented
very efficiently. They lead to a simple programming approach, which
limits the needs of protecting data. However, purely cooperative
systems are not usable as it, and preemption facilities have to be
provided in a way or an other. The question is thus to define a
framework providing both cooperative and preemptive threads, and
allowing programmers to use these two kinds of threads, according to
their needs.
Loft basically gives users the choice of the context, cooperative
or preemptive, in which threads are executed. More precisely,
Loft
defines
schedulers which are cooperative contexts to which
threads can dynamically link or unlink. All threads linked to the
same scheduler are executed in a cooperative way, and at the same
pace. Threads which are not linked to any scheduler are executed by
the OS in a preemptive way, at their own pace. An important point is
that
Loft offers programming constructs for linking and unlinking
threads. The main characteristics of
Loft are:
- It is a language, based on C and compatible with it.
- Programs can benefit from multiprocessor
machines. Indeed, schedulers and unlinked threads can be run in real
parallelism, on distinct processors.
- Users can stay in a purely cooperative context by
linking all the threads to the same scheduler, in which case systems
are completely deterministic and have a simple and clear semantics.
- Blocking I/Os can be implemented in a very simple way, using
unlinked native threads.
- There exist instants shared by all the threads linked
to the same scheduler. Thus, all threads linked to the same scheduler
execute at the same pace, and there is an automatic synchronization at
the end of each instant.
- Events can be defined by users. They are
instantaneously broadcast to all the threads linked to a
scheduler; events are a modular and powerful means for threads to
synchronize and communicate.
- It can be efficiently implemented on various platforms.
Implementation of the whole language needs the full power of native threads.
However, the cooperative part of the language can be implemented without any
native thread support. This leads to implementations suitable for
platforms with few resources (PDAs, for example).
Loft is strongly linked to an API of threads called
FairThreads[
12]: actually,
Loft stands for
Language Over Fair
Threads. This API mixes threads with the reactive approach, by
introducing instants and broadcast events in the context of
threads.
Loft can be implemented in a very direct way by a
translation into
FairThreads (this implementation is described in section
Implementation in FairThreads7.1).
1.4 Structure of the Document
|
Section
The Language LOFT2 contains the description
of the language. First, an overview of
Loft is given. Then, modules
and threads are considered. Atomic and non-atomic instructions are
described. Finally, native modules are considered. A first serie of
examples is given in section
Examples - 13. Amongst
them is the coding of a small reflex-game which shows the reactive
programming style. The case of data-flow programming is also
considered. Various sieves are coded, showing how the presence of
instants can benefit to code reuse. Section
Programming Style4 contains a discussion on the programming style
implicitly supposed by
Loft. Section
Semantics5
contains the formal semantics of the cooperative part of
Loft by
means of a set of rewriting rules.
FairThreads is described in section
FairThreads in C6. Several implementations of
Loft are described in section
Implementations7. One consists in a direct translation in
FairThreads. Another is a direct implementation of the semantics rules of
section
Semantics5. One implementation is suited for
embedded systems. Section
Examples - 28 contains a
second serie of examples. Amongst them are some example which can
benefit from multiprocessor architectures. A prey/predator demo
running on an embedded system (the Game Boy Advance of Nintendo) is
also described. Related work is described in section
Related Work9 before the conclusion. A first annex gives the API of
FairThreads. A second one is the Reference Manual of
Loft. The third one
is a quick description of the compiling command.
2 The Language LOFT
One first gives an overview of the language. Then, the basic
notions of modules, threads, schedulers, events, and atomic and non-atomic
instructions are presented. Finally, native modules are considered.
2.1 Overview of the Language
|
Modules are the basic units of the language. The syntax is based on C
and modules are defined in files that can also contain C code.
Threads are created from modules. A thread created from a module is
called an instance of it. A module can have parameters which
define corresponding parameters of its instances. Arguments provided
when an instance is created are associated to these parameters. A
module can also have local variables, new fresh instances of which are
automatically created for each thread created from it.
The body of a module is basically a sequence of instructions executed
by its instances. There are two types of instructions: atomic
instructions and non-atomic ones. Atomic instructions are run
in one single step by the same processor. Usual terminating C code
belongs to this kind of instructions. Another example of atomic
instruction is the generation of an event, described later, which is
broadcast to all the threads.
Execution of non-atomic instructions may need several steps up to
completion. This is typically the case of the instruction which waits
for an event to be generated. Execution steps of non-atomic
instructions are interleaved, and can thus interfere during
execution. Note that all the execution steps of a non-atomic
instruction may not be necessarily executed by the same processor.
The basic task of a scheduler is to control execution of the threads
that are linked to it. The scheduling is basically
cooperative: linked threads have to return the control to the
scheduler to which they are linked, in order to let other threads
execute. Leaving the control can be either explicit, with the
instruction cooperate
, or implicit, by waiting for an event
which is not present.
All linked threads are cyclically considered in turn by the scheduler
until all of them have reached a cooperation point (cooperate
, or waiting instructions). Then, and only then, a new
cycle can start. Cycles are called instants. Note that the
same thread can receive the control from the scheduler several times
during the same instant; this is for example the case when the
thread waits for a first event which is generated by another
thread, later in the same instant. In this case, the thread receives
the control a first time and then blocks, waiting for the
event. The control goes to the others threads, and returns back to
the first thread after the generation of the event.
A schedulers define some kind of automatic synchronization which
forces the threads linked to it to run at the same pace: all the
threads must have finished their execution for the current instant,
before the next instant can start.
The order in which threads are cyclically considered is always the
same: it is the order in which they have been linked to the
scheduler. This leads to deterministic executions, which are
reproducible. Determinism is of course an important point for
debugging.
At creation, each thread is linked to one scheduler. There exists an
implicit scheduler to which threads are linked by default when no
specific scheduler is specified. Several schedulers can be defined
and actually running in the same program. Schedulers thus define synchronized areas in which threads execute in cooperation.
Schedulers can run autonomously, in a preemptive way under the
supervision of the OS.
During their execution, threads created from special modules, called
native modules, can unlink from the scheduler to which they
are currently linked, and become free from any scheduler
synchronization. Such free threads are run by kernel threads of the
system. Of course, native modules and native threads have a meaning
only in the context of a preemptive OS.
Three kinds of variables exist in
Loft:
- Global variables that are C global variables shared by all the
threads. These variables are placed in the shared memory in which the
program executes.
- Local variables that are instances of local variables of
modules. These variables are local to threads. Their specific
characteristic is that their value is preserved across instants: they
have thus to be stored in the heap.
- Automatic variables which are defined in atomic instructions,
and which are automatically destroyed when the atomic instruction in
which they appear is left. The value of an automatic variable is
not preserved from one instant to the next. As in C, automatic
variables can be stored in registers or in the stack.
Note that non-native threads need a stack only for execution of their
atomic instructions. Indeed, atomic instructions cannot be preempted
during execution, by definition. Thus, in this case, the stack is
always in the same state at the beginning and at the end of the
execution of each atomic instruction. Thus, the content of the stack
needs not to be saved when a context switch occurs. The situation is
of course different for native threads: because they can be preempted
arbitrarily by the OS, they need a specific stack, provided by the
native thread in charge of running them.
The simpler way for threads to communicate is of course to use shared
variables. For example, a thread can set a boolean variable to
indicate that a condition is set, and others threads can test the
variable to know the status of the condition. This basic pattern works
if all threads accessing the variable are linked to the same
scheduler. Indeed, in this case atomicity of the accesses to the
variable is guaranteed by the cooperativeness of the scheduler. A
general way to protect a data from concurrent accesses is thus to
associate it with one unique scheduler to which threads willing to
access the data should first link to.
However, this solution does not work if some of the threads are
non-native and belong to different schedulers or are unlinked. This is
actually the standard situation in concurrent programming, where
protection is basically obtained with locks. Standard POSIX mutexes
can be used for this purpose.
Events, described in the next section, give threads an other different
means of communication.
Events are synchronizing data basically used by threads to avoid
busy-waiting on conditions. An event is created in a scheduler which
is in charge of it during all its lifetime. An event is either present
or absent during each instant of the scheduler which manages it. It is
present if it is generated by the scheduler at the beginning of the
instant, or if it is generated by one of the thread executed by the
scheduler during the instant; it is absent otherwise. The presence
or the absence of an event can change from an instant to another, but
it cannot change during the course of an instant: all the threads
always "see" the presence or the absence of an event in the same way,
independently on the order in which they are scheduled. This is what
is meant by saying that events are broadcast.
Values can be associated to event generations; they are collected
during each instant and are available only during it.
The basic implementation of Loft is build on top of a C library of
native threads. Each scheduler and each instance of a native module is
implemented as a native thread. Threads which are instances of
non-native modules do no need the full power of a native thread to
execute. Actually, they don't need any specific stack as they can use
the one of the native thread of the scheduler to which they are
linked. In this way, the number of native threads can be limited to a
minimum. This is important when a large number of concurrent
activities is needed, as the number of native threads that can be
created in systems is usually rather low.
Unlinked threads and schedulers can be run in full parallelism, by
distinct processors. Programs can thus take benefit of SMP
architectures, and be speed-up by multiprocessors.
Note that one gets program structures that conform to the so-called
M:N architectures: M linked threads and N schedulers running them. An
important point is that these architectures become programmable directly at language level, and not through the use of
a library (as for example in the Solaris system of Sun).
Modules are templates from which threads are created. A thread created
from a module is called an instance of it. A thread is always created
in a scheduler which is initially in charge of executing it. Modules
are made from atomic and non-atomic instructions that are defined in
the next sections.
A module can have parameters which define corresponding parameters of
its instances. Arguments provided when an instance is created are
associated to these parameters. A module can also have local
variables, new fresh instances of which are automatically created for
each thread created from it.
The syntax of
Loft is based on C and modules are defined in files
that can also contain C code. The syntax of modules is:
module <kind> <name> ( <params> )
<locals>
<body>
<finalizer>
end module
- <kind> is the optional
native
keyword. The case of
native modules is considered in section Native Modules2.5.
- <name> is the module name; it can be any C identifier.
- <params> is the list of parameters; it is empty if
there is no parameter at all.
- <locals> is the optional list of local variables
declarations. This list begins with the keyword
local
.
- <body> is a sequence of instructions defining the module
body.
- <finalizer> is an optional atomic instruction executed
in case of forced termination (instruction
stop
). This part
begins with the keyword finalize
.
As example, consider the module trace
defined by:
module trace (char *s)
while (1) do
printf ("%s\n",local(s));
cooperate;
end
end module
This module is a non-native one and it has a parameter which is a
character string. The body is reduced to a single while
loop (actually, an infinite one) considered in section Loops2.4.4. At each instant, the loop prints the message in
argument and cooperates. Note the presence of the do
and
end
keywords around the loop body; indeed, as in Loft
"{
" and "}
" are used to delimit atomic
instructions (see C Statements2.3.4), one needs a
different notation for loop bodies.
The parameter s
is actually a local variable which is passed
at creation to the instances of trace
. In Loft, all
accesses to local variables, and thus also to parameters, must be of
the form local(...)
(local
is actually a macro).
Threads can have local variables which keep their values across
instants. These are to be distinguished from static variable which are
global to all the threads, and from automatic variables which loose
their values across instants.
Let us return to the previous module
trace
and suppose that
one wants to also print the instant number. A first idea would be to
define a variable inside the atomic instruction:
module trace (char *s)
while (1) do
{
int i = 0;
printf ("%s (%d)\n",local(s),i++);
}
cooperate;
end
end module
This is not correct because the variable i
is allocated in
the stack at the beginning of the atomic instruction, and vanishes at
the end of it. Thus, a new instance of the variable is used at each
instant.
A solution would be to declare a global C variable to store the
instant number:
int i = 0;
module trace (char *s)
while (1) do
printf ("%s (%d)\n",local(s),i++);
cooperate;
end
end module
This solution produces a correct output. However, i
is a
global variable shared by all instances of trace
. A local
variable should be used if one wishes that each instance owns a
distinct variable:
module trace (char *s)
local int i;
{local(i) = 0;}
while (1) do
printf ("%s (%d)\n",local(s),local(i)++);
cooperate;
end
end module
Note that local variables must always be accessed using the form
local(...)
. This allows one to access in the same context a
local variable and a global or automatic one having the same name.
Threads are instances of modules. Threads which are instances of a
module
m
are created using the function
m_create
which returns a new thread. Threads are of the pointer type
thread_t
. Note that, as in many C APIs, all types defined in
Loft have their name terminated by "
_t
". Arguments given
to
m_create
are passed to the created instance (as in C,
arguments are passed by value).
For example, the following atomic instruction creates two instances of
the previous module
trace
:
{
trace_create ("first thread");
trace_create ("second thread");
}
The output is:
first thread (0)
second thread (0)
first thread (1)
second thread (1)
first thread (2)
second thread (2)
...
Several points are important:
- Threads are created in the implicit scheduler which
is automatically created and started in all program (see Schedulers7.2.4).
- Threads are automatically started. This is a difference with
Java in which threads have to be explicitly started.
- Threads are always executed in the same order, which is the
order of their creation. Thus, the previous output is the only one
possible.
It is possible to create a thread in a specific scheduler using a
creation function of the form create_in
. For example, an
instance of trace
is created in the scheduler sched
by:
trace_create_in (sched,"a thread");
Threads are always incorporated in the scheduler at the beginning of
an instant, in order to avoid interferences with already running
threads. Moreover, if the current scheduler and the one in which the
thread is created are the same, then this instant is the next one.
The running thread is returned by the function self()
.
The finalizer of a module is an atom executed when an instance of the
module is forced to terminate by a
stop
instruction (defined
in
Control over Threads). The finalizer is only
executed if the stopped thread is not already terminated. A typical use
of finalizer is for deallocation purposes, as in:
module m (...)
local resource_type resource;
{local(resource) = allocate ();}
...
{deallocate (local(resource));}
finalization
{deallocate (local(resource));}
end module
The allocated resource is deallocated in case of normal termination,
at the end of the module body. It is also deallocated if termination
is forced, as in this case the finalizer is executed.
An important point to note is that the thread executing a finalizer
is unspecified, as the instance considered is stopped. This means that
self
cannot be safely used in finalizers.
The entry point of a program is a module with name
main
. An
instance of it is automatically created in the implicit scheduler.
For example, the previous output could be produced by the execution of a
program with the following module
main
:
module main ()
trace_create ("first thread");
trace_create ("second thread");
end module
As in C, arguments can be given to the module main
,
following the standard argc/argv
convention.
The global program does not terminate when the main thread terminates:
it is the programmer's responsibility to end the program (typically,
by calling the exit
function of C).
In this section, one considers the atomic instructions which terminate
at the same instant they start. We have already seen two examples in
previous sections: call to the C function
printf
, and
creation of instances of a module
m
with the
m_create
and
m_create_in
functions.
Creation
Schedulers can be created arbitrarily with the
scheduler_create
atomic instruction. They are of the type
scheduler_t
.
scheduler_t sched = scheduler_create ();
One instant of a scheduler is got by executing the acheduler_react
atomic instruction.
scheduler_react (sched);
The scheduler_react
function allows user to build
hierarchies of schedulers and to make them react following arbitrary
strategies.
Implicit and Current Schedulers
A scheduler, called the
implicit scheduler, always exists
in every program. Two atomic instructions are available to get the
implicit scheduler and also the
current scheduler, which is
the one to which the executing thread is linked (the
NULL
value is returned if the thread is not linked to any scheduler).
scheduler_t implicit = implicit_scheduler ();
scheduler_t current = current_scheduler ();
Events are basically variables with 3 possible values: present,
absent, or unknown. An event is created in a scheduler and managed by
it during all its lifetime. The value of an event is automatically
set to unknown at the beginning of each instant. There is a way for
the programmer to set an event to present (namely, to
generate
it), but there is no way to set it to absent: only the system is able
to decide that an event is absent. Actually, the system automatically
decides that unknown events are absent when all threads have reached a
cooperation point and when there is no thread waiting for an event
which has been previously generated. The language definition assures
that, once unknown events are declared absent, all is finished for the
current instant (in particular, no event can be generated anymore
during the same instant). As consequence, one can consider that:
- Events are non-persistent data which lose their
presence status at the beginning of each instant.
- Events are broadcast: all threads always "see" their
presence/absence status in the same way, independently on the order in
which they are scheduled.
- Reaction to the absence of an event is always postponed to the
next instant.
The last point is a major difference with the synchronous approach, in
particular with the
Esterel[
9] language (see
Related Work9 for details).
Creation
A event is created in a scheduler whose instants determine its
presence/absence status.
event_create
creates an event in
the implicit scheduler.
event_create_in
creates an event in
the scheduler given as argument. Events are of the type
event_t
.
event_t evt1 = event_create ();
event_t evt2 = event_create_in (sched);
The first event is created in the implicit scheduler while the second one
is created in the scheduler sched
.
Generation
Generation of an event makes it present in the scheduler in charge of
it (the one in which the event has been created). If the thread
running the generation is linked to the scheduler of the event, then
the generation becomes immediately effective. In this case, all the
threads waiting for the event will be re-scheduled during the same
instant, and thus will see the event as present. If the scheduler of
the event and the one of the thread are different, then the
order to generate the event is sent to the scheduler of the
event. In this case, the event will be generated later, at the beginning of
the next instant of the target scheduler. Note that this can occur at
an aritrary moment if the target scheduler is run asynchronously by a
dedicated native thread.
A value can be associated to an event generation; in this case one
speaks on a
valued generation of the event; in the other
case, the generation is said to be
pure. The value must be of
a pointer type, or of a type that can be cast to such a type. All the
values generated for a same event are collected during the instant and
are available during this instant using the
get_value
non-atomic instruction (see section
Generated Values2.4.8).
A pure generation of a present (that is, already generated) event has
no effect. This is of course different for a valued generation: then,
the value is appended to the list of values of the event. Note that,
as for the presence/absence status, the list of values of an event is
automatically reset to the empty list at the beginning of each
instant.
generate (evt);
generate_value (evt,val);
The first call is the pure generation of the event evt
. The
second call generates the same event, but with the value val
. Note that a valued generation with a null value is not
equivalent to a generation without any value.
2.3.3 Control over Threads
|
Three means for controlling threads execution are provided: the way to
stop the execution of a thread which is thus forced to
terminate definitively; the way to suspend and resume the execution
of a thread. With these means, the user can design its own scheduling
strategies for controlling threads.
Stop
Execution of a thread is stopped with a call to the
stop
instruction. To stop a thread means that its execution is definitively
abandoned. The effect of the
stop
instruction is to send to the
scheduler of the thread in argument the order to stop the thread. The
order will always be processed at the beginning of a new instant, in
order to avoid interferences with the other threads. If the executing
thread and the stopped threads are both linked to the same scheduler,
then the thread will be stopped at the beginning of the next
instant. As example, a thread can stop itself by:
stop (self ());
Note that the thread terminates its execution for the current instant
and is only effectively stopped at the next instant; the
preemption is actually a "weak" one.
Suspend and Resume
Threads can be suspended and resumed using the two atomic instructions
suspend
and
resume
. Suspension of a thread means
that its execution is suspended until it is resumed. Suspension and
resuming are orders given to the scheduler running the considered
thread, and are always processed at the beginning of new
instants. Like for
stop
, if the executing thread and the
concerned threads are both linked to the same scheduler, then the
thread execution status will be changed at the beginning of the next
instant.
suspend (main_thread);
resume (other_thread);
Orders are processed in turn. For example, the following sequence
of instruction has no effect as thread
is immediately
resumed after being suspended:
suspend (thread);
resume (thread);
Basic atomic instructions are standard C blocks, made of C statements
enclosed by "
{
" and "
}
". Of course, the C
statement must terminate and one must keep in mind that it should not
take too much time to execute it. Indeed, the others threads belonging
to the same scheduler cannot execute while the current thread has not
finish to execute it (see the discussion on this point in
Necessity of Cooperation4.1).
Almost any C code can syntactically appear in atomic instructions,
between "
{
" and "
}
". It is of the programmer's
responsibility to avoid the use of any statement that would prevent
the complete execution of an atomic instruction. For example,
setjump/longjump
should absolutely be avoided. However, some
forbidden statements are detected, and lead to program rejection. This
is the case of the C
return
statement.
break
statements not enclosed in a C bloc are also forbidden, as well as
goto
statements.
Procedure calls are considered as atomic instructions and can thus be
used directly without being put inside a C block. This is however just
a syntactic facility and the semantics of an atomic action made of a C
procedure call is exactly the one of the block containing it. For
example, the module
main
of
Main Module12.1.2
could be equivalently written:
module main ()
{
trace_create ("first thread");
trace_create ("second thread");
}
end module
2.4 Non-atomic Instructions
|
In this section, one considers the non-atomic instructions. As
opposite to atomic instructions, execution of non-atomic instructions
can last several instants.
The basic non-atomic instruction is the
cooperate
instruction which returns the control back to the scheduler. When
receiving the control after a
cooperate
instruction, the
scheduler knows that the executing thread has finished its execution
for the current instant, and thus that is is not necessary to give it
back the control another time during the instant. When the thread
will receive the control in a future instant, the
cooperate
instruction terminates and passes the control in sequence. Execution
of a
cooperate
instruction thus needs two instants to
complete.
Execution of an
await
instruction suspends the executing
thread until the event is generated. There is of course no waiting at
all if the event is already present. Otherwise, the waiting can just
take a portion of the current instant, if the awaited event is
generated later in the same instant, by a thread scheduled later;
the waiting can also last several instants, or can even be infinite,
if the awaited event is never generated.
await (event);
There is a way to limit the time during which the thread is suspended
waiting for an event. The limitation is expressed as a number of
instants. The thread is resumed when the limit is reached. Of course,
the waiting ends, as previously, as soon as the event is generated
before reaching the limit. For example, the following instruction
suspends the executing thread at most 10 instant, waiting for the
event e
.
await (e,10);
After resuming, one can know if the limit was exceeded or if the event
has been generated before the limit was reached, by examining the
value returned by the predefined function return_code ()
:
ETIMEOUT
means that the limit has been
reached.
OK
means normal termination because the awaited event
is generated.
For example, the following code tests the presence of an event during
one instant:
await (e,1);
{
if (return_code () == ETIMEOUT)
printf ("was absent");
else
printf ("is present");
}
Note that the message "was absent
" is printed at the instant
that follows the starting of the waiting, while "is present
"
is printed at the same instant. Reaction to the absence of an event
(here, the printing action) cannot be immediate, but is always
postponed to the next instant.
The value returned by return_code ()
concerns the last
non-atomic instruction executed. For example, in the following code,
the printed messages concern f
, not e
:
await (e,1);
await (f,1);
{
if (return_code () == ETIMEOUT)
printf ("f was absent");
else
printf ("f is present");
}
The
join
instruction suspends the executing thread until
another one, given as argument, has terminated; this is called
joining the thread. Note that there are actually two ways for
a thread to terminate: either because execution has returned, or
because the thread has been stopped.
As for
await
, there is a possibility to limit the time
during which the thread is suspended, and the
return_code
function is used in the same way to detect if the limit was reached.
join (th1);
join (th2,10);
The first instruction waits unconditionally for the termination of th1
,
while the second waits at most 10 instants to join th2
.
There are two kinds of loops:
while
loops, that cycle while
a boolean condition is true, and
repeat
loops that cycle a
fixed number of times. There is no equivalent of a
for
loop,
as such loops would need in most cases an associated local
variable;
for
loops are thus, when needed, to be
explicitly coded.
While
While loops are executing their bodies while a boolean condition is true.
while (1) do
printf ("loop!");
cooperate;
end
The value of the condition is only considered at the first instant and
when the body terminates. For example, the following loop never
terminates, despite the fact that the condition is false at some
instants:
{local(i) = 1;}
while (local(i)) do
{local(i) = 0;}
cooperate;
printf ("loop!");
cooperate;
{local(i) = 1;}
end
Instantaneous loops are loops with a non-cooperating body. This is for
example the case of:
while (1) do
printf ("loop!");
end
Instantaneous loops should generally be avoided as they forbid
execution of the others threads (see
Necessity of Cooperation4.1).
Nevertheless, they can be useful in unlinked threads,
or when the executing thread is the only one run by the scheduler
(which is a rather special situation, indeed).
Repeat
A
repeat
loop executes its body a fixed number of times. It
could be coded with a
while
loop and a counter implemented
as a local variable; the
repeat
instruction avoids
the use of such a local variable.
repeat (10) do
printf ("loop!");
cooperate
end
The expression defining the number of cycles is evaluated when the
control reaches the loop for the first time.
The
if
statement can be used in atomic instructions as other
standard C statements. However, when it is used in this context, both
then
and
else
branches must be atomic
instructions. A non-atomic version of the
if
is
available. The syntax is:
if (<expression>) then <instruction> else <instruction> end
For example, the instruction of Await12.9.4, which
tests for the presence of e
during the current instant, can
be equivalently written as:
await (e,1);
if (return_code () == ETIMEOUT) then
printf ("was absent");
else
printf ("is present");
end
Both then
and else
branches are optional. The
boolean expression exp
is evaluated once, when the control
reaches the if
for the first time. Its value determines
which branch is chosen for execution. The chosen branch is then
executed up to completion, which can takes several instants, as it can
be non-atomic.
For example, the following instructions prints at the next instant
the value exp
has at the current instant:
if (exp) then
cooperate;
printf ("true!");
else
cooperate;
printf ("false!");
end
The
return
statement of C cannot be used in atomic
instruction, as it would prevent the instruction to
terminate. However, it can be used as a non-atomic instruction to
force the termination of the executing thread. For example, the
following module terminates when the boolean expression
exp
becomes true:
module m ()
while (1) do
if (exp) then return; end
cooperate;
end
end module
Note that, as the return
statement is forbidden in atomic
instructions, the following module is incorrect:
module bug ()
while (1) do
{if (exp) return;}
cooperate;
end
end module
The
halt
instruction never terminates. It is equivalent to:
while (1) do cooperate; end
Its basic use is to block the execution flow, as in:
if (!exp) then
generate (error);
halt;
end
...
In this code, the if
instruction tests exp
and
forbids the control to pass in sequence if it is false. In this case,
an event is generated which should be used by an other thread to
recover from this situation.
The
get_value
instruction is the means to get the values
associated to an event by the valued generations of it. The values are
indexed and, when available, returned in a variable of type
void**
. The instruction
get_value (e,n,r)
is an attempt
to get the value of index
n
, generated for the event
e
during the current instant. If available, the value is assigned
to
r
during the current instant; otherwise,
r
will be set to
NULL
at the next instant, and the function
return_code ()
will return the value
ENEXT
.
For example, the following instruction is an attempt to get the first
value of
evt
(for simplicity, values are supposed to be of type
int
):
get_value (evt,0,res);
{
if (return_code () == ENEXT) printf ("there was no value! ");
else printf ("first value is %d ", (int)(*res));
}
The following module awaits an event and prints all the values
generated for it:
module print_all_values (event_t evt)
local int i,int res;
await (local(evt));
{local(i) = 0;}
while (1) do
get_value (local(evt), local(i), (void**)&local(res));
if (return_code () == ENEXT) then return;
else
{
printf ("value #%d: %d\n", local(i), local(res));
local(i)++;
}
end
end
end module
Note that the module always terminates at the next instant. This
should not be surprising: one must wait for the end of the current
instant to be sure that all the values have been effectively got.
A module can run another one, using the
run
non-atomic
instruction. The calling thread suspends execution when encountering
a
run
instruction. Then, an instance of the called module
is created with the arguments provided. Finally, the calling thread is
resumed when the instance of the called module terminates. Moreover,
a thread which is stopped retransmits the stop order to the thread it
is running, if there is one. Thus, the instance of a called module is
automatically stopped when the calling thread is.
run mod1 ();
run mod2 ("mod2 called");
In this sequence, an instance of mod1
is first
executed. When it terminates (if it does), then an instance of mod2
is executed with a string argument. Finally, the sequence
terminates when the last instance does.
When created, a thread is always linked to a scheduler. During
execution, the thread can link to others schedulers, using the
link
instruction. The effect of execution the instruction
link (sched)
is to extract the executing thread from the current
scheduler and to add it to
sched
, in the state in which it
has left the current scheduler. Thus, after re-linking, the thread
will resume execution in the new scheduler. This can be seen as a
restricted migration of threads.
For example, the following module cycles between two schedulers. On
each scheduler, it awaits an event and then prints a message. Then, if
the two events are generated at each instant, the program prints
"
Ping Pong
" forever:
module play ()
while (1) do
link (sched1);
await (evt1);
printf ("Ping ");
link (sched2);
await (evt2);
printf ("Pong\n");
end
end module
Native modules only have meaning in the context of a preemptive OS as
instances of native modules should be run by native threads belonging
to the kernel. The presence of the
native
keyword just after
the keyword
module
defines the module as native. The
unlink
non-atomic instruction unlinks the instance of a native
module from the scheduler to which it was previously linked. After the
unlink
instruction, the thread is free from any
synchronization. However, an unlinked thread can always re-link to a
scheduler using the
link
non-atomic instruction (see
Link12.7.2).
The
unlink
instruction is specific to native modules:
programs where
unlink
instructions appears in non-native
modules are rejected at compile time.
When unlinked, instances of native modules are autonomous, which means
that they execute independantly of any scheduler, at their own
pace. For example, consider the following program which creates
two native threads instances of the same native module
pr
:
module native pr (char *s)
unlink;
while (1) do
printf (local(s));
end
end module
module main ()
pr_create ("hello ");
pr_create ("world!
");
end module
The two lists of messages produced are merged in an unpredictable way
in the output: nondeterminism is introduced by unlinked instances of
native modules. Note that the loop in pr
is
instantaneous; this is not a problem as the thread is
unlinked. The granularity of each thread is, however, under the
dependance of the OS, which can be unacceptable in some situations
(see Granularity of Instants4.2).
Native module are important for using standard blocking I/Os in a
cooperative context. For example, the following module uses the getchar
function which blocks execution of the calling thread until
a character is read on the standard input:
module native AnalyseInput ()
unlink;
...
while (1) do
{
switch (getchar ()){
...
}
}
end
end module
The first instruction unlinks the thread from the current
scheduler. Then, the getchar
function can be safely called
without any risk to block other threads.
The following module read_module
implements a cooperative
read I/O, using the standard blocking read
function. The
thread first unlinks from the scheduler, then performs the read, and
finally re-links to the scheduler:
module native read_module (int fd,void *buf,size_t count,ssize_t *res)
local scheduler_t sched;
{local(sched) = current_scheduler ();}
unlink;
{(*local(res)) = read (local(fd),local(buf),local(count));}
link (local(sched));
end module
Note that the pattern suggested by this example is a very general one,
useful to reuse code which was not designed to be run in a cooperative
context.
3 Examples - 1
One considers several examples of programming in
Loft. First, some
basic examples are given in
Basic Examples3.1. The
complete programming of a little reflex game is considered in
Reflex Game Example3.2. Sieve algorithms are coded in
Sieve Examples3.3. Finally, data-flow programming is considered
Data-Flow Programming3.4.
One considers several basic examples which highlight some aspects of
the language. Section
Mutual Stops3.1.1 shows
the benefit of a precise semantics. A notification-based communication
mechanism is implemented in
Wait-Notify3.1.2. Barriers
are considered in
Synchronization Points3.1.3. A
reader/writer example is code in
Readers/Writers3.1.4.
Finally, a producer/consumer example is considered in section
Producers/Consumers3.1.5.
Let us consider a system made of two threads implementing two variants
of a service, say a fast one and a slow one. Two events are used to
start each of the variants. After a variant is chosen, the other one
should become unavailable, that is each variant should be able to stop
the other variant. The coding of such an example is straightforward.
First, the two threads
fast
and
slow
, and the two
events
start_fast
and
start_slow
are declared:
thread_t fast, slow;
event_t start_fast, start_slow;
The thread fast
awaits start_fast
to start. Then,
before serving, it stops the slow variant. It is an instance of the
following module fast_variant
:
module fast_variant ()
await (start_fast);
stop (slow);
< fast service >
end module
The thread slow
is an instance of slow_variant
defined similarly by:
module slow_variant ()
await (start_slow);
stop (fast);
< slow service >
end module
The program consists in creating the two threads and the two events:
....
fast = fast_variant_create ();
slow = slow_variant_create ();
start_fast = event_create ();
start_slow = event_create ();
....
The question is now: what happens if both start_fast
and
start_slow
are simultaneously present (this should not
appear, but the question remains of what happens if by accident it is
the case)? The answer is clear and precise, according to the
semantics: the two variants are executed during only one instant, and,
then, they both terminate at the next instant. Note that to insert a
cooperate
just after the stop
in the two modules
would prevent both threads to start servicing..
Now, suppose that the same example is coded using standard pthreads,
replacing events by condition variables and stop
by pthread_cancel
. The resulting program is deeply
non-deterministic. Actually, one of the two threads could cancel the
other and run up to completion. But the situation where both threads
cancel the other one is also possible in a multiprocessor context
(where each thread is run by a distinct processor); however, in
this case, both variants execute simultaneously while they are not
canceled, which produces unpredictable results.
Note that the possibility of mutual stopping exhibited in this example
cannot be expressed in a synchronous language, because it would be
rejected at compile time as having a causality cycle.
One considers thread synchronization using
conditions which
basically correspond to (simplified)
condition variables of
POSIX. A thread can
wait for a condition to be set by another
thread. The waiting thread is said to be
notified when the
condition is set. In a very first naive implementation, conditions
are simply boolean variables (initially false). To notify a condition
means to set the variable, and to wait for the condition means to test
it.
int condition = 0;
void notify (int *condition)
{
(*condition) = 1;
}
module wait (int *cond)
while ((*local(cond)) == 0) do
cooperate;
end
{(*local(cond)) = 0;}
end module
Note that no mutex is needed: because of the cooperative model, simple
boolean shared variables are sufficient to implement atomic test and
set operations needed. There is however a major drawback: the module
wait
performs busy-waiting and is thus wasting the CPU
resource.
3.1.2.1 Use of Events
Of course, events are the means to avoid busy-waiting. One now
considers conditions as made of a boolean variable with an associated
event:
typedef struct condition_t
{
int condition;
event_t satisfied;
}
*condition_t;
The notification sets the condition variable and generates
the associated event:
void notify (condition_t cond)
{
cond->condition = 1;
generate (cond->satisfied);
}
The wait
module awaits the event while the condition
is not set:
module wait (condition_t cond)
while (local(cond)->condition == 0) do
await (local(cond)->satisfied);
if (local(cond)->condition == 0) then cooperate; end
end
{local(cond)->condition = 0;}
end module
Note that once the event is received, the condition must be tested
again because an other thread waiting for the same condition could have
been scheduled before. In this case, the wakening is not productive
and the thread has just to cooperate. Without the second test, an
instantaneous loop would occur in this situation.
The solution proposed does not allow single notifications,
where only one thread is notified at a time. Actually, all threads
waiting for the same condition are simultaneously awaken and all but
one will be able to proceed, the other ones returning to their
previous state. This can be inefficient when several threads are often
simultaneously waiting for the same condition.
3.1.2.2 Single Notification
To be able to notify threads one by one, an event is associated to
each waiting thread. Thus, a condition now holds a list of events
managed in a fifo way and storing the waiting threads. There are two
notifications:
notify_one
which notifies a unique thread and
notify_all
which, as previously, notifies all the threads.
The
notify_one
function gets the first event (
get
)
and generates it:
void notify_one (condition_t cond)
{
event_t event = get (cond);
if (event == NULL) return;
cond->condition = 1;
generate (event);
}
The function get
returns NULL if no thread is currently
waiting on the condition. The notification is lost in this case (this
conforms to POSIX specification). The notify_all
function
notifies all the threads in one single loop:
void notify_all (condition_t cond)
{
int i, k = cond->length;
for (i = 0; i < k; i++) notify_one (cond);
}
The module wait
now begins with the creation of a new
event which is stored (put
) in the condition:
module wait (condition_t cond)
local event_t event;
while (local(cond)->condition == 0) do
{
local(event) = event_create_in (current_scheduler ());
put (local(cond),local(event));
}
await (local(event));
if (local(cond)->condition == 0) then cooperate; end
end
{local(cond)->condition = 0;}
end module
The notification of a unique thread does not imply the competition of
all the other threads waiting for the same condition because they are
actually waiting on distinct events. Note that the choice of a fifo
strategy to manage events in conditions is quite arbitrary (while
reasonable). This is of course a question which is inherent to the
notify_one
primitive: how is the notified thread chosen?
Either which thread is notified is left unspecified, which introduces
some nondeterminism, or the way threads are managed is specified,
which can be felt as an over-specification. This is actually an issue
which can only be solved by application programmers according to their
needs.
3.1.3 Synchronization Points
|
A
synchronization point is a given point in a program where
different threads must wait until a certain number of threads have
reached it. The type of synchronization points is defined by:
typedef struct sync_point_t
{
int sync_count;
int threshold;
event_t go;
}
*sync_point_t;
If the threshold is reached, then the counter is reset to 0 and the
event go
is generated. Thus, all the waiting threads can
immediately proceed. If the threshold is not reached, then the
counter is incremented and the thread awaits go
to continue.
module sync (sync_point_t cond)
if (local(cond)->threshold > local(cond)->sync_count) then
{local(cond)->sync_count++;}
await (local(cond)->go);
else
{
local(cond)->sync_count = 0;
generate (local(cond)->go);
}
end
end module
One considers several threads that are reading and writing a shared
1. The
writers have priority over readers. Several readers can simultaneously
read the resource while a writer must have exclusive access to it (no
other writer nor reader) while writing. One adopts the terminology of
locks and note
rwlock_t
the type of control structures:
typedef struct rwlock_t
{
int lock_count;
int waiting_writers;
event_t read_go;
event_t write_go;
}
*rwlock_t;
The convention for lock_count
is the following: 0 means that
the lock is held by nobody; when held by the writer, lock_count
has value -1; when positive, lock_count
is
the number of readers currently reading.
3.1.4.1 Writers
In order to write, a writer must first run the module
write_lock
:
module write_lock (rwlock_t rw)
{local(rw)->waiting_writers++;}
while (local(rw)->lock_count) do
await (local(rw)->write_go);
if (local(rw)->lock_count) then cooperate; end
end
{
local(rw)->lock_count = -1;
local(rw)->waiting_writers--;
}
finalize
{
local(rw)->waiting_writers--;
}
end module
When writing is finished, the writer must call the write_unlock
function:
void write_unlock (rwlock_t rw)
{
rw->lock_count = 0;
if (!rw->waiting_writers) generate (rw->read_go);
else generate (rw->write_go);
}
write_unlock
should also be called in the finalizer of the writer
in order to release the lock when the writer is forced to terminate.
3.1.4.2 Readers
A reader can read if no writer is currently writing (
lock_count < 0
) and if there is no waiting writer:
module read_lock (rwlock_t rw)
while ((local(rw)->lock_count < 0) || (local(rw)->waiting_writers)) do
await (local(rw)->read_go);
end
{local(rw)->lock_count++;}
end module
After reading or when stopped, a reader should call the function read_unlock
:
void read_unlock (rwlock_t rw)
{
rw->lock_count--;
if (!rw->lock_count) generate (rw->write_go);
}
3.1.5 Producers/Consumers
|
3.1.5.1 Unique area
One implements the simplest form of a producers/consumers system where
several threads are processing values placed in a buffer. Each thread
gets a value from the buffer, process it, and then put the result back
in the buffer. To simplify, one considers that values are integers
that are decremented each time they are processed. Moreover, the
processing terminates when 0 is reached. First, a buffer implemented
as a list of type
channel
, and an event are defined:
channel buffer;
event_t new_input;
The processing module cyclically gets a value from the buffer, tests
if it is zero, and if it is not, processes the value. When it is
finished, the value decremented by one is put back in the buffer. The
event new_input
is used to avoid busy-waiting while the
buffer is empty (one considers that the buffer is rarely empty;
so, a unique event is preferred to single notifications of section
Wait-Notify3.1.2).
module process ()
local int v;
while (1) do
while (buffer->length == 0) do
await (new_input);
if (buffer->length == 0) then cooperate; end
end
{local (v) = get (buffer);}
if (local(v) == 0) then return; end
< processing the value >
{local(v)--;}
put (buffer,local(v));
generate (new_input);
end
end module
All accesses to the shared buffer are atomic under the condition that
all instances of process
are created in the same
scheduler. This is of course the case when the implicit scheduler is
the unique scheduler in use.
3.1.5.2 Two areas
One now considers a situation where there are two buffers
in
and
out
, and a pool of threads that take data from
in
,
process them, and then put results in
out
. A distinct scheduler
is associated to each buffer.
channel in, out;
scheduler_t in_sched, out_sched;
event_t new_input, new_output;
The module process
gets values from in
, avoiding
busy-waiting by using the event new_input
. A new thread
instance of process_value
is run for each value.
module process ()
while (1) do
if (in->length > 0) then
run process_value (get (in));
else
await (new_input);
if (in->length == 0) then cooperate; end
end
end
end module
The process_value
module process its parameter and then
links to out_sched
to deliver the result in out
. At delivery, the event new_output
is generated to
awake threads possibly waiting for out
to be filled.
module process_value (int v)
< processing the value >
link (out_sched);
put (out,local(v));
generate (new_output);
end module
This code shows a way to manage shared data using schedulers. The use
of standard locks is actually replaced by linking operations. Of
course, locks still exist in the implementation and are used when
linking operations are performed, but they are totally masked to the
programmer who can thus reason in a more abstract way, in terms of
linking actions to schedulers, and not in terms of low-level lock take
and release actions.
In this section, one considers the example of a little game for
measuring the reactivity of users. This example, issued from
Esterel, shows how
Loft can be used for basic reactive
programming.
The purpose of the game is to measure the reflexes of the user. Four
keys are used. The
c
key means "put a coin", the
r
key means "I'm ready", the
e
key means "end the
measure" and the
q
key means "quit the game". After putting
a coin, the user signals the game that he is ready; then, he waits
for the
GO!
prompt; from that moment, the measure starts
and it lasts until the user ends the measure. After a series of four
measures, the game outputs the average score of the user. The game is
over when an error situation is encountered. There are actually two
such situations: when the player takes too much time to press a button
(the player has abandonned); or when
e
is pressed before
GO!
(this is considered as a cheating attempt).
First, one codes a preemption mechanism which will be used for
processing error situations.
One considers the preemption of a thread by an event, which means that
the thread must be stopped when the event becomes present. This can be
obtained using the following module
killer
:
module killer (event_t event,thread_t thread)
await (local(event));
stop (local(thread));
end module
However, this definition is not completely satisfactory because,
when the controlled thread terminates normally, that is without being
preempted, the instance of killer
should also be
stopped. One thus encapsulates it in the following module until
:
module until (thread_t thread,event_t event)
local thread_t kill;
{local(kill) = killer_create (local(event),local(thread));}
join (local(thread));
stop (local(kill));
end module
Then, the instance of killer
is stopped when the controlled
thread terminates, as in this case, it is joined. The preemption of a
thread t
by the event e
thus takes the
form:
run until (t,e);
Several variables, functions, and events are first introduced:
- Several integer variables:
pause_length
,
limit_time
, total_time
, and measure_number
.
- A function
print_out
to print messages.
- Five events:
coin_evt
is generated when c
is pressed;
ready_evt
is generated when r
is pressed;
end_evt
is generated when e
is pressed;
tilt_evt
is generated by the game when an error is detected;
the value of display_evt
holds the measures times to be printed.
int pause_length, limit_time, total_time, measure_numbers;
void print_out (char*);
event_t coin_evt, ready_evt, end_evt, display_evt, tilt_evt;
Phase 1
The first phase consists in waiting the notification that the user is
ready. The waiting lasts at most
limit_time
instants, and if
the timeout is reached, an error is produced and
tilt_evt
is
generated. During the waiting, mistakes are signaled by a beep sound:
a mistake here means that the user presses the
e
key instead
of the
r
key. Beeps are produced by the following module
beep_on
:
module beep_on (event_t event)
while (1) do
await (local(event));
print_out ("\07");
cooperate;
end
end module
The first phase starts by creating an instance of beep_on
which reacts on end_evt
. Then, ready_evt
is awaited
during at most limit_time
instants. If the timeout is
reached then tilt_evt
is generated and the execution
blocks. In all cases, the previously created instance of beep_on
is stopped.
module phase1 ()
local thread_t beeper;
print_out ("press r when ready\r\n");
{local(beeper) = beep_on_create (end_evt);}
await (ready_evt,limit_time);
stop (local(beeper));
if (return_code () == ETIMEOUT) then
generate (tilt_evt);
halt;
end
end module
Phase 2
In phase 2, a prompt message is output after a random number of
instants and then a mew measure starts. Cheating attempt are detected:
the player is cheating if he tries to press
e
in advance,
trying to anticipate the prompt. In this case, an error is detected
and the game terminates. The module
detector
detects the
presence of an event and then generates an other event and stops a
thread:
module detector (event_t trigger,event_t produced,thread_t thread)
await (local(trigger));
stop (local(thread));
generate (local(produced));
end module
The module phase2
begins by creating a cheating detector
which generates tilt_evt
as soon as e
is
pressed. Then, one waits for a random number of instants (returned by
the function myrandom
) before printing the prompt
message. At that moment, the cheating detector is stopped.
module phase2 ()
local thread_t cheat_detector;
print_out ("wait...\r\n");
{local(cheat_detector) = detector_create (end_evt,tilt_evt,self ());}
repeat (myrandom ()) do cooperate; end
print_out ("GO!\r\n");
stop (local(cheat_detector));
end module
Phase 3
Phase 3 consists in measuring the number of instants taken by the
player to press
e
. An abandon (
e
is not pressed
during
limit_time
instants) terminates the game. Moreover,
mistakes (here, pressing
r
instead of
e
) are
detected and signaled. Finally, the value measured is associated to
display_evt
to be printed.
An auxiliary module
increm
for counting instants is first
defined. Once started, it increments at each instant a global variable
whose reference is passed as parameter:
module increm (int *where)
while (1) do
{(*local(where))++;}
cooperate;
end
end module
The phase3
module is defined by:
module phase3 ()
local int count, thread_t counter, thread_t beeper;
{
local(count) = 0;
local(counter) = increm_create (&local(count));
local(beeper) = beep_on_create (ready_evt);
}
await (end_evt,limit_time);
stop (local(counter));
stop (local(beeper));
if (return_code () == ETIMEOUT) then
generate (tilt_evt);
halt;
else
{total_time += local(count);}
generate_value (display_evt, (void*)local(count));
end
end module
Reflex Game Module
The
final_display
module waits during
pause_length
instants before generating
display_evt
with a value which is the
average of the measured times:
module final_display ()
repeat (pause_length) do cooperate; end
print_out ("**** final ");
generate_value (display_evt, (void*)(total_time / measure_number));
end module
A measure consists in running in sequence the three previous phases.
The module list_of_measures
execute measure_number
measures and finally runs an instance of final_display
:
module list_of_measures ()
repeat (measure_number) do
run phase1 ();
run phase2 ();
run phase3 ();
end
run final_display ();
end module
The reflex_game
module waits for c
and then
executes a list of measures. The measures are preempted by tilt_evt
which is generated in case of errors. The preemption results
from the use of the until
module defined in Preemption3.2.1:
module reflex_game ()
print_out ("A reflex game ... press c to start, q to stop.\r\n");
print_out ("You must press `e' as fast as possible after GO!.\r\n");
while (1) do
{total_time = 0;}
await (coin_evt);
run until (list_of_measures_create (),tilt_evt);
print_out ("game over. Press c to restart, q to stop.\r\n");
end
end module
One now considers the executing environment needed to use the reflex game.
Two functions are first defined to put the terminal in a raw mode (to
be able to directly get the key pressed by the player), and to restore
the initial mode and terminate the game.
void the_beginning (void){
system ("stty raw -echo");
}
void the_end (void){
print_out ("It's more fun to compete ...\r\n");
system ("stty -raw echo");
exit (0);
}
Input
The input of the program is produced from the keys pressed. The keys
are returned by the
getchar
function which is not
cooperating. In order to use it, one thus defines the native module
analyze_input
which, after unlinking, cyclically calls
getchar
and generates the appropriate events:
module native analyze_input ()
unlink;
the_beginning ();
while (1) do
{
switch(getchar()){
case 'c': generate (coin_evt); break;
case 'r': generate (ready_evt); break;
case 'e': generate (end_evt); break;
case 'q': the_end ();
}
}
end
end module
Output
The results of measures are the values generated with
display_evt
. The way to get them is to use the
get_value
instruction. Note that only one value can be generated at a time;
thus only the value of index 0 has to be considered. The module
analyze_display
is:
module analyze_display ()
local int res;
while (1) do
await (display_evt);
get_value (display_evt,0, (void**)&local(res));
fprintf (stdout,"score: %d\r\n",local(res));
cooperate;
end
end module
A message with a beep must be issued when tilt_evt
is
generated; this is done by the module analyze_tilt
:
module analyze_tilt ()
while (1) do
await (tilt_evt);
print_out ("\07TILT!!\r\n");
cooperate;
end
end module
Finally, the analyse of the output consists in creating two threads,
one instance of analyze_display
and one of analyze_tilt
:
module analyze_output ()
analyze_display_create ();
analyze_tilt_create ();
end module
Main Module
The main module creates the five events and the three threads needed:
module main ()
{
ready_evt = event_create ();
coin_evt = event_create ();
end_evt = event_create ();
display_evt = event_create ();
tilt_evt = event_create ();
analyze_input_create ();
reflex_game_create ();
analyze_output_create ();
}
end module
Two points are to be noticed. First, the order in which the threads
are created corresponds to the intuitive sequence: input processing -
reaction - output processing. Second, the values of pause_length
and limit_time
, and the definition of myrandom
must be adapted to the executing platform, in order to get
a playable system.
In this section, the focus is put on the dynamic creation of threads.
One considers
sieves which are algorithms for producing
numbers with a given characteristics. The most well-known sieve is
of course the sieve of Eratosthenes for prime numbers.
Actually, there are two types of sieves: the number of elements of
bounded sieves is statically fixed, while in unbounded sieves the
number of elements is potentially infinite. Only unbounded sieves are
considered here.
3.3.1 Standard Eratosthenes Sieve
|
The basic version of the unbounded sieve of Eratosthenes can be implemented
by the following
2:
int main ()
{
list_t primes = init_list ();
int current = 3, create = 1;
while (1) {
cell_t cell = primes->first;
while (cell != NULL) {
if (multiple_of (current,cell->val)) {
create = 0; break;
}
cell = cell->next;
}
if (create) {
add_list (primes,current);
output (current);
} else create = 1;
current += 2;
}
return 0;
}
The type list_t
is the type of lists made of elements of
type cell_t
which contains an integer field val
and a pointer next
to the next cell. The definition of these
two types is standard and not given here.
3.3.2 Eratosthenes Sieve in Loft
|
Dynamic memory allocation is somewhere needed in unbounded
sieves. Actually, in
Loft memory allocation can be masked by the
creation of threads. The Eratosthenes's sieve can be thus basically
implemented in
Loft without the need of any user-defined data type..
One first defines a module which generates the odd numbers, one
by instant. The odd numbers are associated to an event given as
parameter.
module producer (event_t start)
local int i;
{local(i) = 1;}
while (1) do
{local(i) += 2;}
generate_value (local(start), (void**)local(i));
cooperate;
end
end module
First, a macro is defined for waiting an event evt
and
assigning its first value to a variable res
:
#define get_first_val(evt,res) await(local(evt));get_value(local(evt),0, (void**)&local(res))
An instance of filter_prime
is created for each prime
number passed as the value associated to the event in parameter:
module filter_prime (event_t me)
local int prime, int res, event_t next;
get_first_val(me,prime);
{
output (local(prime));
filter_prime_create (local(next = event_create ()));
}
while (1) do
cooperate;
get_first_val(me,res);
{
if (local(res) % local(prime))
generate_value (local(next), (void**)local(res));
}
end
end module
The condition local(res) % local(prime)
is true if local(res)
is not a multiple of local(prime)
; in this
case local(res)
is transmitted to the next filter;
otherwise, nothing is done.
The main module creates a producer and a first filter, linked
by the same event:
module main ()
{
event_t start = event_create ();
producer_create (start);
filter_prime_create (start);
}
end module
The beginning of the output is:
3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73
79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167
173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263
269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367
373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541 547 557 ...
lucky numbers[
18] are generated by a
sieve which is very close to the one of Eratosthenes. The production
of lucky numbers is defined by the following algorithm: write all the
odd numbers: 1,3,5,7,9,11,... The first number greater than 1 is 3,
so
delete every third number: 1,3,7,9,13,15,19,... The first
number greater than 3 in the list is 7, so
delete every seventh
number: 1,3,7,9,13,15,21,25,31,... And so on, without ending. Numbers
that remain are called lucky numbers.
The sieve to produce lucky numbers is based on a filtering function
which filters numbers according to their indexes and not to their
values, as the Eratosthenes sieve.
module filter_lucky (int i,event_t trigger,event_t out)
local int base, int res, int k, event_t next;
get_first_val(trigger,base);
{
generate_value (local(out), (void**)local(base));
filter_lucky_create (local(k)= local(i)+1,
local(next) = event_create (),
local(out));
}
while (1) do
cooperate;
get_first_val(trigger,res);
{
local(k)++;
if (local(k) % local(base))
generate_value (local(next), (void**)local(res));
}
end
end module
The produced numbers are generated as values of the event out
in parameter. Moreover, the indexes of the filtered values is
given as first parameter to filter_lucky
. The sieve for
lucky numbers is defined by:
module main ()
{
event_t go = event_create ();
event_t result = event_create ();
out1_create (result);
producer_create (go);
filter_lucky_create (1,go,result);
}
end module
The module out1
prints all the values associated to its
parameter which is an event. The list of lucky numbers starts with:
3 7 9 13 15 21 25 31 33 37 43 49 51 63 67 69 73 75 79 87
93 99 105 111 115 127 129 133 135 141 151 159 163 169 171 189 193 195
201 205 211 219 223 231 235 237 241 259 261 267 273 283 285 289 297
303 307 319 321 327 331 339 349 357 361 367 385 391 393 399 409 415
421 427 429 433 451 463 475 477 483 487 489 495 511 517 519 529 535
537 541 553 559 577 579 583 591 601 613 615 619 ...
3.3.4 Lucky Prime Numbers
|
Let us now consider the numbers that are simultaneously prime and
lucky. The direct production of these lucky prime numbers is not so
clear using a unique sieve (what would be the filtering
function?). However, a possible solution is to reuse the two previous
sieves and to build a system made of two sub-systems, one producing
prime numbers and the other producing lucky numbers. The values
produced by the two sub-systems are comming from the same event and
are output by the module
out2
which outputs a value when two
generation of it are received during the same instant, meaning thus
that the value has passed all the prime filters but also all the lucky
filters. The definition of
out2
is:
module out2 (event_t event)
local int k, int res, int all;
while (1) do
await (local(event));
{local(k) = 0; local(all) = 0;}
while (!local(all)) do
get_value (local(event),local(k), (void**)&local(res));
{
if (return_code () == ENEXT) {
local(all) = 1;
if (local(k) == 2) output (local(res));
}
local(k)++;
}
end
end
end module
The module filter_prime
is changed to let it produce its
results as values of an event. The system is the following:
module main ()
{
event_t go = event_create ();
event_t result = event_create ();
out2_create (result);
producer_create (go);
filter_lucky_create (1,go,result);
filter_prime_create (go,result);
}
end module
The list of lucky prime numbers is:
3 7 13 31 37 43 67 73 79 127 151 163 193 211 223 241 283
307 331 349 367 409 421 433 463 487 541 577 601 613 619 631 643 673
727 739 769 787 823 883 937 991 997 1009 1021 1039 1087 1093 1117 1123
1201 1231 1249 1291 1303 1459 1471 1543 1567 1579 1597 1663 1693 1723
1777 1801 1831 1879 1933 1987 2053 2083 2113 2221 2239 2251 2281 2311
2467 2473 2557 2593 2647 2671 2689 2797 2851 2887 2953 2971 3037 3049
3109 3121 3163 3187 3229 3259 3301 3307 3313 ...
This example shows how code reuse can be made easier by the existence
of common instants shared by all the threads.
3.4 Data-Flow Programming
|
In this section, one considers data-flow programming and shows how to
implement it in
Loft.
One considers programs made of
processes connected through
channels. Channels are basically unbounded FIFO
files. Processes can put values in channels (this is always possible
as channels are unbounded) and get values from them. Each channel as
at most one
producer (a process that fills it) and at most one
consumer (a process that empties it). Thus, there is no
concurrency on channels accesses, except between producers and
consumers. These programs are called data-flow programs because
processes are basically creating and transforming flows of
values. They were first introduced in [
24], and
they are actually the initial computing model at the basis
of dataflow synchronous languages.
There are two variants: in
Nets of Sequential Processes, the
consumer of an empty channel remains stuck until some value is
(possibly) produced by the producer of the channel. In
Nets of
Reactive Processes, all processes share the same instants, and a
process can thus detect that a channel is empty (and thus it can avoid
to get stuck on it) during one instant. In what follows, one does not
distinguish between the two variants.
Channels are basically lists of cells containing values, with an
access to the first cell and to the last cell. Moreover, an event is
associated to each channel for signaling that a new value is put in
it. Cells hold integers of type
value_t
.
typedef struct channel_t {
cell_t first;
cell_t last;
int length;
event_t new_value;
} *channel_t;
The function to put a value into a channel is:
void put (channel_t chan,value_t v)
{
if (chan->length == 0) {
chan->last = cell_create (v);
chan->first = chan->last;
generate (chan->new_value);
} else {
chan->last->next = cell_create (v);
chan->last = chan->last->next;
}
chan->length++;
}
Note the generation of the event hold by the channel, to signal the
consumer that the channel is no more empty. Extraction of a value
from a channel is performed by the function extract
:
value_t extract (channel_t chan)
{
cell_t head;
value_t res;
if (chan->length == 0) return NULL;
head = chan->first;
chan->first = head->next;
res = head->val;
free (head);
chan->length--;
return res;
}
The module get
blocks execution until a first value can be
extracted out of a channel:
module get (channel_t in,value_t *res)
while (local(in)->length == 0) do
await (local(in)->new_value);
if (local(in)->length == 0) then cooperate; end
end
{(*local(res)) = extract (local(in));}
end module
If the channel is not empty, the value is immediately extracted from
it and assigned to the result variable. Otherwise, the event
associated to the channel is awaited. Note that a process can extract
several values from a channel during the same instant. Thus, the event
associated to a channel may be present while the channel is empty,
because the channel has been already emptied during the same
instant. There are two consequences:
- One must always test if the channel is empty, after
receiving the event.
- In case the channel is empty (which means that all the values
in it have been previously extracted), the module should cooperate, to
avoid cycling.
A macro is defined to replace runs of the get
module:
#define GET(chan) \
while (chan->length == 0) do\
await (chan->new_value);\
if (chan->length == 0) then cooperate; end\
end
Processes are modules. One just presents here two basic modules: the
first one produces integer values on its output channel, and the
second one prints the values received on its input channel. The
producer
3:
module producer_process (value_t from,value_t incr,channel_t out)
local value_t val;
{local(val) = local(from);}
while (1) do
if (local(out)->length < 10) then
{
put (local(out),local(val));
local(val) += local(incr);
}
end
cooperate;
end
end module
The out_process
process is:
module out_process (channel_t in)
while (1) do
GET (local(in))
printf ("%d ",extract (local(in)));
fflush(stdout);
end
end module
Programs are built by connecting processes with channels. Each
channel should have at most one producer and at most one consumer. For
example, the following program is made of a channel (created with the
function InitChannel
) connecting an instance of the process
Producer
to an instance of the process Out
:
module main()
{
channel_t c = channel_create ();
producer_process_create (0,1,c);
out_process_create (c);
}
end module
The program prints the list of integers.
One programs in the dataflow style the Eratosthenes sieve to produce
prime numbers. A list of filters corresponding to already produced
prime numbers are put in sequence and connected through channels. A
number that is able to pass all the filters is prime. A new filter is
introduced for each new prime number that rejects multiples of it. The
filter module is:
module filter_process (int index,int base,channel_t in,channel_t out,filter_t filter)
while (1) do
GET (local(in))
{
int v = extract (local(in));
local(index)++;
if (local(filter) (local(index),local(base),v))
put(local(out),v);
}
end
end module
The list of filters is constructed by the module sift_process
:
module sift_process (channel_t in,channel_t out,filter_t filter)
local int k;
{local(k) = 1;}
while (1) do
GET (local(in))
{
channel_t intern = channel_create_in (current_scheduler ());
int v = extract (local(in));
put(local(out),v);
local(k)++;
filter_process_create (local(k),v,local(in),intern,local(filter));
local(in) = intern;
}
end
end module
Finally, the prime numbers are produced by the following program,
where the filtering function primes
(of type filter_t
) filters the multiples of a base value:
int primes (int index,int base,int value)
{
return value%base;
}
module main()
{
channel_t c1 = channel_create ();
channel_t c2 = channel_create ();
producer_process_create (3,2,c1);
sift_process_create (c1,c2,primes);
out_process_create (c2);
}
end module
4 Programming Style
The specificities of
Loft leads to a programming style which can be
sum-up by the following rules of thumb: don't forget to cooperate;
define a good granularity of time for instants; don't loose events;
choose a smart decomposition into distinct synchronous areas.
One now considers these rules in turn.
4.1 Necessity of Cooperation
|
Fair threads are basically cooperative, that is they have to leave the
control periodically and to return it to the scheduler they are linked
to in order to let the others threads execute. There are basically two
ways for a thread to be non-cooperative: to fall in an instantaneous
loop, or to execute a never-terminating atom. It must be noticed that
recursion cannot by itself lead to a lack of cooperation. One now
consider these points in more details.
4.1.1 Instantaneous Loops
|
An instantaneous loop is a loop that never terminates and cycles
forever during the same instant. Thus, it never cooperates and
prevents the others threads from being executed.
The first way to get an instantaneous loop is to forget an explicit
cooperate
statement, as in:
while (1) do
printf ("loop!");
end
The use of implicit cooperation points is not always the solution to
break instantaneous loops. For example, consider the case where the
waiting of an event triggers the execution of the loop body:
while (1) do
await (evt);
printf ("loop!");
end
Then, the infinite cycle does not immediately starts, but it does when
the event becomes present. Indeed, from that moment, the await
statement will always consider the event as present, and the
loop becomes then instantaneous.
To falsely assume that two events are never simultaneous can also lead
to an instantaneous loop. Consider:
while (1) do
await (evt1);
await (evt2);
{ ... }
end
This code cannot lead to an instantaneous loop provided evt1
and evt2
are never simultaneous. However, there is an
infinite cycle if both events are present at the same instant during
the loop execution. Note that this may occur if evt1
and evt2
actually denote the same event.
To produce a potentially infinite number of values can also be
possible in an indirect way, through the get_value
primitive. For example, consider:
while (search) do
get_value (evt,i,&res);
{
if (return_code () == ENEXT) {
search = 0;
} else {
printf ("value #%d: %d ", i,res);
i++;
generate_value (evt,i);
}
}
end
This piece of code can lead to an infinite loop (and also to the
generation of an infinite number of values). Indeed, to get the ith value leads to generate the (i+1)th one. Thus, an
instantaneous loop appears as soon as the first value is generated.
4.1.2 Non-terminating Atoms
|
A non-terminating atom (which is really bad-named in this case) also
leads to an absence of cooperation. The purest form of a
non-terminating atom is:
{
for (;;) {}
}
There are, of course, many others ways for an atom to be
non-terminating.
Note that, in addition to terminate, an atom as also to do it
properly: it is forbidden to jump in a way or an other, for example
with a goto
, outside the atom, as outlined in C Statements2.3.4.
There is no possibility to recursively create infinitely many threads
during the same instant, as thread creation is always postponed to the
next instant. Thus, the following code, while not useful, will not
prevent cooperation:
module m ()
m_create ();
end module
Of course, one can exhaust the system memory using recursion, by
creating infinitely many new threads, as in:
module m ()
m_create ();
halt
end module
Recursive creation of threads is also possible using the run
primitive, as for example in:
module m ()
run m ();
end module
However, as with an explicit creation of threads, there is no
possibility using run
to cycle forever during the same
instant.
4.2 Granularity of Instants
|
Supposing that there is no instantaneous loop to prevent the passing
of instants, the problem still remains to get a correct granularity
for instant durations. Indeed, if instants take too much time,
reactivity of the whole system could be lost. This problem is actually
central in the context of
Loft.
Granularity of unlinked threads is not under control of the user, but
only of the scheduler. Of course, standard methods can then be used,
for example based on the use of the
yield
primitive, but
with all their standard problems too. One does not consider this case
in more details here.
With non-native threads, users can have some control on the
granularity of instants, based on cooperation points either explicit
(
cooperate
) or implicit (
await
). For example,
consider the following atom which calls a function f 2*N times during
the same instant:
{
int i;
for (i=0; i<2*N; i++) {
f ();
}
}
It can be split into 2 instants, just by introducing a cooperate
statement:
{
int i;
for (i=0; i<N; i++) {
f ();
}
}
cooperate;
{
int i;
for (i=0; i<N; i++) {
f ();
}
}
Then, the complete execution of the series of calls takes two instants
instead of one, leaving the other threads the possibility to execute
after the first half of the calls has been performed. At the extreme,
f can be called once by instant:
repeat (2*N) do
f ();
cooperate
end
Introducing supplementary cooperation points in a program will give
others programs more opportunities to execute. The point however is
that the introduction of supplementary cooperating points must remain
compatible with the use of events. Remember indeed that events are not
persistent data and are automatically reset as absent at the beginning
of each new instant.
A major use of events is to avoid busy-waiting on a condition (just
like condition-variables with standard pthreads). For example,
consider a system where one thread sets a boolean variable
X
to true to trigger a processing done by another thread. A possibility
is that the processing thread tests
X
at each instant:
module processing_thread ()
while (!X) do cooperate end;
....
end
Note that no race condition on X
can occur, as the framework
is cooperative. This solution is however unsatisfactory because the
processing thread tests X
at each instant while it is false,
which is of course inefficient.
The correct solution is to wait for an event generated to trigger the
processing thread, which then becomes:
module processing_thread ()
await (X);
....
end
Now, the processing thread does not consume any resources while
waiting for X
. As a side effect of using events, one is sure
that the processing starts at the very same instant X
is
generated. This is to be compared with the previous solution using a
boolean variable, where the processing is delayed to the next instant
if X
is tested before being set to true.
A rather delicate point, using events, is that they may be awaited or
generated at the wrong moment. For example, let us return to the
previous example. If the processing thread is created after the
generation of the event, then processing will not be done as the
thread will never receive the event. For example, the following code
is not correct (more precisely, the processing will not be done
immediately despite the generation of
X
):
generate (X);
processing_thread_create ();
Indeed, the processing thread will only be created at the next
instant, and thus will not see X
as being present. Actually,
the generation of X
has been lost.
As there is no lock, deadlocks, considered in the original meaning
of the term, cannot occur. However, it is possible to make some analogy
with situations in
Loft.
Actually, the closest analogy with a deadlock
would be a situation made of two threads, each of them preventing the
other from closing the current instant. This, would be the case if one
thread tests the variable
X
with the atom:
{
while (!X) {}
Y=1;
}
while the other tests the variable Y
with:
{
while (!Y) {}
X=1;
}
This is however an incorrect situation because the two atoms
considered are actually non-terminating. Provided that all atoms
always terminate, such a situation should never appear.
Another analogy with a deadlock would be a situation where each thread
prevents the other from passing a certain point of its execution.
This is the case if one thread is executing the non-atomic
instruction:
while (!X) do cooperate; end
{Y=1;}
while the other executes:
while (!Y) do cooperate; end
{X=1;}
Then, the two threads remain blocked, but this does not in any way
prevent the passing of instants. Using events instead of boolean
variables, avoids busy-waiting. The two instructions
would be:
await (X);
generate (Y)
and:
await (Y);
generate (X)
Actually, such a situation could be called a "communication
deadlock". Real deadlocks are at implementation level: they appear
because some data are to be protected by locks. Real deadlock
introduces the risk that a data become inaccessible forever because
some lock used to protect it is never released. This problem
concerning data does not appear with communication deadlocks which
can be seen as occuring at a higher, more "logical", level.
Architectural design is of major concern in
Loft. Indeed, the
language allows programmers to decompose a system in several
sub-systems, implemented as distinct schedulers. These schedulers can
either be run synchronously or asynchronously.
4.4.1 Synchronous Schedulers
|
Synchronous schedulers are schedulers whose instants are related. Many
different situations exist in which the presence of several
synchronous schedulers can be useful.
Let us, for example, consider graphical objects which should be
painted only after they have all moved. A simple solution consists in
defining two threads for each object, one for moving it and one for
painting it, and to create all the moving threads before all the
painting ones. However, this forbids the dynamic creation of new
objects, as their moving threads should be executed after the painting
threads of the old objects. The solution is to use two schedulers,
one in which moving threads are linked, and the other for painting
threads:
scheduler_t move_scheduler, paint_scheduler;
main ()
{
move_scheduler = scheduler_create ();
paint_scheduler = scheduler_create ();
....
while (1) {
scheduler_react (move_scheduler);
scheduler_react (paint_scheduler);
}
}
The function to create an object is:
void object_creation (graphical_object obj)
{
move_create_in (move_scheduler,obj);
paint_create_in (paint_scheduler,obj);
}
One can also imagine many situations where several instants of a
scheduler should actually correspond to one instant of
another. Loft gives the possibility to code these various
situations very easily.
4.4.2 Asynchronous Schedulers
|
Schedulers that are run asynchronously define reactive areas in which
threads share the same instants and the same events. Asynchronous
schedulers are especially interesting in the context of
multiprocessing, as each of them can be run by a dedicated native
thread, on a distinct processor. The example of the sieve in section
Lucky Prime Numbers8.1.2 shows how this can be
achieved quite simply.
However, one has to be very cautious using unlinked threads, as, then,
all the problems of standard threads come to the foreground. For
example, unlinked threads should never access data used by linked
threads, because these are usually not protected. Data shared by two
unlinked threads should be protected by locks exactly as if they were
standard threads (actually, they
are standard threads!).
The discipline that should be obeyed using unlinked threads has to be
very strict, and it seems good practice to adopt the following rules:
- any shared data should only be accessed by threads
linked to a same scheduler, in charge of the data. In particular, any
access to a shared data by an unlinked thread should be forbidden. The
unlinked thread must first link to the scheduler in charge of the data,
before accessing it.
- any communication between distinct schedulers running
asynchronously should use either some event, or a native thread acting
as a go-between for transferring data.
Of course, to decompose a system in several areas can be a
difficult question, with several possible different answers.
Unnecessary uses of native modules should be avoided. Indeed, each
instance of a native module requires a native thread to execute. This
is required because such an instance can be unlinked, and in this case
it behaves exactly as a standard pthread.
Instances of non-native modules do not need the full power of a native
thread to execute and they can be simply implemented as automata. The
reason is that, when making a thread react, the stack state is the
same at the beginning and at end of the reaction. Actually, the stack
is only needed for atoms which by definition should always terminate
instantly. Thus, only the state of the execution and the local
variables have to be preserved between instants, which can be done
using the heap. Thus, instances of non-native modules can be simply
run by the thread of the scheduler to which they are linked.
Actually, there is no reason why a module in which the
unlink
instruction is never used should be native.
One also have to keep in mind that there exist partial implementations
of
Loft which do not have any support for native modules (see
Direct Implementation7.5). This is for example the case when
the targets are embedded systems with limited resources (memory and CPU).
5 Semantics
One considers the formal semantics of the cooperative part of
Loft. To simplify, one supposes that there is only one scheduler to
which all the created threads are linked. One first considers the
basic semantics, called
Rewrite, then two variants of it, called
Replace and
Optim.
Replace puts the focus on memory use by
limiting the number of intermediate instructions that are created
during execution of an instant. The
Optim variant puts the focus on
CPU usage, by avoiding busy-waiting on absent events.
The
Rewrite variant is defined in the first 3 sections. The basics
of the semantics are introduced in
Semantics Basics5.1.
Then, one describes in
Instructions in REWRITE5.2 the
semantics of expressions and of instructions. The semantics of threads
and schedulers is defined in
Threads and Schedulers in REWRITE5.3.
The
Replace variant is considered in
REPLACE Semantics5.4 and the
Optim variant in
OPTIM Semantics5.5.
Finally,
Conclusion7.6 concludes the
formal description of the language.
The semantics is expressed in the
structural operational
semantics (SOS) format defined in [
30]. First, one
defines the data structures corresponding to schedulers and
threads; then, an intuitive description of the semantics is given
to make the reading of the rest of the semantics easier; then, one
defines the various formats of the rewriting rules; finally, the
BNF syntax of the instructions is described.
A scheduler
sched
is a structure made of the following
components:
sched.linked
is the list of threads linked
to the scheduler.
sched.current
is the thread currently executed.
sched.events
is the set of generated events.
sched.values
is a table that associates to events
the list of values generated during the current instant.
sched.eoi
is a boolean flag that indicates the end
of the current instant.
sched.move
is a boolean flag that indicates the
need to postpone the end of the current instant.
sched.to_broadcast
is the list of events and of
couples (event,value) that must be generated at the beginning of the
next instant.
sched.orders
is the list of orders (stop, suspend,
resume, or link a thread) that must be executed at the beginning of
the next instant. Orders are build with the function order
which takes the kind of order as first parameter: STOP_ORDER,
SUSP_ORDER, RESUME_ORDER, or LINK_ORDER.
sched.to_finalize
is the list of threads that
must be finalized.
One adopts the following notations:
sched<f = v>
changes to v
the
value of the field f
of sched
. For example, s<eoi = true>
is equal to s
, except that s.eoi
is set to true.
l += v
adds the value v
to the list l
and sched<f += v>
adds v
to the field f
(which must be a list or a set) of sched
. In the same
way, l -= v
removes the value v
from l
and sched<f -= v>
removes v
from f
in
sched
.
- The list (or table) with no element in it is noted
Ø. The
n
th element of a list l
is noted l(n)
, and the number of element of l
is
noted length(l)
.
A thread
th
is a structure made of the following components:
th.run
is the instruction run by the thread.
th.finalizer
is the atomic instruction executed when
the thread is stopped.
th.suspended
is a boolean which is true when the
thread is suspended.
th.state
is the execution state of the thread:
- TERM: the thread is completely terminated.
- COOP: the thread cooperates.
- CONT: the thread needs to be continued during the
current instant. This is the initial state, and the one of a
thread waiting for an event which is not present, or trying to get
a non-available event value.
th.signal
is an event used to signal the final
termination of the thread.
th.code
is the return code set by some non-atomic
instructions. There are 3 possible codes:
- OK: the instruction has completed normally.
- ETIMEOUT: the limit has been reached (for
await
and join
).
- ENEXT: not available value (for
get_value
).
th.called
denotes the thread created and executed when a
run
non-atomic instruction is. Termination of this thread
is awaited by the calling thread.
An instance of a module is a new structure
th
. At creation,
th
is such that:
th.suspended
is false.
th.state
is CONT.
th.signal
is a new event.
th.run
is a sequence made of the module body
followed by a generation of th.signal
.
th.finalizer
is the module finalizer.
th.code
is OK.
th.called
is NULL.
5.1.3 Overview of the Semantics
|
In this section, one gives an intuitive overview of the semantics.
Let's consider a scheduler
sched
and an environment
env
. The final goal of the semantics is to show without ambiguity
how
sched
and
env
evolve as time passes.
Actually, evolution is decomposed in a sequence of instants; the
first instant is:
which means that, starting from
sched
and
env
, one
gets
sched1
and
env1
at the end
of the instant.
Threads are not immediately started as soon as they are created in
order to avoid interferences with currently running threads. Actually,
all threads created during one instant are stored in
sched.to_link
, and are effectively started at the beginning of the
next instant. In the same way, events broadcast from the outside
world are stored in
sched.to_broadcast
and are incorporated
in the system at the beginning of the next instant. Stop, suspend, and
resume orders are processed in the same way. In case of stop orders,
finalizers of stopped threads are executed. Thus, evolution of the
scheduler as time passes is a sequence of the form:
|
sched1', env1' | => | sched2, env2 |
|
sched2', env2' | => | sched3, env3 |
|
... |
where
schedi'
incorporates in
schedi
all orders collected during instant
i
, and
envi'
is obtained from
envi
after
execution of finalizers of stopped threads.
Now, let's decompose instants: each instant consists in cyclically
running
phases where all the threads are run in turn. Running
a thread
t
means to resume execution of its associated
instruction
t.run
. A thread which needs to be continued is
a non-suspended thread that has not already cooperated during the
instant. The instant is over when there is no such thread.
Threads which are started are placed in the
linked
list of
the scheduler. At the beginning of each instant, threads that are
terminated (TERM) and threads stopped during the previous instant
are removed from
linked
.
Note that the order in which the threads are executed always remains
the same during an instant. Note also that suspensions and resumptions
of threads never change the order in which the other threads are
executed.
The scheduler behavior consists in cyclically running the threads in
linked
while there exist threads needing to be continued.
The number of phases depends on the events generated. For example,
consider a situation where
linked = (t1,t2)
and
t1
awaits the event
e
generated by
t2
. Then, after
t2
execution, a new phase is needed to resume
t1
. Otherwise,
t1
would not consider
e
as present despite the fact that it is generated; in such a
situation, one could not say that
e
is broadcast. Note that
a third phase would be necessary if, after generating
e
,
t2
is waiting for another event generated by
t1
. Actually, new phases are needed until one reaches a
situation where no new event is generated; then, the end of the
current instant can be safely decided.
The scheduler uses two boolean flags to manage instants:
- The flag
move
is set each time a new event
is generated; it is reset by the scheduler at the beginning of
each phase. The scheduler does not decide the end of the current
instant when move
is true, to give threads waiting for new
generated events the possibility to react to them.
- The flag
eoi
is set by the scheduler when the end of
the current instant is decided; it is reset at the beginning of
each new instant. It is used to inform threads which are waiting for
events that these events are definitely absent. Then, the threads can
cooperate, and the next instant can safely take place.
Awaiting an event which is absent blocks a thread up to the end of the
current instant. This forbids immediate (that is, during the same
instant) reaction to the absence of an event; reaction, if any, is
thus postponed to next instant. This is important to avoid situations
where one could react to the absence of an event during one instant by
generating it during the same very instant, which would contradict the
fact that the event is absent. These kind of contradictions, known as
"causality problems" in synchronous languages [
19],
do not exist here. In the same way, trying to get a not available
generated value blocks a thread up to the end of the current instant.
5.1.4 Rewriting Rules Formats
|
In this section, one gives the various formats of the rewriting rules
used in the semantics.
Instructions
The format of the rewritings of instructions is:
| s | |
inst, sched, env | → | inst', sched', env' |
In such a rewriting, the initial instruction is
inst
, the
scheduler
sched
, and the environment of data
env
. As a result, the rewriting produces a status
s
, a
new instruction
inst'
, a new scheduler
sched'
and
a new environment
env'
.
There are 4 possible status:
- TERM: the instruction is terminated.
- COOP: the instruction cooperates.
- CONT: the instruction needs to be continued. It is the
status of an instruction waiting for an event which is not
present, or trying to get a non-available event value.
- RETURN: a return instruction has been reached.
Expressions
The notation for evaluating expressions is:
exp | |= | sched, env | → | val, sched', env' |
The evaluation of
exp
, in the scheduler
sched
and
in the environment
env
, produces the value
val
,
and leads to the new scheduler
sched'
and to the new
environment
env'
.
Several expressions can be evaluated in one step. The notation is:
(exp0,...,expn) | |= | sched, env | → | (val0,...,valn), sched', env' |
The evaluation of
expi
produces the value
vali
. The final scheduler is
sched'
and the
final environment is
env'
. The order in which the
expressions are evaluated is left unspecified.
Threads and Schedulers
The micro-step execution of a thread is noted:
| | |
th, sched, env | → | th', sched', env' |
At scheduler level, the execution of the thread number
n
is
noted:
| | |
sched, n, env | → | sched', env' |
A phase of the scheduler consists in executing in turn all the threads
linked in it. The format is:
Finally, an instant of the scheduler consists in cyclicly executing
phases up to a point where no thread needs to be continued. The format
is:
5.1.5 Syntax of Instructions
|
One gives the syntax of instructions considered in the semantics. The
syntax is very close to the real syntax of programs. However, we feel
free to make some simplifications when there is no deep impact over
the semantics (for example, in
if
instructions, the
else
branch is not optional). Some extensions are also included in
the syntax to ease the writing of the semantics rules (actually, these
extensions never appear in programs). The syntax of instructions
is:
inst =
inst ... inst
| atomic
| non-atomic
Atomic instructions execute completely in one single instant:
atomic =
{<c-code>}
| stop (exp)
| suspend (exp)
| resume (exp)
| generate (exp)
| generate (exp, exp)
Some special expressions are added to the ones of C:
exp =
special_exp
| <c-exp>
special_exp =
<module>_create (exp, ..., exp)
| return_code ()
| self ()
Execution of non-atomic instructions can last several instants:
non-atomic =
cooperate
| halt
| return
| await (exp)
| await (exp, exp)
| join (exp)
| join (exp,exp)
| get_value (exp, exp, exp)
| if (exp) then inst else inst end
| while (exp) do inst end
| repeat (exp) do inst end
| run <module> (exp, ..., exp)
| extensions
Some extensions are defined. They do not appear in real programs, but
are introduced as intermediary instructions:
extensions =
nothing
| repeat* (n, inst) do inst end
| while* (exp, inst) do inst end
| await* (event)
| await* (event, n)
| get_value* (event, n, v)
5.2 Instructions in REWRITE
|
One now considers the evaluation of expressions and the execution of
instructions in the basic semantics
Rewrite.
Expressions are evaluated in the standard way of C. However,
evaluation of some special expressions has an effect on the scheduler
(for example, the creation of threads) or are using it. Thus, the
general form of evaluation of expressions is:
exp | |= | sched, env | → | value, sched', env' |
In this rewriting,
value
is the value obtained by evaluating
exp
. The resulting scheduler is
sched'
and the
resulting environment is
env'
.
One now gives the semantics of special expressions.
Creation of Threads
New threads instances of a module
m
are created by calling
the function
m_create
.
Rule 1: Create |
(e1,...,en) | |= | sched, env | → | (v1,...,vn), sched', env' |
|
|
m_create(e1,...,en) | |= | sched, env | → | t, sched'', env' |
|
sched'' = sched'<orders += order (LINK_ORDER,t)> |
|
|
The thread
t
is a new thread created from the module
m
, with
(v1,...,vn)
as arguments. It
will be started only at the next instant.
Thread Code
A call to
return_code
returns the code of the current
thread.
Rule 2: Return Code |
return_code() | |= | sched, env | → | sched.current.code, sched, env |
|
|
Self
A call to
self
returns the executing thread which is
actually the current thread of the scheduler.
Rule 3: Self |
self() | |= | sched, env | → | sched.current, sched, env |
|
|
5.2.2 Sequence of Instructions
|
The semantics of sequences is contained in three rules. In the first
one, the execution of the first element of the sequence is not
terminated and thus only this element is changed in the resulting
instruction:
Rule 4: Sequence - Not Term |
| s | |
i0, sched, env | → | i'0, sched', env' |
|
s ≠ TERM |
|
| s | |
i0...in, sched, env | → | i'0...in, sched', env' |
|
|
|
The execution goes to the next component as soon as the previous one
terminates:
Rule 5: Sequence - Term |
| TERM | |
i0, sched, env | → | i'0, sched', env' |
|
| s | |
i1...in, sched', env' | → | j, sched'', env'' |
|
|
| s | |
i0...in, sched, env | → | j, sched'', env'' |
|
|
|
Finally, one has also to consider for consistency reason the
degenerated case where the list is empty:
Rule 6: Sequence - Empty |
| TERM | |
Ø, sched, env | → | Ø, sched, env |
|
|
5.2.3 Atomic Instructions
|
C Code
Standard C code enclosed in "{" and "}" is an atomic instruction,
executed in the standard way. This leads to rewritings of the form:
| TERM | |
{<c-code>}, sched, env | → | nothing, sched', env' |
Note that the scheduler can also be changed by execution of standard C
code because special expressions can appear in it.
Stop, Suspend, Resume
Execution of
stop(t)
adds the order to stop
t
to
the list of orders.
Rule 7: Stop |
e | |= | sched, env | → | t, sched', env' |
|
|
| TERM | |
stop(e), sched, env | → | nothing, sched'', env' |
|
sched'' = sched'<orders += order (STOP_ORDER,t)> |
|
|
Execution of
suspend(t)
adds the order to suspend
t
to the list of orders.
Rule 8: Suspend |
e | |= | sched, env | → | t, sched', env' |
|
|
| TERM | |
suspend(e), sched, env | → | nothing, sched'', env' |
|
sched'' = sched'<orders += order (SUSPEND_ORDER,t)> |
|
|
Execution of
resume(t)
adds the order to resume
t
to the list of orders.
Rule 9: Resume |
e | |= | sched, env | → | t, sched', env' |
|
|
| TERM | |
resume(e), sched, env | → | nothing, sched'', env' |
|
sched'' = sched'<orders += order (RESUME_ORDER,t)> |
|
|
Generate
A
generate
statement adds the generated event in the event
set of the scheduler, and immediately terminates; moreover, the
flag
move
is set to indicate that something new happened:
Rule 10: Generate - Pure |
e | |= | sched, env | → | v, sched', env' |
|
|
| TERM | |
generate(e), sched, env | → | nothing, sched'', env' |
|
sched'' = sched'<move = true><events += v> |
|
|
Moreover, when a value is associated to the generation, it is added at
the end of the table of values associated to the event.
Rule 11: Generate - Valued |
(e1, e2) | |= | sched, env | → | (v1,v2), sched1, env1 |
|
| TERM | |
generate(v1), sched1, env1 | → | nothing, sched2, env2 |
|
|
| TERM | |
generate(e1, e2), sched, env | → | nothing, sched3, env2 |
|
sched3 = sched2<val(v1) += v2> |
|
|
5.2.4 Non-Atomic Instructions
|
Cooperate, Halt, Nothing, Return
The
cooperate
statement finishes execution for the current
instant and returns COOP. Nothing remains to be done at the next
instant (thus, execution at the next instant will resume in sequence
from the
cooperate
instruction):
Rule 12: Cooperate |
| COOP | |
cooperate, sched, env | → | nothing, sched, env |
|
|
The
halt
instruction never terminates:
Rule 13: Halt |
| COOP | |
halt, sched, env | → | halt, sched, env |
|
|
The following rule is introduced for completeness; actually, the
nothing
instruction does not appear in real programs.
Rule 14: Nothing |
| TERM | |
nothing, sched, env | → | nothing, sched, env |
|
|
The
return
instruction forces the termination of the
executing thread by returning the status RETURN. It also generates
the event which signals the thread termination:
Rule 15: Return |
| TERM | |
generate (sched.current.signal), sched, env | → | nothing, sched', env' |
|
|
| RETURN | |
return, sched, env | → | nothing, sched', env' |
|
|
|
Await
An instruction
await
first determines which event is awaited
and then behaves as
await*
:
Rule 16: Await |
e | |= | sched, env | → | v, sched', env' |
|
| s | |
await* (v), sched', env' | → | i, sched'', env'' |
|
|
| s | |
await (e), sched, env | → | i, sched'', env'' |
|
|
|
The waiting terminates when the awaited event is present:
Rule 17: Await - Present |
v ∈ sched.events |
|
| TERM | |
await* (v), sched, env | → | nothing, sched, env |
|
|
|
Execution needs to be continued during the same instant when the
awaited event is unknown (neither present nor absent):
Rule 18: Await - Unknown |
v not in sched.events |
sched.eoi = false |
|
| CONT | |
await* (v), sched, env | → | await* (v), sched, env |
|
|
|
The
await
instruction cooperates when the awaited event is
absent. The absence is detected because the event is not generated
while the end of the current instant has been decided:
Rule 19: Await - Absent |
v not in sched.events |
sched.eoi = true |
|
| COOP | |
await* (v), sched, env | → | await* (v), sched, env |
|
|
|
Limited Await
With a limited
await
, the waiting is limited to a fixed
number of instants:
Rule 20: Limited Await |
(e1,e2) | |= | sched, env | → | (v,n), sched', env' |
|
| s | |
await* (v,n), sched', env' | → | i, sched'', env'' |
|
|
| s | |
await (e1,e2), sched, env | → | i, sched'', env'' |
|
|
|
Rules for the limited
await
are very similar to the ones of the
standard
await
, except that tests for the delay are
introduced.
The instruction immediately terminates when the delay is over, and in
this case, the return code of the executing thread is set to ETIMEOUT.
Rule 21: Limited Await - Over |
n ≤ 0 |
|
| TERM | |
await* (v,n), sched, env | → | nothing, sched', env |
|
sched' = sched<current.code = ETIMEOUT> |
|
|
The instruction also immediately terminates when the awaited event is
present; in this case, the return code of the executing thread is
set OK.
Rule 22: Limited Await - Present |
n > 0 |
v ∈ sched.events |
|
| TERM | |
await* (v,n), sched, env | → | nothing, sched', env |
|
sched' = sched<current.code = OK> |
|
|
The executing thread needs to be continued during the current instant while
the awaited is unknown.
Rule 23: Limited Await - Unknown |
n > 0 |
v not in sched.events |
sched.eoi = false |
|
| CONT | |
await* (v,n), sched, env | → | await* (v,n), sched, env |
|
|
|
The executing thread cooperates when the awaited event is absent. This
can only happen at the end of the instant. In this case, the waiting
delay is decremented.
Rule 24: Limited Await - Absent |
n > 0 |
v not in sched.events |
sched.eoi = true |
|
| COOP | |
await* (v,n), sched, env | → | await* (v,n-1), sched, env |
|
|
|
An important point must be noticed: the previous rules describe a
busy-waiting mechanism: actually, the
await*
instruction
should activated at each micro-step of each instant, until the awaited
event is present (except of course during instants where the executing
thread is suspended).
If
The
if
instruction chooses a branch accordingly to the value
of a boolean expression, and it executes it up to completion (which
may take several instants; note that the expression is evaluated
only once):
Rule 25: If - True |
e | |= | sched, env | → | true, sched', env' |
|
| s | |
i, sched', env' | → | i', sched'', env'' |
|
|
| s | |
if (e) then i else j end, sched, env | → | i', sched'', env'' |
|
|
|
Rule 26: If - False |
e | |= | sched, env | → | false, sched', env' |
|
| s | |
j, sched', env' | → | j', sched'', env'' |
|
|
| s | |
if (e) then i else j end, sched, env | → | j', sched'', env'' |
|
|
|
While
A
while
loop first evaluates the associated boolean
expression and terminates if the returned value is false:
Rule 27: While - False |
e | |= | sched, env | → | false, sched', env' |
|
|
| TERM | |
while (e) do i end, sched, env | → | nothing, sched', env' |
|
|
|
Otherwise, the loop behaves as the auxiliary
while*
loop
defined below:
Rule 28: While - True |
e | |= | sched, env | → | true, sched', env' |
|
j = while* (e, i) do i end |
| s | |
j, sched', env' | → | j', sched'', env'' |
|
|
| s | |
while (e) do i end, sched, env | → | j', sched'', env'' |
|
|
|
The auxiliary
while*
instruction runs the loop body up to
termination, without re-evaluating the boolean condition; then, it
rewrites as a standard
while
loop which evaluates the
condition to determine if the body has to be run one more time.
Rule 29: While - Not Term |
| s | |
i, sched, env | → | i', sched', env' |
|
s ≠ TERM |
|
| s | |
while* (e, init) do i end, sched, env | → | j, sched', env' |
|
j = while* (e, init) do i' end |
|
|
Rule 30: While - Term |
| TERM | |
i, sched, env | → | i', sched', env' |
|
| s | |
while (e) do init end, sched', env' | → | j, sched'', env'' |
|
|
| s | |
while* (e, init) do i end, sched, env | → | j, sched'', env'' |
|
|
|
Repeat
A
repeat
loop executes its body a limited number of
times. This number is the value of an expression which is first
evaluated:
Rule 31: Repeat |
e | |= | sched, env | → | n, sched', env' |
|
j = repeat* (n, i) do i end |
| s | |
j, sched', env' | → | j', sched'', env'' |
|
|
| s | |
repeat (e) do i end, sched, env | → | j', sched'', env'' |
|
|
|
The loop terminates when the number of cycles to be performed is negative:
Rule 32: Repeat - Negative |
n ≤ 0 |
|
| TERM | |
repeat* (n, init) do i end, sched, env | → | nothing, sched, env |
|
|
|
Otherwise, the loop behaves as its body when it does not terminates:
Rule 33: Repeat - Not Term |
0 < n |
| s | |
i, sched, env | → | i', sched', env' |
|
s ≠ TERM |
|
| s | |
repeat* (n, init) do i end, sched, env | → | j, sched', env' |
|
j = repeat* (n, init) do i' end |
|
|
When the body terminates, the number of cycles to be performed is
decremented and the loop is immediately restarted.
Rule 34: Repeat - Term |
0 < n |
| TERM | |
i, sched, env | → | i', sched', env' |
|
j = repeat* (n-1, init) do init end |
| s | |
j, sched', env' | → | j', sched'', env'' |
|
|
| s | |
repeat* (n, init) do i end, sched, env | → | j', sched'', env'' |
|
|
|
Get Value
The
get_value
instruction returns the with index
n
(it it exists) associated to a given event. First, three
expressions are evaluated: the event, the index
n
, and the
place where the result should be placed.
Rule 35: Get Value |
(e1,e2,e3) | |= | sched, env | → | (v,n,r), sched', env' |
|
| s | |
get_value* (v,n,r), sched', env' | → | i, sched'', env'' |
|
|
| s | |
get_value (e1,e2,e3), sched, env | → | i, sched'', env'' |
|
|
|
The instruction immediately terminates if the value is available. In
this case, the value is assigned to the result variable and the return
code of the executing thread is set to OK.
Rule 36: Get Value - Available |
length(sched.values(v)) > n |
|
| TERM | |
get_value* (v,n,r), sched, env | → | nothing, sched', env' |
|
sched' = sched<current.code = OK> |
env' = env<r = (sched.values(v))(n)> |
|
|
The instruction has to be continued if the value is not available
while the current instant is not terminated:
Rule 37: Get Value - Unknown |
length(sched.values(v)) ≤ n |
env.eoi = false |
|
| CONT | |
get_value* (v,n,r), sched, env | → | get_value* (v,n,r), sched, env |
|
|
|
The instruction cooperates if the value is still not available at the
end of the current instant. In this case, the return code is set to
ENEXT which will indicate at the next instant that the value has not
been got.
Rule 38: Get Value - Not Available |
length(sched.values(v)) ≤ n |
env.eoi = true |
|
| COOP | |
get_value* (v,n,r), sched, env | → | nothing, sched', env |
|
sched' = sched<current.code = ENEXT> |
|
|
Join
Nothing is done if the thread to be joined is already terminated:
Rule 39: Join - Terminated |
e | |= | sched, env | → | t, sched', env' |
|
t.state = TERM |
|
| TERM | |
join (e), sched, env | → | nothing, sched', env' |
|
|
|
Otherwise, the instruction waits for the event which is generated when
the thread terminates (see
Threads7.2.3):
Rule 40: Join - Not Terminated |
e | |= | sched, env | → | t, sched', env' |
|
t.state ≠ TERM |
| s | |
await* (t.signal), sched', env' | → | i, sched'', env'' |
|
|
| s | |
join (e), sched, env | → | i, sched'', env'' |
|
|
|
Limited Join
Limited
join
is similar to
join
, except that the
delay is also tested for termination. The instruction immediately
terminates if the thread to be joined is already terminated. In this
case, the return code of the executing thread is set to
OK.
Rule 41: Limited Join |
(e1,e2) | |= | sched, env | → | (t,n), sched', env' |
|
t.state = TERM |
|
| TERM | |
join (e1, e1), sched, env | → | nothing, sched'', env' |
|
sched'' = sched'<current.code = OK> |
|
|
If the thread to be joined is not already terminated, then the termination
event is awaited.
Rule 42: Limited Join - Waiting |
(e1,e2) | |= | sched, env | → | (t,n), sched', env' |
|
th.state ≠ TERM |
| s | |
await* (t.signal, n), sched', env' | → | i, sched'', env'' |
|
|
| s | |
join (e1, e2), sched, env | → | i, sched'', env'' |
|
|
|
Run
To run a module means to create a new instance of the module and to
join it. Moreover, the instance is stored in the component
called
of the calling thread in order to be stopped when the
calling thread is.
Rule 43: Run |
m_create (e1,..., en) | |= | sched, env | → | t, sched', env' |
|
sched1 = sched'<current.called = t> |
| s | |
join (t), sched1,env' | → | i, sched2, env2 |
|
|
| s | |
run m (e1,..., en), sched, env | → | i, sched2, env2 |
|
|
|
5.3 Threads and Schedulers in REWRITE
|
The semantics of threads and schedulers in
Rewrite is now
considered. One defines some auxiliary functions using a very
simple self-explanatory syntax.
First, one defines the firing of a thread. A thread is
fireable if it is not suspended and if its state is CONT:
fireable (thread)
return thread.suspended = false and thread.state = CONT
To fire a fireable thread means to rewrite its body and to force the
state to TERM if the rewriting produces RETURN.
Rule 44: Fireable Thread |
fireable (t) = true |
| s | |
t.run, sched, env | → | i, sched', env' |
|
|
| | |
t, sched, env | → | t', sched', env' |
|
t' = t<run = i>
<state = (if s = RETURN then TERM else s)> |
|
|
To fire a thread which is not fireable has no effect.
Rule 45: Not Fireable Thread |
fireable (t) = false |
|
| | |
t, sched, env | → | t, sched, env |
|
|
|
During a phase, the scheduler performs a sequence of steps in which it
fires all the threads linked to it. The n
th step consists in
firing the n
th thread, considering it as the current thread of
the scheduler:
Rule 46: Nth Thread |
length(sched.linked) > n |
t = sched.linked(n) |
| | |
t, sched<current = t>, env | → | t', sched', env' |
|
|
| | |
sched, n, env | → | sched'<linked(n) = t'>, env' |
|
|
|
One now switches to the rules describing phases. The first rule
corresponds to an empty scheduler. In this case, nothing is done
during the phase:
Rule 47: Scheduler - Empty |
|
|
Otherwise, when the scheduler is not empty, all the threads are fired
in turn:
Rule 48: Scheduler - Not Empty |
length(sched.linked) = n+1 |
| | |
sched, 0, env | → | sched1, env1 |
|
| | |
sched1, 1, env1 | → | sched2, env2 |
|
... |
| | |
schedn, n, envn | → | sched', env' |
|
|
|
|
|
Note that threads are considered in a fixed order, which is the order
of appearance in
linked
.
Instants are macro-steps of schedulers. Basically, an instant is a
sequence of phases leading to a stable situation, where no thread
remains to be continued. One first defines the function
stable
that determines if a scheduler is in a stable situation or
not.
stable (sched)
forall thread in sched.linked
if (fireable (thread)) return false
return true
The first rule corresponds to a phase leading to a stable
situation. In this case, execution for the current instant is over.
Rule 49: Instant - Finished |
|
|
If the phase leads to an unstable situation, then the scheduler
cyclically re-schedule all the threads. If no event is generated
during a phase (
move
is still false at the end of the
phase), then the scheduler decides that the current instant can be
finished by setting the flag
eoi
to true.
Rule 50: Instant - Not Finished |
|
stable (sched') = false |
make_decision(sched'), env' | => | sched'', env'' |
|
|
sched, env | => | sched'', env'' |
|
|
|
where the function
make_decision
is:
make_decision (sched)
if (sched.move = true) return sched<move = false>
else return sched<eoi = true>
5.3.4 Inter Instants Processing
|
One finally considers the chaining of instants and more specifically
the actions performed between two successive instants. Actually, these
actions are regrouped at beginnings of instants and are specified by
the function
start_instant
.
The call
start_instant(sched,env)
resets the two flags
eoi
and
move
of the scheduler and incorporates in
sched
the events and orders that have been produced (by the
program or by the external world) during the previous instant; then it
removes the terminated threads and resets those that are not suspended;
finally, it finalize the stopped threads:
start_instant (sched,env)
sched.eoi = false
sched.move = false
transfer_events (sched)
process_orders (sched,env)
reset_threads (sched)
finalize_stopped_threads (sched)
Transfer Events
The
transfer_events
functions sets the events and the table
of values associated to them, according to the broadcast orders.
transfer_events (sched)
sched.events = get_events (to_broadcast)
sched.val = get_values (to_broadcast)
sched.to_broadcast = Ø
The call get_events (x)
returns the list of events extracted
from x
; get_values (x)
returns the table build
from the couples (event,value) belonging to x
.
Note that previous elements of events
are lost; this is
coherent as events are always resets at the beginning of each new
instant.
Processing Orders
The various orders (others than the broadcast ones) are processed
in turn by the function
process_orders
.
process_orders (sched,env)
forall order in previous
if (order.kind = SUSPEND_ORDER)
order.thread.suspended = true
else if (order.kind = RESUME_ORDER)
order.thread.suspended = false
else if (order.kind = STOP_ORDER)
sched.to_finalize += thread
thread.state = TERM
else if (order.kind = LINK_ORDER)
sched.linked += order.thread
sched.orders = Ø
Stopped threads are places into the to_finalize
list in
order to be processed later. Thus, orders issued from finalizer
execution will be performed at the next instant, and not during the
current one. As a consequence, no instantaneous can occur from a
finalizer sending orders to the scheduler running it.
Reset Threads
The function
reset_threads
eliminates from
linked
the threads that are terminated or unlinked and sets to CONT the
status of the remaining threads that are not suspended.
reset_threads (sched)
forall thread in sched.linked
if (thread->state = TERM)
sched.linked -= thread
else if (thread->suspended = false and thread->state = COOP)
thread.state = CONT
Finalization
To finalize a thread means to set its state to TERM, to add the
signaling event in the list of events, to call the finalizer of the
thread, and finally to recursively finalize the called thread, if
there is one.
finalize (sched,env,thread)
if (thread.state = TERM) return
thread.state = TERM
sched.events += thread.signal
sched = sched'
env = env'
if (thread.called != NULL) finalize (sched,env,thread.called)
where sched',env'
results from the execution of thread.finalizer
in sched,env
:
| TERM | |
thread.finalizer, sched, env | → | nothing, sched', env' |
All threads which must be finalized are processed by the scheduler in
turn:
finalize_stopped_threads (sched)
forall thread in sched.to_finalize
finalize (thread)
sched.to_finalize = Ø
This ends with the inter-instant actions that are performed by
the scheduler at each beginning of instant.
Chaining Instants
Finally, the execution accross instants, starting from an environment
env
, of a scheduler
sched
is a chain of rewritings
of the form:
|
start_instant(sched1,env1),
| => | sched2, env2 |
|
start_instant(sched2,env2),
| => | sched3, env3 |
|
... |
This concludes the description of the semantics
Rewrite.
One now considers the
Replace semantics, which is a variant of the basic
semantics
Rewrite, designed to limit the creation of new instructions
during the rewriting process. Actually, no new instruction is created
by
Replace during rewritings, and to rewrite an instruction basically
means to change its state.
The first instruction to be considered is the sequence which receives
a state indicating which element is currently executed. One
introduces explicitely the sequence operator
seq
in order to
assign a state to it. The state is an integer which can be added as an
index to the sequence operator. The execution begins with the first
component (with index 0):
Rule 51: Sequence - Init |
| s | |
seq0 (i0, ..., in), sched, env | → | i, sched', env' |
|
|
| s | |
seq (i0, ..., in), sched, env | → | i, sched', env' |
|
|
|
Execution stays in the same component while it is not terminated:
Rule 52: Sequence - Stay |
| s | |
ik, sched, env | → | i'k, sched', env' |
|
s ≠ TERM |
|
| s | |
seqk (i0, ...,
ik, ...,
in), sched, env | → | j, sched', env' |
|
j = seqk (i0, ...,
i'k, ...,
in) |
|
|
Execution goes to the next component as soon as the current one terminates:
Rule 53: Sequence - Next |
| TERM | |
ik, sched, env | → | i'k, sched', env' |
|
| s | |
seqk+1 (i0, ...,
in), sched', env' | → | i'', sched'', env'' |
|
|
| s | |
seqk (i0, ..., in), sched, env | → | i'', sched'', env'' |
|
|
|
Finally, execution terminates when the last component does:
Rule 54: Sequence - Term |
| TERM | |
seqn+1 (i0, ..., in), sched, env | → | seqn+1 (i0, ..., in), sched, env |
|
|
5.4.2 Atomic Instructions
|
Atomic instructions have no state but always rewrite in themselves,
producing the status TERM. Actually, the execution which passes
through an atomic instruction is under control of the instruction
which encloses it (typically, a sequence). One just gives the rule for
C code:
| TERM | |
{<c-code>}, sched, env | → | {<c-code>}, sched', env' |
Note that now the atomic instruction does not anymore rewrite in
nothing
, but in the same instruction.
5.4.3 Non-Atomic Instructions
|
Non-atomic instructions have states. We only give the semantics of
some of them to show the introduction of states and the way they are
managed. The semantics of others instructions should be
straightforward.
Cooperate
The
cooperate
has a state which indicates if the control
has or not passed through the instruction. Actually, a simple tag is
needed. The
cooperate
statement returns COOP and sets
its state to return TERM at the next instant. One now has two
rules for
cooperate
:
Rule 55: Cooperate |
| COOP | |
cooperate, sched, env | → | cooperatetrue, sched, env |
|
|
Rule 56: Cooperate - Term |
| TERM | |
cooperatetrue, sched, env | → | cooperatetrue, sched, env |
|
|
Await
An
await
instruction has a state which stores the awaited event.
First, the awaited event is stored in the state:
Rule 57: Await (State) |
e | |= | sched, env | → | v, sched', env' |
|
| s | |
awaitv (e), sched', env' | → | i, sched'', env'' |
|
|
| s | |
await (e), sched, env | → | i, sched'', env'' |
|
|
|
The waiting terminates when the awaited event is present:
Rule 58: Await - Present (State) |
v ∈ sched.events |
|
| TERM | |
awaitv (e), sched, env | → | awaitv (e), sched, env |
|
|
|
Execution must be continued when the awaited event is unknown (neither
present nor absent):
Rule 59: Await - Unknown (State) |
v not in sched.events |
sched.eoi = false |
|
| CONT | |
awaitv (e), sched, env | → | awaitv (e), sched, env |
|
|
|
Finally, the instruction cooperates when the awaited event is absent:
Rule 60: Await - Absent (State) |
v not in sched.events |
sched.eoi = true |
|
| COOP | |
awaitv (e), sched, env | → | awaitv (e), sched, env |
|
|
|
Note that actually the difference with the corresponding rules of
Rewrite consists in the replacement by a state of the
await*
extension of the syntax.
If
The
if
instruction chooses a branch, according to the value
of a boolean expression, and executes it up to completion. A boolean
state is added to store which branch is chosen:
Rule 61: If |
e | |= | sched, env | → | v, sched', env' |
|
| s | |
ifv(e) then i else j end, sched', env' | → | k, sched'', env'' |
|
|
| s | |
if (e) then i else j end, sched, env | → | k, sched'', env'' |
|
|
|
Rule 62: If - True |
| s | |
i, sched, env | → | i', sched', env' |
|
|
| s | |
iftrue (e) then i else j end, sched, env | → | k, sched', env' |
|
k = iftrue (e) then i' else j end |
|
|
Rule 63: If - False |
| s | |
j, sched, env | → | j', sched', env' |
|
|
| s | |
iffalse (e) then i else j end, sched, env | → | k, sched', env' |
|
k = iffalse (e) then i else j' end |
|
|
While
One major difference of
Replace with
Rewrite concerns the loops. In
Replace, loops bodies are not duplicated as in
Rewrite (using the
extended instructions
repeat*
and
while*
), but are
reset at the end of each cycle, in order to be ready for the next
cycle. One only considers here
while
loops.
A
while
loop first evaluates the associated boolean
expression A state is introduced to store the fact that the expression
is evaluated or not. The state is actually a simple tag. The loop
terminates if the value of the expression is false:
Rule 64: While - False |
e | |= | sched, env | → | false, sched', env' |
|
|
| TERM | |
while (e) do i end, sched, env | → | while (e) do i end, sched', env' |
|
|
|
Otherwise, the loop sets its state:
Rule 65: While - True |
e | |= | sched, env | → | true, sched', env' |
|
| s | |
whiletrue (e) do i end, sched', env' | → | j, sched'', env'' |
|
|
| s | |
while (e) do i end, sched, env | → | j, sched'', env'' |
|
|
|
The body is run up to termination, without re-evaluating the boolean
condition.
Rule 66: While - Not Term |
| s | |
i, sched, env | → | i', sched', env' |
|
s ≠ TERM |
|
| s | |
whiletrue (e) do i end, sched, env | → | whiletrue (e) do i' end, sched', env' |
|
|
|
When the body terminates, it is reset using the
reset
function defined in
Reset Function5.4.4, in order to
be re-executed from the beginning:
Rule 67: While - Reset |
| TERM | |
i, sched, env | → | i', sched', env' |
|
| s | |
while (e) do reset(i) end, sched', env' | → | j, sched'', env'' |
|
|
| s | |
whiletrue (e) do i end, sched, env | → | j, sched'', env'' |
|
|
|
Join
The thread to be joined is first determined and stored in the
instruction state:
Rule 68: Join |
e | |= | sched, env | → | t, sched', env' |
|
| s | |
joint (e), sched', env' | → | i, sched'', env'' |
|
|
| s | |
join (e), sched, env | → | i, sched'', env'' |
|
|
|
Nothing is done if the thread is already terminated:
Rule 69: Join - Terminated |
t.status = TERM |
|
| TERM | |
joint (e), sched, env | → | joint (e), sched, env |
|
|
|
Otherwise, the instruction waits for the termination signal:
Rule 70: Join - Waiting |
t.status ≠ TERM |
| s | |
awaitt.signal (t.signal), sched, env | → | i, sched', env' |
|
|
| s | |
joint (e), sched, env | → | joint (e), sched', env' |
|
|
|
The
reset
function is used by loops to reset their body at
the beginning of each cycle. To reset a term actually means to
erase all indices appearing in it. The recursive definition of
reset
is straightforward and is left to the reader.
For example, consider:
repeat3 (5) do
seq0 (
awaite (event),
cooperatetrue
)
end
The result of reseting the previous instruction is:
repeat (5) do
seq(
await (event),
cooperate
)
end
A strong property holds concerning the reset
function: the
basis of an instruction, which is the instruction without any
indices in it, never changes during the rewriting process. That is,
one has:
| s | |
i, sched, env | → | i', sched', env' |
implies
reset(i) = reset(i').
In
Replace, the rewriting is thus just a change of instruction
states. This is the reason why it can be implemented more efficiently
than
Rewrite, without creation of new instructions during execution
(except of course when new threads are created).
One now considers a variant of the previous semantics
Replace,
called
Optim, in which threads are not busy-waiting for events. This
variant basically uses queues associated to events in which waiting
threads are stored. A special queue to store threads that are
submitted to deadlines (because they execute limited forms of
await
and
join
) is also defined. There are few changes
with the previous semantics and only the major ones are given in the
sequel.
First, one adds the new status WAIT to the possible ones returned
by the rewriting of instructions. This status corresponds to a
registration in a waiting queue. Second, one adds three new fields to
schedulers:
- A table, named
waiting
(initially empty),
which associates to an event the list of threads waiting for it.
- An integer
instant
which holds the current instant
number (initially 0).
- A list of threads, named
timer
(initially empty),
whose elements are the threads waiting for a deadline.
Finally, a new field
deadline
is added to threads to
store the maximal instant to be considered by limited instructions.
5.5.1 Atomic Instructions
|
Amongst the atomic instructions, only the
generate
instruction has to be changed. Indeed, in addition to what was
previously done, a
generate
statement awakes all the threads
waiting for the generated event:
Rule 71: Generate - Pure |
e | |= | sched, env | → | v, sched', env' |
|
|
| TERM | |
generate(e), sched, env | → | nothing, sched'', env' |
|
sched'' = awake_all (sched',v)<move = true><events += v> |
|
|
The
awake_all
function awakes all the threads waiting for
the generated event:
awake_all (sched, event)
forall thread in sched.waiting (event)
awake (sched,thread,event)
return sched
To awake a thread means to change its state to CONT and to
unregister it from the queues where it is registered. Of course, this
is for threads that are effectively waiting for the event and that are
not suspended.
awakeable (thread)
return thread.suspended = false and thread.state = WAIT
The awake
function is defined by:
awake (sched,thread,event)
if (awakeable (thread))
thread.state = CONT,
sched.waiting(event) -= thread
sched.timer -= thread
5.5.2 Non-Atomic Instructions
|
One only considers the changes concerning the
await
statement (changes for other instructions are straightforward).
Await
An
await
instruction first determines the awaited event and
stores it in its state:
Rule 72: Await |
e | |= | sched, env | → | v, sched', env' |
|
| s | |
awaitv(e), sched', env' | → | i, sched'', env'' |
|
|
| s | |
await (e), sched, env | → | i, sched'', env'' |
|
|
|
The waiting terminates when the awaited event is present:
Rule 73: Await - Present |
v ∈ sched.events |
|
| TERM | |
awaitv(e), sched, env | → | awaitv(e), sched, env |
|
|
|
The executing thread is put in the state WAIT and registered in the
waiting list when the awaited event is not present:
Rule 74: Await - Unknown |
v not in sched.events |
|
| WAIT | |
awaitv(e), sched, env | → | awaitv(e), sched', env |
|
sched' = sched<waiting(v) += sched.current> |
|
|
Limited Await
With a limited
await
, the waiting is limited to a fixed
number of instants:
Rule 75: Limited Await |
(e1,e2) | |= | sched, env | → | (v,n), sched', env' |
|
sched1 = sched'<current.deadline = n + sched.instant> |
| s | |
awaitv(e1,e2),
sched1, env' | → | i, sched2, env2 |
|
|
| s | |
await (e1,e2), sched, env | → | i, sched2, env2 |
|
|
|
Rule 76: Limited Await - Over |
sched.current.deadline > sched.instant |
|
| TERM | |
awaitv(e1,e2), sched, env | → | awaitv(e1,e2), sched', env |
|
sched' = sched<current.code = ETIMEOUT> |
|
|
Rule 77: Limited Await - Present |
sched.current.deadline ≤ sched.instant |
v ∈ sched.events |
|
| TERM | |
awaitv(e1,e2), sched, env | → | awaitv(e1,e2), sched', env |
|
sched' = sched<current.code = OK> |
|
|
When the waiting is effective, the current thread is registered in the
timer and in the waiting list associated to the awaited event:
Rule 78: Limited Await - Unknown |
t = sched.current |
t.deadline ≤ sched.instant |
v not in sched.events |
|
| WAIT | |
awaitv(e1,e2), sched, env | → | awaitv(e1,e2), sched', env |
|
sched' = sched<waiting(v) += t><timer += t> |
|
|
The change in the scheduler concerns the processing of the timer which is
introduced in
start_instant
:
start_instant (sched,env)
sched.eoi = false
sched.move = false
timer_processing (sched)
transfer_events (sched)
process_orders (sched,env)
reset_threads (sched)
finalize_stopped_threads (sched)
where the function timer_processing
is defined by:
timer_processing (sched)
sched.instant = sched.instant + 1
forall thread in sched.timer
if (awakeable (thread) and thread.deadline ≤ sched.instant)
awake (thread),
A point must be noted: to perform supplementary calls to the awake
function is not dangerous, in the sense that it cannot change
the meaning of a program. Indeed, to awake a thread which has not to
be awaken will leads to reintroduce it into the game, and does not
change the actions that the thread has to perform. This basically
means that spurious awake actions are not a problem. The consequence
is that one can choose to never unregister threads from the timer,
without risk of perturbating the execution later. This actually can be
useful for optimizing implementations.
One has precisely defined the semantics of the cooperative part of
Loft using rewriting rules.
In a first step, one has described the semantics
Rewrite with a set
of about 50 rules covering all the aspects of the cooperative part of
the language. The rules are simple to be understood one by one, and
one can be rather confident that they capture the essence of the
language. Actually, a way to make this confidence stronger is to
implement the semantics as is, and to execute tests covering the
largest possible set of various aspects of the language. This is of
course possible because the semantics is basically deterministic,
which allows one to implement it in a straightforward way.
A direct implementation of the semantics is however unsatisfactory
when considered in terms of efficiency for at least the following
reasons:
- A huge number of instructions appear as
intermediary results in the rules. From an implementation point of
view, they are wasting memory and should not be created at all.
- The threads are busy-waiting events, which waste CPU cycles.
Note that a thread which is waiting for an event tests for it at each
micro-step of each instant, up to the possible generation of it.
- Sequencing of instructions has to be explicitely executed, and
is not just implicit as for example in standard C.
- All the threads linked to a scheduler are considered in turn
during each phase. Thus, to determine which is the next thread to
execute is not constant in time, but linear in the number of threads
linked to the scheduler.
Actually, the main interest of
Rewrite is to serve as a reference for
teaching and implementation purposes. Two improved semantics are defined
which model more efficient implementations.
The variant
Replace of
Rewrite is defined to deal with the first
previous point. In it, the construction of new instructions is
replaced by the introduction of states in instructions. Instead of
creating a new instruction from an old one, one just changes the state.
Replace is transformed to avoid busy-waiting (previous point 2),
using queues in which threads that are waiting for events are
stored. This leads to the variant
Optim which is closer to an
efficient implementation of
Loft than the previous semantics.
The two last points are not covered at the semantics level but will be
considered later in section
Implementations7, when
implementation of
Loft is discussed.
;Concerning efficiency, it should however be noticed that all the
;threads linked to the scheduler are considered in turn during each
;phase. Thus, to determine which is the next thread to execute is not
;constant in time, but linear in the number of threads linked to the
;scheduler. This is certainly not the best way to proceed for
;implementing schedulers.
To end with semantics concerns, one should notice that the
non-cooperative part of
Loft is not considered by any of the
previous semantics. To cover the whole language, a way that should be
explored is the
chemical abstract machine paradigm[
10] to deal with nondeterministic aspects arising in this case.
6 FairThreads in C
FairThreads is an alternative to standard threads, which is very close to
Loft.
FairThreads is first presented; then some little examples are
given; finally,
FairThreads is compared with
Loft and with standard
POSIX threads.
The API of
FairThreads is overviewed in this section. All functions are
presented, but some details, such as error codes, are for simplicity
not considered here. The API is summarized in the annex.
Schedulers
FairThreads explicitly introduces schedulers, of type
ft_scheduler_t
. Before being used, a scheduler must be created by
calling the function
ft_scheduler_create
.
In order to be executed, a scheduler must be started by a call to the
function
ft_scheduler_start
. Note that several schedulers
can be used without problem simultaneously in the same program. Each
scheduler is run by a dedicated native thread which gives the control
in turn to the threads linked to the scheduler.
Threads
Fair threads are of type
ft_thread_t
. The call
ft_thread_create(s,r,c,a)
creates a thread in the scheduler
s
. The thread is run by a dedicated native thread. The creation
becomes actual at the beginning of the next instant of
s
.
The thread is automatically started as soon as it is created. The
function
r
is executed by the thread, and the parameter
a
is transmitted to it.
If stopped (by
ft_scheduler_stop
), the thread switches
execution to the function
c
to which
a
is also
transmitted.
Events
Events are of the type
ft_event_t
. An event is created by
calling the function
ft_event_create
which takes as
parameter the scheduler in charge of it.
Automata
Automata are fair threads of the type
ft_thread_t
created
with the function
ft_automaton_create
. The thread returned
by
ft_automaton_create(s,r,c,a)
is executed as an automaton
by the scheduler
s
, which means that it is run by the native
thread of the scheduler. The function
r
executed by the
automaton must be defined with the macro
DEFINE_AUTOMATON
.
Control of Threads
The call ft_scheduler_stop(t)
gives to the scheduler which
executes the thread t
the order to stop it. The stop will
become actual at the beginning of the next instant of the scheduler,
in order to assure that t
is in a stable state when stopped.
The call ft_scheduler_suspend(t)
gives to the scheduler
which executes t
the order to suspend t
. The
suspension will become actual at the beginning of the next instant of
the scheduler. The function ft_scheduler_resume
is used to
resume execution of suspended threads.
Broadcast of Events
The function ft_scheduler_broadcast(e)
gives to the
scheduler of the event e
the order to broadcast it to all
the threads running in the scheduler. The event will be actually
generated at the beginning of the next instant of the scheduler. The
call ft_scheduler_broadcast_value(e,v)
associates the value
v
to e
(v
can be read using ft_thread_get_value
).
Cooperation
The call ft_thread_cooperate()
is the explicit way for the
calling thread to return control to the scheduler running it. The call
ft_thread_cooperate_n(i)
is equivalent to a sequence of
i
calls ft_thread_cooperate()
.
Termination
The call ft_thread_join(t)
suspends the execution of the
calling thread until the thread t
terminates (either
normally or because it is stopped). Note that t
needs not to
be linked to the scheduler of the calling thread. With ft_thread_join_n(t,i)
the suspension takes at most i
instants.
Generating Events
The call ft_thread_generate(e)
generates the event e
in the scheduler which was associated to it, when created. The
call ft_thread_generate_value(e,v)
adds v
to the
list of values associated to e
during the current instant
(these values can be read using ft_thread_get_value
).
Awaiting Events
The call ft_thread_await (e)
suspends the execution of the
calling thread until the event e
becomes
generated. Execution is resumed as soon as the event is
generated. With ft_thread_await_n (e,i)
, the waiting takes
at most i
instants.
Selecting Events
The call ft_thread_select(k,array,mask)
suspends the
execution of the calling thread until the generation of one element of
array
which is an array of k
events. Then, mask
, which is an array of k
boolean values, is set
accordingly. With ft_thread_select_n(k,array,mask,i)
, the
waiting takes at most i
instants.
Getting Events Values
The call ft_thread_get_value(e,i,r)
is an attempt to get the
i
th value associated to the event e
during
the current instant. If such a value exists, it is returned in r
and the call immediately terminates. Otherwise, the value NULL
is returned at the next instant.
Link and Unlink
The call ft_thread_unlink()
unlinks the calling thread
t
from the scheduler in which it was previously
running. Then, t
will no longer synchronize, instant after
instant, with other threads linked to the scheduler. Actually, after
unlinking, t
behaves as a standard native thread.
The call ft_thread_link(s)
links the calling thread to the
scheduler s
. The calling thread must be unlinked when
executing the call. The linkage becomes actual at the beginning of the
next instant of s
.
Locks
In presence of unlinked threads, locks can be needed to protect data
shared between unlinked and linked threads. Standard mutexes are used
for this purpose. The call ft_thread_mutex_lock(p)
, where
p
is a mutex, suspends the calling thread until the moment
where p
can be locked. The lock is released using ft_thread_mutex_unlock
. Locks owned by a thread are automatically
released when the thread terminates definitively or when it is
stopped.
Automata are coded using macros. Here are the macros to define the
automaton structure:
-
AUTOMATON(aut)
declares the automaton
aut
.
-
DEFINE_AUTOMATON(aut)
starts definition of the
automaton aut
.
-
BEGIN_AUTOMATON
starts the list of states.
-
END_AUTOMATON
ends the list of states.
A state with number
num
starts with one of the following
macros:
-
STATE(num)
introduces a standard state.
-
STATE_AWAIT(num,event)
and STATE_AWAIT_N(num,event,delay)
are states to await event
.
-
STATE_JOIN(num,thread)
and STATE_JOIN_N(num,thread,delay)
are states to join thread
.
-
STATE_STAY(num,n)
is a state which keeps execution
in it for n
instants.
-
STATE_GET_VALUE(num,event,n,result)
is a state to
get the n
th value associated to event
.
-
STATE_SELECT(num,n,array,mask)
and STATE_SELECT_N(num,n,array,mask,delay)
are generalizing STATE_AWAIT
and STATE_AWAIT_N
to an array of events of
length n
.
Going from one state to another one is done with the macros:
-
GOTO(num)
blocks execution for the current
instant and sets the state for the next instant to be state num
.
-
GOTO_NEXT
blocks execution for the current instant
and sets the state for the next instant to be the next state in the
list of states.
-
IMMEDIATE(num)
forces execution to jump to state
num
which is immediately executed.
-
RETURN
immediately terminates the automaton.
Finally, the following macros define some special variables:
-
SELF
is the automaton.
-
LOCAL
is the local data of the automaton.
-
SET_LOCAL(data)
sets the local data of the
automaton.
-
ARGS
is the argument which is passed at creation to
the automaton.
-
RETURN_CODE
is the error code set by the macros run
during automaton execution.
Current Thread
The calling thread is returned by ft_thread_self()
.
Current Scheduler
The scheduler of the calling thread is returned by ft_thread_scheduler()
.
Pthread
The call ft_pthread(t)
returns the native pthread which
executes the fair thread t
. This function gives direct
access to the Pthreads implementation of FairThreads. In the rest of the
paper, native thread and pthread will be considered as synonymous.
Exiting
The function ft_exit
is equivalent to pthread_exit
. The basic use of ft_exit
is to terminate
the pthread which is running the function main
, without
exiting from the process running the whole program.
One only gives two small examples just to show the programming style
in
FairThreads. The first example is a complete executable programs, and the
second is an automaton.
The following code is a complete example of
FairThreads program, made of two
threads linked to the same scheduler.
#include "fthread.h"
#include <stdio.h>
void h (void *id)
{
while (1) {
fprintf (stderr,"Hello ");
ft_thread_cooperate ();
}
}
void w (void *id)
{
while (1) {
fprintf (stderr,"World!\n");
ft_thread_cooperate ();
}
}
int main (void)
{
ft_scheduler_t sched = ft_scheduler_create ();
ft_thread_create (sched,h,NULL,NULL);
ft_thread_create (sched,w,NULL,NULL);
ft_scheduler_start (sched);
ft_exit ();
return 0;
}
The program outputs Hello World!
cyclically. Note the call
of ft_exit
to prevent the program to terminate before
executing the two threads. Execution of linked fair threads is
round-robin and deterministic: the two messages Hello
and
World!
are always printed in this order.
The following automaton switches control between two threads,
according to the presence of an event. The automaton
switch_aut
has three states. The first state resumes the first
thread to run (initially, one assumes that both threads are
suspended). The switching event is awaited in the second state, and
the threads are switched when the event becomes present. The third
state is similar to the second, except that the threads are exchanged.
DEFINE_AUTOMATON (switch_aut)
{
void **args = ARGS;
ft_event_t event = args[0];
ft_thread_t thread1 = args[1];
ft_thread_t thread2 = args[2];
BEGIN_AUTOMATON
STATE (0)
{
ft_scheduler_resume (thread1);
}
STATE_AWAIT (1,event)
{
ft_scheduler_suspend (thread1);
ft_scheduler_resume (thread2);
GOTO(2);
}
STATE_AWAIT (2,event)
{
ft_scheduler_suspend (thread2);
ft_scheduler_resume (thread1);
GOTO(1);
}
END_AUTOMATON
}
The automaton behavior is actually the one of a thread
running the following function:
void switch_aut (void *arg)
{
void **args = arg;
ft_event_t event = args[0];
ft_thread_t thread1 = args[1];
ft_thread_t thread2 = args[2];
ft_scheduler_resume (thread1);
while (1) {
ft_thread_await (event);
ft_scheduler_suspend (thread1);
ft_scheduler_resume (thread2);
ft_thread_cooperate ();
ft_thread_await (event);
ft_scheduler_suspend (thread2);
ft_scheduler_resume (thread1);
ft_thread_cooperate ();
}
}
The two main differences between
Loft and
FairThreads concerns the ease
of programming and the expressiveness given by the existence of
private stacks.
6.3.1 Ease of Programming
|
The first main difference relies of course on the fact that
Loft is
a language while
FairThreads is an API. It indeed implies two very different
styles of programming. In
Loft, one writes reactive programs which
are linked with standard C code through atoms. In
FairThreads, one basically
writes C code in which some function calls are reactive
primitives. Thus, the reactive structure of
Loft programs certainly
appears more clearly than in
FairThreads programs.
Actually, in
FairThreads native threads are the rule and the automata are
the exception, while it is the converse in
Loft. The passage from
a thread to an automaton can need in
FairThreads the complete rewriting of
the code in order to fit with the automata format. In
Loft, the
passage from an automaton to a thread or vice-versa is extremely
simplified, and is actually concentrated in the keyword
native
. For example, the
Loft module that corresponds to the
previous
switch_aut
automaton is:
module switch_aut (ft_event_t event,ft_thread_t thread1,ft_thread_t thread2)
resume (local(thread1));
while (1) do
await (local(event));
suspend (local(thread1));
resume (local(thread2));
cooperate;
await (local(event));
suspend (local(thread2));
resume (local(thread1));
cooperate;
end
end module
it produces code which is very similar to the previous automaton.
However, by adding the native
keyword, the code produced
would be very close to the one of the corresponding fair thread.
The linking mechanism of Loft is a simplification of the one of
FairThreads as there is no need in Loft to be unlinked in order to be
able to link to another scheduler. Actually, when linking to a target
scheduler, the unlinking of the source scheduler is implicit in
Loft. The important point is that is does not implies that the
thread stays unlinked, which would need the use of a native thread.
The arguments of threads in FairThreads have to be passed through an unique
parameter of type void*
. This complicates things when
several arguments have to be passed to a same thread. Actually, these
arguments have to be packed in one new structure whose reference is
passed to the thread. In case of an automaton, the arguments are
accessed with the macro ARGS
.
An implicit scheduler is always present in Loft which is not the
case in FairThreads where there is no implicit main
function which
would create it.
The local variable in Loft have to be explicitely declared as local in
Loft while they coincide with local C variables in FairThreads. This is
an aspect that is simpler in FairThreads than in Loft.
FairThreads offers possibilities that do not exist in
Loft and which are
related to the stacks. Actually, in
FairThreads all the threads, except
automata, use their own private stack, as they are basically
pthreads. The opposite holds in
Loft where threads, except
instances of native modules, do not need a private stack to execute.
The existence of a private stack allows to code functions that are not
tail-recursive. For example, consider the following recursive
function, which is not tail-recursive because a printing action occurs
after the recursive call:
void sum_aux (int n)
{
if (n == 0) { printf ("0"); return; }
ft_thread_cooperate_n (n);
sum_aux (n - 1);
printf ("+%d",n);
}
The execution exits after a call to the previous function:
void sum (void *k)
{
sum_aux ((int)k);
printf (" = instant number\n");
exit(0);
}
One also defines a function to trace the instants:
void trace_instants (void *n)
{
int i = 0;
while (1){
printf ("\ninstant %d: ", i++);
ft_thread_cooperate();
}
}
Finally, the main function is defined by:
int main(int argc, void** argv)
{
ft_scheduler_t sched = ft_scheduler_create ();
ft_thread_create (sched,trace_instants,NULL,NULL);
ft_thread_create (sched,sum,NULL,argv[1]);
ft_scheduler_start(sched);
ft_exit();
return 0;
}
Given the argument n
, the program exits at the instsnt
number s
where s
is the sum of the n
first instegers. For example, with n
equals to 4, on gets:
instant 0:
instant 1:
instant 2:
instant 3:
instant 4:
instant 5:
instant 6:
instant 7:
instant 8:
instant 9:
instant 10: 0+1+2+3+4 = instant number
Thus, the thread executing sum
absolutely needs a stack to
store the partial results of the recursive calls because several
instants are passing before the final result is computed.
There is no counterpart in Loft of the ft_thread_select
functions. Thus, the waiting of one amongst several events must be
coded indirectly in Loft. Next versions of Loft should
certainly propose a primitive similar to ft_thread_select
of
FairThreads.
6.4 Comparison with POSIX Threads
|
When comparing
FairThreads with POSIX, the first point to note is of course
that
FairThreads is basically cooperative while POSIX is basically
preemptive. More precisely, threads linked to the same scheduler are
executed cooperatively. As a consequence, data protection is
considered very differently in the two contexts: in POSIX mutexes are
basically used for that purpose, while in
FairThreads schedulers are used
for the same purpose. If one is sure that a data can only be accessed
by the threads linked to a given scheduler, one can safely deduce that
no race condition occur for the data. If accessing the data needs more
than simple atomic instructions, a boolean variable can be used to
lock and unlock the variable. In this case, however, one is in a
situation very similar to the one using mutexes, with all their
identified problems.
The second point to note when comparing
FairThreads and POSIX is that
FairThreads proposes the notion of an automaton which does not need
the full power of a kernel thread to execute. Using automata,
one can pass around the limitations of POSIX implementations
in which threads are mapped onto kernel threads.
One ends the comparison with POSIX by detailing some others important
differences.
The POSIX notion of a thread attribute and the related functions
pthread_attr_*
do not exist in FairThreads in which threads are
more simply of two types: standard fair threads or automata.
Like POSIX (and unlike Java), there is no primitive to start a fair
thread which will be automatically run after creation by ft_thread_create
.
Equality of fair threads is standard pointers equality as the type
ft_thread_t
is a pointer type (no equivalent of pthread_equals
).
Data local to a thread can be defined in POSIX with the notion of keys
and of thread-specific data (functions pthread_key_*
and
pthread_getspecific
, and pthread_setspecific
).
There is no direct counterpart in FairThreads. However, the POSIX functions
for data local can be used in FairThreads by means of the ft_pthread
function.
The pthread_once
function of POSIX can be simply implemented
with a boolean in FairThreads as one is in a cooperative context.
The functions related to POSIX scheduling policies (for example
pthread_setschedparam
) are of no use in FairThreads where only one
policy is available: the fair one. Note that this policy more or less
corresponds to the SCHED_FIFO
policy of POSIX.
Priority of POSIX do not exist in FairThreads. Priority-based systems
can be coded in FairThreads using several schedulers to manage the
various priority levels. This is another reason why FairThreads is
simpler than POSIX.
Mutex and Condition Variables
|
All the functions related to mutexes (
pthread_mutex_*
and
pthread_muttexattr_*
) have no counterpart in
FairThreads, where
locks are not needed when one stays in the cooperative part of it.
As stated before, schedulers in
FairThreads can be used to structure
applications around protected data which are only accessed by linked
threads; in this respect, schedulers are playing a role very
similar to the one of mutexes.
Events in
FairThreads basically correspond to condition variables of POSIX.
To generate an event corresponds to
pthread_cond_broadcast
and to await it corresponds to
pthread_cond_wait
. The
limited variant
pthread_cond_timedwait
is the counterpart of
ft_thread_await_n
except that in
FairThreads deadlines are counted
in instants. There is no direct counterpart for the nondeterministic
pthread_cond_signal
of POSIX which must be coded in
FairThreads in
a way similar as the one presented in
Single Notification3.1.2.2.
The notion of a detached thread (pthread_detach
) has no
counterpart in FairThreads where termination of all threads can be tested at
any moment by ft_thread_join
.
Joining a thread in FairThreads and in POSIX are very similar except that
no value are returned by terminating fair threads.
To kill a thread is done in FairThreads with the ft_scheduler_stop
function. In POSIX, the cancellation mechanism with the related
functions (for example, pthread_setcancelstate
) can be used
for that purpose. However, it implies that the cancelled thread is in
the correct mode and has set meaninful cancellation points. In FairThreads
thread cancellation can naturally only occurs at beginning of
instants. In this respect, FairThreads is much more simpler than POSIX.
The suspend-resume mechanism of FairThreads has no direct counterpart in
POSIX where it must be coded indirectly. Note that the equivalent
functionnality has disapeared from the recent versions of Java (it is
deprecated).
7 Implementations
One considers now the question of implementing
Loft. Actually,
five implementations are described:
- The basic implementation basically translates Loft
programs into FairThreads. Its is a complete implementation which can deal
with non-native modules as well as with native ones.
- Three implementations correspond to the previous Rewrite,
Replace and Optim semantics.
- Finally, the last implementation is suited for embedded
systems. Actually, it can be seen as a mix of Optim with the
automata-based part of the translation in FairThreads.
7.1 Implementation in FairThreads
|
The basic implementation of
Loft uses
FairThreads in C which is very
close to it (let us recall that
Loft means
Language Over Fair
Threads) and which was introduced in
FairThreads Overview6.1.
FairThreads is supposed to be executed in the context of a preemptive
OS. In this context, unlinked fair threads are run by dedicated native
threads, at the kernel level. This is exactly what is needed to
implement native modules of
Loft. The implementation described in
this section is complete, as opposite to implementations considered
later, that deal only with the cooperative part of
Loft.
7.1.1 Translation in FairThreads
|
The implementation of
Loft basically consists in a translation of
modules in
FairThreads. The native modules are translated into
standard fair threads, while the non-native ones are translated into
automata. The translation is rather direct:
- Types
event_t, scheduler_t, thread_t
of
Loft are directly mapped to the corresponding types ft_event_t, ft_scheduler_t, ft_thread_t
of FairThreads.
- Orders of Loft are directly translated into corresponding
orders of FairThreads. For example,
stop (t)
is translated
into ft_scheduler_stop (t)
.
- The
generate
instruction of Loft translates into
ft_thread_generate
or into ft_scheduler_broadcast
,
depending on the linking of the executing thread.
- The instructions
cooperate
is translated into ft_thread_cooperate
when a native module is considered, and into
GOTO_NEXT
when it is not native.
- Instruction
await
of Loft translates into ft_thread_await
or into STATE_AWAIT
. Other non-atomic
instructions are translated into their equivalent statements of
FairThreads.
However, some points are specific to the implementation. They
basically concern schedulers, deallocation, and the main module.
- Schedulers are always run as standard autonomous
pthreads, using the function
ft_scheduler_start
. Actually,
the scheduler_react
function of Loft becomes useless and
is not implemented.
- The main module must always be defined. Thus, the function
loft_initialize
becomes useless.
- Deallocation functions are not implemented. It is left to the
programmer to define means to deallocate objects created in the
heap.
Each instance of a module is mapped onto a fair thread of type
ft_thread_t
. This thread is either a standard fair thread in case
of a native module, or an automaton in case of a non-native one. Thus,
only instances of non-native modules are run autonomously as standard
pthreads. The translation will be described through the example
of a module.
The example considered is the following module that waits for the
event
go
and then prints a message 100 times:
module native m (char *s)
local int i;
await (go);
{local(i) = 0;}
repeat (100) do
printf ("instant %d: %s\n",local(i)++,local(s));
cooperate;
end
exit (0);
end module
One first describes the translation of m
, then one considers
the change where m
is a non-native module. In both cases,
the module translates into a data structure for local variables, a
function that describes the module behavior, and two functions to
create new instances of the module.
7.1.2.1 Native Modules
The translation of the module
m
first defines the datatype
structure
m_data_
to store the parameters and local
variables of
m
.
typedef struct m_data_
{
char* s;
int i;
} *m_data_;
Second a function m_finalize
is defined to process the
finalizing atom, which in this case is absent.
void m_finalize (void *_args)
{
m_data_ _local_vars = _args;
}
Third, a function m_fun
is built directly from the module
body (one has manually indented the code in order to make it more
readable).
void m_fun (void *_args)
{
thread_t _self = ft_thread_self ();
m_data_ _local_vars = (m_data_)_args;
_set_thread_return_code (_self,ft_thread_await (go));
{(_local_vars->i ) = 0;}
{
int _i0 = 100;
for(;_i0>0;_i0--){
{
printf ("instant %d: %s\n",
(_local_vars-> i ) ++,
(_local_vars-> s ) );
}
ft_thread_cooperate ();
}
}
{exit (0);}
}
Several points can be noticed in the translation:
- The executing thread returned by
ft_thread_self
is assigned to the variable _self
in order
to be used later, in the function body.
- The local data of the thread is obtained by the argument
arg
which is thus cast to the correct type. The macro local
used in the module body is expanded into an access to the _local_vars
variable.
- The result of the call of
ft_thread_await
is passed
to a special function that makes it available by calling the
return_code
function.
- The
repeat
loop is translated into a standard for
loop controlled by an automatic variable (produced by the
compiler with an unique name).
Third, the m_create_in
function is defined. It first
allocates a structure _locals
for parameters and local
variables of the returned thread _th
. A special allocation
function is used, called mymalloc
. Actually it just adds
some tracing possibilities to the standard malloc
function
of C. The parameter are copied in _locals
. Then, a fair
thread is created using ft_thread_create
. Finally, the new
pthread is detached (using the pthread_detach
function of
Pthread), in order to avoid the limitation on the number of pthreads
that can be attached in an application. Actually, the join
primitive of FairThreads is not implemented using pthtread_join
which supposes that threads are attached.
thread_t m_create_in (scheduler_t _sched ,char* s)
{
thread_t _thread;
m_data_ _locals = mymalloc (sizeof (struct m_data_));
_locals->s = s;
_thread = ft_thread_create (_sched,m_fun,m_finalize,_locals);
pthread_detach (ft_pthread (_thread));
return _thread;
}
Finally, the m_create
function is defined as a call to
m_create_in
with the implicit scheduler as parameter.
thread_t m_create (char* s)
{
return m_create_in (implicit_scheduler (),s);
}
7.1.2.2 Non-native modules
One now considers non-native modules and changes the previous module
m
into a non-native one. Syntaxically, this just means to
suppress the
native
keyword in the definition of
m
.
Non-native modules are naturally translated into automata as they do
not need the full power of a dedicated pthread to execute.
As auxiliary variables used in the translation, such as the previous
variable
_i0
, cannot anymore be implemented as automatic
variables, they are introduced in the local data structure
produced. The
m_data_
becomes (
_i0
is renamed in
_loc0
):
typedef struct m_data_
{
int _loc0;
char* s;
int i;
} *m_;
The automaton produced by the translation of m
has 9 states
and is defined by a function which is as previously called m_fun
. The first state is used to await the event go
.
State 1 contains initialization of the local variable i
. States 2,3, and 6 implements the repeat
loop, using
the local variable _loc0
, which is initialized in state 2,
tested in 3, and decremented in 6. State 4 contains the atomic
printing action. State 5 corresponds to the cooperate
instruction. State 7 is an auxiliary state (which should probably be
removed). Finally, state 8 contains the call to the exit
function.
DEFINE_AUTOMATON(m_fun)
{
m_data_ local_vars = (m_data_)_get_thread_args (_self);
BEGIN_AUTOMATON
STATE_AWAIT (0,go) {}
STATE (1) {local(i) = 0;}
STATE (2) {local(_loc0) = 100;}
STATE (3) {if (local(_loc0)<=0) IMMEDIATE(7);}
STATE (4) {printf ("instant %d: %s\n",local(i)++,local(s));}
STATE (5) {GOTO_NEXT;}
STATE (6) {local(_loc0)--; IMMEDIATE (3); }
STATE (7) {}
STATE (8) {exit (0);}
END_AUTOMATON
}
The m_create_in
function creates an automaton using ft_automaton_create
:
thread_t m_create_in (scheduler_t sched ,char* s)
{
thread_t _thread;
m_ locals = malloc (sizeof (struct m_));
locals->s = s;
_thread = ft_automaton_create (sched,m_fun,m_finalize,locals);
pthread_detach (ft_pthread (_thread));
return _thread;
}
7.1.2.3 Main Module
The main function creates the implicit scheduler, then creates the
main thread in it, and finally starts the implicit scheduler.
static scheduler_t _implicit;
int main (int n,char **args)
{
_implicit = scheduler_create ();
_main_create_in (_implicit,n,args);
ft_scheduler_start (_implicit);
ft_exit ();
return 0;
}
The pthread running the main function is exited, using the ft_exit
function, leaving thus the implicit scheduler running
autonomously.
Efficiency of the implementation is basically based on two points:
- There is no busy-waiting on absent events because
fair threads waiting for an event are placed into a list associated to
it, and then become transparent for the scheduler. Transparent fair
threads return to the visible state as soon as the event they are
waiting for becomes generated. Thus, one gets an implementation using
queues to store waiting threads which actually implements the method
described in section OPTIM Semantics5.5.
- The use of automata for non-native threads is very
efficient. Actually, an automaton is implemented as a
switch
instruction which dispatches the control to the appropriate
state. Moreover, there is no limit on the number of automata that can
be created, which is not true for native threads.
However, the implementation in
FairThreads has a major drawback: it
absolutely needs the presence of pthreads (even if they are not
started, with non-native modules) and thus suffers
of the problems related to pthreads. Amongst them, one can cite the
need of a private stack for each pthread, which is memory consuming,
and the presence of uncontrolled context-switches, which is inherent
to the preemptive basis of pthreads. This last
characteristic is clearly time consuming. Because pthreads are
mandatory, the described implementation cannot be used for embedded
systems with limited resources; this is the reason why the
implementation described in section
Direct Implementation7.5 is proposed.
7.2 Implementation of REWRITE
|
One now switches to implementations of the cooperative part of
Loft. In this section, one directly implements the semantics rules
of section
Instructions in REWRITE5.2. As the
semantics, this implementation is called
Rewrite to emphasize on
the fact that it mimics the rewriting rules execution. The main
purpose of
Rewrite is to serve as a reference for others
implementations.
Instructions are of type
instruction_t
. Here is the
definition of the type:
struct _instruction_t
{
char type;
}*instruction_t;
The type instruction_t
is used to defined types of specific
instructions. They all contain a field parent
of type instruction_t
which store the actual type of the instruction. For
example, the type of sequences (to simplify, one only considers
binary sequences) sequence_t
, is defined by:
typedef struct _sequence_t
{
instruction_t parent;
instruction_t left;
instruction_t right;
}
*sequence_t;
The make_sequence
function returns new sequences:
instruction_t make_sequence (instruction_t left,instruction_t right)
{
sequence_t res = mymalloc (sizeof (struct _sequence_t));
res->parent.type = SEQ;
res->left = left;
res->right = right;
return (instruction_t)res;
}
Types corresponding to other instructions are build in the same way,
and for simplicity are not consider here.
The scheduler and the environment which are transformed by the
rewriting rules are considered as global variables in the
implementation. The prototype of the rewriting function is:
rewrite_result_t rewrite (instruction_t);
The result of a rewriting (of type rewrite_result_t
) is a
structure made of two components: the status (TERM, CONT,
COOP, or RETURN) and the residual instruction. For example, the
implementation of the rewriting of a sequence is:
rewrite_result_t sequence_rewrite (instruction_t inst)
{
sequence_t s = (sequence_t)inst;
rewrite_result_t res = rewrite (s->left);
if (res->status != TERM) {
return
make_rewrite_result (res->status,make_sequence(res->inst,s->right));
}
return rewrite(s->right);
}
Note that a new sequence is systematically build by make_sequence
when the left branch is not terminated.
The implementation of modules is close to the one presented in section
Implementation in FairThreads7.1 except that pthreads
are no more used and are replaced by elements of type
instruction_t
. The translation of a module
m
creates
several items:
- A structure
m_data_
which defines the local data
associated to the module (parameters and local variables). The
structure m_data_
is constructed exactly as in Implementation in FairThreads7.1.
- The function
m_selector
which contains all the atomic
instructions called in the module.
- The function
m
which returns the instruction of type
instruction_t
build from the module body.
- The two functions
m_create_in
and m_create
which returns new instances of m
. m_create
is the
same as in Implementation in FairThreads7.1.
As example, consider the following (non-native) module
m
(also considered in
Implementation in FairThreads7.1):
module m (char *s)
local int i;
await (go);
{local(i) = 0;}
repeat (100) do
printf ("instant %d: %s\n",local(i)++,local(s));
cooperate;
end
exit (0);
end module
The function m_selector
lists all the atoms and execute them
according to an integer parameter:
static void *m_selector (int _n,m_data_ _local_vars)
{
switch (_n) {
case 0: {void *res = (void*)(go); return res;}
case 1: {local(i) = 0;} return NULL;
case 2: {printf("instant %d: %s\n",local(i)++,local(s));} return NULL;
case 3: {void *res = (void*)(100); return res;}
case 4: {exit(0);} return NULL;
}
return NULL;
}
The m
function returns an instruction which actually
corresponds to the syntax tree of the module:
instruction_t m (m_data_ _local_vars)
{
return
make_sequence (
make_sequence (
make_sequence (
make_await (make_wrapper (m_selector,0)),
make_atom (m_selector,1)),
make_repeat (make_wrapper (m_selector,3),
make_sequence (make_atom (m_selector,2),make_cooperate ()))),
make_atom (m_selector,4));
}
The function make_wrapper
returns a wrapper which is a
structure which contains a selector, as the previous m_selector
, and an index. The execution of an atom made from a
wrapper returns the evaluation of the case corresponding to the index
of the selector . Thus, the selector is evaluated at execution time
and not when the thread is constructed.
Finally, the function m_create_in
which creates threads is
defined by:
thread_t m_create_in (scheduler_t _sched, char* s)
{
thread_t _thread;
m_data_ _locals = mymalloc (sizeof (struct m_data_));
_locals->s = s;
_thread = make_thread (_sched,m (_locals),m_finalize,_locals);
add_to_link (_sched,_thread);
return _thread;
}
Threads are structures defined in
Semantics Basics5.1.
The type
thread_t
is implemented by:
typedef struct _thread_t
{
instruction_t *run;
finalizer_t finalizer;
void *locals;
char state;
char suspended;
event_t *signal;
thread_t *called;
char code;
scheduler_t *scheduler;
}*thread_t;
The field run
is the instruction executed by the thread.
locals
points to the thread local variables. The state of
the thread is contained in state
. The thread is suspended
when suspended
is different from 0. The event signal
is generated when the thread terminates its execution. The
field called
is set while a thread is under execution
through a run
instruction. The field code
holds
the return code of some instructions (namely, await
, join
, and get_value
). Finally, scheduler
denotes the scheduler in which the thread is currently executed.
The make_thread
function returns a new thread in which a the
termination signal is generated at the end of the executed
instruction:
thread_t make_thread (scheduler_t sched,
instruction_t inst,
finalizer_t final,
void *locals)
{
thread_t res = mymalloc (sizeof (struct _thread_t));
res->signal = event_create_in (sched);
res->run =
make_sequence (
inst,
make_generate (make_const_expression (res->signal))
);
res->finalizer = final;
res->state = _CONT;
res->suspended = 0;
res->called = NULL;
res->code = OK;
res->locals = locals;
res->scheduler = sched;
return res;
}
The reaction of a thread consists in a rewriting of the instruction
embedded in it:
void fire_thread (thread_t thread)
{
rewrite_result_t res;
thread->scheduler->current = thread;
res = rewrite (thread->run);
return_processing (thread,res->status);
thread->run = res->inst;
}
The return_processing
function basically replaces the
RETURN status by TERM.
Schedulers are structures defined in
Semantics Basics5.1
and implemented by:
typedef struct _scheduler_t
{
item_list_t linked;
thread_t current;
item_list_t events;
item_list_t orders;
item_list_t to_broadcast;
char eoi;
char move;
}*scheduler_t;
One just describes the way the scheduler reacts, corresponding
to the instant
function:
void instant (scheduler_t sched)
{
start_instant (sched);
while (1) {
fire_all_threads (sched);
if (stable (sched)) break;
make_decision (sched);
}
}
The start_instant
function called at the beginning of
each new instant is defined by:
static void start_instant (scheduler_t sched)
{
sched->move = sched->eoi = 0;
transfer_threads (sched);
transfer_all_events (sched);
thread_changes (sched);
reset_scheduler (sched);
}
During a phase, the threads in linked
are all fired in turn:
static void fire_all_threads (scheduler_t sched)
{
item_cell_t cell = sched->linked->first;
while (cell) {
thread_t th = cell->item;
if (fireable (th)) fire_thread (th);
cell = cell->next;
}
}
Ths stability of the scheduler is tested by the function stable
defined by:
static int stable (scheduler_t sched)
{
item_cell_t cell = sched->linked->first;
while (cell) {
if (fireable ((thread_t)cell->item)) return 0;
cell = cell->next;
}
return 1;
}
Finally, the end of the current instant is decided when the make_decision
function is called while the move
flag is
false:
static void make_decision (scheduler_t sched)
{
if (sched->move) sched->move = 0; else sched->eoi = 1;
}
7.3 Implementation of REPLACE
|
One now consider the variant
Replace of
Rewrite. The basic idea
is to introduce states in instructions and to change instructions
states during the rewriting process in order to limit to the maximum
the creation of new instructions. One just gives here the main changes
with the previous
Rewrite implementation, concerning
cooperate
, sequences, and loops.
The
cooperate
instruction has a state which is set to
indicate that the instruction has been executed:
char cooperate_rewrite (instruction_t inst)
{
cooperate_t coop = (cooperate_t) inst;
if (coop->state) return TERM;
coop->state = 1;
return COOP;
}
A function is defined which resets the state of the instruction:
void cooperate_reset (instruction_t inst)
{
cooperate_t coop = (cooperate_t) inst;
coop->state = 0;
}
A state is also introduced in sequences (only binary ones are
considered) which is set when the left branch of the sequence is
terminated:
char sequence_rewrite (instruction_t inst)
{
sequence_t s = (sequence_t)inst;
if (0 == s->state) {
int res = rewrite (s->left);
if (res != TERM) return res;
s->state = 1;
}
return rewrite(s->right);
}
Note that the rewrite
function directly returns the
execution status because no new instruction is created as in
Rewrite. The reset
function for sequence resets the state
and recursively resets the branches of the sequence:
void sequence_reset (instruction_t inst)
{
sequence_t s = (sequence_t)inst;
s->state = 0;
reset_instruction (s->left);
reset_instruction (s->right);
}
The
while
loop has a state which is set to 1 when the
expression is evaluated. When the loop body terminates, it is reset
and the state is reset to 0 to force the expression to be
re-evaluated:
char while_rewrite (instruction_t inst)
{
while_t w = (while_t)inst;
char r;
while (1) {
if (0 == w->state) {
int b = (int)evaluate (w->exp);
if (!b) return TERM;
w->state = 1;
}
r = rewrite (w->body);
if (r != TERM) return r;
reset_instruction (w->body);
w->state = 0;
}
}
7.4 Implementation of OPTIM
|
The semantics
Optim of
OPTIM Semantics5.5 can be now
implemented as a variant of the previous implementation
Replace,
to avoid busy-waiting on events.
One has made a little change with the semantics of
OPTIM Semantics5.5 by implementing the global table
waiting
of the scheduler with a list included in each event.
Two functions are defined to register threads returning the WAIT
status:
register_in_event (e)
registers the executing thread
(actually
self ()
) in the waiting list associated to the
event
e
, and
register_in_timer (s)
registers the
executing thread in the timer list of the scheduler
s
.
One now considers the implementation of the
await
instruction (in both the limited and the unlimited forms).
If the awaited event is not present, then the
await
instruction is registered in the waiting queue of the event and
WAIT is returned.
char await_rewrite (instruction_t inst)
{
await_t a = (await_t)inst;
if (0 == a->state) {
a->event = evaluate (a->exp);
a->state = 1;
}
if (is_present (a->event)) return TERM;
register_in_event (a->event);
return WAIT;
}
Now the status CONT and COOP are not returned. Either, the
instruction terminates because the event is present (and the status
returned is TERM), or it is registered in the event.
For a limited await registration is both in the waiting queue of the
event and in the scheduler timer, before returning WAIT.
char limited_await_rewrite (instruction_t inst)
{
limited_await_t a = (limited_await_t)inst;
scheduler_t sched = current_scheduler ();
if (0 == a->state) {
a->event = evaluate (a->evt_exp);
a->limit = (int)evaluate (a->limit_exp);
self ()->deadline = a->limit + sched->instant;
a->state = 1;
}
if (self ()->deadline <= sched->instant) {
self ()->code = ETIMEOUT;
return TERM;
}
if (is_present (a->event)) {
self ()->code = OK;
return TERM;
}
register_in_event (a->event);
register_in_timer (sched);
return WAIT;
}
The function
generate
now calls the
awake
function
in order to awake the threads waiting for the generated event. To
awake a thread makes it once again executable by the scheduler by
setting its state to CONT:
static void awake (event_t event)
{
item_cell_t cell = event->waiting->first;
while (cell) {
thread_t th = cell->item;
if (!th->suspended && th->state == WAIT) th->state = CONT;
cell = cell->next;
}
reset_item_list (event->waiting);
}
The processing of deadlines uses the auxiliary function which returns 1
if the deadline is reached and 0 otherwise:
static int deadline_reached (void *item)
{
thread_t thread = item;
if (thread->deadline > current_scheduler ()->instant) return 0;
if (thread->state == WAIT) thread->state = CONT;
return 1;
}
The awaking of threads with deadlines reached is done at the beginning
of each instant, and the start_instant
function becomes:
static void start_instant (scheduler_t sched)
{
sched->instant++;
sched->move = sched->eoi = 0;
awake_deadline_reached (sched);
transfer_threads (sched);
transfer_all_events (sched);
thread_changes (sched);
}
where the awake_deadline
function calls deadline_reached
on all the elements in timer
, removing
them when 1 is returned.
7.5 Direct Implementation
|
One considers now an implementation of
Loft which mixes the
previous implementation of the
Optim semantics described in
Implementation of OPTIM7.4 and the automata-based approach of
the implementation in
FairThreads considered in
Implementation in FairThreads7.1, but without the need of pthreads
anymore. One calls this implementation
Direct as it consists in
a rather direct translation in C, without the need of some particular
library. Moreover,
Direct use an efficient algorithm to access
the next thread which needs execution.
One gets an efficient implementation with low consumption of
memory. This implementation is suited for using
Loft on embedded
systems with limited resources. An experiment described in
section
Prey-Predator Demo8.3 has been made with the
GBA (
Game Boy Advance) platform of Nintendo.
Event queues are used to avoid busy-waiting not only during one
instant, but also accross several instants. A thread which is awaiting
a not present event registers itself in a queue associated to the
event. Up to this point, the thread falls into a new WAIT state,
and will not be considered for execution by the scheduler to which it
is linked, until the event is generated. When an event is generated,
elements of the queue associated to it all return in the normal
execution state. Note that the order in which the threads are executed
is not perturbed by this mechanism which is actually just a way to
suspend execution of threads awaiting non generated events. In this
case, suspension and resumption are immediate, i.e. they become
effective in the current instant and not in the next one as it is the
case with the suspend
/resume
instructions.
Let us consider another time the module of
Modules.
The translation makes use of the following macros, which are very
similar to the ones of
FairThreads:
#define BEGIN_AUTOMATON\
while (1) {\
switch (_local_vars->_state)\
{
#define STATE(n)\
case n: SET_STATE(n);
#define END_AUTOMATON\
default: THE_END\
}\
}
#define SET_STATE(n) _local_vars->_state=n
#define GOTO_NEXT {SET_STATE(_local_vars->_state+1); return COOP;}
#define GOTO(n) {SET_STATE(n); return COOP;}
#define IMMEDIATE(n) {SET_STATE(n); break;}
#define THE_END {SET_STATE(-1); generate (self()->signal); return TERM;}
The code is actually very close to the one obtained with the
translation in FairThreads. However, now the state of the automaton (_state
) becomes explicit and is considered as a local variable. The
local data structure becomes:
typedef struct m_data_
{
event_t _loc0;
int _loc1;
char* s;
int i;
int _state;
} *m_data_;
The module body is translated into the automaton m_automaton
which is the function:
char m_automaton (m_data_ _local_vars)
{
BEGIN_AUTOMATON
STATE(0) {local(_loc0)=go;}
STATE(1) {
event_t event = local(_loc0);
if (event->scheduler != current_scheduler ()) {
self()->code = EBADLINK;
} else if (!is_present (event)) {
register_in_event (event);
return WAIT;
}
}
STATE(2) {local(i) = 0;}
STATE(3) {local(_loc1) = 100;}
STATE(4) {if (local(_loc1)<=0) IMMEDIATE(8);}
STATE(5) {printf ("instant %d: %s\n",local(i)++,local(s));}
STATE(6) {GOTO_NEXT;}
STATE(7) {local(_loc1)--; IMMEDIATE (4);}
STATE(8)
STATE(9) {exit (0);};
END_AUTOMATON
}
The code of state 1 implements the await (go)
instruction. If go
is not present, then the executing thread
(returned by self
) is registered in the event and the status
returned is WAIT. Thus, the thread will never be considered for
execution until go
will be generated. When it will be the
case (if it happens), then the thread status returns to CONT and thus
the execution will resume. An important point to note is that the
order in which threads are executed always remains the same and is not
changed by awaiting events.
The function which creates a thread is:
thread_t m_create_in (scheduler_t _sched,char* s)
{
thread_t _thread;
m_data_ _locals = mymalloc (sizeof (struct m_data_));
_locals->s = s;
_thread = make_thread (_sched,m_automaton,m_finalize,_locals);
_locals->_state = 0;
add_to_link (_sched,_thread);
return _thread;
}
Note the initialization of the automaton state. The function make_thread
returns a thread executing the function passed as first
parameter with the finalizer as second parameter and the local data
as third parameter. The new thread is added in the implicit
scheduler by the call to add_to_link
.
The default main function is defined by:
int main (int argc,char **argv)
{
loft_initialize ();
_main_create_in (implicit_scheduler (),argc,argv);
while (1) scheduler_react (implicit_scheduler ());
return 0;
}
This function has to be adapted if one does not want all the CPU to be
consumed by it.
7.5.4 Access to the Next Thread
|
The threads placed in
linked
are considered in turn fot
execution. When a thread is in the state WAIT it should not be
considered for execution and the scheduler should pass on it without
any other action to perform. Of course, the state can later return to
CONT when the thread is awaken, and in this case the scheduler will
have to reconsider it again.
To avoid to consider threads that are in state WAIT, one uses a
masking 1: threads with the state WAIT are masked
and disapear from
linked
. However, they are not
definitively extracted from the list and can be reintroduced in it
at the same place. This of course happens when a masked thread
is awaken. In this way, the efficiency of the scheduler do not depend
anymore from threads that are registered in events or in the scheduler
timer.
To implement masking,
linked
is implemented as a list of
cells with two series of pointers. The first serie describes the
structure of the list which does not change except when a thread
terminates or when new threads are linked to the scheduler. Note that
this can only appear in-between instants. The second serie of pointers
gives the next and previous unmasked cell. This pointers are changed
when a thread returns WAIT and when it is awaken. Using the
structure pointers allows to reinsert a thread at its previous
position when awaken.
The cost of finding the next unmasked thread to execute is constant.
The cost of a masking action is constant. However the cost of
unmaking depends on the number of threads (masked and unmasked)
present in the list. This is basically because threads are
reintroduced in
linked
at their initial place. One thus gets
a very efficient mechanism that is specially useful when a large
number of threads is in use while, at each instant, few of them are
really executed, the others ones waiting for absent events.
One has considered five implementations of Loft.
The first implementation consists in a direct translation into
FairThreads, which is an API based on a computing model very close to
Loft. From this point of view, Loft can be seen as a mean to
replace an API by a language, in order to ease the programming of fair
threads. The implementation is complete, as native modules are taken
in account. Actually, instances of native modules are implemented as
standard kernel threads, which are provided by the pthread library.
Then implementations of the three semantics proposed have been
considered.
Rewrite is a direct implementation of the basic rewriting rules
defining the semantics of Loft. This direct coding is possible
because the semantics is deterministic and structurally defined. In
Rewrite, new terms are build at each micro-step, which is clearly
an unsatisfactory solution. Moreover, the memory is never released
(there is no GC!)
In the variant Replace of Rewrite, the construction of new terms
is limited to the minimum. Basically, in Replace, new allocation
occurs only when new threads are added in the system. Thus, the memory
remains stable while no new thread is created.
Both Rewrite and Replace are using a polling strategy for
awaiting events. Actually, a thread executing await (e)
tests for the presence of e
at each micro-step and
during all instants, until e
is generated (if it
is). This is clearly inefficient. The Optim implementation avoids
busy-waiting for absent events, using queues in which waiting threads
are placed.
Finally, the fifth implementation Direct is suited for embedded
systems. As Optim, it uses queues to store threads awaiting for an
event are used, instead of busy-waiting. As the implementation in
FairThreads, it translates non-native modules in automata. However, no
library of kernel threads must be used in this case. Moreover,
Direct uses a masking technique to get direct access to the
next thread to be executed. One thus gets a very efficient with low
memory consumption implementation of the cooperative part of Loft.
8 Examples - 2
One continues the presentation of examples started in chapter
Examples - 13. First one considers in
Multiprocessing8.1 examples which benefit from multiprocessing.
Section
Direct Access8.2 considers the masking
optimization included in the
Direct implementation. Finally,
section
Prey-Predator Demo8.3 describes the code of an
embedded system which runs on a platform with very limited resources
(the GBA of Nintendo).
In this section, one considers examples in which the presence of
several processors clearly improves the computation efficiency.
Let us return to the producers/consumers example of
Producers/Consumers3.1.5 and let us suppose that the actual processing
of values is made by some thread-safe C function (a re-entrant
function that can be called simultaneously by more than one kernel
thread). Thus, each processing thread instance of the
process
module can unlink from the scheduler during the computation
of the result, and re-link to it to return the result. Then, several
instances of
process
can be simultaneously unlinked and
running on distinct processors which would certainly speed-up the
overall computation. The
process
module becomes:
module native process ()
local int v;
while (1) do
while (buffer->length == 0) do
await (new_input);
if (buffer->length == 0) then cooperate; end
end
if ((local(v) = get (buffer)) == 0) then return; end
unlink;
< call of a thread-safe C function >
link (implicit_scheduler ());
{local(v)--;}
put (buffer,local(v));
generate (new_input);
end
end module
In this case, to benefit from multiprocessor machines, one basically
has to define the process
module as native.
8.1.2 Lucky Prime Numbers
|
One now considers the production of lucky prime numbers defined in
Lucky Prime Numbers8.1.2. The system is decomposed in
two uncoupled sub-systems, one for producing prime numbers and one for
lucky numbers, that can be run by distinct kernel threads on distinct
processors. The sub-systems naturally correspond to schedulers. As
the two sub-systems do not share the same instants, result produced
have to be stored elsewhere. For this purpose, one uses two fifo
channels (of type
channel_t
defined in
Data-Flow Programming3.4). Finally, a comparison thread is defined
which goes between schedulers to detect equal values and output them.
scheduler_t lucky_sched, prime_sched;
event_t lucky_go, prime_go;
channel_t lucky_result, prime_result;
8.1.2.1 Compare
One uses a macro which extract values from a channel
chan
as
much as possible, until a value greater than the one in
hold
is
encountered. Moreover, a value equal to
hold
is output:
#define COMP(chan,go) \
{local(sw) = 0;} \
while (!local(sw)) do \
GET(chan,go) \
{\
int n = extract (chan);\
if (n == local(hold)) output (n);\
if (local(hold) <= n) {\
local(hold) = n;\
local(sw) = 1;\
}\
}\
end
The module that cyclically goes from one scheduler to the other and
compares values is defined by:
module compare ()
local int hold, int sw;
while (1) do
link (lucky_sched);
COMP(lucky_result,lucky_go);
link (prime_sched);
COMP(prime_result,prime_go);
end
end module
8.1.2.2 Main
The system produces prime numbers in the channel
prime_result
,
lucky numbers in
lucky_result
, and it creates an instance of
compare
in the implicit scheduler:
module main (int argc, char **argv)
{
lucky_sched = scheduler_create ();
lucky_go = event_create_in (lucky_sched);
lucky_result = init_channel ();
scheduler_start (lucky_sched);
producer_create_in (lucky_sched,lucky_go);
filter_create_in (lucky_sched,lucky_filtering,1,lucky_go,lucky_result);
prime_sched = scheduler_create ();
prime_go = event_create_in (prime_sched);
prime_result = init_channel ();
scheduler_start (prime_sched);
producer_create_in (prime_sched,prime_go);
filter_create_in (prime_sched,prime_filtering,1,prime_go,prime_result);
compare_create ();
}
end module
The filter
module has been slightly changed to store
produced values in a channel instead of, as previously, generating
them. Note that the scheduler in which the instance of compare
is created is actually irrelevant; the instance could
be as well created in any of the two schedulers lucky_sched
or prime_sched
.
8.1.2.3 Results
One executes the previous program on a SMP (Symetric Multi
Processor) bi-processor machine made of 2 PIII 1Ghz processors
running on Linux 2.20 with 512Mb memory. One uses the implementation
based on FairThreads to produce the first 300 lucky-prime numbers. The
previous program is compared with a variant in which there is only one
scheduler (both lucky_sched
and prime_sched
are
mapped to the implicit scheduler). The time taken by the initial
system with 2 schedulers is 10.5s, while it is 19.2s with the variant
with a single scheduler. This shows that the presence of two
processors is fully exploited by the program.
One now illustrates the efficiency of the masking technique used in
the
Direct implementation to get a constant time access to the next thread.
One first considers an example in which the gain of direct access is
maximal. The considered system is made of two active threads and of a
large number of inactive threads, which thus can be masked. First,
one defines a module
blocked
which just awaits an event and
terminates:
module blocked (event_t event)
await (local(event));
end module
The module active
prints a certain number of times a message
and then exits the program:
module active (char *s)
repeat (CYCLES) do
printf ("%s ",local(s));
cooperate;
end
end module
Finally, the system one creates two active threads and a
large number of threads blocked on the same event:
module main ()
{
int i;
event_t go = event_create ();
active_create ("1");
for (i = 0; i < THREADS; i++) blocked_create (go);
active_create ("2");
}
end module
At the beginning of the second instant, the linked
list of
the implicit scheduler contains THREADS
+2 elements. All but
the active threads are registered in the waiting list of the event
go
during the second instant. Then, at the beginning of the
third instant, linked
contains only the 2 active threads,
the other ones being masked. Thus execution of the system do not
depend any more of the THREADS
number of blocked threads,
and can be extremely fast as only 2 threads are run from that moment.
To give an idea, it takes about 0.7 seconds to run the program on a
PIII 850Mhz with 256M of memory with THREADS
equals to
100000 and CYCLES
equals to 100. With CYCLES
equals to 1000, the time becomes 0.9s. This is not surprising as CPU
is mainly consumed during the second instant.
Note that, without the masking technique implemented by Direct one
should get about 6s for CYCLES
equals to 100 but 57s for
CYCLES
equals to 1000!
Now let us consider the worst case for the masking technique, when
threads are actually masked and unmasked at each instant. One changes
blocked
to get a cyclic behavior:
module blocked (event_t event)
while (1) do
await (local(event));
cooperate;
end
end module
Then, one generates the triggering event go
at each instant.
Now, in the same previous conditions THREADS
equals to
100000 and CYCLES
equals to 100, the time becomes 27s with
the masking technique and 26s without it. With CYCLES
equals
to 1000, the time is 4m29s with masking and 4m6s without. Thus, in
this case, the masking approach leads to a loss of efficency (actually a
small one). Of course, intermediate results can be obtained which
depend on the frequency of generations of go
.
From the rather academic previous examples, one sees that the gain
increases as there are more and more threads that are rarely
awaken. In real situations the gain is of course less important than
the one of the best case. This is the case with the sieve of
Sieve Examples3.3 for producing prime numbers. For producing
the 3000 first prime numbers with the making technique it takes 13.7s
and without it 17.4s. For producing 5000 prime numbers, it takes 38s
with masking and 1m without it. For 10000 numbers and masking: 2m37s
and without masking: 4m24s. Thus, in this example the gain obtained by
using masking is significant.
One describes the main parts of the code of a demo of a small
preys-predators system implemented for the
GameBoy Advance
(GBA) platform of Nintendo.
A predator moves using the arrow keys. Preys are running away from
predators and are killed when a predator is sufficiently close to
them. New preys are automatically created. A predator is put in an
automatic chasing mode with key 'a' and returns in the manual mode
with 'b'. More predators can be created with 'l', and more preys with
'r'. Here is a screenshot of the demo running on the VisualBoyAdvance
emulator.
The predator is the ghost and the preys are the pacmans. This demo is
available at the Web site
http://www.gbadev.org.
One calls
sprites the basic moving graphical objects to which
reactive behaviors are associated. Sprites are in limited number in
the GBA (actually, 128). The type of sprites
sprite_t
is:
typedef struct
{
s16 x;
s16 y;
s16 sx;
s16 sy;
u8 index;
thread_t *behaviors;
u8 behav_num;
event_t destroy;
u8 dead;
}
struct_sprite_t, *sprite_t;
The types s16
and u8
are integer types (actually,
signed 16 bits, and unsigned 8 bits). Fields x
and y
are the sprite coordinates. sx
and sy
define
the speed vector of the sprite. index
is the index of the
sprite in the table where sprites are stored. behaviors
is
an array of threads in elementary behaviors of the
sprite can be stored. The size of behaviors
is stored in
behav_num
. The event destroy
is used to signal the
destruction of the sprite. Finally, the boolean dead
is used
to note that the sprite is destroyed.
Destruction
Destruction of a sprite is performed using the following module
destroy_sprite
:
module destroy_sprite (sprite_t s)
{
int i;
sprite_t s = local(s);
for (i=0;i<s->behav_num;i++) stop (s->behaviors[i]);
}
cooperate;
{
int i;
sprite_t s = local(s);
suppress_sprite (s->index);
for (i=0;i<s->behav_num;i++) thread_deallocate (s->behaviors[i]);
event_deallocate (s->destroy);
free (s->behaviors);
free (s);
thread_deallocate (self ());
}
end module
First, all components of behaviors
are stopped. Then, at the
next instant, they are deallocated (by thread_deallocate
)
with the sprite structure itself. Finally, the executing thread
(returned by self
) is deallocated.
Because of the cooperate
, threads are actually stopped at
instant N+1, and are deallocated at instant N+2 (remember thar calls
to stop
and to thread_deallocate
produce orders
which are not immediately processed, but are delayed to the next
instant).
Basically, keys are converted into events. The function
create_key_events
creates the 8 events needed:
void create_key_events (void)
{
a_pressed = event_create ();
b_pressed = event_create ();
up_pressed = event_create ();
down_pressed = event_create ();
left_pressed = event_create ();
right_pressed = event_create ();
r_pressed = event_create ();
l_pressed = event_create ();
}
The purpose of the module key_handler
is to convert at each
instant, the key press (obtained through the function pressed_key
) into the corresponding events (remark that several
keys can be simultaneously processed).
module key_handler ()
while (1) do
{
if (pressed_key (KEY_A)) generate (a_pressed);
if (pressed_key (KEY_B)) generate (b_pressed);
if (pressed_key (KEY_UP)) generate (up_pressed);
if (pressed_key (KEY_DOWN)) generate (down_pressed);
if (pressed_key (KEY_LEFT)) generate (left_pressed);
if (pressed_key (KEY_RIGHT)) generate (right_pressed);
if (pressed_key (KEY_R)) generate (r_pressed);
if (pressed_key (KEY_L)) generate (l_pressed);
}
cooperate;
end
end module
The module key_processing
associates a callback (of type
void (*callback_t) ()
) to a key. 10 instants are passed when
a key press is detected, in order to give enough time for the key to
be released.
module key_processing (event_t key_pressed,callback_t callback)
while(1) do
await (local(key_pressed));
{
local(callback) ();
}
repeat (10) do cooperate; end
end
end module
8.3.3 Elementary Behaviors
|
One describes several elementary behaviors which will be used by
sprites: inertia, moves in the 4 directions, and collision.
Inertia
A sprite gets an inertial behavior with the
inertia
module,
in which the sprite given in parameter increments at each instant its
coordinates according to its speed.
module inertia (sprite_t s)
while (1) do
{
sprite_t s = local(s);
s->x += s->sx/INERTIA_FACTOR;
s->y += s->sy/INERTIA_FACTOR;
}
cooperate;
end
end module
Moves
With the module
move_right
, a sprite moves when the event
right_pressed
is present:
void go_right (sprite_t s,s16 dist)
{
s->x += dist;
if (s->x > MAX_BORDER_X) s->x = MAX_BORDER_X;
}
module move_right (sprite_t s,int dist)
while (1) do
await (right_pressed);
go_right (local(s),local(dist))
cooperate;
end
end module
Similar modules are defined for the 3 other directions.
Collisions
A function which performs elementary collision between two sprites is
first defined:
static void collision (sprite_t s1,sprite_t s2)
{
u16 dx = s2->x - s1->x;
u16 dy = s2->y - s1->y;
if (dist2 (s1,s2) > min_collision) return;
s1->sx = -s1->sx;
s1->sy = -s1->sy;
if (dx>0) go_left (s1,dx); else go_right (s1,-dx);
if (dy>0) go_up (s1,dy); else go_down (s1,-dy);
}
Then, the collision behavior of a sprite is described in the module
collide
. Collision detection is related to a special event
which is generated by all sprites subject to collision. This event is
generated by all these sprites, at each instant.
module collide (sprite_t me,event_t collide_evt)
local sprite_t other, int i, int exit;
while (1) do
{
local(i) = 0;
local(exit) = 0;
}
generate_value(local(collide_evt), (void*)local(me));
while(!local(exit)) do
get_value (local(collide_evt),local(i), (void**)&local(other));
{
if (return_code () == OK) {
if (local(me) != local(other)) collision (local(me),local(other));
local(i)++;
} else {
local(exit) = 1;
}
}
end
end module
The body of the module collide
is an infinite loop in which
the collision event collide_evt
is generated at each
instant. The sprite which generates the event is passed as value to
the generate_value
function. Then, all generations of collide_evt
are considered in turn, and the function collide
is called for all (except of course itself) the generated
values, that are the other sprites. The return code becomes different
from OK (more precisely: ENEXT) when all the sprites are
processed. This is detected at the next instant, and there is thus no
need of an explicit cooperate
to prevent the loop to be
instantaneous.
Actually, the collision
function implements a very basic
algorithm which leads to some unrealistic moves; it can of course
be replaced by more realistic (but more expensive!) collision
treatments. Moreover, each sprite considers all others sprites, at
each instant which is inefficient (complexity in the square of the
number of sprites). The algorithm could certainly also be improved in
this respect.
The basic behavior of a predator is defined as the module
predator
:
module predator (sprite_t me,event_t pred_evt,event_t prey_evt)
local sprite_t other, sprite_t closest,
int min, int i, int finished;
while (1) do
{
local(i) = 0;
local(finished) = 0;
local(min) = predator_visibility;
generate_value (local(pred_evt), (void*)local(me));
}
while (!local(finished)) do
get_value (local(prey_evt),local(i), (void**)&local(other));
if (local(me)->dead) then return; end
{
if (return_code () == OK) {
int d2 = dist2 (local(me),local(other));
if (local(me) != local(other) && d2 < local(min)) {
local(min) = d2;
local(closest) = local(other);
}
local(i)++;
} else {
local(finished) = 1;
if (local(min) < predator_visibility && !local(closest)->dead)
chase (local(me),local(closest));
}
}
end
end
end module
At each instant, the event pred_evt
is generated by each
predator. It is a valued generation, in which the sprite is passed as
value. Then, all values associated to the event prey_event
are considered in turn. When a new value is available, one tests if
the sprite is dead (it could have been killed just before); the
behavior returns if it is the case. Otherwise, the return code is
considered:
- if it is OK, then
other
denotes the other
sprite. The distance to it is computed (actually, the square of it,
returned by dist2
), and if the other sprite is distinct from
the sprite and closer than the one in the field closest
,
then closest
is changed.
- otherwise, the closest prey is chased, provided it can be seen
by the predator (according to the global variable
predator_visibility
) and provided it is still alive (according to
its death
field). The destroy
event of the prey is
generated when it is killed.
The module defining a predator sprite is called predator_sprite
:
module predator_sprite ()
local thread_t inertia;
{
sprite_t s = new_sprite (0,GHOST);
predator_create (s,pred_evt,prey_evt);
sync_sprite_create (s);
move_right_create (s,pred_run_step);
move_left_create (s,pred_run_step);
move_up_create (s,pred_run_step);
move_down_create (s,pred_run_step);
bounce_on_borders_create (s);
collide_create (s,pred_evt);
local(inertia) = inertia_create (s);
suspend (local(inertia));
}
while (1) do
await (a_pressed);
resume (local(inertia));
cooperate;
await (b_pressed);
suspend (local(inertia));
cooperate;
end
end module
The local variable inertia
stores the inertia thread giving
the inertial behavior to the sprite. Initially, inertial
is
suspended and the predator is thus immobile. When receving the event
a_pressed
, inertia
is resumed, and it is suspended
again with b_pressed
. Initially, an atomic action is first
executed which performs the following actions:
- A new sprite
s
is first created with a
zero speed vector and a ghost aspect.
- An instance of the module
predator
is created. The
sprite and the events pred_evt
and prey_evt
are
passed to it.
- Several threads are created, all of them taking
s
as
parameter: an instance of sync_sprite
to perform
synchronisation of the sprite with the corresponding entry in the
table of sprites; four instances of move
to move the
sprite in the 4 directions, according to 4 corresponding events;
an instance of bounce_on_borders
to let the sprite bounce on
the borders of the applet; an instance of collide
to let
the sprite collide with other predators.
- Finally, an instance of the module
inertia
is created
and assigned to the local variable with the same name. The instance is
immediately suspended, using the suspend
function.
Note that the various threads created are not stored in the behaviors
field of the sprite. Actually, only inertia
will be accessed after creation, and one use a special local variable
to store it.
The prey sprite is defined in the module
prey_sprite
:
module prey_sprite ()
local sprite_t s;
{
sprite_t s = new_sprite (0,PACMAN);
alloc_behaviors (s,5);
s->behaviors[0] = inertia_create (s);
s->behaviors[1] = bounce_on_borders_create (s);
s->behaviors[2] = prey_create (s,pred_evt,prey_evt);
s->behaviors[3] = sync_sprite_create (s);
s->behaviors[4] = collide_create (s,prey_evt);
local(s) = s;
}
await (local(s)->destroy);
run destroy_sprite (local(s));
thread_deallocate (self ());
end module
The overall behavior is made of several components: the specific one,
called prey
, an inertia behavior, a bouncing one, a
synchronisation behavior and a collision one. All these elementary
behaviors are stored in behaviors
because they must be
destroyed when the prey is killed. The definition of the module prey
is very close to the one of predator
, with 2
changes: first, the events generated and observed are inverted;
second, the chase
function is changed by a function that
make the prey run away from the predator.
After creation of the elementary behaviors, the destroy
event is awaited (let us recall that it is generated when the prey is
killed by the the predator). Then, an instance of the destroy_sprite
module is run in order to deallocate the various
behaviors of the prey, and finally the executing thread is
deallocated.
8.3.6 Automatic Creation of Preys
|
The module
automatic
automatically creates new preys, after
a while, when there is none in the system.
module automatic (int delay)
local sprite_t other;
while(1) do
get_value (prey_evt,0, (void**)&local(other));
if (return_code () == ENEXT) then
repeat (local(delay)) do cooperate; end
repeat (NUMBER_OF_PREYS) do prey_sprite_create (); end
cooperate;
end
cooperate;
end
end module
At each instant, the event generated by preys (prey_evt
) is
observed, using the get_value
function. If no value with
index 0 is available, which means that no prey was present during the
previous instant, then new preys are created. Otherwise, the thread
simply cooperates.
In the context of the GBA, the main module is called
GBA_main
.
module GBA_main()
{
prey_evt = event_create ();
pred_evt = event_create ();
create_key_events ();
key_handler_create ();
automatic_create (CREATION_DELAY);
predator_sprite_create ();
key_processing_create (l_pressed,predator_sprite_create);
key_processing_create (r_pressed,prey_sprite_create);
max_pred_speed = 10;
kill_dist2 = 200;
predator_visibility = 40000;
pred_run_step = 2;
max_prey_speed = 30;
prey_visibility = 8000;
prey_run_step = 3;
}
end module
First, the two events generated at each instant by preys and predators
are created. Then, events corresponding to keys are created with an
instance of the module key_handler
which convert key presses
into events. An instance of automatic
and one of predator_sprite
are created. Finally, some global variables are set
to correctly parametrize the demo.
9 Related Work
One considers three domains related to
Loft. The first is the one
of threads which either appear as libraries or are incorporated in
existing languages. The second domain is the synchronous-reactive
approach from which
Loft takes its major constructs. The third
domain contains approaches in which computations are decomposed into
small pieces that can be combined to build complex behaviors.
9.1 Thread Based Formalisms
|
Several thread libraries exist for C. Among them, the
Pthreads library
[
29] implements the POSIX standard for preemptive
threads. LinuxThreads [
4] is an implementation
of
Pthreads for Linux; it is based on native (kernel-level) threads.
Quick Threads[
25] provides programmers with a
minimal support for multi-threading at user-space level. Basically,
it implements context-switching in assembly code, and is thus a
low-level solution to multi-threading.
Gnu Portable Threads[
17] (GNU Pth) is a library of purely
cooperative threads which has portability as main objective. The Next
Generation POSIX Threading project[
5] proposes to
extend GNU Pth to the M:N model (M user threads, N native threads),
with Linux SMP machines as target.
Java introduce threads at language level. Actually, threads are
generally heavily used in
Java, for example when graphics or
networking is involved. No assumption is made on the way threads are
scheduled (cooperative or preemptive schedulings are both possible)
which makes
Java multi-threaded systems difficult to program and to
port[
23]. This difficulty is pointed out by the
suppression from the recent versions (1.2) of the language of the
primitives to gain fine control over threads[
3].
These primitives actually corresponds to the
stop
,
suspend
, and
resume
of
Loft.
The use of
Java threads with multiprocessor architectures leads to
several difficulties which are actually under study in the
Java
community. One of these difficulties is called the
Double Check
Locking (DCL) which concerns the
Java memory model (JMM) and is
rather tricky (see ??? for details). These studies should lead to
some changes of the
Java thread model in the next versions (1.5) of
Java.
It should be mentioned that the initial version of
FairThreads has been
proposed in the context of the
Java language[
12]
in order to simplify concurrent programming in
Java; this version
was however limited to cooperative threads.
Threads in Functional Languages
|
Threads are used in several ML-based languages such as CML[
31]. CML is preemptively scheduled and threads communication is
synchronous and based on channels. Threads are also introduced in
CAML[
2]; they are implemented by time-sharing on a
single processor, and thus cannot benefit from multiprocessor
machines.
FairThreads has been recently introduced in the Bigloo[
1]
implementation of Scheme. The present version does not support
unlinked threads, but special constructs are introduced to deal with
non-blocking cooperative I/Os.
FairThreads and
Loft actually belong to the so-called
reactive
approach[
6] which is issued from
synchronous
languages. One first compare synchronous languages and the reactive
approach before describing several reactive programming languages.
Synchronous Languages vs. Reactive Programming
|
To the family of synchronous languages[
19] belong
several programming languages which all share the same notion of an
instant which is supposed to be of zero duration. In this context, the
output of a program at a given instant is synchronous with its
input; the synchronous characterization of these languages comes
from this hypothesis.
Lustre and Signal are two data-flow synchronous languages in which one
programs nets of operators, in a style very close to the one of
section
Data-Flow Programming3.4. At the basis of these
dataflow languages are the nets of processes of [
24].
Following the StateCharts of D. Harel, several systems have been
proposed for synchronous graphical programming, among which are
SyncCharts and Argos.
The
Esterel synchronous language[
9] introduces
broadcast events, called
signals, in an imperative
style. However, in
Esterel, the absence of an event can be decided
immediately and consequences of this absence can take place in the
very same instant. This leads to "causality problems" as for example
in
present S else emit S end
. In this program, indeed, the
signal
S
is emitted if it is absent during the instant,
which is of course contradictory.
As opposite to
Esterel, in reactive programming the absence of an
event during one instant cannot be decided before the end of this very
instant. As a consequence, the reaction to the absence of one event is
delayed to the next instant. This is a way to solve the causality
problems which are obstacles to modularity.
All synchronous languages are
static in the sense that they
disallow the creation at run time of new parallel components and of
new events. Moreover, they do not give users any way to deal with
multiprocessor machines. However, synchronous languages generally put
the focus on hardware circuits in which dynamic creation does not
appear. This is a major difference with reactive programming which
limits to software systems.
A major claim of synchronous languages is that they make possible
proofs and validations of programs. This is a consequence of the
existence of a formal semantics and of the static characteristics of
these languages. Program proofs and validations have not yet be
considered in reactive programming.
The
Reactive-C[
11] language was the first proposal
for reactive programming in C; in this respect,
Loft can be
considered as a descendant of it.
Reactive-C proposes instants and the
merge
operator which implements deterministic parallelism. A
reaction of the instruction
merge i1 i2
consists in a reaction of
i1
followed in the same
instant by a reaction of
i2
. Thus, one has a
deterministic left-right operator with which parallelism can be
introduced at any level.
Reactive-C does not define events, as they
are defined in
Loft.
Reactive-C is basically implemented as a
straightforward preprocessor of C.
Reactive-C introduces primitives for remote execution of reactive
programs over the network. Remote execution is based on the use of the
Remote Procedure Call (RPC) mechanism. Distribution is not yet
possible neither in
Loft nor in
FairThreads.
Reactive Programming in Java
|
The reactive approach has been implemented in
Java in several ways.
The first one is the SugarCubes[
13] framework
which is a set of classes for reactive programming in
Java. Several
related formalisms have been designed, among which are Junior[
21] and Rejo [
7].
The definition of
Rewrite and
Replace in
Implementations7 is strongly linked to the implementations with the
same names of the
Java Junior framework [
22]. The
use of waiting queues has appeared in the implementation of Junior,
called
1, which
has been proposed by Laurent Hazard to deal with large number
of parallel components and events. The
Direct semantics is
strongly inspired from it.
Rejo is a language which is proposed for high-level reactive
programming in a
Java based language. Rejo is implemented on top of
Junior and thus can benefit from the various implementations of it.
As opposite to SugarCubes and Junior, standard
Java code can be freely
mixed with reactive code. Rejo also introduces primitives for
migration of reactive code, implemented using the serialization and
the RMI mechanisms of
Java.
9.3 Approaches based on Decomposition
|
Chores[
16] and
filaments[
27] are small pieces of code that do not have private stack
and are never preempted. Chores and filaments are designed for
fine-grain parallelism programming on shared-memory machines. Chores
and filaments are completely executed and cannot be suspended nor
resumed. Generally, a pool of threads is devoted to execute them.
Chores and chunk-based techniques are described in details in the
context of the
Java language in [
15] and [
23]. Automata in
FairThreads are close to chores and filaments, but
give programmers more freedom for direct coding of states-based
algorithms. Automata are also related to
mode automata[
28] in which states capture the notion of a running
mode in the context of the synchronous language Lustre[
19].
Cohorts and Staged Computation
|
Cohort scheduling[
26] dynamically reorganizes series
of computations on items in an input stream, so that similar
computations on different items execute consecutively. Staged
computation intends to replace threads. In the staged model, a
program is constructed from a collection of stages and each stage has
scheduling autonomy to control the order in which operations are
executed. Stages are thus very close to instants of
FairThreads and cohort
scheduling looks very much like cooperative scheduling. In the staged
model, emphasis is put on the way to exploit program locality by
grouping similar operations in cohorts that are executed at the same
stage; in this way, cohorts and staged computations fall in the
family of data-flow models.
10 Conclusion
One has defined
Loft, a language which extends C to concurrent
and parallel programming in a shared memory context. The main
questions to which
Loft proposes answers can be listed as follows:
how synchronous and asynchronous executions can be mixed together in
an unified framework? How system architectures which are basically
cooperative can benefit from parallelism offered by multiprocessor
machines? How to conciliate concurrency and determinism? What
compromises should be accepted in order to get a powerful enough
programming, while staying in a simple and relatively safe context?
The answers in
Loft rely on several aspects related to the underlying
programming model, to semantics concerns, to implementation issues, and to
design choices.
In
Loft, the units of concurrency are threads which are defined as
instances of modules. There are basically two kinds of threads:
threads which are linked to a scheduler and threads which are
unlinked. A thread cannot be linked simultaneously to several
schedulers, but it can link to distinct schedulers during its
execution.
All the threads linked to a same scheduler are run in a cooperative
mode under the supervision of the scheduler. Moreover, they are
sharing events. The threads are run cyclically until all have either
cooperated or have reacted to events which are present. The current
instant is finished when a stable situation is found in which all the
events which are not present can be safely considered as absent. The
scheduler can then proceed to the next instant after having
incorporated in the system orders sent to it during the instant. These
orders can be orders to stop, suspend or resume threads, or to add new
created threads.
Several schedulers can be simultaneously present in the same
Loft
program. Each scheduler defines a synchronous area in which threads
run at the same pace, because they share the same instants, and in
which events are broadcast, because all threads have exactly the same
information on the presence and the absence of events.
It is possible for some threads instances of native modules to run
autonomously, unlinked from any scheduler. When it is the case, these
threads behave exactly as standard native threads scheduled by the
preemptive operating system. However, unlinked threads still have the
possibility to link to a scheduler, and then to behave as a
cooperative thread. Of course, they can later on return in the
unlinked state if they want.
Several advantages of the programming model of
Loft are to be
pointed out:
- Loft offers an unified framework in which both
cooperative and preemptive approaches can be used jointly. Actually,
in Loft, users have control on the way threads are scheduled.
Loft provides users with programming primitives allowing threads to
dynamically link to schedulers and to dynamically unlink from them.
This framework is flexible as the structure of schedulers can evolve
dynamically. It actually proposes N:M architectures, with the
advantage that there exist a syntax to deal with them (as opposite for
example with the threads of the Solaris system of Sun).
- Cooperative systems can be coded in a simple way, without the
need of locks to protect data. Moreover, instants give automatic
synchronizations that can greatly simplify programming in certain
situations. Linked threads have a precise and clear semantics. The
point is that systems exclusively made of threads linked to one unique
scheduler are completely deterministic. This simplifies reasonning
about program. It also makes debugging easier. Thus, in the
cooperative context, one gets a simple language for concurrent
programming which is deterministic and have a precise semantics.
- When a thread unlinks from a scheduler, it becomes an
autonomous native thread which can be run in real parallelism, on a
distinct processor. Schedulers can also be executed by native threads,
on distinct processors. Thus, Loft give the user access to
multiprocessing in a very simple way.
Loft has a formal semantics. Actually the semantics is an
operational one which describes the steps that should be executed to
reach a result. The presence of a formal semantics is not anecdoctical
and is useful for at least two reasons: first, it give the exact
meaning of programs which can be much more complex than in traditional
programming, as a result of concurrency and of the new dimension
introduced by instants. The purpose of the semantics is to exactly
define what should be the execution of a program made of several
concurrent threads along the passing of instants. Second, variants of
the semantics are defined in order to model various aspects of the
execution. More precisely, a series of semantics variants defines
relatively small steps starting from the basic Rewrite semantics
and reaching the Direct semantics which is designed to be extremely
efficient. One can get strong confidence in each of these
steps, and thus see this whole process as a way to validate the
implementation. The presence of a formal semantics is in this respect
a means for a safer programming.
The language has several implementations. The first one is a direct
translation in FairThreads which is an API of threads closely related to
Loft. The second implementation Rewrite, which serves as a
reference for the cooperative part of the language, is a direct
translation of the rewriting rules of the semantics. The Replace
implementation optimises the memory use, while the Optim
implementation optimizes the waiting of events, to avoid busy-waiting,
and can be used for embedded systems with few ressources.
To sum-up, the main characteristics of
Loft are:
- Concurrent language based on C.
- Both cooperative and preemptive approaches mixable in the same
framework.
- Possibility of real parallelism (use of multiprocessors).
- Determinism of programs belonging to the cooperative part of
the language.
- Precise (formal) semantics defined for the cooperative part.
- Efficient implementation, usable for embedded systems.
Finally, one would say that the external syntax of
Loft could be
subject to change in the future. This should not be shocking as, after
all, syntax is rather secondary compared to the programming model and to
the semantics.
One now considers possible topics for future work.
In
Loft, as in
FairThreads, parallelism is only at top-level. Indeed,
there is no parallel construct which can be used at any level, as in
Esterel for example. Consider for example a statement of the form
(A || B);C
, where
||
is the parallel operator. In
it, the parallel operator is not at the root of the syntax tree, but
on the left branch of a sequence instruction. In
Loft one must use
indirect constructions for expressing parallelism at arbitrary levels
of nesting. The previous example could be for instance coded using a
join
instruction to wait for the termination of both
A
and
B
. A possible extension of
Loft would be to
introduce a deterministic parallel operator, as for example the one of
Reactive-C. Others constructs could also be added to the language;
for example, the
until
module of section
Run12.9.10 could be directly introduced in the language as a preemption
instruction. One could also consider local events whose extend is
defined syntaxically by a language construct, as it is the case in
standard reactive programming, such as Junior.
A rather natural extension of Loft would be to interface it with
the network. Thus, distributed synchronous areas could be defined,
running on distinct machines. The difficulty is to give a way for a
thread to link to a distant machine. This is a kind of thread
migration that would open programming to systems based on agents.
Some open problems are raised by such an approach: what
exactly are the threads that can migrate and how are they dealing with
the memory? What is the syntax for migration? How migrating threads
can dynamically use the resources of a distant site? How can a site be
protected from dangerous migrating threads (for example, not
cooperating ones)?
Loft could be rather easily extended to C++. This would simplify
the code produced by the Loft translator as classes encapsulating
the local data of threads could be generated. Of course, other target
languages could also be considered.
As the others formalisms for reactive programming, Loft and FairThreads
are available under the Gnu public licence on the site http://www.inria.fr/mimosa/rp.
11 Annex A: API of FairThreads
ft_scheduler_t ft_scheduler_create (void);
ft_thread_t ft_thread_create (ft_scheduler_t scheduler,
void (*runnable)(void*),
void (*cleanup)(void*),
void *args);
ft_thread_t ft_automaton_create (ft_scheduler_t,
void (*automaton)(ft_thread_t),
void (*cleanup)(void*),
void *args);
ft_event_t ft_event_create (ft_scheduler_t scheduler);
int ft_scheduler_start (ft_scheduler_t scheduler);
int ft_scheduler_stop (ft_thread_t thread);
int ft_scheduler_suspend (ft_thread_t thread);
int ft_scheduler_resume (ft_thread_t thread);
int ft_scheduler_broadcast (ft_event_t event);
int ft_scheduler_broadcast_value (ft_event_t event,void *value);
int ft_thread_cooperate (void);
int ft_thread_cooperate_n (int num);
int ft_thread_join (ft_thread_t thread);
int ft_thread_join_n (ft_thread_t thread,int timeout);
int ft_thread_generate (ft_event_t event);
int ft_thread_generate_value (ft_event_t event,void *value);
int ft_thread_await (ft_event_t event);
int ft_thread_await_n (ft_event_t event,int timeout);
int ft_thread_select (int len,ft_event_t *array,int *mask);
int ft_thread_select_n (int len,ft_event_t *array,int *mask,int timeout);
int ft_thread_get_value (ft_event_t event,int num,void **result);
int ft_thread_link (ft_scheduler_t scheduler);
int ft_thread_unlink (void);
Current Thread and Scheduler
|
ft_thread_t ft_thread_self (void);
ft_scheduler_t ft_thread_scheduler (void);
int ft_thread_mutex_lock (pthread_mutex_t *mutex);
int ft_thread_mutex_unlock (pthread_mutex_t *mutex);
pthread_t ft_pthread (ft_thread_t thread);
AUTOMATON(name)
DEFINE_AUTOMATON(name)
BEGIN_AUTOMATON
END_AUTOMATON
STATE(num)
STATE_AWAIT(num,event)
STATE_AWAIT_N(num,event,delay)
STATE_GET_VALUE(num,event,n,result)
STATE_STAY(num,delay)
STATE_JOIN(num,thread)
STATE_JOIN_N(num,thread,delay)
STATE_SELECT(num,n,array,mask)
STATE_SELECT_N(num,n,array,mask,delay)
GOTO(num)
GOTO_NEXT
IMMEDIATE(num)
RETURN
SELF
SET_LOCAL(data)
LOCAL
ARGS
RETURN_CODE
12 Annex B: LOFT Reference Manual
Loft is an extension of C for concurrent and parallel programming.
Basic units of concurrency are threads which are created as instances
of modules and are run under the control of schedulers. Threads can
synchronize and communicate using broadcast events. Threads linked to
the same scheduler run in a cooperative way, leading to deterministic
systems. Threads can dynamically change their linking to
schedulers. Moreover, special native threads have the possibility to
stay unlinked from any scheduler, becoming then autonomous. Unlinked
threads can be executed in real parallelism.
Loft code is a mix of standard C code and of modules.
Loft code
is first preprocessed (typically by the
gcc -E
command),
then translated into C, and finally compiled by the C compiler in
order to get executable code.
Modules are templates from which threads, called
instances,
are defined. Module definitions start whith the keyword
module
and terminate with
end module
. The syntax of
modules is:
module_definition:
module kind module_name (params)
locals
inst
final
end module
A module is
native when the keyword
native
follows
module
. Native modules should only be used in programs run
by a preemptive operating system (Linux for example). Instances of a
native module must be implemented as kernel threads. Instances of
non-native module do not need specific kernel threads to execute (they
can use the one of the scheduler to which they are linked).
kind: native | /*empty*/
The module name is an identifier or the keyword
main
(this
case is considered in section
Main Module12.1.2).
module_name: identifier | main
Parameters in
params and variables defined in
locals
are considered in section
Variables12.2.
The body
inst of the module is an instruction (usually, a
sequence of instructions) which is executed by the threads instances
of the module. Instructions are considered in section
Instructions12.6.
The finalizer is an optional atomic instruction which is executed in
case of forced termination; the executing thread of finalizers is
left unspecified.
finalizer: finalizer atomic | /*empty*/
Modules defined in others files, or later in the same file, can be
declared using clauses of the form:
extern_modules: module identifier, ... , identifier ;
A program containing a module named
main
can be directly
executed. The parameters of
main
must follow the
argc/argv
conventions of C. The translation of the
main
module, first creates a new instance of the module and an implicit
scheduler, then adds the new thread in the implicit scheduler, and
finally makes the implicit scheduler react cyclically.
When no
main
module is defined, a function C with the name
main
must be defined, as usual in C. In this case, it must
call the function
loft_initialize
in order to create the
implicit scheduler.
loft_initialize: loft_initialize () ;
There are 3 distinct kinds of variables:
- Global variables declared outside module
definitions. These variable are the standard global variables of C.
- Automatic variables declared in blocks of C instructions (of
the form
{...}
). These variables can appear in declarations
of C functions, or in atomic instructions, in module
definitions. Automatic variables are allocated when the control enters
the block in which they are declared, and destroyed when the block is
left.
- Parameters and local variables declared in modules. Each
instance of a module has its own copy of these variables. Their values
are preserved during the passing of instants.
Thus, concerning memory management,
Loft basically differs from C
by the presence of variables of kind 3, which are local to threads and
maintain their value between instants.
12.2.1 Types of Local Variables and Parameters
|
Types of parameters and local variables can be:
- Basic types (for example,
int
).
- Named types (defined using
typedef
).
- Pointer or double pointer types on the previous types.
params: defvar, ..., defvar | /*empty*/
defvar:
type identifier
| type * identifier
| type * * identifier
type: basic_type | identifier
locals: local defvar, ... , defvar | /*empty*/ ;
Remark: the restriction on the types of parameters and local variables
is introduced to simplify the implementation and should be removed in
future versions of the language.
12.2.2 Access to Local Variables and Parameters
|
Local variables and parameters must always be explicitly tagged by the
keyword local
to distinguish them from global or automatic
variable with same names: all uses of a local variable or parameter
x
must be of the form local(x)
.
Threads are concurrent entities created from modules and run by
schedulers. Threads are of the type
thread_t
.
New threads instances of a module
m
are returned by the C
function
m_create
which has the same parameters as
m
. The new threads are linked to the implicit scheduler.
With the function
m_create_in
, new created threads are
linked to a scheduler given as first parameter (the other parameters
are passed to the thread in the standard way).
thread_creation:
identifier_create (exp, ... , exp)
| identifier_create_in (exp, exp, ... , exp)
Threads linked to a scheduler during one instant start running only at
the beginning of the next instant.
The state of a thread is composed of two parts: the execution state
and the binding state. The possible values for the execution state of
a thread are:
- Ready to execute.
- Terminated. A thread becomes terminated when it has finished
to execute its body. A thread can also be forced to terminate when
it is stopped.
- Suspended. A suspended thread must be resumed before it can
continue to execute.
The possible values for the binding state are:
- Linked to a scheduler. When linked to a scheduler, a
thread must cooperate with the others threads linked to the same
scheduler, and its execution is under the control of the scheduler.
- Unlinked. This is only possible for instances of native
modules. Unlinked threads are autonomous and run at their own pace.
Initially, that is when created, a thread is ready to execute and
linked to the scheduler in which it is created.
Unlinked threads are supposed to be executed by kernel threads. An
unlinked thread run autonomously, until it reaches a
link
instruction which changes its linking state.
A linked thread resumes execution of its body when it receives the
control from the scheduler to which it is linked. As the scheduling is
cooperative, the resumption is always supposed to finish after a
finite delay. At the end of the resumption, the thread returns the
control back to the scheduler and signals it with one of the following
values:
- Terminated. The thread terminates when it completes the last
instructions it has to run.
- Blocked. The thread is blocked either because it is awaiting an
unknown event (
await
), or because it tries to get an event
value which is not available (get_value
).
- Cooperating. The thread is not terminated but has finished to
execute for the current instant.
12.3.4 Return Code and Self
|
Threads have a
return code set during the execution of certain
non-atomic instructions (namely,
await
,
join
, and
get_value
). The return code can be
OK,
ETIMEOUT, or
ENEXT. The return code of the executing thread, set by the last
executed non-atomic instruction, is returned by the
return_code
function:
thread_return_code: return_code ()
The executing thread is returned by the
self
function:
self: self ()
12.3.5 Thread Deallocation
|
The memory allocated to a thread can be recovered using the
thread_deallocate
function:
thread_deallocation: thread_deallocate (exp) ;
This is the programmers's responsability to perform correct
deallocation of threads.
A scheduler defines instants during which all ready threads linked to
it are run in a cooperative way. Schedulers are of type
scheduler_t
. New schedulers can be created with the function
scheduler_create
:
scheduler_creation: scheduler_create ()
An implicit scheduler should exist in all program. It is returned by
the function
implicit_scheduler
.
During execution, a linked thread can have access to the
current scheduler running it, with the function
current_scheduler
.
scheduler_access:
implicit_scheduler ()
| current_scheduler ()
A
phase of a scheduler consists in giving the control in turn
to all the ready threads which are linked to it. The order in which
the threads receive the control is the order in which they have been
linked to the scheduler. During an instant, the scheduler executes
cyclically a serie of phases, up to a situation where no thread
remains blocked. Then, the scheduler decides that the current instant
is finished, and it can then proceed to the next instant. Thus, at the
end of each instant, all ready threads either have terminated or have
cooperated.
The scheduler decides that the current instant is finished when, at
the end of a phase, there is no possibility for any blocked thread to
progress: no awaited event has been generated, and no new value has
been produced which could unblock a thread trying to get it.
One instant of a scheduler is executed by the
scheduler_react
function.
scheduler_react: scheduler_react (exp) ;
In
scheduler_react(exp)
,
exp
should evaluate to a
scheduler previously created by
scheduler_create
.
12.4.2 Scheduler Deallocation
|
The memory allocated to a scheduler can be recovered using the
scheduler_deallocate
function:
scheduler_deallocation: scheduler_deallocate (exp) ;
This is the programmers's responsability to perform correct
deallocation of schedulers.
Events are used by thread to synchronize and to communicate. Events
are of type
event_t
. An event is always created in a
specific scheduler which is in charge of it during all its
lifetime. The function
event_create
returns an event created
in the implicit scheduler. The function
event_create_in
returns an event created in the scheduler in argument.
event_creation:
event_create ()
| event_create_in (exp)
At each instant, events have a presence status which can be present,
absent, or unknown. All events managed by a scheduler are
automatically reset to unknown at the beginning of each new instant
(thus, events are non-remanent data). All events which are unknown
become absent when the scheduler decides to terminate the current
instant (see
Instants12.4.1).
There are two ways for an event to be present during one instant:
- There were an order to generate it received by the
scheduler during execution of the previous instant and coming from a
thread which is not linked to the scheduler.
- It is generated during the current instant by one of the
threads linked to the scheduler.
One can associate values to generations of events. All values
associated to the generations of the same event during one instant are
collected in a list, as they are produced. They are available only
during the current instant, using the
get_value
instruction
(
Get Value12.9.6).
generate:
| generate (exp) ;
| generate_value (exp, exp) ;
The execution of
generate(exp)
starts by the evaluation of
exp
which should return an event
e. If the executing
thread is unlinked or if the scheduler
sched in charge of
e (the one in which
e has been created) is different from
the one of the thread, then the order is sent to
sched to
generate
e at the beginning of its next instant. Otherwise,
e is immediately generated in
sched.
The execution of
generate(exp,exp)
starts by the evaluation
of the two expressions in argument. The first one should return an
event
e, and the second one a value
v of a pointer
type (
void*
). The execution is the same as the one of the
previous call, except that
v is added as last element to the
list of values associated to
e at the instant where
e
is generated.
12.5.2 Event Deallocation
|
The memory allocated to an event can be recovered using the
event_deallocate
function:
event_deallocation: event_deallocate (exp) ;
This is the programmers's responsability to perform correct
deallocation of events.
Instructions are binding instructions, atomic instructions, non-atomic
instruction, or sequences of instructions:
inst:
bind
| atomic
| non_atomic
| inst ... inst
A thread starts to execute a component of a sequence as soon as the
execution of the previous component is completed.
12.7 Binding Instructions
|
There are two binding instructions:
bind:
unlink ;
| link (exp) ;
The
unlink
instructions should only appear in native modules. It has no effect
if the executing thread is already unlinked. Otherwise, the thread
executing the instruction returns back the control to the scheduler,
signaling it that it cooperates. In this case, the thread is removed
from the scheduler which thus will never consider it again (except, of
course, if the thread re-links to it later).
As soon as the thread has returned the control back to the scheduler,
it starts running in an autonomous way.
A thread executing
link(exp)
first evaluate
exp
which should return a
scheduler
sched. If the thread is already linked, it unlinks
from the scheduler to which it is linked, and then waits for
sched to give it the control. If the thread is unlinked, it just
waits for the control from
sched. In all cases,
sched
will resume execution of the thread, as it does for new created
thread, at the beginning of the next instant.
Atomic instructions have the form of blocks of C code or of C function
calls. They can be executed by linked and unlinked threads, with the
same semantics, which is actually the one they have in standard
C. However, when executed by a linked thread, execution of an atomic
instruction is instantaneous, which means that it terminates in the
same instant it is started.
atomic:
{c-code}
| identifier (exp,..., exp) ;
| order
The previous calls of functions to create and manage threads,
schedulers, and events are considered as special cases of atomic
instructions.
Orders are given to schedulers to stop, suspend, or resume threads:
orders:
stop (exp) ;
| suspend (exp) ;
| resume (exp) ;
All orders received by a scheduler during one instant are
systematically processed at the beginning of the next instant (to
avoid interference with executing threads) in the order in which they
are issued.
The execution of
stop(exp)
starts by the evaluation of
exp
which
should return a thread
th. The effect of the call is undefined
if
th is unlinked. Otherwise, the order to terminate
th is sent to the scheduler to which it is linked.
The execution of
suspend(exp)
starts by the evaluation of
exp
which
should return a thread
th. The effect of the call is undefined
if
th is unlinked. Otherwise, the order to suspend
th
is sent to the scheduler to which it is linked.
The execution of
suspend(exp)
starts by the evaluation of
exp
which
should return a thread
th. The effect of the call is undefined
if
th is unlinked. Otherwise, the order to resume
th
is sent to the scheduler to which it is linked. Resuming a thread
which is not suspended has no effect.
12.9 Non-atomic Instructions
|
Non-atomic instructions should only be executed by linked threads, and
their semantics is undefined when they are executed by unlinked
threads. Execution of non-atomic instructions can take several
instants to complete.
non_atomic:
cooperate;
| halt;
| return;
| await (exp);
| await (exp, exp);
| join (exp);
| join (exp, exp);
| get_value (exp, exp, variable);
| if (exp) then_branch else_branch end
| while (exp) do inst end
| repeat (exp) do inst end
| run identifier (exp, ..., exp);
then_branch: then inst | /*empty*/
else_branch: else inst | /*empty*/
A thread executing cooperate
returns the control to the
scheduler, signaling it that it cooperates. If the thread regains
control in the future, it will resume execution in sequence of the
cooperate
instruction.
A thread executing halt
returns the control to the
scheduler, signaling it that it cooperates. If the thread regains
control in the future, it will re-execute the same halt
instruction. Thus, halt
blocks the thread forever, without
never terminating it.
A thread executing return
returns the control to the
scheduler, signaling it that it terminates. Thus, it will never regain
control again from the scheduler.
A thread executing await(exp)
first evaluates exp
.
The expression should evaluate to an event e. If e is
present, then the thread proceeds in sequence of the await
instruction. Otherwise, if the current instant is finished (which
means that e is absent), then the thread returns the control
to the scheduler, signaling it that it cooperates. If the thread
regains control in some future instant, execution will restart at the
same place, waiting for the same event e. If the current
instant is not finished, then the thread returns the control to the
scheduler, signaling it that it is blocked. When the thread will
regain the control, it will, as previously, continue to test the
presence of e.
A thread executing await(exp,exp)
first evaluates the two
arguments. The first one should evaluate to an event e, and
the second one to an integer value k. Then, the thread behaves
as the previous instruction, but the waiting of e is limited
to at most k instants. If e is generated during the
next k next instants, then the return code of the thread is
set to OK. Otherwise, if e is still absent at the end of
the kth instant, then the thread returns the control to the
scheduler, signaling it that it cooperates ; moreover, the return
code of the thread is set to ETIMEOUT. In this case, the thread
will proceed in sequence when receiving the control back (just as the
cooperate
instruction does).
A thread executing join(exp)
first evaluates exp
.
The expression should evaluate to a thread th. If th
is terminated, then the thread proceeds in sequence of the join
instruction. Otherwise, the thread behaves as if it was
awaiting a special event generated by the termination of th.
A thread executing join(exp,exp)
first evaluates the two
arguments. The first one should evaluate to a thread th, and
the second one to an integer value k. Then, the thread behaves
as the previous instruction, but the waiting of the termination of
th is limited to at most k instants. The return code
of the thread is set to ETIMEOUT if the limit is reached, and it is
set to OK otherwise.
A thread executing get_value(exp,exp,var)
first evaluates
the two expressions in arguments. The first should return an event
e, and the second an integer k.
Then, the thread tests if there are at least k values
generated for e during the current instant. If it is the case,
the kth value is assigned to var, the return code of the
thread is set to OK, and the thread proceeds in sequence.
Otherwise, if the current instant is not finished, then the thread
returns the control to the scheduler, signaling it that it is
blocked. When the thread will regain the control, it will, as
previously, continue to test the existence of the kth value.
If the current instant is finished, then the return code of the thread
is set to ENEXT, and the thread returns the control to the scheduler,
signaling it that it cooperates. When the thread will regain control,
it will proceed in sequence (just as a cooperate
instruction
would do).
A thread executing if(exp)then i else j end
first evaluates
exp
. If the evaluation returns true (a non-zero value) then
the thread switches to i
, else it switches to j
. If the chosen branch is empty, the thread proceeds in sequence
immediately.
A thread executing while(exp)do i end
first evaluates exp
. If the evaluation returns false (zero) then the thread
proceeds in sequence of the loop. Otherwise, the thread behaves as if
it was executing i
, with one difference: when execution of
i
terminates, then exp
is re-evaluated and the
same process restarts: execution is left if evaluation returns false,
and otherwise i
is re-executed. Thus, the thread cyclically
executes i
while exp
is true, exp
being
evaluated at the first instant and, after that, only when i
terminates.
thread executing repeat(exp)do i end
first evaluates exp
. The evaluation should return an integer value k. If
k is negative or equal to 0, then the thread proceeds in
sequence of the loop. Otherwise, the thread behaves as if it was
executing a sequence of k instructions i
.
A thread th executing run m(e1,...,en)
first creates
a new instance of the module m
with e1,...,en
as
arguments, and then joins the new created thread. Moreover, the new
created thread is stopped if th is.
13 Annex C: the LOFT Compiler
The
Loft compiler is a one-pass compiler that translates
Loft programs into C code and then compile it with the standard
C compiler. It runs under Linux and is called
loft
.
There are 5 main options:
Option
-ft
is the default option. All files containing
Loft code should be terminated by
.loft
or
.LOFT
. Standard C files can be given to
loft
and are
processed in the standard way.
As example, the command
loft -direct test.loft
produces an
executable
a.out
from the file
test.loft
.
There are others options:
-v
to trace the actions performed by the compiler.
-debug
like -v
but the actions are not performed.
-code
to only produce the C code.
-c
to only produce the .o
code (like the
standard C compiler).
-o
to rename the executable (like the
standard C compiler).
-version
to get the current version of the compiler.
For example,
loft test.loft -debug
produces something which
looks like:
/bin/loft/loft2ft test.loft > test.loft.c
gcc -I/include/loft -I/include/ft_v1.0 test.loft.c \
-L/lib/loft -lloft_ft \
-L/lib/ft_v1.0 -lfthread -lpthread
and loft -direct test.loft -debug
produces:
/bin/loft/loft2c test.loft > test.loft.c
gcc -I/include/loft test.loft.c -L/lib/loft -lloft_c
The loft
command actually uses the following processors and
libraries:
loft2c
, loft2ft
, loft2sem
which
are the commands for producing C code corresponding to -direct
, to
-ft
, and to the 3 implementations of the semantics.
libloft_c.a
, libloft_ft.a
, libloft_rewrite.a
, libloft_replace.a
, libloft_optim.a
, and libloft_gba.a
which are the various
libraries for execution.
Some shell variables can be redefined for the command loft
:
-
C_COMPILER
to set a different C compiler.
-
LOFT_BINDIR
, LOFT_INCLDIR
, and
LOFT_LIBDIR
to change the directories where the executables,
include files, and libraries
used by loft
are placed.
-
FT_INCLDIR
and FT_LIBDIR
to change the
directories where the include file and the library needed by the -loft
option are placed.
Glossary
An atomic instruction runs in one shot and completes in the same very
instant it starts to execute.
Events are broadcast: all threads linked to the scheduler that manages
an event see the presence/absence of it exactly in the same way.
The passage of the control from one thread to another one. The local
data and the stack associated to the thread which is left have to be
stored.
An input/output operation which does not block the others threads
until it terminates.
The strategy of a scheduler in which all thread are supposed to
friendly leave the control after a certain time.
A situation where two or more threads are blocked, each one owning a
lock needed by another one to proceed.
Events are non-persistent data used by threads to synchronise and to
communicate. Events are managed by schedulers and at each instant
they are either present or absent.
The finalizer of a thread is an atomic instruction which is executed
(by an other thread) when the thread is stopped while it is not
terminated.
In one instant, a scheduler let all the threads linked to it execute
until they cooperate.
A loop which cyclically executes its body during one single instant,
and which thus does not cooperate with the others threads.
Kernel threads are the basic units of concurrency in standard
preemptive OS. They are not always available, as for example in
embedded systems with limited ressources.
A cooperative thread which is linked to a scheduler, and which, thus,
executes at the pace defined by the scheduler.
A way to restrict accesses to a data: a thread that wants to access a
locked data blocks until the data is unlocked.
Modules are templates from which instances, called threads, are
constructed.
The use of several processors.
Native modules are modules whose instances must be implemented by
kernel threads.
Native threads are instances of native modules. They are implemented
as kernel threads.
Execution of a non-atomic instruction can take several instants to
complete.
An operating system which has the ability to withdraw a processor to a
thread or a process which is using it. Actually, all modern OS are
preemptive.
The strategy of a scheduler having the ability to force a running
thread to leave the control, in order to give it to another thread.
Schedulers are controling threads linked to them in a cooperative and
fair way.
A set of threads, all linked to the same scheduler and thus running
synchronously at the same pace and sharing the same broadcast events.
Stands for Symetric Multi-Processor. An architecture of a
multi-processor machine in which the assignment of the processors to
the threads is highly dynamic and depends on the operating system, and
only of it.
Threads are the basic units of concurrency in the language. They are
instances of modules.
A thread instance of a native module which is not linked to any
scheduler. Thus, an unlinked thread runs asynchronously. An unlinked
thread is implemented as a kernel thread.
Index
A C E F G H I J L M N R S T U W
Bibliography
[1] | -- Bigloo Web Site. |
[2] | -- CAML Web Site. |
[3] | -- Java Web Site. |
[4] | -- LinuxThreads Web Site. |
[5] | -- Next Generation POSIX Threading Web Site. |
[6] | -- Reactive Programming Web Site. |
[7] | Acosta-Bermejo, Raul -- Rejo - Langage d'objets réactifs et d'agents -- Thèse de doctorat de l'ENSMP, October, 2003. |
[8] | Arnold, Ken and Goslin, James -- The Java Programming Language -- Addison-Wesley, 1996. |
[9] | Berry, G. and Gonthier G. -- The Esterel Synchronous Programming Language: Design, Semantics, Implementation -- Science of Computer Programming, 19(2), 1992, pp. 87-152. |
[10] | Boudol G. and Berry G. -- Chemical Abstract Machines -- , . |
[11] | Boussinot, F. -- Reactive C: An Extension of C to Program Reactive Systems -- Software Practice and Experience, 21(4), april, 1991, pp. 401--428. |
[12] | Boussinot, F. -- Java Fair Threads -- Inria research report, RR-4139, 2001. |
[13] | Boussinot, F. and Susini, J-F. -- The SugarCubes tool box - a reactive Java framework -- Software Practice and Experience, 28(14), december, 1998, pp. 1531--1550. |
[14] | Brinch Hansen, P. -- The Origin of Concurrent Programming -- Springer, 2002. |
[15] | Christopher, Thomas W. and Thiruvathukal, George K. -- High Performance Java Platform Computing: Multithreaded and Networked Programming -- Sun Microsystems Press Java Series, Prentice Hall, 2001. |
[16] | Eager, Derek L. and Zahorjan, John -- Chores: Enhanced run-time support for shared memory parallel computing -- ACM Transaction on Computer Systems, 11(1), , 1993. |
[17] | Engelschall, Ralf S. -- Portable Multithreading -- Proc. USENIX Annual Technical Conference, San Diego, California, 2000. |
[18] | Gardner, M. -- Lucky Numbers and 2187 -- Math. Intell., 19(26), 1997. |
[19] | Halbwachs, Nicolas -- Synchronous Programming of Reactive Systems -- Kluwer Academic Publishers, New York, 1993. |
[20] | Harel, D. and Pnueli A. -- -- , . |
[21] | Hazard, L. and Susini, J-F. and Boussinot, F. -- The Junior reactive kernel -- Inria Research Report, RR-3732, july, 1999. |
[22] | Hazard, L. and Susini, J-F. and Boussinot, F. -- Programming with Junior -- Inria Research Report, RR-4027, 2000. |
[23] | Hollub, A. -- Taming Java Threads -- Apress, 2000. |
[24] | Kahn, G. and MacQueen, D.B. -- Coroutines and Networks of Parallel Processes -- Proceeding of the IFIP'74 Congress, 1974. |
[25] | Keppel, D. -- Tools and Techniques for Building Fast Portable Threads Packages -- Technical Report UWCSE 93-05-06, University of Washington, 1993. |
[26] | Larus, James R. and Parkes Michael. -- Using Cohort Scheduling to Enhance Server Performance -- Proc. of USENIX Conference, Monterey Cal., 2002, pp. 103-114. |
[27] | Lowenthal, David K. and Freech, Vincent W. and Andrews, Gregory R. -- Efficient Support for Fine-Grain Parallelism on Shared-Memory Machines -- TR 96-1, University of Arizona, , 1996. |
[28] | Maraninchi, F. and Remond, Y. -- Running-Modes of Real-Time Systems: A Case-Study with Mode-Automata -- Proc. 12th Euromicro Conference on Real-Time Systems, Stockholm, Sweden, 2000. |
[29] | Nichols, B. and Buttlar, D. and Proulx Farrell J. -- Pthreads Programming -- O'Reilly, 1996. |
[30] | Plotkin, G. -- A Structural Approach to Operational Semantics -- Aarhus University Report DAIMI FN-19, 1981. |
[31] | Reppy, John H. -- Concurrent Programming in ML -- Cambridge University Press, 1999. |