|
1.1 Concurrent Programming
|
Concurrent programming is the use of several programs, running
together in the same application. It is opposed to traditional
sequential programming, in which there is one unique program, which
thus coincides with the whole application. Concurrent programming is
usually considered to increase modularity, as one can often naturally
decompose a complex application in several cooperating programs which
can be run concurrently. This is for example the case of servers in
which each user request can quite logically be mapped to a specific
program processing it. With concurrent programming there is no more
need to code a complex application made of many distinct parts in one
single program, as it is the case in sequential programming.
There are two main variants of concurrent programming: the one in
which the programs, called threads, are sharing a common
memory; and the one in which programs are basically processes which have their own memory. The presence of a shared
memory eases communication between threads, as threads can directly
use shared data to communicate, while processes have to use
specialized means, for example sockets, for the same purpose. But, the
counterpart is that shared data may have to be protected from
concurrent accesses from distinct threads. Of course, this is not
necessary with processes in which, by definition, data are fully
protected from being accessed by other processes.
Parallelism is the possibility to simultaneously use several computing
resources. It is available with multiprocessor machines, or more
generally, when several machines are distributed over a network. It
is well-known that parallelism is not the best solution in all cases,
and there are systems which are more efficiently run by a single
processor machine than by a multiprocessor one. Parallelism is now
directly available, with no special effort, in standard OS such as
Linux which can operate hardware mapped on Symmetric
Multi-Processor (SMP) architectures. Parallelism appears quite
natural in concurrent programming, as generally there is no logical
need that concurrent programs should be run by one unique processor.
Concurrent programming is largely recognized as being, in the general
case, much more difficult than sequential programming. Some problems,
such as deadlocks, are specific to it. Moreover, concurrent programs
are most of the time nondeterministic, and thus more difficult to
debug. For these reasons, concurrent programming is often considered
as an activity which must be left to specialists. This seems of course
largely justified in domains such as the programming of OS
kernels. But the question remains of designing a set of concurrent
primitives that can be not too difficult or painful to use, and to
provide general users with it.
Concurrency at Language Level
Most of the time, programming languages do no offer any primitive for
concurrent programming. This is the case of C and C++, in which
concurrent programming is only available using system calls (for
example fork or exec ) or through libraries of
threads (for example, the pthread library). Parallelism is also most
of the time uncovered by programming languages primitives. The way to
benefit from parallelism is either to use threads in a SMP context, or
to design distributed systems where programs (often considered as
distributed objects) communicate and synchronize through the network.
Amongst the languages that have primitives to deal with concurrency,
some use message passing such as Erlang, others propose "rendez-vous"
such as Ada and Occam, and several, as CAML, directly include threads.
The most achieved proposal of such an inclusion seems to be
Java [8] in which threads are first-class objects and
data can be protected using locks and monitors, as defined by
E.W. Dijkstra in the sixties (see [14] for
reference papers on concurrent programming). In Java, threads are
supposed to be used in almost every context (the simplest applet
involves several threads, one of them being for example dedicated to
graphics). However, Java does not give particular means to ease
thread programming, which remains difficult and error-prone (see [22] for a discussion on this point). Concerning
parallelism, Java proposes the RMI mechanism to define distributed
objects, methods of which can be invoked through the network. Remote
method invocations basically use specialized threads dedicated to
communications.
Synchronous languages [19] propose
concurrent programming primitives in the special context of reactive
systems [20]. In this context, which is roughly
the one of embedded systems, concurrent programs are deterministic and
can be formally verified. However, synchronous languages suffer
several limitations which limit their use. Amongst these is the
limitation to static systems, in which the number of concurrent
programs is known at compile time, and to mono-processor
architectures.
A proposal to extend the synchronous approach to dynamic systems
(systems which are not static) is called the reactive
approach. It is described in full details in [6]. Several language belonging to this approach have been designed;
amongst them are Reactive-C [11] which implements
the approach in C, and SugarCubes [13] which
implements it in Java. Reactive programs can be very efficiently
implemented as they are basically cooperative. However, up to now,
they were not able to get direct benefit from multiprocessor machines.
In this text, one proposes a language for concurrent programming
inspired by the reactive approach and based on special threads, called
fair threads. One major point is that the language contains
primitives that allows multiprocessing. The rest of the section
considers threads and presents the basics of the language proposal.
In sequential programming, a program is basically a list of
instructions run by the processor. The program has access to the
memory in which it can read and write data. Amongst the instructions
are tests and jumps which are the way to control the execution
flow. In this standard computing model, a program-counter points to
the current instruction executed by the processor.
The model of threads extends the previous sequential model by allowing
the presence of several programs, the threads, sharing the
same memory. There are thus several program counters, one by thread,
and a new component, the scheduler, which is in charge of
dispatching the processors to the threads. The transfer of a processor
from one thread to another is called a context-switch, because
it implies that the program-counter of the thread which is left must
be saved, in order to be restored later, when the thread will regain
the processor. When several threads can be executed, the scheduler
can choose arbitrarily between them, depending on implementation
details, and it is thus generally unpredictable to determine which one
will finally get the processor. This is a source of nondeterminism in
multi-threaded programming.
The strategy of the scheduler is said to be preemptive when it
is able to force a running thread to release a processor executing it,
in which case one says that the thread is preempted. With a
preemptive scheduler, situations where a non-cooperative thread
prevents the others from any execution, are not possible. A preemptive
scheduler can withdraw a processor from a running thread because of
priority reasons (there exist threads with higher priorities in the
system), or for other reasons. This is also a source of
nondeterminism. In a preemptive context, to communicate or to
synchronize generally implies the need to protect some shared data
involved in the communication or in the synchronization. Locks are
often used for this purpose, but they have a cost and are error-prone
as they introduce possibilities of deadlocks, that are situations
where two threads cannot progress because each one owns a lock needed by
the other.
Preemptive threads are generally considered to be well adapted for
systems made of heavy-computing tasks and needing few communications
and synchronizations. This is typically the case of Web servers, in
which a thread (usually picked out from a pool of available threads)
is associated to each new user request. Advantages are clear: first,
modularity is increased, because requests do not have to be split in
several small parts, as it would be the case with sequential
programming. Second, servers can be run by multiprocessor machines
without any change, and for example they can immediately benefit from
SMP architectures. Finally, blocking I/Os, for example reading from a
file, do not need special attention. Indeed, as the scheduler is
preemptive, there is no risk that a thread blocked forever on an I/O
operation will also block the rest of the system.
The strategy of the scheduler is said to be cooperative if the
scheduler has no way to force a thread to release the processor it
owns. With this strategy, a running thread must cooperate and release
by itself the processor from time to time, in order to let the other
threads the possibility to get it. Cooperation is of course absolutely
mandatory if there is only one processor: a running thread which would
never release the processor would indeed forbid any further execution
of the other threads. Cooperative threads, sometimes called green-threads, are threads that are supposed to run under the
control of a cooperative scheduler. Cooperative threads are adapted
for tasks that need a lot of communications between them. Indeed, data
protection is no more needed, and one can thus avoid to use
locks. Cooperative threads can be efficiently implemented at user
level, but they cannot benefit from multiprocessor machines. Moreover,
they need special means to deal with blocking I/O. Thus, purely
cooperative threads are not sufficient to implement important
applications such as servers.
Reuse of Sequential Code
It is often claimed that reuse of sequential code is made easier by
using threads. What is meant is that a sequential program can be
easily embedded in one thread, and then added into a system to be run
concurrently with the others programs present in it.
However, one generally cannot be sure that a sequential program never
enters a loop in which actions leading to releasing the processor,
such as I/Os, will never be performed. In a cooperative scheduler,
such a non-cooperative program would block the whole system (supposing
that the processor is unique). In a way, this is a major justification
for preemptive scheduling: a non-cooperating program is no more a
problem because it can now be preempted by the scheduler.
This is however only one aspect of the problem, because a sequential
program cannot generally be used directly as it, even in a preemptive
context. The first reason is that some data may have to be protected
from being accessed by other threads, which needs locks which were of
course absent in the sequential code. A second reason is that in some
cases, for example libraries, the sequential code must be made
reentrant, because several threads can now make concurrent calls to
it. Finally, the granularity of the execution, which is under control
of the scheduler and only of it, can be too large and can need the
introduction in the initial code of supplementary cooperation points.
In conclusion, it seems that in all cases reuse of sequential code is
largely an illusion in the context of multi-threading.
Portability
Portability is very difficult to achieve for multi-threaded systems.
In the larger sense, portability means that a system should run in the
same way, independently of the scheduling strategy. This is of course
a quite impossible challenge in the general case: data should be
protected, for the case of a preemptive scheduling, but cooperation
points should also be introduced, for the case of a cooperative
scheduling, and the conjunction of these two requirements would lead
to very inefficient code. Portability is, thus, often considered as a
goal that can only be partially achieved with multi-threaded
systems. Note that this is a major problem in Java which claims to
be portable but does not specify the scheduling strategy that should
be used for multi-threaded applications.
Determinism and Debugging
The choices made by the scheduler are a major source of nondeterminism
which make thread programming complex. In particular, debugging is
more difficult than in sequential programming because one may have to
take in account the choices of the scheduler when tracking a
bug. Moreover, the replay of a faulty situation may be difficult to
achieve, as it may actually depend on the scheduler choices. For
example, it may happens that the introduction of printing instructions
to trace an execution change the scheduling and makes the bug
disappear.
Nondeterminism seems inherent to preemption as it is very difficult to
conceive a multi-threading framework both deterministic and
preemptive. For the reader who is not convinced, imagine two threads
running in a preemptive context, one which cyclically increments a
counter, and one which prints the value of the same counter. What
should be the (unique, because of the determinism) printed value and
how can one simply find it?
However, it should be noticed that it is possible to design completely
deterministic frameworks based on cooperative threads, with clear and
simple semantics. This is actually the case of fair threads which are
introduced later.
Resources Issues
The way resources are consumed is a major issue for threads. Several
aspects are relevant to this matter.
First, as any standard sequential program, each thread needs a private
memory stack to hold its local variables and the parameters and
intermediate results of executed procedures. The use of a stack is
actually mandatory in the context of a preemptive scheduler, because
the moments context-switches occur are independent of the thread,
which has thus no possibility to save its local data before the
release of the processor. Note that the situation is different in a
cooperative context, where context-switches are predictable. The
necessity of a private stack for each thread certainly contributes to
waste memory.
Second, user threads are often mapped to kernel threads which are
under supervision of the OS. Of course, this is only meaningful with a
preemptive OS, in order to avoid the risk of freezing the whole
machine when a non-cooperative thread is encountered. The need to
perform context-switches at kernel level is generally costly in CPU
cycles. An other problem that can occur with kernel threads is the
limitation which generally exists on the number of such threads
(typically, the number of allowed threads can vary on a Linux system
from 256 to about 4 thousands). Several techniques can be used to
bypass these problems, specially when large numbers of short-lived
components are needed. Among these techniques are thread-pooling, to
limit the number of created threads, and the decomposition of tasks
in small pieces of code, sometimes called "chores" or "chunks",
which can be executed in a simpler way than threads are.
Preemptive threads (threads designed to be run by a preemptive
scheduler) are also subject to an other problem: some unnecessary
context-switches can occur which are time consuming. Actually,
threads have no way at all to prevent a context-switch. Unnecessary
context-switches causes a loss of efficiency, but this is of course the
price to pay for preemptive scheduling.
Conclusion
Actually, programming with threads is difficult because threads
generally have very "loose" semantics, which strongly depend on
the scheduling strategy in use. This is particularly true with
preemptive threads. Moreover, the semantics of threads may also
depends on many others aspects, for example, the way priorities of
threads are mapped at the kernel level. As a consequence, portability
of multi-threaded systems is very hard to achieve. Moreover,
multi-threaded systems are most of the time nondeterministic which
makes debugging difficult.
Cooperative threads have several advantages over preemptive ones: it
is possible to give them a precise and simple semantics, which
integrates the semantics of the scheduler. They can be implemented
very efficiently. They lead to a simple programming approach, which
limits the needs of protecting data. However, purely cooperative
systems are not usable as it, and preemption facilities have to be
provided in a way or an other. The question is thus to define a
framework providing both cooperative and preemptive threads, and
allowing programmers to use these two kinds of threads, according to
their needs.
Loft basically gives users the choice of the context, cooperative
or preemptive, in which threads are executed. More precisely, Loft
defines schedulers which are cooperative contexts to which
threads can dynamically link or unlink. All threads linked to the
same scheduler are executed in a cooperative way, and at the same
pace. Threads which are not linked to any scheduler are executed by
the OS in a preemptive way, at their own pace. An important point is
that Loft offers programming constructs for linking and unlinking
threads. The main characteristics of Loft are:
- It is a language, based on C and compatible with it.
- Programs can benefit from multiprocessor
machines. Indeed, schedulers and unlinked threads can be run in real
parallelism, on distinct processors.
- Users can stay in a purely cooperative context by
linking all the threads to the same scheduler, in which case systems
are completely deterministic and have a simple and clear semantics.
- Blocking I/Os can be implemented in a very simple way, using
unlinked native threads.
- There exist instants shared by all the threads linked
to the same scheduler. Thus, all threads linked to the same scheduler
execute at the same pace, and there is an automatic synchronization at
the end of each instant.
- Events can be defined by users. They are
instantaneously broadcast to all the threads linked to a
scheduler; events are a modular and powerful means for threads to
synchronize and communicate.
- It can be efficiently implemented on various platforms.
Implementation of the whole language needs the full power of native threads.
However, the cooperative part of the language can be implemented without any
native thread support. This leads to implementations suitable for
platforms with few resources (PDAs, for example).
Loft is strongly linked to an API of threads called FairThreads [12]: actually, Loft stands for Language Over Fair
Threads. This API mixes threads with the reactive approach, by
introducing instants and broadcast events in the context of
threads. Loft can be implemented in a very direct way by a
translation into FairThreads (this implementation is described in section
Implementation in FairThreads).
1.4 Structure of the Document
|
Section The Language LOFT contains the description
of the language. First, an overview of Loft is given. Then, modules
and threads are considered. Atomic and non-atomic instructions are
described. Finally, native modules are considered. A first serie of
examples is given in section Examples - 1. Amongst
them is the coding of a small reflex-game which shows the reactive
programming style. The case of data-flow programming is also
considered. Various sieves are coded, showing how the presence of
instants can benefit to code reuse. Section Programming Style contains a discussion on the programming style
implicitly supposed by Loft. Section Semantics
contains the formal semantics of the cooperative part of Loft by
means of a set of rewriting rules. FairThreads is described in section
FairThreads in C. Several implementations of
Loft are described in section Implementations. One consists in a direct translation in
FairThreads. Another is a direct implementation of the semantics rules of
section Semantics. One implementation is suited for
embedded systems. Section Examples - 2 contains a
second serie of examples. Amongst them are some example which can
benefit from multiprocessor architectures. A prey/predator demo
running on an embedded system (the Game Boy Advance of Nintendo) is
also described. Related work is described in section Related Work before the conclusion. A first annex gives the API of
FairThreads. A second one is the Reference Manual of Loft. The third one
is a quick description of the compiling command.
|