1. Introduction

1. Introduction

Home:

Concurrent Programming with Fair Threads

The LOFT Language

Previous chapter:

Next chapter: The Language LOFT

Sections

1.1 Concurrent Programming

      Concurrency at Language Level

1.2 Threads

      Reuse of Sequential Code

      Portability

      Determinism and Debugging

      Resources Issues

      Conclusion

1.3 The LOFT Proposal

1.4 Structure of the Document

Chapters

1. Introduction

2. The Language LOFT

3. Examples - 1

4. Programming Style

5. Semantics

6. FairThreads in C

7. Implementations

8. Examples - 2

9. Related Work

10. Conclusion

11. Annex A: API of FairThreads

12. Annex B: LOFT Reference Manual

13. Annex C: the LOFT Compiler

Glossary

Index

1.1 Concurrent Programming

Concurrent programming is the use of several programs, running together in the same application. It is opposed to traditional sequential programming, in which there is one unique program, which thus coincides with the whole application. Concurrent programming is usually considered to increase modularity, as one can often naturally decompose a complex application in several cooperating programs which can be run concurrently. This is for example the case of servers in which each user request can quite logically be mapped to a specific program processing it. With concurrent programming there is no more need to code a complex application made of many distinct parts in one single program, as it is the case in sequential programming. There are two main variants of concurrent programming: the one in which the programs, called threads, are sharing a common memory; and the one in which programs are basically processes which have their own memory. The presence of a shared memory eases communication between threads, as threads can directly use shared data to communicate, while processes have to use specialized means, for example sockets, for the same purpose. But, the counterpart is that shared data may have to be protected from concurrent accesses from distinct threads. Of course, this is not necessary with processes in which, by definition, data are fully protected from being accessed by other processes. Parallelism is the possibility to simultaneously use several computing resources. It is available with multiprocessor machines, or more generally, when several machines are distributed over a network. It is well-known that parallelism is not the best solution in all cases, and there are systems which are more efficiently run by a single processor machine than by a multiprocessor one. Parallelism is now directly available, with no special effort, in standard OS such as Linux which can operate hardware mapped on Symmetric Multi-Processor (SMP) architectures. Parallelism appears quite natural in concurrent programming, as generally there is no logical need that concurrent programs should be run by one unique processor. Concurrent programming is largely recognized as being, in the general case, much more difficult than sequential programming. Some problems, such as deadlocks, are specific to it. Moreover, concurrent programs are most of the time nondeterministic, and thus more difficult to debug. For these reasons, concurrent programming is often considered as an activity which must be left to specialists. This seems of course largely justified in domains such as the programming of OS kernels. But the question remains of designing a set of concurrent primitives that can be not too difficult or painful to use, and to provide general users with it.

Concurrency at Language Level

Most of the time, programming languages do no offer any primitive for concurrent programming. This is the case of C and C++, in which concurrent programming is only available using system calls (for example fork or exec) or through libraries of threads (for example, the pthread library). Parallelism is also most of the time uncovered by programming languages primitives. The way to benefit from parallelism is either to use threads in a SMP context, or to design distributed systems where programs (often considered as distributed objects) communicate and synchronize through the network. Amongst the languages that have primitives to deal with concurrency, some use message passing such as Erlang, others propose "rendez-vous" such as Ada and Occam, and several, as CAML, directly include threads. The most achieved proposal of such an inclusion seems to be Java [8] in which threads are first-class objects and data can be protected using locks and monitors, as defined by E.W. Dijkstra in the sixties (see [14] for reference papers on concurrent programming). In Java, threads are supposed to be used in almost every context (the simplest applet involves several threads, one of them being for example dedicated to graphics). However, Java does not give particular means to ease thread programming, which remains difficult and error-prone (see [22] for a discussion on this point). Concerning parallelism, Java proposes the RMI mechanism to define distributed objects, methods of which can be invoked through the network. Remote method invocations basically use specialized threads dedicated to communications. Synchronous languages [19] propose concurrent programming primitives in the special context of reactive systems [20]. In this context, which is roughly the one of embedded systems, concurrent programs are deterministic and can be formally verified. However, synchronous languages suffer several limitations which limit their use. Amongst these is the limitation to static systems, in which the number of concurrent programs is known at compile time, and to mono-processor architectures. A proposal to extend the synchronous approach to dynamic systems (systems which are not static) is called the reactive approach. It is described in full details in [6]. Several language belonging to this approach have been designed; amongst them are Reactive-C [11] which implements the approach in C, and SugarCubes [13] which implements it in Java. Reactive programs can be very efficiently implemented as they are basically cooperative. However, up to now, they were not able to get direct benefit from multiprocessor machines. In this text, one proposes a language for concurrent programming inspired by the reactive approach and based on special threads, called fair threads. One major point is that the language contains primitives that allows multiprocessing. The rest of the section considers threads and presents the basics of the language proposal.

1.2 Threads

In sequential programming, a program is basically a list of instructions run by the processor. The program has access to the memory in which it can read and write data. Amongst the instructions are tests and jumps which are the way to control the execution flow. In this standard computing model, a program-counter points to the current instruction executed by the processor. The model of threads extends the previous sequential model by allowing the presence of several programs, the threads, sharing the same memory. There are thus several program counters, one by thread, and a new component, the scheduler, which is in charge of dispatching the processors to the threads. The transfer of a processor from one thread to another is called a context-switch, because it implies that the program-counter of the thread which is left must be saved, in order to be restored later, when the thread will regain the processor. When several threads can be executed, the scheduler can choose arbitrarily between them, depending on implementation details, and it is thus generally unpredictable to determine which one will finally get the processor. This is a source of nondeterminism in multi-threaded programming. The strategy of the scheduler is said to be preemptive when it is able to force a running thread to release a processor executing it, in which case one says that the thread is preempted. With a preemptive scheduler, situations where a non-cooperative thread prevents the others from any execution, are not possible. A preemptive scheduler can withdraw a processor from a running thread because of priority reasons (there exist threads with higher priorities in the system), or for other reasons. This is also a source of nondeterminism. In a preemptive context, to communicate or to synchronize generally implies the need to protect some shared data involved in the communication or in the synchronization. Locks are often used for this purpose, but they have a cost and are error-prone as they introduce possibilities of deadlocks, that are situations where two threads cannot progress because each one owns a lock needed by the other. Preemptive threads are generally considered to be well adapted for systems made of heavy-computing tasks and needing few communications and synchronizations. This is typically the case of Web servers, in which a thread (usually picked out from a pool of available threads) is associated to each new user request. Advantages are clear: first, modularity is increased, because requests do not have to be split in several small parts, as it would be the case with sequential programming. Second, servers can be run by multiprocessor machines without any change, and for example they can immediately benefit from SMP architectures. Finally, blocking I/Os, for example reading from a file, do not need special attention. Indeed, as the scheduler is preemptive, there is no risk that a thread blocked forever on an I/O operation will also block the rest of the system. The strategy of the scheduler is said to be cooperative if the scheduler has no way to force a thread to release the processor it owns. With this strategy, a running thread must cooperate and release by itself the processor from time to time, in order to let the other threads the possibility to get it. Cooperation is of course absolutely mandatory if there is only one processor: a running thread which would never release the processor would indeed forbid any further execution of the other threads. Cooperative threads, sometimes called green-threads, are threads that are supposed to run under the control of a cooperative scheduler. Cooperative threads are adapted for tasks that need a lot of communications between them. Indeed, data protection is no more needed, and one can thus avoid to use locks. Cooperative threads can be efficiently implemented at user level, but they cannot benefit from multiprocessor machines. Moreover, they need special means to deal with blocking I/O. Thus, purely cooperative threads are not sufficient to implement important applications such as servers.

Reuse of Sequential Code

It is often claimed that reuse of sequential code is made easier by using threads. What is meant is that a sequential program can be easily embedded in one thread, and then added into a system to be run concurrently with the others programs present in it. However, one generally cannot be sure that a sequential program never enters a loop in which actions leading to releasing the processor, such as I/Os, will never be performed. In a cooperative scheduler, such a non-cooperative program would block the whole system (supposing that the processor is unique). In a way, this is a major justification for preemptive scheduling: a non-cooperating program is no more a problem because it can now be preempted by the scheduler. This is however only one aspect of the problem, because a sequential program cannot generally be used directly as it, even in a preemptive context. The first reason is that some data may have to be protected from being accessed by other threads, which needs locks which were of course absent in the sequential code. A second reason is that in some cases, for example libraries, the sequential code must be made reentrant, because several threads can now make concurrent calls to it. Finally, the granularity of the execution, which is under control of the scheduler and only of it, can be too large and can need the introduction in the initial code of supplementary cooperation points. In conclusion, it seems that in all cases reuse of sequential code is largely an illusion in the context of multi-threading.

Portability

Portability is very difficult to achieve for multi-threaded systems. In the larger sense, portability means that a system should run in the same way, independently of the scheduling strategy. This is of course a quite impossible challenge in the general case: data should be protected, for the case of a preemptive scheduling, but cooperation points should also be introduced, for the case of a cooperative scheduling, and the conjunction of these two requirements would lead to very inefficient code. Portability is, thus, often considered as a goal that can only be partially achieved with multi-threaded systems. Note that this is a major problem in Java which claims to be portable but does not specify the scheduling strategy that should be used for multi-threaded applications.

Determinism and Debugging

The choices made by the scheduler are a major source of nondeterminism which make thread programming complex. In particular, debugging is more difficult than in sequential programming because one may have to take in account the choices of the scheduler when tracking a bug. Moreover, the replay of a faulty situation may be difficult to achieve, as it may actually depend on the scheduler choices. For example, it may happens that the introduction of printing instructions to trace an execution change the scheduling and makes the bug disappear. Nondeterminism seems inherent to preemption as it is very difficult to conceive a multi-threading framework both deterministic and preemptive. For the reader who is not convinced, imagine two threads running in a preemptive context, one which cyclically increments a counter, and one which prints the value of the same counter. What should be the (unique, because of the determinism) printed value and how can one simply find it? However, it should be noticed that it is possible to design completely deterministic frameworks based on cooperative threads, with clear and simple semantics. This is actually the case of fair threads which are introduced later.

Resources Issues

The way resources are consumed is a major issue for threads. Several aspects are relevant to this matter. First, as any standard sequential program, each thread needs a private memory stack to hold its local variables and the parameters and intermediate results of executed procedures. The use of a stack is actually mandatory in the context of a preemptive scheduler, because the moments context-switches occur are independent of the thread, which has thus no possibility to save its local data before the release of the processor. Note that the situation is different in a cooperative context, where context-switches are predictable. The necessity of a private stack for each thread certainly contributes to waste memory. Second, user threads are often mapped to kernel threads which are under supervision of the OS. Of course, this is only meaningful with a preemptive OS, in order to avoid the risk of freezing the whole machine when a non-cooperative thread is encountered. The need to perform context-switches at kernel level is generally costly in CPU cycles. An other problem that can occur with kernel threads is the limitation which generally exists on the number of such threads (typically, the number of allowed threads can vary on a Linux system from 256 to about 4 thousands). Several techniques can be used to bypass these problems, specially when large numbers of short-lived components are needed. Among these techniques are thread-pooling, to limit the number of created threads, and the decomposition of tasks in small pieces of code, sometimes called "chores" or "chunks", which can be executed in a simpler way than threads are. Preemptive threads (threads designed to be run by a preemptive scheduler) are also subject to an other problem: some unnecessary context-switches can occur which are time consuming. Actually, threads have no way at all to prevent a context-switch. Unnecessary context-switches causes a loss of efficiency, but this is of course the price to pay for preemptive scheduling.

Conclusion

Actually, programming with threads is difficult because threads generally have very "loose" semantics, which strongly depend on the scheduling strategy in use. This is particularly true with preemptive threads. Moreover, the semantics of threads may also depends on many others aspects, for example, the way priorities of threads are mapped at the kernel level. As a consequence, portability of multi-threaded systems is very hard to achieve. Moreover, multi-threaded systems are most of the time nondeterministic which makes debugging difficult. Cooperative threads have several advantages over preemptive ones: it is possible to give them a precise and simple semantics, which integrates the semantics of the scheduler. They can be implemented very efficiently. They lead to a simple programming approach, which limits the needs of protecting data. However, purely cooperative systems are not usable as it, and preemption facilities have to be provided in a way or an other. The question is thus to define a framework providing both cooperative and preemptive threads, and allowing programmers to use these two kinds of threads, according to their needs.

1.3 The LOFT Proposal

Loft basically gives users the choice of the context, cooperative or preemptive, in which threads are executed. More precisely, Loft defines schedulers which are cooperative contexts to which threads can dynamically link or unlink. All threads linked to the same scheduler are executed in a cooperative way, and at the same pace. Threads which are not linked to any scheduler are executed by the OS in a preemptive way, at their own pace. An important point is that Loft offers programming constructs for linking and unlinking threads. The main characteristics of Loft are:

It is a language, based on C and compatible with it.
Programs can benefit from multiprocessor machines. Indeed, schedulers and unlinked threads can be run in real parallelism, on distinct processors.
Users can stay in a purely cooperative context by linking all the threads to the same scheduler, in which case systems are completely deterministic and have a simple and clear semantics.
Blocking I/Os can be implemented in a very simple way, using unlinked native threads.
There exist instants shared by all the threads linked to the same scheduler. Thus, all threads linked to the same scheduler execute at the same pace, and there is an automatic synchronization at the end of each instant.
Events can be defined by users. They are instantaneously broadcast to all the threads linked to a scheduler; events are a modular and powerful means for threads to synchronize and communicate.
It can be efficiently implemented on various platforms. Implementation of the whole language needs the full power of native threads. However, the cooperative part of the language can be implemented without any native thread support. This leads to implementations suitable for platforms with few resources (PDAs, for example).

Loft is strongly linked to an API of threads called FairThreads [12]: actually, Loft stands for Language Over Fair Threads. This API mixes threads with the reactive approach, by introducing instants and broadcast events in the context of threads. Loft can be implemented in a very direct way by a translation into FairThreads (this implementation is described in section Implementation in FairThreads).

1.4 Structure of the Document

Section The Language LOFT contains the description of the language. First, an overview of Loft is given. Then, modules and threads are considered. Atomic and non-atomic instructions are described. Finally, native modules are considered. A first serie of examples is given in section Examples - 1. Amongst them is the coding of a small reflex-game which shows the reactive programming style. The case of data-flow programming is also considered. Various sieves are coded, showing how the presence of instants can benefit to code reuse. Section Programming Style contains a discussion on the programming style implicitly supposed by Loft. Section Semantics contains the formal semantics of the cooperative part of Loft by means of a set of rewriting rules. FairThreads is described in section FairThreads in C. Several implementations of Loft are described in section Implementations. One consists in a direct translation in FairThreads. Another is a direct implementation of the semantics rules of section Semantics. One implementation is suited for embedded systems. Section Examples - 2 contains a second serie of examples. Amongst them are some example which can benefit from multiprocessor architectures. A prey/predator demo running on an embedded system (the Game Boy Advance of Nintendo) is also described. Related work is described in section Related Work before the conclusion. A first annex gives the API of FairThreads. A second one is the Reference Manual of Loft. The third one is a quick description of the compiling command.

This page has been generated by Scribe.
Last update Wed Oct 22 18:41:04 2003