Scheme Fair Threads

Manuel Serrano Inria Sophia Antipolis 2004 route des Lucioles - BP 93 F-06902 Sophia Antipolis, Cedex France Manuel.Serrano@sophia.inria.fr Frédéric Boussinot Inria Sophia Antipolis 2004 route des Lucioles - BP 93 F-06902 Sophia Antipolis, Cedex France Frederic.Boussinot@sophia.inria.fr Bernard Serpette Inria Sophia Antipolis 2004 route des Lucioles - BP 93 F-06902 Sophia Antipolis, Cedex France Bernard.Serpette@sophia.inria.fr

Abstract

This paper presents Fair Threads, a new model for concurrent programming. This multi-threading model combines preemptive and cooperative scheduling. User threads execute according to a cooperative strategy. Service threads execute according to a preemptive strategy. User threads may ask services from service threads in order to improve performance by exploiting hardware parallelism and in order to execute non-blocking operations.
Fair threads are experimented within the context of the functional programming language Scheme. This paper also presents the integration in this language. That is, it presents a semantics for Scheme augmented with Fair Threads and the main characteristics of the implementation.

1 Introduction

When shared memory is involved, the two most common ways to implement concurrency are event-driven architectures and multi-threaded architectures. From a programming point of view, we think that none is fully satisfactory. Event-driven architectures enable relatively simple and efficient implementations but hardly support extensibility and modularity [24]. Threads à la Posix [23] or à la Java [12] support extensibility but they are a nightmare to program and to maintain [5,15].

In this introduction, we start detailing the weaknesses of the two models from a programming language perspective. Then, we informally present our new model, the Fair Threads.

1.1 Event-driven programming

In the event-driven programming style, there is one execution stream which is, generally, made of a loop waiting for events. When such events are raised, the loop invokes registered handlers. The handlers may or may not spawn new threads or new processes. Event-driven programming does not permit parallelism by itself (that is, true CPU concurrency). It does not support preemption, so if an event handler falls into an infinite loop the whole program gets stuck. Long-lasting handler executions are problems too because they make applications non-responsive. In order to ensure responsiveness, one has to break up event-handlers. This makes event-loops difficult to program because most programming languages do not provide convenient means for saving and restoring states of computations [1]. Scheme and its call-with-current-continuation function [17] is one exception.

In a famous provocative presentation [24], John Ousterhout, the author of Tcl/Tk, compares event-driven programming and multi-threading programming. He strongly argues in favor of the former, mostly because preemptive threads are too difficult to program. We agree on this last point but we also think that lack of modularity of event loops is a major concern.

1.2 Threads

The term "thread" is a rather loose fitting description. It generally denotes concurrency with shared memory. Threads are also known as lightweight processes. Threads execute according to a scheduling strategy which can be either preemptive or cooperative.

1.2.1 Preemptive threads

A preemptive scheduler may suspend a running thread at any time to allocate the processor to another thread. It may then resume the suspended thread at any moment. Preemptive threads are generally considered to be well adapted for systems made of heavy-computing tasks needing few communications and synchronizations. This can be the case of Web servers, in which a thread (usually picked out from a pool of available threads) is associated to each request. In such contexts, advantages of preemptive threads are clear:

As long as synchronization is not concerned, modularity is increased, as threads can be naturally used for coding independent sub-parts of the system.
Assuming a preemptive operating system, blocking IOs do not need any kind special of attention. Indeed, as the scheduler is preemptive, there is no risk that a thread blocked forever on an IO operation also blocks the rest of the system.
Programs can be run by multi-processors machines without any change. Thus, multi-threaded systems immediately take benefit from SMP architectures, which become now widely available.

Several difficulties are however related to programming with preemptive threads:

To communicate or to synchronize generally implies the need to protect the data involved in the communication or in the synchronization. Locks are often used for this problem. They have a cost and are error-prone, introducing possibilities of deadlocks.
Threads generally have a loose semantics. This is particularly true with preemptive threads because their semantics strongly relies on the scheduling policy. The semantics of threads also depends on several other aspects, as, for example, the way thread priorities are mapped at the kernel level. This is certainly a problem when one has to debug programs, as then a faulty situation cannot be replayed at will.
Because of the loose semantics of threads, portability is difficult to achieve for threaded systems. It is generally considered as a very difficult task to program systems which behave in the same way on different platforms.

1.2.2 Cooperative Threads

A cooperative scheduler does not suspend threads. It waits for threads to cooperate. That is, it waits for threads to yield the processor before allocating it to others threads. For systems made of tasks needing strong synchronizations or a lot of communications, cooperative threads (sometimes called green-threads) may seem more suited than preemptive threads. Indeed, data protection is no longer needed with cooperative threads, and one can avoid the use of locks, leading to efficient implementations. Cooperative threads give the user a better control on the way threads are scheduled. Cooperative systems do not rely on specificities of the scheduling strategy and are thus easier to program and to port. They are also easier to debug.

However, cooperative threads have several well-known drawbacks:

While cooperative threads can be implemented efficiently at user level, they cannot benefit from multi-processor machines.
Special means to deal with blocking IO must be provided, to automatically introduce cooperation while IO operations are performed.
Standard sequential code cannot be reused as is, and simply plugged into a multi-threaded application without change. Actually, cooperation points have to be introduced in order to adapt the granularity of execution to the multi-threaded environment.
There is no way to prevent a thread from infinitely consuming the CPU resource. For instance, a thread falling into an infinite loop would make all the other threads starve.

1.2.3 Pros and cons of threads

From previous discussions, it seems rather clear that multi-threading is difficult to program. Amongst the important questions with non trivial answers are: when and where are threads to be preferred to standard sequential code, how to get portable code, and how to be able to reuse sequential code in a multi-threaded context?

Both preemptive and cooperative strategies have their pros and cons, and neither one is valid in all cases. Cooperative threads are sometimes felt as an "old" solution. This is mainly because old operating systems (such as Windows 3.1 or MacOS 9) relying on this technique are now replaced by preemptive ones. It seems however clear that preemptive strategies are not universally valid, and that cooperative frameworks are better suited for certain contexts, when for example portability is a major concern. In contexts where precise semantics is mandatory, cooperative strategies can also be useful.

1.3 Fair Threads

The Fair Threads model proposes a thread-based concurrent programming framework which mainly differs from other frameworks by its scheduling strategy:

It proposes two kinds of threads: (i) user threads that execute according to a cooperative strategy, (ii) service threads that execute according to a preemptive strategy. The former are defined in programmer source code, using the Fair Threads API. The latter requires special attention since they do not benefit from the automatic synchronization supported by the user threads. Normally service threads should be used to implement a fixed set of operations such as: concurrent system operations (IO, socket connections, sleeps), parsing, etc.
It defines instants shared by all user threads which automatically synchronize at the end of each instant, and thus execute at the same pace.
It introduces user-defined signals which are instantaneously broadcast to all threads. Signals are a modular and powerful means for threads to synchronize and communicate.

1.4 Overview of the paper

In Section Fair Threads we present the Fair Threads programming model. Then, in Section Scheme Fair Threads, we focus on the integration of these threads in the Scheme programming language. We present the Scheme API (Section API). We show how to program a Web server with support for concurrent request handling with threads (Section A Web Server with Fair Threads). We present the semantics of Scheme augmented with threads (Section Semantics of Scheme with Fair Threads). In Section Implementation, we present the main characteristics of the current implementation. In Section Related Work we present related work. Finally, in Section Weaknesses of Fair Threads and Future work we show some future directions for our work.

2 Fair Threads

Fair Threads is a new framework that combines user threads and service threads. Intuitively, a user thread is a cooperative thread which can synchronize with other threads (either user threads or service threads). It can communicate with other threads by means of user-defined signals. It may start new user threads and new service threads. On the opposite, a service thread is a preemptive thread which can only communicate with user threads, by means of signals.

The scheduler defines instants during which all user threads run up to their next cooperation point. User threads must behave fairly and cooperate with the other threads by returning the control to the scheduler. A user thread cooperates by calling the yield function or by waiting for signals. Signals are non-persistent user-defined data which are reset at the beginning of each instant. The scheduler broadcasts signals to all user threads which thus see the presence and the absence of signals in exactly the same way. Moreover, values associated with signals are also broadcast.

Signals are used when one wants one or more threads to wait for a condition, without polling a variable to determine when the condition is fulfilled. Broadcast is a means to get modularity, as the thread that generates a signal has nothing to know about its potential receivers. Fairness in signal processing means that all threads waiting for a signal always receive it during the same instant: the instant in which the signal is generated. Thus, a user thread leaving control on a cooperation point does not risk to lose a signal generated later during the same instant.

2.1 User threads

User thread fairness is ensured by the fair scheduling strategy that relies on instants. An instant consists in an iteration of user thread executions: all user threads are executed until they are blocked. A user thread can block either because it explicitly cooperates (by means of a yield operation) or because it is waiting for a signal that is not present. That is, a signal which has not been broadcast during the instant. Note that during one instant, one user thread may be elected several times by the scheduler because it may be waiting successively for different broadcast signals. Consider the following three user threads:

       A                 B                  C
----------------  -----------------  -----------------
1: await sig1     1: broadcast sig1  1: await sig1
2: await sig2     2: yield           2: broadcast sig2
3: yield          3: broadcast sig3  3: await sig3
4: await sig1

Let us detail the scheduling of these threads. For each instant, we detail exactly how and when each thread executes:

Instant 1: Thread A gets blocked line 1 because it is waiting for a signal not already broadcast during the instant. Thread B broadcasts sig1 line 1, then it cooperates line 2. That is, B has completed its execution for the instant. It is explicitly blocked. Thread C does not block on signal sig1 because it has been broadcast (by B) during the instant. It broadcasts sig2 and it blocks, waiting for signal sig3. At this point all threads have executed but it exists a thread (A) that is blocked on a signal which is present. The scheduler re-elects A in the instant. This thread does not block on signal sig2 because it is present (broadcast by C). Then, it explicitly cooperates line 3. At this very point all threads are blocked and no new signal has been broadcast. This marks the end of instant 1. Signals are reset.
Instant 2: Thread A gets blocked instantly waiting for signal sig1 line 4 which has not been broadcast during the instant. (Remember that at the beginning of an instant, all signals are reset.) Thread B broadcasts signal sig3 then terminates its execution. Thread C awakes for sig3 line 3 and it terminates. At this point all threads are blocked. The instant 2 ends.
Instant 3: Thread A blocks for the instant and for the other instants until somebody broadcasts sig1.

The scheduler iterates until every thread gets blocked, so, in this example that does not perform side effects, executing the three threads in a different order would produce the very same result.

Instants are at the heart of fair scheduling. First, they ensure that each user thread sees the same signals. Second, instant boundaries enable determinism for user thread manipulations. For instance, user threads are started, terminated, suspended, and resumed in between two instants. This framework enables an easy implementation for the problem, known to be difficult, of two threads killing each other:

            D                            E
    -----------------             -----------------
    1: terminate E                1: terminate D

Because user threads are actually terminated at the end of instants, in the previous example, the two user threads, will be terminated by the scheduler at the end of instant 1.

2.2 Service threads

Fair Threads support concurrent operations such as I/Os, character parsing, process creation, etc. These operations are implemented by means of service threads that are not under control of the Fair Scheduler, but only of the operating system. They execute concurrently (in parallel if the actual hardware supports it) without blocking running user threads. Service threads are the only means for an application using Fair Threads to benefit from hardware parallelism.

Service threads and user threads only interact using signals. From the programmer point of view, running a service thread is equivalent to waiting for a signal: the user thread requiring a concurrent service from a service thread waits for its completion as it would wait for any signal. When a service thread is in charge of a service, it computes, in parallel with other user and service threads, whatever is needed to satisfy the request. Then, it broadcasts the result of its computation to the user threads.

The purpose of the dichotomy of user and service threads is to support simple semantics and simple implementation while offering a general framework for supporting non-blocking operations and taking advantage of parallelism. In the semantics, this dichotomy confines the non-determinism to the creation of service threads. In the implementation, it isolates, in the source code of service threads, the few places where locks and mutual exclusions are needed.

3 Scheme Fair Threads

In this section, we explain how Fair threads have been integrated in the Scheme programming language [17]. They have been implemented in the Bigloo system [32]. In Section API we show what Scheme programs using Fair threads look like. In Section An example of producers/consumers we present the classical example of producers/consumers with Fair threads. In Section A Web Server with Fair Threads, we present a more realistic example: a Web sever. Finally, in Section Semantics of Scheme with Fair Threads we present the semantics of Scheme augmented with Fair threads.

3.1 API

In this section we present a subset of the Fair Threads API. The whole documentation can be found in the Bigloo user manual.

3.1.4 SRFI-18 compliance

The "Scheme Request For Implementation" (SRFI in short) effort has provided the community with a document about threads: the SRFI-18 by M. Feeley [11]. This documentation describes a thread API inspired by the Posix-1 API [23] and the Java API [12]. Scheme Fair Threads supports the whole SRFI-18 API. Here are the major constructions:

Creation and execution of user threads: A user thread is created by the make-thread function that accepts a thunk (i.e., "a procedure without argument"). As in Java, a thread is not started as soon as it is created. It is started by calling the thread-start! function. In our API, this function accepts an optional parameter which is the scheduler below which the thread must run. This argument is optional because a global scheduler pre-exists to all threads. As an example here is the creation of two threads:

(let ((th1 (make-thread (lambda () (print 'foo))))
      (th2 (make-thread (lambda () (print 'bar)))))
   (thread-start! th1)
   (thread-start! th2))

Cooperation, termination: A thread cooperates with the thread-yield! function. One thread can terminate another thread with the thread-terminate! function.

3.1.5 Fair schedulers

All threads execute inside a scheduler. Schedulers are first class values. A program may construct one or several schedulers. All schedulers share the same scheduling policy (see Section User threads) but using different schedulers may be useful to control groups of threads. Schedulers may be recursively embedded. Typically one application using threads executes as follows:

Creates a scheduler.
Creates one or several user threads.
Start the user threads in the scheduler.
Start the scheduler.

Here is an example of a simple application that executes three instants:

(let ((s (make-scheduler))
      (th1 (thread-start!
            (make-thread
             (lambda () 
                (let loop () 
                   (print 'thread1) 
                   (thread-yield!)
                   (loop))))
            s))
      (th2 (thread-start!
            (make-thread
             (lambda () (print 'thread2)))
            s)))
   (scheduler-start! s 3))

3.1.6 Signals

Basically a user thread waits for a signal by means of the thread-await! function. It may also wait for one of several signals with thread-await*! These functions block the thread until a waited signal is broadcast. They return the value associated with the signal. A thread broadcasts a signal with broadcast! which immediately sends the signal to all the threads running in the scheduler.

(let ((th1 (make-thread 
            (lambda () 
               (broadcast! 'today (date)))))
      (th2 (make-thread
            (lambda () 
               (display (thread-await! 'today))))))
   (thread-start! th1)
   (thread-start! th2))

3.1.7 Callbacks

When a thread blocks on a signal, it gets unblock as soon as the first occurrence of the signal is broadcast. However, some applications require a thread to react to several broadcasts of the same signal. This is possible with callbacks introduced by the thread-get-values form. As thread-yield, it explicitly yields the processor but, it also returns, at the next instant, all the values associated with the broadcast signal. That is, if the signal has not been broadcast thread-get-values returns an empty list, otherwise it returns the list of all values. Here is an example of a thread registering a callback on the signal click:

(make-thread
 (lambda ()
    (for-each print (thread-get-values 'click))))

At the next instant, the print function will be invoked, as many times as the signal click has been broadcast in the previous instant.

3.1.8 Service threads

Service threads are created by user threads via library functions. Each of these functions creates a signal that can be awaited by any user thread. Then it runs a service thread that executes the required service and, on completion, broadcasts the signal. Here are the service threads currently supported:

The function:
(make-service-signal (lambda (s) ....))
spawns a service thread whose body may broadcast the signal s at any time. This service thread constructor enables users to run, in parallel when the hardware supports it, preemptive concurrent threads.
output: The function:
(make-output-signal p s)
spawns a service thread that broadcasts when the string of characters s has been written to the output port p.
input: The function:
(make-input-signal p n)
spawns a service thread that broadcasts when n characters have been read on the port p. The associated value is the string of read characters.
Characters transfer: The function:
(make-send-chars-signal ip op)
spawns a service thread that transfers characters from the input port ip to the output port op. Unsurprisingly this service thread is extremely useful for implementing file server (see Section A Web Server with Fair Threads). Actually, it is a mapping of the Linux sendfile [16] facility into the Fair Threads API.
parsing: The function:
(make-rgc-signal p g)
spawns a service thread that broadcasts when an input has been parsed according to the regular grammar g on the input port p. This relies on the Bigloo regular-grammar facility [32].
socket: The function:
(make-accept-signal s)
spawns a service thread that broadcasts when a connection is established on the socket server s. The associated value is a client socket.
process: The function:
(make-process-signal p)
spawns a service thread that broadcasts when the process p completes. The value associated with the signal is the return code of the process.
timer: The function:
(make-timer-signal date)
spawns a service thread that broadcasts at a specified date.

Section A Web Server with Fair Threads contains various examples of concurrent operations coded with the previous constructions. It could be envisioned to add new service threads to this list. For instance, good candidates could be service threads supporting graphical user interfaces and foreign function calls.

3.2 An example of producers/consumers

We consider the traditional example of producers and consumers sharing a global buffer. This example is based on a wait/notify mechanism which can be implemented with thread-await!, broadcast!, and a waiting queue.

(define (wait sig)
   (let ((self (current-thread)))
      (queue-push! sig self)
      (thread-await! self)))
(define (notify sig)
   (if (not (queue-empty? sig))
       (broadcast! (queue-pop! sig))))

It is important to note that any Scheme value can be used to denote signals. This feature enables some threads to share private signals. In the implementation above, we have used the thread descriptors returned by current-thread as signal identifiers. This trick constitutes a simple solution to the problem of associating a unique signal identifier to each thread.

The shared resource is accessed via the functions get and put. According to standard multi-threading programming [5], when a thread is notified in the function get, it still has to check the availability of the resource.

(define (put val)
   (buffer-put! val)
   (notify 'rsc-available))
(define (get)
   (if (buffer-empty?)
       (begin
          (wait 'rsc-available)
          (thread-yield!)
          (get))
       (buffer-fetch)))

(define (buffer-fetch)
   (let ((r (car *buffer*)))
      (set! *buffer* (cdr *buffer*))
      r))
(define (buffer-put! val)
   (if (null? *buffer*)
       (set! *buffer* (list val))
       (set-cdr! (last-pair *buffer*) (list val))))

One may notice that no lock is needed to protect the shared buffer. Producers and consumers are implemented by the means of simple loops. Any number of producers and consumers can run concurrently.

(define (make-producer count)
   (make-thread (lambda ()
                   (let loop ((n count))
                      (put n)
                      (thread-yield!)
                      (loop (+ 1 n))))))
(define (make-consumer)
   (make-thread (lambda ()
                   (let loop ()
                      (print "Get: " (get))
                      (thread-yield!)
                      (loop)))))

After a notification, only one thread consumes the resource. No competition is needed between threads.

3.3 A Web Server with Fair Threads

We present in this section the implementation of a full-featured Web server supporting handling of multiple concurrent requests and Scripting à la CGI. We do not consider efficiency issues for this example. In particular, we do not use any pool for managing the threads. Note that, even if this server does not use cache for files, thanks to the sendfile facility (available through make-send-chars-signal in Fair Threads) it is still reasonably performant. With this server a user thread waits on a socket for a connection. When such a connection is established, it starts a new user thread that handles the request. All I/O operations are performed concurrently by service threads. Thus, they do not block other running threads handling other client requests, and the server may benefit from parallel architectures such as an SMP.

3.3.9 Creating the server

The function http-up starts the Web server. It opens a server connection (line 2). Then, it creates and runs a thread for the server and it starts the scheduler by the means of the schedule-start! (line 4). The server runs until the completion of all threads.

1: (define (http-up)
2:   (let ((s (make-socket-server)))
3:      (thread-start! (make-thread (make-http-server s)))
4:      (scheduler-start!)
5:      (socket-shutdown s)))

3.3.10 Running the server

The Web server waits for client requests. This is achieved with the expression line 9 that forces the thread running the Web server to be blocked until a connection is established. While the HTTP server is waiting for a connection, the other threads in the scheduler run.

 6: (define (make-http-server s)
 7:    (define (serv)
 8:       ;; wait cooperatively for a connection
 9:       (let ((c (thread-await! (make-accept-signal s))))
10:          ;; create and run a new thread for handling
11:          ;; the client request
12:          (thread-start!
13:           (make-thread (lambda () (http-eval c))))
14:          (serv)))
15:    serv)

3.3.11 Parsing the requests

The first action upon client connection, is to parse the request. For this, the user thread in charge of the HTTP request spawns a service thread (line 19):

17: (define (http-eval s)
18:    (let ((l (thread-await!
19:              (make-rgc-signal (socket-input s) G))))
20:       (cond
21:          ((string? l)
22:           (http-get-request s l))
23:          (else
24:           (http-reply s "Unknown request")))
25:       (socket-shutdown s #f)))

It blocks until the HTTP GET line is read and parsed using regular expressions:

27: (define G
28:    (regular-grammar ()
29:       ((: "GET " (submatch (+ (out " \r\n"))))
30:        (the-submatch 1))
31:       (else
32:        #f)))

3.3.12 Serving the requests

This Web server handles regular HTML files and also Scheme files. We will explain how these later are handled in Section Handling Scheme scripts. First, let us focus on regular HTML files.

33: (define (http-get-request s file)
34:    (let ((p (case (string->symbol (suffix file))
35:                ((scm)
36:                 (http-get-scm file))
37:                (else
38:                 (http-get-html file)))))
39:       (http-reply s p)))
40: (define (http-get-html f)
41:    (if (file-exists? f)
42:        (open-input-file f)
43:        "Can't find file"))

3.3.13 Replying

Replying to a request, takes two parameters: (i) a socket onto which the characters composing the answer are written, (ii) data from which the characters are extracted. In this toy implementation the data is either a string or an input port. Replying to a request spawns successively two service threads. A first one emits the header (line 53). A second one transfers the characters (line 56). Because of these service threads, http-reply does not block other threads running in the scheduler and thus the server supports concurrent requests.

43: (define (http-reply s d)
44:    (let ((op (socket-output s))
45:          (ip (cond
46:                 ((string? d)
47:                  (open-input-string d))
48:                 ((input-port? d)
49:                  d)
50:                 (else
51:                  (open-input-string "internal error")))))
52:       (thread-await!
53:        (make-output-signal op "HTTP/1.1 200 Ok\r
54: Server: test_httpd 1.0\r
55: \r\n\r\n"))
56:       (thread-await! (make-send-chars-signal ip op))))

3.3.14 Handling Scheme scripts

We have provided our Web server with the ability to run scripts à la cgi-bin. More precisely, when the file to be served is a Scheme file, instead of providing the characters composing the file, the server evaluates the file with the Bigloo interpreter and answers the result of this evaluation to the client. The implementation relies on the Bigloo run-process facility that runs asynchronous external processes. The output port associated with the process is sent to http-reply.

58: (define (http-get-scm f)
59:    (if (file-exists? f)
60:        (process-output-port 
61:         (run-process "bigloo" "-i" "-s"
62:                      f output: pipe:))
63:        "Can't find file"))

3.3.15 Conclusion

In about sixty lines of Scheme code we have implemented a multi-threaded, multi-process Web server. Each request is handled separately by a specific thread. Since, parsing the requests and writing outputs are implemented using service threads, the server can scale up to an SMP architecture. One should note that for such a typical application, cooperative scheduling has not introduced any additional complexity in the source code: there is no need to explicitly call the thread-yield! function because all cooperations are naturally implemented by the I/O operations.

3.4 Semantics of Scheme with Fair Threads

In this section we present the semantics of Scheme augmented with Fair Threads. Sequential and concurrent aspects of programming languages are so intricately related that considering sequentiality and concurrency apart could yield to major flaws in the overall language semantics and design. This is a lesson we have learned from Java for which it has been demonstrated that the memory model is incompatible with the thread model [26,21]. So, we have decided to express the semantics of Scheme and the semantics of the threads in a single formalism. We use denotational semantics because this is the traditional formalism for Scheme. In Section Syntax we give the language for which we present the semantics. The language is slightly different from the one we have been previously studying in order to ease the writing of the semantics. In Section Semantics of Core Scheme we present, as a reminder, the semantics of core Scheme. Then, in Section Semantics of user threads we present the semantics of thread operations such as thread-yield! or thread-await!. In Section The Scheduler we present the semantics of the scheduler.

3.4.16 Syntax

For the sake of simplicity, we do not consider here side effects. Side effects introduce no difficulties so they could be easily added to the semantics. However, getting rid of side effects suppresses one parameter of the semantics (the store). Thus, we consider the following subset of Scheme:

<continuation> -> <expression>
<expression> -> <variable>
  | <literal>
  | (begin <expression> <expression>)
  | (if <expression> <expression> <expression>)
  | (lambda (<variable>) <expression>)
  | (<expression> <expression>)

The syntax for creating and handling threads is similar to the one presented in Section API except that we do not present the semantics for thread-get-values. We omit this last form because in term of control, it behaves such as thread-yield!. However, it would make the representation of the signal environment more complex without introducing any specificity. At last, we present a variant of scheduler reaction. In order to avoid using an abstract data-type for representing schedulers we do not present the couple thread-start! and scheduler-start!. Instead we present the semantics of the form (schedule exp) which can be expanded into the former ones as follows:

(define-syntax schedule
   (syntax-rules ()
      ((schedule exp)
       (let ((s (make-scheduler)))
          (thread-start! (make-thread (lambda () exp)) s)
          (scheduler-start! s)))))

So, the syntax we are considering is:

<expression> -> ...
  | (make-thread <expression>)
  | (thread-terminate! <expression>)
  | (thread-yield!)
  | (thread-await! <expression>)
  | (broadcast! <expression> <expression>)
  | (schedule <expression>)

A thread can be in four states: (i) It can be blocked because it cooperates. (ii) It can be ready to run, that is, it can be elected by the scheduler. (iii) It can be ended, that is, its body has executed until completion. (iv) It can be waiting for a signal. We denote the state of threads with the following pseudo-syntax:

<thread> -> (block <continuation>)
  | (ready <continuation>)
  | (end <continuation> <expression>)
  | (wait <continuation> <expression>)

3.4.17 Semantics of Core Scheme

The function E evaluates an expression in a lexical environment (ρ), a signals environment (σ) and a continuation (κ). It returns a Scheme value. When a thread returns to the scheduler, it returns a ThdResult which is a triple made of a thread, a signal environment, and a list of new threads to be started at the next instant. Remember that with the definition of signal environment, any Scheme value may be used to denote signal. The domains are presented in Figure 1.

π Command

τ Thread

ν Identifier

T list of Threads

ε ∈ Value = Fun + Integer + Cons + ...

φ ∈ Fun = Value × Cont → Value

κ ∈ Cont = Value × Signals → Value

ω ∈ ThdResult = Thread × Signals × Thread^_*

Ω ∈ ScdlResult = Thread^_* × Signals × Thread^_*

ρ ∈ Env = Ident → Value

σ ∈ Signals = Value → Value + ?

E : Form → Env × Signals × Cont → Value

S : Thread → Signals × Thread^_* → ScdlResult

Fig. 1: Domains

We use a semantics of Scheme that resembles the one shown in Queinnec's book [27]. It is presented in Figure 2.

E[ν]ρσκ = (κ (ρ ν) σ)
E[(if π π₁ π₂)]ρσκ = (E[π] ρ σ λεσ₁.(boolify ε) → (E[π₁] ρ σ₁ κ), (E[π₂] ρ σ₁ κ))
E[(begin π₁ π₂)]ρσκ = (E[π₁] ρ σ λεσ₂.(E[π₂] ρ σ₂ κ))
E[(lambda (ν) π)]ρσκ = (κ inValue(λεσ₂κ₂.(E[π] ρ[ν/ε] σ₂ κ₂)) σ)
E[(π₁ π₂)]ρσκ = (E[π₁] ρ σ λφσ₁.(E[π₂] ρ σ₁ λεσ₂.(φ|_Function ε σ₂ κ)))
call/cc = inValue(λφσκ.(φ|_Function inValue(λεσ₂κ₂.(κ ε σ₂)) σ κ))

Fig. 2: Semantics of core Scheme without side effects (see Scheme code)

3.4.18 Semantics of user threads

We start presenting, in Figure 3, the user level of threads, that is, the semantics of the forms composing the API.

E[(broadcast! π₁ π₂)]ρσκ = (E[π₁] ρ σ λε₁σ₁.(E[π₂] ρ σ₁ λε₂σ₂.(κ ε₂ σ₂[ε₁/ε₂])))
E[(thread-yield!)]ρσκ = 〈(block κ), σ, ∅〉
E[(thread-await! π)]ρσκ = (E[π] ρ σ λεσ₁.(σ₁ ε)≠? → (κ (σ₁ ε) σ₁), 〈(wait κ ε), σ₁, ∅〉)
E[(thread-terminate! π)]ρσκ = (E[π] ρ σ λε₁σ₁.〈(end κ ε₁), σ₁, ∅〉)
E[(make-thread π)]ρσκ = 〈(ready κ), σ, list((block λεσ.(E[π] ρ σ end-kont)))〉
E[(schedule π)]ρσκ = (κ schedule(list((ready λεσ.(E[π] ρ σ end-kont))), σ))

Fig. 3: Semantics of Fair Threads (see Scheme code)

Here is a brief explanation for each entry:

broadcast: The broadcast! form evaluates its arguments (the signal identifier to be broadcast and the value associated with the signal) then it invokes its continuation.
cooperation: The cooperation thread-yield! switches back to the scheduler. That is, it constructs a triple made of the current thread state which is now blocked for the instant, the signals environment (σ) and the set of new threads to be added. This set is always empty except for the make-thread form that creates new threads. The thread state of a blocked thread is made of the block mark and its current continuation.
waiting: The evaluation of a thread-await! first evaluates the identifier denoting the awaited signal. Then, it inspects the signal environment (σ). If the signal is present (absence is denoted by the question mark "?") the continuation is immediately invoked with the value of the signal. If the signal is not present yet, the thread returns to the scheduler with the wait mark denoting the state of the thread, its current continuation, and the signal it is waiting for.
termination: Terminating a thread is straightforward. It simply returns to the scheduler.
creation: When a thread creates a new thread, it returns to the scheduler as a ready thread. This implies that it will be reelected in the same instant by the scheduler. The round trip travel through the scheduler is only used to register end the new created thread. This new thread is marked as blocked. So, it will not get started until the next instant.
scheduling: The form schedule creates a scheduler, and starts a thread. The continuation termCont is used for each newly created thread. This continuation simply returns to the scheduler a thread in the end state. That is, it notifies the scheduler that the thread cannot ever be reelected.

3.4.19 Semantics of service threads

From a semantics point of view, a service thread is an operator that broadcasts a new signal at an unpredictable time. It can be viewed as a user thread computing a value and broadcasting a signal after a pause that last a random number of instants:

(define (make-...-signal . args)
   (let ((sig (gensym)))
      (thread-start! 
       (make-thread 
        (lambda () 
           (let ((v (... args)))
              (do ((i (random) (- i 1)) (> i 0))
                  (thread-yield!))
              (broadcast! sig v)))))
      sig))

Service threads are the only places where non determinism is introduced in the semantics. So, a program using only user threads, would be fully deterministic and predictable. That is, the system guarantees a precise scheduling order for program using only user threads. Therefore, a program not using service threads could be debugged as easy as sequential code because it would support replayed executions.

3.4.20 The Scheduler

As presented in Section User threads, during one instant a thread can be elected several times by the scheduler. Actually, each instant consists in a fix-point iteration. The scheduler elects all runnable user threads. A runnable user thread is a thread that is ready or a thread that is waiting for a signal which has been broadcast in the instant. The semantics of the scheduler is presented in Figure 4. The operator § stands for the lists concatenation. The functional operator it-list3 applies its first argument which is a function accepting three parameters, to an accumulated list.

schedule = λT.let T₁ = scheduler-react(T, empty-event, ∅) in finish?(T₁) → map(λτ.thread-end-value(τ), T₁), schedule(T₁)
finish? = λT.∀τ∈T, thread-end?(τ)
empty-event = λν.?

Fig. 4: The Scheduler (see Scheme code)

The fix-point iteration is ensured by the function micro-schedule. This function applies the function S to each thread. S allocates the processor to threads that are runnable. That is, it invokes the continuation of the thread which is captured in the thread state. S also accumulates the threads to be started at the beginning of the next instant. The fix-point is reached when the function micro-finish? returns the boolean true. Switching from one instant to the next instant is ensured by the function next-ready. It turns blocked threads into ready threads. The fix-point iteration is presented in Figure 5. The function S is presented in Figure 6.

scheduler-react = λTσT₁.instant-finish?(T, σ) → next-ready(T)§T₁, let Ω = it-list3(S, T, σ, T₁), T₂ = Ω↓₁, σ₁ = Ω↓₂, T₃ = Ω↓₃ in scheduler-react(T₂, σ₁, T₃)
instant-finish? = λTσ.∀τ∈T, thread-ready?(τ) → false, thread-wait?(τ) → (σ thread-wait-signal(τ))=?, true
next-ready = λT.map(λτ.thread-block?(τ) → (ready thread-cont(τ)), τ, T)

Fig. 5: The Scheduler fix-point iteration (see Scheme code)

S[(block κ)]τσT = 〈τ, σ, T〉
S[(end κ ε)]τσT = 〈τ, σ, T〉
S[(ready κ)]τσT = let ω = (κ unspecified σ) in 〈ω↓₁, ω↓₂, T§ω↓₃〉
S[(wait κ ε)]τσT = (σ ε)≠? → let ω = (κ (σ ε) σ), τ = ω↓₁, σ₁ = ω↓₂, T₁ = ω↓₃ in 〈τ, σ₁, T§T₁〉, 〈τ, σ, T〉

Fig. 6: The micro Scheduler (see Scheme code)

4 Implementation

The Bigloo system supports two execution platforms [31]: (i) native executions on Posix-1 compliant operating systems, (ii) Java executions on JVM. On both systems, user and service threads are mapped into native threads. Posix threads and Java threads are so close that the implementation of Fair Threads on both systems is very similar.

4.1 Naïve implementation

Conceptually, an execution token is passed through user threads and schedulers. In order to determine which threads can receive the token, the scheduler examines all user threads. It eliminates the ones that are explicitly blocked and the ones that are waiting for a non-present signal. When a thread yields explicitly or implicitly it gives back the token to the scheduler. Service threads are implemented by means of native threads. That is, each time a service thread is spawned, the fair scheduler starts a new native thread that runs in parallel until completion and then broadcasts a signal.

4.2 Actual implementation

The current implementation in the Bigloo system, improves the naïve implementation in several directions. First, it avoids busy waiting. Second, it minimizes the creation of native threads. Third, it reduces the number of context switches.

4.2.21 Avoiding busy-waiting

It is extremely important to implement signal broadcast efficiently because this operation is ubiquitous in the Fair Threads programming style. Because the model does not propose point to point communication and because broadcasting is the only communication means between user and service threads. In the actual implementation, a thread does not consume the processor resource when waiting for a signal. The scheduler avoids active waiting by placing threads into wait queues associated with signals.

4.2.22 Minimizing thread creations

Service threads are intensively used and in general they have a very short lifetime. So, the overhead imposed by spawning a service thread must be kept as minimal as possible. A naïve implementation that would start a new native thread each time a service thread is created would be inefficient. Using a pool of threads would improve the global performance because it would get rid of thread creation time. However, it would probably not perform sufficiently well because it would not reduce the context switches between user and service threads. The current implementation deploys an ad-hoc optimization that eliminates the thread creations and most of the thread context switches. It relies on the observation that the general pattern for spawning a service thread is: (thread-await! (make-...-signal ...))

That is, the current user thread blocks until the service thread completes and broadcasts its signal. One may take advantage of this situation. It is possible not to allocate a native thread for implementing the service thread but instead to reuse the user thread. This optimization takes place inside the thread-await! function. When its argument is created by one of the make-...-signal library functions it detaches the user thread from the scheduler to make it execute the service required. In other words, the user thread becomes temporarily a service thread. On completion, this thread broadcasts a signal and gets re-attached to the scheduler.

It might be possible that some service threads have been created but not awaited (for instance, when writing some data to a port, it is frequent not to check or not to wait for the result). So, at the end of the instant, when all user threads are blocked, the scheduler starts native threads for these pending service threads.

4.2.23 Minimizing context switches

Fair Threads context switches are expensive. A token is shared by all user threads. Allocating the CPU to a new user thread involves a lock or mutex acquisition and a notification. The less context switches, the more efficient the system.

One way to minimize context switches is to avoid, as much as possible, to switch back and forth to the scheduler. In particular, when a user thread U1 cooperates, it is useless to switch back to the scheduler which, in turn, switches forth to the next runnable thread U2. U1 can directly give the token to U2. The scheduling algorithm is thus actually split over all user threads. Each of these executes a small part of the scheduling task. In this framework, the thread running the scheduler gets the CPU only at the end of the instant. This is imposed by the optimization presented in Section Minimizing thread creations. It might happen that all user threads are currently used to execute service threads. In such a situation there must exist another thread, waiting for the first broadcast signal. This is the role of the scheduler.

5 Related Work

Threads have been used for programming concurrency since the mid eighties and have met a large popularity in the mid nineties with the Java era. Following the path started with Ada and Modula [37,7], Java has been one of the first popular general purpose programming languages to include concurrent primitives by means of threads in the core language [12]. Others proposals exist mainly based on message passing such as Erlang [2] but we don't consider them here as they define a totally different style of programming. If many people are still thinking of threads as the "classical" way to program concurrency in the context of shared memory, nowadays, criticizing threads has become quite frequent, if not popular. Criticisms focus on two different aspects: performance and programming:

Performance: Several studies have recently demonstrated that when performance is demanded, the preemptive model of Posix-1 threads is not suitable [8,35,1,20]. Thread context switches are also time consuming, especially when threads are implemented at kernel level [10]. The current implementation of Fair Threads is not polished enough to compare its performance with traditional preemptive systems. It is still to be demonstrated that Fair Threads improves multi-threaded programming with respect to performance.
Programming: In 1996, J. Ousterhout compared preemptive threads and event-loops [24], strongly defending the latter. The strongest point against threads is that they are extremely difficult to program. Because Fair Threads separates user threads and service threads they do not suffer from the difficulties of classical preemptive multi-threading.

In the rest of this section, we compare the Fair Threads with other programming models. The area of programming concurrency is so vast that it cannot be fully embraced by this section. Here, we focus on a comparison with other multi-threading support in functional languages, with purely cooperative scheduling systems, and with synchronous and reactive programming.

5.1 Threads and functional languages

It is well-known that threads can be expressed in functional languages equipped with continuations [30,36,33]. In the context of Scheme, continuations are captured by the call/cc primitive. However, the definition of threads using continuations is rather unsatisfactory as call/cc is known to be difficult to implement efficiently [22]: the entire execution context has to be captured each time a continuation is created. Moreover, call/cc does not mix well with exceptions [4,28].

To our knowledge, concurrency models in the Scheme programming language have mostly been studied through the looking glass of continuations and engines, a means for providing preemptive schedulers [9,18]. If some modern Scheme implementations support concurrency with variants of Posix-1 threads [11] none has explored alternative constructions. Most works have focused on implementation issues [19,34].

Some functional languages such as OCaml support hybrid multi-threading model made of a Posix-like semantics and a mostly-cooperative implementation. That is, these systems present a preemptive scheduler but the preemptions can only take place at specific moments in the execution (for instance, after a fixed number of allocations or using an interval timer). These systems have the advantage of relaxing the need to introduce cooperation points in the programs but still, they only support one thread running at a time. So, we think that this approach has some drawbacks:

It can hardly benefit from parallel hardware (SMP).
When true preemption is involved, the scheduler dynamically decides when the thread context switches take place and which new threads get the CPU. In a cooperative scheduling, these decisions are taken by the threads themselves. In the hybrid model where context switches take place at regular intervals with few tuning possibilities, it is likely that the executions would switch contexts more than the other techniques.

CML [29] is another ML-based language which introduces threads as first-class objects. CML uses preemptive scheduling. In CML implementation, context switches are provoked by an interval timer provided by the operating system. CML puts the focus on synchronous communication mechanisms based on message passing. Thus, it basically differs from Fair Threads by the use of preemptive scheduling and message passing instead of broadcast signals. However, Fair Threads and CML both have a formal semantics and are mixing concurrent and functional programming in a same language.

Concurrent Haskell [25] supports preemptive multi-threading. Preemptions can only take place during allocations. It proposes a synchronization facility called Mvars that enables several threads to synchronize on a single resource. This construction could be seen as an abstraction close to Posix-1 condition variables. Concurrent Haskell separates the semantics of the sequential core language and the concurrent layer which is defined by an operational semantics based on the π-calculus.

5.2 Cooperative threads

Pth (GNU Portable Threads) [10] is a Posix based library which provides non-preemptive priority-based scheduling for multi-threading inside event-driven applications. The thread scheduling is done in a cooperative way which means that the threads are managed by a priority- and event-based non-preemptive scheduler. The intention is to get better portability and run-time performance than with preemptive scheduling. The event facility, basically implemented with select, allows threads to wait until various types of events occur, including I/Os, asynchronous signals, timers, thread and process termination, and customized callback functions. The State Threads (ST) is a library for writing Internet applications such as Web servers. It offers a multi-threading API for structuring a network application as an event-driven state machine. The State Threads library is a derivative of the Netscape Portable Runtime library (NSPR). ST uses a cooperative event-driven scheduling within a single process address space (many-to-one model). ST is fully deterministic and always guarantees that the context switch can only happen in a well-known set of functions (at I/O points or at explicit cooperation points). Actually, there are two major differences between Pth and ST on one side, and Fair Threads on the other side: first, as opposed to Fair Threads, Pth and ST do not require any kernel support (like Posix-threads), but cannot benefit from multiprocessor machines. Second, Pth and ST do not put the focus on syntax and semantics, as it is done in Fair Threads which introduces new primitives for multi-threading programming.

5.3 Synchronous and Reactive programming

Fair Threads are largely based on synchronous reactive programming. They inherit the semantics of signal broadcast and instants. Fair Threads stands on the side of reactive programming. That is, contrarily to synchronous programming à la Esterel [3], it does not enable reaction to the absence of signals. This limitation has an important consequence: by construction, it avoids causality cycles (i.e., a loop checking the presence of a signal and, if this is not present, broadcasting it). Thus, reactive programming does not need a static compilation stage detecting causality cycles. So, it enables dynamic code loading.

Fair Threads is an attempt to adapt techniques developed for reactive programming [13,14] to multi-threading programming. Fair Threads can also be viewed as a simplification of reactive programming.

Multi-threading supports only one level of parallelism while reactive programming supports nested parallelism.
Reactive programming proposes an algebra to express various conditions describing when a program blocks. Fair Threads is very limited with this respect because they only allow a thread to get blocked on the absence of signals.

The Fair Threads model presented in this paper is the last achievement of a research started in 2001 by F. Boussinot [6]. The former versions and the current one differs in several aspects:

Former Fair Threads was purely cooperative. They did not support service threads.
They did not support callback (thread-get-values).
They did not support multiple waits (thread-await*, and thread-get-values*).
The integration of former Fair Threads inside a host language has not been thoroughly studied. In particular, the semantics given for the former Fair Threads contained no information on how threads and sequential codes interact.

6 Weaknesses of Fair Threads and Future work

Fair Threads is a step in the direction of combining some advantages of cooperative scheduling (simple semantics, ease of programming) and some advantages of preemptive scheduling (exploiting hardware parallelism). It has some success (for instance, it enables realistic Web server implementation) but it also has weaknesses. We present two of them and the envisioned solutions in this section.

6.1 User thread cooperation points

The main drawback of Fair Threads is the difficulty, faced by programmers, to place cooperation points in the source codes. Sometimes this task is straightforward (see the example of Section A Web Server with Fair Threads). Sometimes it is complex. CPU intensive computations stand on this side. If the program contains too few cooperation points, there is the risk that one thread monopolizes the processor. If the program contains too many cooperation points, there is a risk to impede performance. It is even likely that there is no static optimal solution. It could be that the decisions to switch from one thread to another should be taken according to dynamic criteria, only known at runtime. Fair Threads does not improve cooperative systems in this matter.

To minimize the need of explicit cooperation points, we are considering the design of Fair Processes. We think that most of the difficulties of multi-threading programming are basically due to the simultaneous use of concurrency, shared memory, and preemption. Fair Threads divides them. User threads propose concurrency plus shared memory. System threads propose concurrency plus preemption. Unfortunately, system threads cannot be defined in user programs. This is why, we are considering designing Fair processes that could satisfy this demand. Fair processes would look like Unix processes (à la fork) or Java isolates. Coupled with new constructions for handling chunks of shared memory (such as mmap), they could be used to implement independent CPU intensive computations. The current semantics of Fair Threads has probably sufficient room for integrating Fair processes in a way similar to service threads.

6.2 Low number of simultaneous threads

Native threads consume resources (memory, descriptors, ...). So, in general, operating systems limit the number of threads that can be run simultaneously (e.g., 1024 for a standard Linux kernel configuration). However, some applications need more threads than the authorized limit. As stated in Section Synchronous and Reactive programming, Fair Threads programming is largely based on reactive programming. Reactive programming is particularly suitable when a lot (several thousands) of small synchronized behaviors have to be implemented. That is, reactive programming can be used when the upper bound of the number of threads is reached. In addition, reactive programs can be executed more efficiently than threads because the context switches between two reactive programs is cheap. Thus, we are considering merging the two models by extending the language so that one would be able to mix Fair Threads and reactive programs.

7 Conclusion

In this paper we have presented the Fair Threads model. It combines cooperative user threads and preemptive service threads. This approach confines non-determinism inside the latter. So, it minimizes the problems frequently encountered while programming concurrent applications. Still, it supports parallel executions for improving the global performance of applications.

Bibliography

[1]	Adya, A. and Howell, J. and Theimer, M. and Bolosky, J. and Douceur, J. -- Cooperative Task Management without Manual Stack Management or Event-driven Programming is Not the Opposite of Tthreaded Programming -- Proceedings of the Usenix Annual Technical Conference, Monterey, CA, USA, Jun, 2002, pp. 289--302.
[2]	Armstrong, J. and Virding, R. and Wikström, C. and Williams, M. -- Concurrent Programming in ERLANG -- Prentice Hall, 1996.
[3]	Berry, G. and Gonthier G. -- The Esterel Synchronous Programming Language: Design, Semantics, Implementation -- Science of Computer Programming, 19(2), 1992, pp. 87-152.
[4]	Biagioni, E. and Cline, K. and Lee, P. and Okasaki, C. and Stone, C. -- Safe-for-Space Threads in Standard ML -- Proc. 2nd ACM SIGPLAN Workshop on Continuations, 1996.
[5]	Birrell, A. -- An Introduction to programming with threads -- 35, DEC Systems Research Center, Palo-Alto, CA, USA, Jan, 1989.
[6]	Boussinot, F. -- Java Fair Threads -- Research Report, RR-4139, Inria, 2001.
[7]	Cardelli, L. and al. -- Modula-3 Report (revised) -- 52, DEC Systems Research Center, Palo-Alto, CA, USA, Nov, 1989.
[8]	Chankhunthod, A. and Danzig, P. and Neerdaels, C., Schwartz, M., Worrell, K. -- A Hierarchical Internet Object Cache -- Proceedings of the Usenix Annual Technical Conference, 1996, pp. 153--164.
[9]	Dybvig, K. and Hieb, R. -- Engines from Continuations -- Computer Languages, 2(14), 1989, pp. 109--123.
[10]	Engelschall, R. -- Portable Multithreading - The Signal Stack Trick for User-Space Thread Creation -- Proceedings of the Usenix Annual Technical Conference, San Diego, CA, USA, Jun, 2000.
[11]	Feeley, M. -- SRFI 18: Multithreading support -- 2000.
[12]	Gosling, J. and Joy, B. and Steele, G. -- The Java Language Specification -- Addison-Wesley, 1996.
[13]	Hazard, L. and Susini, J-F. and Boussinot, F. -- The Junior reactive kernel -- Research Report RR-3732, Inria, Jul, 1999.
[14]	Hazard, L. and Susini, J-F. and Boussinot, F. -- Programming with Junior -- Research Report RR-4027, Inria, 2000.
[15]	Hollub, A. -- Taming Java Threads -- Apress, 2000.
[16]	Joubert, P. and King, R. and Neves, R. and Russinovich, M. and Tracey, J. -- High-Performance Memory-Based Web Servers: Kernel and User-Space Performance -- Usenix, 2001, pp. 175--188.
[17]	Kelsey, R. and Clinger, W. and Rees, J. -- The Revised(5) Report on the Algorithmic Language Scheme -- Higher-Order and Symbolic Computation, 11(1), Sep, 1998.
[18]	Kelsey, R. and Rees, J. -- A tractable Scheme implementation -- Lisp and Symbolic Computation, 7(4), 1994, pp. 315--335.
[19]	Kranz, D. and Kesley, R. and Rees, J. and Hudak, P. and Philbin, J. and Adams, N. -- ORBIT: An Optimizing Compiler for Scheme -- Symposium on Compiler Construction, Palo Alto, California, Jun, 1986, pp. 219--233.
[20]	Larus, J. and Parkes, M -- Using Cohort Scheduling to Enhance Server Performance -- Proceedings of the Usenix Annual Technical Conference, Monterey, CA, USA, Jun, 2002, pp. 103--114.
[21]	Lea, D. and Pugh, B. -- Correct and Efficient Synchronization of Java Technology-based Threads -- JavaOne, 2000.
[22]	Mateu, L. -- Efficient Implementation of Coroutines -- International Workshop on Memory Management, Lecture Notes on Computer Science, (637), Saint-Malo (France), Sep, 1992, pp. 230--247.
[23]	Nichols, B. and Buttlar, D. and Proulx Farrell, J. -- Pthreads Programming -- O'Reilly Associates, USA, 1996.
[24]	Ousterhout, J. -- Why Threads Are a Bad Idea (for most purposes) -- Invited talk of the USENIX Winter Technical Conference, Jan, 1996.
[25]	Peyton Jones, S. and Gordon, A. and Finne, S. -- Concurrent Haskell -- Symposium on Principles of Programming Languages, Jan, 1996.
[26]	Pugh, B. -- Fixing the Java Memory Model -- Java Grande, San Francisco, CA, USA, Jun, 1999.
[27]	Queinnec, C. -- Lisp In Small Pieces -- Cambridge University Press, 1996.
[28]	Reppy, J. -- Asynchronous Signals in Standard ML -- TR 90-1144, Cornell University, Aug, 1990.
[29]	Reppy, J. -- Concurrent Programming in ML -- Cambridge University Press, 1999.
[30]	Reynolds, J. -- GEDANKEN- A Simple Typeless Language Based on the Principle of Completeness and the Reference Concept -- Communications of the ACM, 13(5), May, 1970, pp. 308--319.
[31]	Serpette, B. and Serrano, M. -- Compiling Scheme to JVM bytecode: a performance study -- 7th Int'l Conf. on Functional Programming, Pittsburgh, Pensylvanie, USA, Oct, 2002.
[32]	Serrano, M. -- Bigloo user's manual -- 0169, INRIA-Rocquencourt, France, Dec, 1994.
[33]	Shivers, O. -- Continuations and threads: Expressing machine concurrency directly in advanced languages -- ACM SIGPLAN Workshop on Continuations, Paris, France, Jan, 1997.
[34]	Shivers, O. -- Atomic Heap Transactions and Fine-Grain Interrupts -- Int'l Conf. on Functional Programming, Paris, France, Sep, 1999.
[35]	Vivek, S. and Druschel, P. and Zwaenepoel, W. -- Flash: An efficient and portable Web server -- Proceedings of the Usenix Annual Technical Conference, Monterey, CA, USA, Jun, 1999.
[36]	Wand, M. -- Continuation-Based Multiprocessing Revisited -- Higher Order and Symbolic Computation, 12(3), 1999, pp. 285--299.
[37]	Wirth, N. -- Programming in Modula-2 -- Texts and Monographs in Computer Science, 1993.

Appendix: Fair threads semantics, the source code

Semantics of Core Scheme

(define (ev-var i)
   (lambda (env events pk)
      (pk (env i) events)))
(define (ev-if pi pi1 pi2)
   (lambda (env events pk)
      ((ev pi) env events
               (lambda (v events1)
                  (if (boolify v)
                      ((ev pi1) env events1 pk)
                      ((ev pi2) env events1 pk))))))
(define (ev-begin pi1 pi2)
   (lambda (env events pk)
      ((ev pi1) env events
                (lambda (v events2)
                   ((ev pi2) env events2 pk)))))
(define (ev-lambda1 i pi)
   (lambda (env events pk)
      (pk (make-closure (lambda (v events2 pk2)
                           ((ev pi) (extend i v env)
                                    events2 pk2)))
          events)))
(define (ev-funcall1 pi1 pi2)
   (lambda (env events pk)
      ((ev pi1) env events
                (lambda (f events1)
                   ((ev pi2) env events1
                             (lambda (v events2)
                                ((closure->function f) v events2 pk)))))))
(define-library call/cc
   (make-closure
    (lambda (f events pk)
       ((closure->function f)
        (make-closure (lambda (v events2 pk2)
                         (pk v events2)))
        events pk))))

Fig. 7: Semantics of core Scheme without side effects (see Denotational notation)

Semantics of user threads

(define (ev-emit pi1 pi2)
   (lambda (env events pk)
      ((ev pi1) env events
                (lambda (v1 events1)
                   ((ev pi2) env events1
                             (lambda (v2 events2)
                                (pk v2 (extend v1 v2 events2))))))))
(define (ev-yield)
   (lambda (env events pk)
      (make-result (make-thread 'block pk) events '())))
(define (ev-wait pi)
   (lambda (env events pk)
      ((ev pi) env events
               (lambda (v events1)
                  (if (event? (events1 v))
                      (pk (events1 v) events1)
                      (make-result (make-thread 'wait pk v)
                                   events1
                                   '()))))))
(define (ev-terminate pi)
   (lambda (env events pk)
      ((ev pi) env events
               (lambda (v1 events1)
                  (make-result (make-thread 'end pk v1) events1 '())))))
(define (ev-thread pi)
   (lambda (env events pk)
      (make-result (make-thread 'ready pk)
                   events
                   (list (make-thread 'block
                                      (lambda (v events)
                                         ((ev pi)
                                          env
                                          events
                                          end-kont)))))))
(define (ev-schedule pi)
   (lambda (env events pk)
      (pk (schedule (list (make-thread 'ready
                                       (lambda (v events)
                                          ((ev pi)
                                           env
                                           events
                                           end-kont))))
                    events))))

Fig. 8: Semantics of Fair Threads (see Denotational notation)

The Scheduler

Macro steps

(define (doschedule-sans-deadlock T)
   (let ((T1 (micro-schedule T empty-event '())))
      (if (finish? T1)
          (map (lambda (t) (thread-end-value t)) T1)
          (schedule T1))))
(define (finish? T)
   (every? (lambda (t) (thread-end? t)) T))
(define (empty-event i)
   bottom)

Fig. 9: The Scheduler (see Denotational notation)

(define (domicro-schedule T events T1)
   (if (micro-finish? T events)
       (append (next-ready T) T1)
       (let* ((Res (it-list3 sc T events T1))
              (T2 (result0 Res))
              (events1 (result1 Res))
              (T3 (result2 Res)))
          (micro-schedule T2 events1 T3))))
(define (micro-finish? T events)
   (every? (lambda (t)
              (cond
                 ((thread-ready? t) #f)
                 ((thread-wait? t)
                  (not-event? (events (thread-wait-signal t))))
                 (else #t)))
           T))
(define (next-ready T)
   (map (lambda (t)
           (if (thread-block? t)
               (make-thread 'ready (thread-cont t))
               t))
        T))

Fig. 10: The Scheduler fix-point iteration (see Denotational notation)

Micro steps

(define (sc-block pk)
   (lambda (t events T)
      (make-result t events T)))
(define (sc-end pk v)
   (lambda (t events T)
      (make-result t events T)))
(define (sc-ready pk)
   (lambda (t events T)
      (let ((res (pk #unspecified events)))
         (make-result (result0 res) (result1 res) (append T (result2 res))))))
(define (sc-wait pk v)
   (lambda (t events T)
      (if (event? (events v))
          (let* ((res (pk (events v) events))
                 (t (result0 res))
                 (events1 (result1 res))
                 (T1 (result2 res)))
             (make-result t events1 (append T T1)))
          (make-result t events T))))

Fig. 11: The micro Scheduler (see Denotational notation)

This Html page has been produced by Skribe.
Last update Wed Oct 6 10:36:22 2004.

1	Introduction
	1.1	Event-driven programming
	1.2	Threads
	1.3	Fair Threads
	1.4	Overview of the paper
2	Fair Threads
	2.1	User threads
	2.2	Service threads
3	Scheme Fair Threads
	3.1	API
	3.2	An example of producers/consumers
	3.3	A Web Server with Fair Threads
	3.4	Semantics of Scheme with Fair Threads
4	Implementation
	4.1	Naïve implementation
	4.2	Actual implementation
5	Related Work
	5.1	Threads and functional languages
	5.2	Cooperative threads
	5.3	Synchronous and Reactive programming
6	Weaknesses of Fair Threads and Future work
	6.1	User thread cooperation points
	6.2	Low number of simultaneous threads
7	Conclusion
	Bibliography
	Appendix: Fair threads semantics, the source code

	π		Command
	τ		Thread
	ν		Identifier
	T		list of Threads
`ε` ∈	Value	=	Fun + Integer + Cons + ...
`φ` ∈	Fun	=	Value × Cont → Value
`κ` ∈	Cont	=	Value × Signals → Value
`ω` ∈	ThdResult	=	Thread × Signals × Thread^_*
`Ω` ∈	ScdlResult	=	Thread^_* × Signals × Thread^_*
`ρ` ∈	Env	=	Ident → Value
`σ` ∈	Signals	=	Value → Value + ?
	`E`	:	Form → Env × Signals × Cont → Value
	`S`	:	Thread → Signals × Thread^_* → ScdlResult