This paper presents Fair Threads, a new model for concurrent programming. This multi-threading model combines preemptive and cooperative scheduling. User threads execute according to a cooperative strategy. Service threads execute according to a preemptive strategy. User threads may ask services from service threads in order to improve performance by exploiting hardware parallelism and in order to execute non-blocking operations. Fair threads are experimented within the context of the functional programming language Scheme. This paper also presents the integration in this language. That is, it presents a semantics for Scheme augmented with Fair Threads and the main characteristics of the implementation. |
When shared memory is involved, the two most common ways to implement concurrency are event-driven architectures and multi-threaded architectures. From a programming point of view, we think that none is fully satisfactory. Event-driven architectures enable relatively simple and efficient implementations but hardly support extensibility and modularity [24]. Threads à la Posix [23] or à la Java [12] support extensibility but they are a nightmare to program and to maintain [5,15].
In this introduction, we start detailing the weaknesses of the two models from a programming language perspective. Then, we informally present our new model, the Fair Threads.
In the event-driven programming style, there is one execution
stream which is, generally, made of a loop waiting for events. When such
events are raised, the loop invokes registered handlers. The handlers
may or may not spawn new threads or new processes.
Event-driven programming does not permit parallelism by itself
(that is, true CPU concurrency). It does not support preemption, so if
an event handler falls into an infinite loop the whole program gets
stuck. Long-lasting handler executions are problems too because they
make applications non-responsive. In order to ensure responsiveness,
one has to break up event-handlers. This makes event-loops difficult to
program because most programming languages do not provide convenient means
for saving and restoring states of computations [1].
Scheme and its call-with-current-continuation
function [17] is one exception.
In a famous provocative presentation [24], John Ousterhout, the author of Tcl/Tk, compares event-driven programming and multi-threading programming. He strongly argues in favor of the former, mostly because preemptive threads are too difficult to program. We agree on this last point but we also think that lack of modularity of event loops is a major concern.
The term "thread" is a rather loose fitting description. It generally denotes concurrency with shared memory. Threads are also known as lightweight processes. Threads execute according to a scheduling strategy which can be either preemptive or cooperative.
A preemptive scheduler may suspend a running thread at any time to allocate the processor to another thread. It may then resume the suspended thread at any moment. Preemptive threads are generally considered to be well adapted for systems made of heavy-computing tasks needing few communications and synchronizations. This can be the case of Web servers, in which a thread (usually picked out from a pool of available threads) is associated to each request. In such contexts, advantages of preemptive threads are clear:
Several difficulties are however related to programming with preemptive threads:
A cooperative scheduler does not suspend threads. It waits for threads to cooperate. That is, it waits for threads to yield the processor before allocating it to others threads. For systems made of tasks needing strong synchronizations or a lot of communications, cooperative threads (sometimes called green-threads) may seem more suited than preemptive threads. Indeed, data protection is no longer needed with cooperative threads, and one can avoid the use of locks, leading to efficient implementations. Cooperative threads give the user a better control on the way threads are scheduled. Cooperative systems do not rely on specificities of the scheduling strategy and are thus easier to program and to port. They are also easier to debug.
However, cooperative threads have several well-known drawbacks:
From previous discussions, it seems rather clear that multi-threading is difficult to program. Amongst the important questions with non trivial answers are: when and where are threads to be preferred to standard sequential code, how to get portable code, and how to be able to reuse sequential code in a multi-threaded context?
Both preemptive and cooperative strategies have their pros and cons, and neither one is valid in all cases. Cooperative threads are sometimes felt as an "old" solution. This is mainly because old operating systems (such as Windows 3.1 or MacOS 9) relying on this technique are now replaced by preemptive ones. It seems however clear that preemptive strategies are not universally valid, and that cooperative frameworks are better suited for certain contexts, when for example portability is a major concern. In contexts where precise semantics is mandatory, cooperative strategies can also be useful.
The Fair Threads model proposes a thread-based concurrent programming framework which mainly differs from other frameworks by its scheduling strategy:
In Section Fair Threads we present the Fair Threads programming model. Then, in Section Scheme Fair Threads, we focus on the integration of these threads in the Scheme programming language. We present the Scheme API (Section API). We show how to program a Web server with support for concurrent request handling with threads (Section A Web Server with Fair Threads). We present the semantics of Scheme augmented with threads (Section Semantics of Scheme with Fair Threads). In Section Implementation, we present the main characteristics of the current implementation. In Section Related Work we present related work. Finally, in Section Weaknesses of Fair Threads and Future work we show some future directions for our work.
Fair Threads is a new framework that combines user threads and service threads. Intuitively, a user thread is a cooperative thread which can synchronize with other threads (either user threads or service threads). It can communicate with other threads by means of user-defined signals. It may start new user threads and new service threads. On the opposite, a service thread is a preemptive thread which can only communicate with user threads, by means of signals.
The scheduler defines instants during which all user threads run up to their next cooperation point. User threads must behave fairly and cooperate with the other threads by returning the control to the scheduler. A user thread cooperates by calling the yield function or by waiting for signals. Signals are non-persistent user-defined data which are reset at the beginning of each instant. The scheduler broadcasts signals to all user threads which thus see the presence and the absence of signals in exactly the same way. Moreover, values associated with signals are also broadcast.
Signals are used when one wants one or more threads to wait for a condition, without polling a variable to determine when the condition is fulfilled. Broadcast is a means to get modularity, as the thread that generates a signal has nothing to know about its potential receivers. Fairness in signal processing means that all threads waiting for a signal always receive it during the same instant: the instant in which the signal is generated. Thus, a user thread leaving control on a cooperation point does not risk to lose a signal generated later during the same instant.
User thread fairness is ensured by the fair scheduling strategy that relies on instants. An instant consists in an iteration of user thread executions: all user threads are executed until they are blocked. A user thread can block either because it explicitly cooperates (by means of a yield operation) or because it is waiting for a signal that is not present. That is, a signal which has not been broadcast during the instant. Note that during one instant, one user thread may be elected several times by the scheduler because it may be waiting successively for different broadcast signals. Consider the following three user threads:
A B C
---------------- ----------------- -----------------
1: await sig1 1: broadcast sig1 1: await sig1
2: await sig2 2: yield 2: broadcast sig2
3: yield 3: broadcast sig3 3: await sig3
4: await sig1
Let us detail the scheduling of these threads. For each instant, we detail exactly how and when each thread executes:
sig1
line 1, then it cooperates line 2.
That is, B has completed its execution
for the instant. It is explicitly blocked.
Thread C does not block on
signal sig1
because it has been
broadcast (by B) during the
instant. It broadcasts sig2
and
it blocks, waiting for signal sig3
.
At this point all threads have executed
but it exists a thread (A)
that is blocked on a signal which is
present. The scheduler re-elects A
in the instant. This thread does not block
on signal sig2
because it is
present (broadcast by C). Then,
it explicitly cooperates line 3. At this
very point all threads are blocked and no new
signal has been broadcast. This marks the
end of instant 1. Signals are
reset.sig1
line 4
which has not been broadcast during the
instant. (Remember that at the beginning of
an instant, all signals are reset.)
Thread B broadcasts signal
sig3
then terminates its
execution.
Thread C awakes for sig3
line 3 and it terminates. At this point
all threads are blocked. The instant
2 ends.sig1
.Instants are at the heart of fair scheduling. First, they ensure that each user thread sees the same signals. Second, instant boundaries enable determinism for user thread manipulations. For instance, user threads are started, terminated, suspended, and resumed in between two instants. This framework enables an easy implementation for the problem, known to be difficult, of two threads killing each other:
D E
----------------- -----------------
1: terminate E 1: terminate D
Because user threads are actually terminated at the end of instants, in the previous example, the two user threads, will be terminated by the scheduler at the end of instant 1.
Fair Threads support concurrent operations such as I/Os, character parsing, process creation, etc. These operations are implemented by means of service threads that are not under control of the Fair Scheduler, but only of the operating system. They execute concurrently (in parallel if the actual hardware supports it) without blocking running user threads. Service threads are the only means for an application using Fair Threads to benefit from hardware parallelism.
Service threads and user threads only interact using signals. From the programmer point of view, running a service thread is equivalent to waiting for a signal: the user thread requiring a concurrent service from a service thread waits for its completion as it would wait for any signal. When a service thread is in charge of a service, it computes, in parallel with other user and service threads, whatever is needed to satisfy the request. Then, it broadcasts the result of its computation to the user threads.
The purpose of the dichotomy of user and service threads is to support simple semantics and simple implementation while offering a general framework for supporting non-blocking operations and taking advantage of parallelism. In the semantics, this dichotomy confines the non-determinism to the creation of service threads. In the implementation, it isolates, in the source code of service threads, the few places where locks and mutual exclusions are needed.
In this section, we explain how Fair threads have been integrated in the Scheme programming language [17]. They have been implemented in the Bigloo system [32]. In Section API we show what Scheme programs using Fair threads look like. In Section An example of producers/consumers we present the classical example of producers/consumers with Fair threads. In Section A Web Server with Fair Threads, we present a more realistic example: a Web sever. Finally, in Section Semantics of Scheme with Fair Threads we present the semantics of Scheme augmented with Fair threads.
In this section we present a subset of the Fair Threads API. The whole documentation can be found in the Bigloo user manual.
The "Scheme Request For Implementation" (SRFI in short) effort has provided the community with a document about threads: the SRFI-18 by M. Feeley [11]. This documentation describes a thread API inspired by the Posix-1 API [23] and the Java API [12]. Scheme Fair Threads supports the whole SRFI-18 API. Here are the major constructions:
make-thread
function that accepts
a thunk (i.e., "a procedure without argument"). As in Java,
a thread is not started as soon as it
is created. It is started by calling the
thread-start!
function.
In our API, this
function accepts an optional parameter which is the
scheduler below which the thread must run. This argument
is optional because a global scheduler pre-exists to all
threads. As an example here is the creation of
two threads:(let ((th1 (make-thread (lambda () (print 'foo)))) (th2 (make-thread (lambda () (print 'bar))))) (thread-start! th1) (thread-start! th2))
thread-yield!
function. One
thread can terminate another thread with the thread-terminate!
function.
All threads execute inside a scheduler. Schedulers are first class values. A program may construct one or several schedulers. All schedulers share the same scheduling policy (see Section User threads) but using different schedulers may be useful to control groups of threads. Schedulers may be recursively embedded. Typically one application using threads executes as follows:
Here is an example of a simple application that executes three instants:
(let ((s (make-scheduler)) (th1 (thread-start! (make-thread (lambda () (let loop () (print 'thread1) (thread-yield!) (loop)))) s)) (th2 (thread-start! (make-thread (lambda () (print 'thread2))) s))) (scheduler-start! s 3))
Basically a user thread waits for a signal by means of the thread-await!
function. It may also wait for one of several signals
with thread-await*!
These functions
block the thread until a waited signal is broadcast. They return the value
associated with the signal. A thread broadcasts a signal with
broadcast!
which immediately sends the signal to
all the threads running in the scheduler.
(let ((th1 (make-thread (lambda () (broadcast! 'today (date))))) (th2 (make-thread (lambda () (display (thread-await! 'today)))))) (thread-start! th1) (thread-start! th2))
When a thread blocks on a signal, it gets unblock as soon as the first
occurrence of the signal is broadcast. However, some applications require
a thread to react to several broadcasts of the same
signal. This is possible with callbacks introduced by the
thread-get-values
form. As thread-yield
,
it explicitly yields the processor but, it also returns, at the next
instant, all the values associated with the broadcast signal.
That is, if the signal has not been broadcast
thread-get-values
returns an empty list, otherwise it
returns the list of all values. Here is an
example of a thread registering a callback on the signal
click
:
(make-thread (lambda () (for-each print (thread-get-values 'click))))
At the next instant, the print
function will be invoked,
as many times as the signal click
has been
broadcast in the previous instant.
Service threads are created by user threads via library functions. Each of these functions creates a signal that can be awaited by any user thread. Then it runs a service thread that executes the required service and, on completion, broadcasts the signal. Here are the service threads currently supported:
(make-service-signal (lambda (s) ....))
s
at any time. This service thread
constructor enables users to run, in parallel when the
hardware supports it, preemptive concurrent threads.(make-output-signal p s)
s
has been
written to the output port p
.(make-input-signal p n)
n
characters have been read on the port p
.
The associated value is the string of read characters.(make-send-chars-signal ip op)
ip
to the output port
op
. Unsurprisingly
this service thread is extremely useful for implementing file
server (see Section A Web Server with Fair Threads). Actually, it is a
mapping of the Linux
sendfile
[16] facility
into the Fair Threads API.(make-rgc-signal p g)
g
on the
input port p
. This relies on the Bigloo
regular-grammar
facility
[32].(make-accept-signal s)
s
.
The associated value is a client socket.(make-process-signal p)
p
completes. The value associated with the
signal is the return code of the process.(make-timer-signal date)
date
.Section A Web Server with Fair Threads contains various examples of concurrent operations coded with the previous constructions. It could be envisioned to add new service threads to this list. For instance, good candidates could be service threads supporting graphical user interfaces and foreign function calls.
We consider the traditional example of producers and consumers
sharing a global buffer. This example is based on a wait/notify mechanism which can be implemented with thread-await!
, broadcast!
, and a waiting queue.
(define (wait sig) (let ((self (current-thread))) (queue-push! sig self) (thread-await! self))) (define (notify sig) (if (not (queue-empty? sig)) (broadcast! (queue-pop! sig))))
It is important to note that any Scheme value can be used to
denote signals. This feature enables some threads to share private signals.
In the implementation above, we have used
the thread descriptors returned by current-thread
as
signal identifiers. This trick constitutes a simple solution to the
problem of associating a unique signal identifier to each thread.
The shared resource is accessed via the functions get
and put
. According to standard multi-threading
programming [5], when a thread is notified in
the function get
, it still has to check the availability of
the resource.
(define (put val) (buffer-put! val) (notify 'rsc-available)) (define (get) (if (buffer-empty?) (begin (wait 'rsc-available) (thread-yield!) (get)) (buffer-fetch)))
(define (buffer-fetch) (let ((r (car *buffer*))) (set! *buffer* (cdr *buffer*)) r)) (define (buffer-put! val) (if (null? *buffer*) (set! *buffer* (list val)) (set-cdr! (last-pair *buffer*) (list val))))
One may notice that no lock is needed to protect the shared buffer. Producers and consumers are implemented by the means of simple loops. Any number of producers and consumers can run concurrently.
(define (make-producer count) (make-thread (lambda () (let loop ((n count)) (put n) (thread-yield!) (loop (+ 1 n)))))) (define (make-consumer) (make-thread (lambda () (let loop () (print "Get: " (get)) (thread-yield!) (loop)))))
After a notification, only one thread consumes the resource. No competition is needed between threads.
We present in
this section the implementation of a full-featured Web server
supporting handling of multiple concurrent requests and Scripting
à la CGI.
We do not consider efficiency issues for this example.
In particular, we do not use any pool for managing the threads. Note that,
even if this server does not use cache for files,
thanks to the sendfile
facility
(available through make-send-chars-signal
in Fair Threads) it
is still reasonably performant. With this
server a user thread waits on a socket for a connection. When such a
connection is established, it starts a new user thread that handles the
request. All I/O operations are performed concurrently by service threads.
Thus, they do not block other running threads handling other client requests,
and the server may benefit from parallel architectures such as an SMP.
The function http-up
starts the Web server. It opens a server connection
(line 2).
Then, it creates and runs a thread for the
server and it starts the scheduler by the means of the schedule-start!
(line 4).
The server runs until the completion of all threads.
1: (define (http-up) 2: (let ((s (make-socket-server))) 3: (thread-start! (make-thread (make-http-server s))) 4: (scheduler-start!) 5: (socket-shutdown s)))
The Web server waits for client requests. This is achieved with the expression line 9 that forces the thread running the Web server to be blocked until a connection is established. While the HTTP server is waiting for a connection, the other threads in the scheduler run.
6: (define (make-http-server s) 7: (define (serv) 8: ;; wait cooperatively for a connection 9: (let ((c (thread-await! (make-accept-signal s)))) 10: ;; create and run a new thread for handling 11: ;; the client request 12: (thread-start! 13: (make-thread (lambda () (http-eval c)))) 14: (serv))) 15: serv)
The first action upon client connection, is to parse the request. For this, the user thread in charge of the HTTP request spawns a service thread (line 19):
17: (define (http-eval s) 18: (let ((l (thread-await! 19: (make-rgc-signal (socket-input s) G)))) 20: (cond 21: ((string? l) 22: (http-get-request s l)) 23: (else 24: (http-reply s "Unknown request"))) 25: (socket-shutdown s #f)))
It blocks until the HTTP GET
line is read and parsed using regular expressions:
27: (define G 28: (regular-grammar () 29: ((: "GET " (submatch (+ (out " \r\n")))) 30: (the-submatch 1)) 31: (else 32: #f)))
This Web server handles regular HTML files and also Scheme files. We will explain how these later are handled in Section Handling Scheme scripts. First, let us focus on regular HTML files.
33: (define (http-get-request s file) 34: (let ((p (case (string->symbol (suffix file)) 35: ((scm) 36: (http-get-scm file)) 37: (else 38: (http-get-html file))))) 39: (http-reply s p))) 40: (define (http-get-html f) 41: (if (file-exists? f) 42: (open-input-file f) 43: "Can't find file"))
Replying to a request, takes two parameters:
(i) a socket onto which the characters composing the answer
are written, (ii) data from which the characters are extracted. In this
toy implementation the data is either a string or an input
port. Replying to a request spawns successively
two service threads. A first one emits the header
(line 53). A second one transfers the
characters (line 56).
Because of these service threads, http-reply
does not block other threads running in the scheduler
and thus the server supports concurrent requests.
43: (define (http-reply s d) 44: (let ((op (socket-output s)) 45: (ip (cond 46: ((string? d) 47: (open-input-string d)) 48: ((input-port? d) 49: d) 50: (else 51: (open-input-string "internal error"))))) 52: (thread-await! 53: (make-output-signal op "HTTP/1.1 200 Ok\r 54: Server: test_httpd 1.0\r 55: \r\n\r\n")) 56: (thread-await! (make-send-chars-signal ip op))))
We have provided our Web server with the ability to run scripts à la
cgi-bin
. More precisely, when the file to be served is
a Scheme file, instead of providing the characters composing the file,
the server evaluates the file with the Bigloo interpreter and answers
the result of this evaluation to the client. The implementation relies
on the Bigloo run-process
facility that runs asynchronous external
processes. The output port associated with the process is sent to
http-reply
.
58: (define (http-get-scm f) 59: (if (file-exists? f) 60: (process-output-port 61: (run-process "bigloo" "-i" "-s" 62: f output: pipe:)) 63: "Can't find file"))
In about sixty lines of Scheme code we have implemented a
multi-threaded, multi-process Web server. Each request is handled
separately by a specific thread. Since, parsing the requests and
writing outputs are implemented using service threads, the server can
scale up to an SMP architecture. One should note that for such a typical
application, cooperative scheduling has not introduced any additional
complexity in the source code: there is no need to explicitly call the thread-yield!
function because all cooperations are naturally
implemented by the I/O operations.
In this section we present the semantics of Scheme augmented with Fair Threads.
Sequential and concurrent aspects of programming languages are so
intricately related that considering sequentiality and concurrency apart
could yield to major
flaws in the overall language semantics and design. This is a lesson we
have learned from Java for which it has been demonstrated that the memory
model is incompatible with the thread
model [26,21]. So, we have
decided to express the semantics of Scheme and the semantics of the threads
in a single formalism.
We use denotational semantics because this is the traditional
formalism for Scheme. In Section Syntax we
give the language for which we present the semantics. The language is
slightly different from the one we have been previously studying in order
to ease the writing of the semantics. In Section Semantics of Core Scheme we present, as a reminder, the semantics of core
Scheme. Then, in Section Semantics of user threads
we present the semantics of thread operations such as thread-yield!
or thread-await!
. In Section The Scheduler we present the semantics of the scheduler.
For the sake of simplicity, we do not consider here side effects. Side effects introduce no difficulties so they could be easily added to the semantics. However, getting rid of side effects suppresses one parameter of the semantics (the store). Thus, we consider the following subset of Scheme:
<continuation> -> <expression> <expression> -> <variable> | <literal> | (begin <expression> <expression>) | (if <expression> <expression> <expression>) | (lambda (<variable>) <expression>) | (<expression> <expression>)
The syntax for creating and handling threads is similar to the one
presented in Section API except that we do not
present the semantics for thread-get-values
. We omit
this last form because in term of control, it behaves such as
thread-yield!
. However, it would make the representation of the
signal environment more complex without introducing any
specificity. At last, we present a variant of scheduler reaction. In
order to avoid using an abstract data-type for representing schedulers
we do not present the couple thread-start!
and scheduler-start!
. Instead we present the semantics of the form
(schedule exp)
which can be expanded into the former ones as
follows:
(define-syntax schedule (syntax-rules () ((schedule exp) (let ((s (make-scheduler))) (thread-start! (make-thread (lambda () exp)) s) (scheduler-start! s)))))
So, the syntax we are considering is:
<expression> -> ... | (make-thread <expression>) | (thread-terminate! <expression>) | (thread-yield!) | (thread-await! <expression>) | (broadcast! <expression> <expression>) | (schedule <expression>)
A thread can be in four states: (i) It can be blocked because it cooperates. (ii) It can be ready to run, that is, it can be elected by the scheduler. (iii) It can be ended, that is, its body has executed until completion. (iv) It can be waiting for a signal. We denote the state of threads with the following pseudo-syntax:
<thread> -> (block <continuation>) | (ready <continuation>) | (end <continuation> <expression>) | (wait <continuation> <expression>)
The function E evaluates an expression in a lexical environment (ρ), a signals environment (σ) and a continuation (κ). It returns a Scheme value. When a thread returns to the scheduler, it returns a ThdResult which is a triple made of a thread, a signal environment, and a list of new threads to be started at the next instant. Remember that with the definition of signal environment, any Scheme value may be used to denote signal. The domains are presented in Figure 1.
π | Command | ||
τ | Thread | ||
ν | Identifier | ||
T | list of Threads | ||
ε ∈ | Value | = | Fun + Integer + Cons + ... |
φ ∈ | Fun | = | Value × Cont → Value |
κ ∈ | Cont | = | Value × Signals → Value |
ω ∈ | ThdResult | = | Thread × Signals × Thread* |
Ω ∈ | ScdlResult | = | Thread* × Signals × Thread* |
ρ ∈ | Env | = | Ident → Value |
σ ∈ | Signals | = | Value → Value + ? |
E | : | Form → Env × Signals × Cont → Value | |
S | : | Thread → Signals × Thread* → ScdlResult |
We use a semantics of Scheme that resembles the one shown in Queinnec's book [27]. It is presented in Figure 2.
E[ν]ρσκ =
(κ (ρ ν) σ)
E[(
if
π π1 π2)
]ρσκ =
(E[π] ρ σ λεσ1.(boolify ε) → (E[π1] ρ σ1 κ), (E[π2] ρ σ1 κ))
E[(
begin
π1 π2)
]ρσκ =
(E[π1] ρ σ λεσ2.(E[π2] ρ σ2 κ))
E[(
lambda
(
ν)
π)
]ρσκ =
(κ inValue(λεσ2κ2.(E[π] ρ[ν/ε] σ2 κ2)) σ)
E[(
π1 π2)
]ρσκ =
(E[π1] ρ σ λφσ1.(E[π2] ρ σ1 λεσ2.(φ|Function ε σ2 κ)))
call/cc =
inValue(λφσκ.(φ|Function inValue(λεσ2κ2.(κ ε σ2)) σ κ))
We start presenting, in Figure 3, the user level of threads, that is, the semantics of the forms composing the API.
E[(
broadcast!
π1 π2)
]ρσκ =
(E[π1] ρ σ λε1σ1.(E[π2] ρ σ1 λε2σ2.(κ ε2 σ2[ε1/ε2])))
E[(
thread-yield!
)
]ρσκ =
〈(block
κ), σ, ∅〉
E[(
thread-await!
π)
]ρσκ =
(E[π] ρ σ λεσ1.(σ1 ε)≠? → (κ (σ1 ε) σ1), 〈(wait
κ ε), σ1, ∅〉)
E[(
thread-terminate!
π)
]ρσκ =
(E[π] ρ σ λε1σ1.〈(end
κ ε1), σ1, ∅〉)
E[(
make-thread
π)
]ρσκ =
〈(ready
κ), σ, list((block
λεσ.(E[π] ρ σ end-kont
)))〉
E[(
schedule
π)
]ρσκ =
(κ schedule(list((ready
λεσ.(E[π] ρ σ end-kont
))), σ))
Here is a brief explanation for each entry:
broadcast!
form evaluates
its arguments (the signal identifier to be broadcast and the value
associated with the signal) then it invokes its continuation.thread-yield!
switches back to the scheduler. That is, it
constructs a triple made of the current thread state which is now blocked
for the instant, the signals environment (σ) and the set of
new threads to be added. This set is always empty except for the
make-thread
form that creates new threads. The thread state of a
blocked thread is made of the block
mark and its current
continuation.thread-await!
first evaluates the identifier denoting the awaited signal. Then, it inspects
the signal environment (σ). If the signal is present (absence
is denoted by the question mark "?") the continuation is immediately
invoked with the value of the signal. If the signal is not present yet,
the thread returns to the scheduler with the wait
mark denoting
the state of the thread, its current continuation, and the
signal it is waiting for.schedule
creates
a scheduler, and starts a thread. The continuation
termCont is used for each newly created thread. This continuation
simply returns to the scheduler a thread in the end
state. That is,
it notifies the scheduler that the thread cannot ever be reelected.From a semantics point of view, a service thread is an operator that broadcasts a new signal at an unpredictable time. It can be viewed as a user thread computing a value and broadcasting a signal after a pause that last a random number of instants:
(define (make-...-signal . args) (let ((sig (gensym))) (thread-start! (make-thread (lambda () (let ((v (... args))) (do ((i (random) (- i 1)) (> i 0)) (thread-yield!)) (broadcast! sig v))))) sig))
Service threads are the only places where non determinism is introduced in the semantics. So, a program using only user threads, would be fully deterministic and predictable. That is, the system guarantees a precise scheduling order for program using only user threads. Therefore, a program not using service threads could be debugged as easy as sequential code because it would support replayed executions.
As presented in Section User threads, during one instant a thread can be elected several times by the scheduler. Actually, each instant consists in a fix-point iteration. The scheduler elects all runnable user threads. A runnable user thread is a thread that is ready or a thread that is waiting for a signal which has been broadcast in the instant. The semantics of the scheduler is presented in Figure 4. The operator § stands for the lists concatenation. The functional operator it-list3 applies its first argument which is a function accepting three parameters, to an accumulated list.
schedule =
λT.let
T1 = scheduler-react(T, empty-event, ∅) in
finish?(T1) → map(λτ.thread-end-value(τ), T1), schedule(T1)
finish? =
λT.∀τ∈T, thread-end?(τ)
empty-event =
λν.?
The fix-point iteration is ensured by the function micro-schedule. This function applies the function S to each thread. S allocates the processor to threads that are runnable. That is, it invokes the continuation of the thread which is captured in the thread state. S also accumulates the threads to be started at the beginning of the next instant. The fix-point is reached when the function micro-finish? returns the boolean true. Switching from one instant to the next instant is ensured by the function next-ready. It turns blocked threads into ready threads. The fix-point iteration is presented in Figure 5. The function S is presented in Figure 6.
scheduler-react =
λTσT1.instant-finish?(T, σ) → next-ready(T)§T1, let
Ω = it-list3(S, T, σ, T1), T2 = Ω↓1, σ1 = Ω↓2, T3 = Ω↓3 in
scheduler-react(T2, σ1, T3)
instant-finish? =
λTσ.∀τ∈T, thread-ready?(τ) → false, thread-wait?(τ) → (σ thread-wait-signal(τ))=?, true
next-ready =
λT.map(λτ.thread-block?(τ) → (ready
thread-cont(τ)), τ, T)
S[(
block
κ)
]τσT =
〈τ, σ, T〉
S[(
end
κ ε)
]τσT =
〈τ, σ, T〉
S[(
ready
κ)
]τσT =
let
ω = (κ unspecified σ) in
〈ω↓1, ω↓2, T§ω↓3〉
S[(
wait
κ ε)
]τσT =
(σ ε)≠? → let
ω = (κ (σ ε) σ), τ = ω↓1, σ1 = ω↓2, T1 = ω↓3 in
〈τ, σ1, T§T1〉, 〈τ, σ, T〉
The Bigloo system supports two execution platforms [31]: (i) native executions on Posix-1 compliant operating systems, (ii) Java executions on JVM. On both systems, user and service threads are mapped into native threads. Posix threads and Java threads are so close that the implementation of Fair Threads on both systems is very similar.
Conceptually, an execution token is passed through user threads and schedulers. In order to determine which threads can receive the token, the scheduler examines all user threads. It eliminates the ones that are explicitly blocked and the ones that are waiting for a non-present signal. When a thread yields explicitly or implicitly it gives back the token to the scheduler. Service threads are implemented by means of native threads. That is, each time a service thread is spawned, the fair scheduler starts a new native thread that runs in parallel until completion and then broadcasts a signal.
The current implementation in the Bigloo system, improves the naïve implementation in several directions. First, it avoids busy waiting. Second, it minimizes the creation of native threads. Third, it reduces the number of context switches.
It is extremely important to implement signal broadcast efficiently because this operation is ubiquitous in the Fair Threads programming style. Because the model does not propose point to point communication and because broadcasting is the only communication means between user and service threads. In the actual implementation, a thread does not consume the processor resource when waiting for a signal. The scheduler avoids active waiting by placing threads into wait queues associated with signals.
Service threads are intensively used and in general
they have a very short lifetime. So, the overhead
imposed by spawning a service thread must be kept as minimal as possible.
A naïve implementation that would start
a new native thread each time a service thread is created would be
inefficient. Using a pool of threads would improve the global
performance because it would get rid of thread creation time. However, it
would probably not perform sufficiently well because it would not reduce
the context switches between user and service threads. The current
implementation deploys an ad-hoc optimization that eliminates the
thread creations and most of the thread context switches.
It relies on the observation that the general pattern for spawning a service
thread is:
(thread-await! (make-...-signal ...))
That is, the current user thread blocks until the service thread
completes and broadcasts its signal. One may take advantage of
this situation. It is possible not to allocate a native thread
for implementing the service thread but instead to reuse the user thread.
This optimization takes place inside the thread-await!
function.
When its argument is created by one of the
make-...-signal
library functions it detaches the user
thread from the scheduler to make it execute the service required. In
other words, the user thread becomes temporarily a service thread.
On completion, this thread broadcasts a signal and gets re-attached
to the scheduler.
It might be possible that some service threads have been created but not awaited (for instance, when writing some data to a port, it is frequent not to check or not to wait for the result). So, at the end of the instant, when all user threads are blocked, the scheduler starts native threads for these pending service threads.
Fair Threads context switches are expensive. A token is shared by all user threads. Allocating the CPU to a new user thread involves a lock or mutex acquisition and a notification. The less context switches, the more efficient the system.
One way to minimize context switches is to avoid, as much as possible, to switch back and forth to the scheduler. In particular, when a user thread U1 cooperates, it is useless to switch back to the scheduler which, in turn, switches forth to the next runnable thread U2. U1 can directly give the token to U2. The scheduling algorithm is thus actually split over all user threads. Each of these executes a small part of the scheduling task. In this framework, the thread running the scheduler gets the CPU only at the end of the instant. This is imposed by the optimization presented in Section Minimizing thread creations. It might happen that all user threads are currently used to execute service threads. In such a situation there must exist another thread, waiting for the first broadcast signal. This is the role of the scheduler.
Threads have been used for programming concurrency since the mid eighties and have met a large popularity in the mid nineties with the Java era. Following the path started with Ada and Modula [37,7], Java has been one of the first popular general purpose programming languages to include concurrent primitives by means of threads in the core language [12]. Others proposals exist mainly based on message passing such as Erlang [2] but we don't consider them here as they define a totally different style of programming. If many people are still thinking of threads as the "classical" way to program concurrency in the context of shared memory, nowadays, criticizing threads has become quite frequent, if not popular. Criticisms focus on two different aspects: performance and programming:
In the rest of this section, we compare the Fair Threads with other programming models. The area of programming concurrency is so vast that it cannot be fully embraced by this section. Here, we focus on a comparison with other multi-threading support in functional languages, with purely cooperative scheduling systems, and with synchronous and reactive programming.
It is well-known that threads can be expressed in functional languages
equipped with continuations [30,36,33]. In the context of Scheme, continuations
are captured by the call/cc
primitive. However, the
definition of threads using continuations is rather unsatisfactory as
call/cc
is known to be difficult to implement
efficiently [22]: the entire execution context
has to be captured each time a continuation is created. Moreover,
call/cc
does not mix well with exceptions [4,28].
To our knowledge, concurrency models in the Scheme programming language have mostly been studied through the looking glass of continuations and engines, a means for providing preemptive schedulers [9,18]. If some modern Scheme implementations support concurrency with variants of Posix-1 threads [11] none has explored alternative constructions. Most works have focused on implementation issues [19,34].
Some functional languages such as OCaml support hybrid multi-threading model made of a Posix-like semantics and a mostly-cooperative implementation. That is, these systems present a preemptive scheduler but the preemptions can only take place at specific moments in the execution (for instance, after a fixed number of allocations or using an interval timer). These systems have the advantage of relaxing the need to introduce cooperation points in the programs but still, they only support one thread running at a time. So, we think that this approach has some drawbacks:
CML [29] is another ML-based language which introduces threads as first-class objects. CML uses preemptive scheduling. In CML implementation, context switches are provoked by an interval timer provided by the operating system. CML puts the focus on synchronous communication mechanisms based on message passing. Thus, it basically differs from Fair Threads by the use of preemptive scheduling and message passing instead of broadcast signals. However, Fair Threads and CML both have a formal semantics and are mixing concurrent and functional programming in a same language.
Concurrent Haskell [25] supports preemptive
multi-threading. Preemptions can only take place during allocations.
It proposes a synchronization facility called Mvars
that enables several threads to synchronize on a single resource. This
construction could be seen as an abstraction close to Posix-1
condition variables. Concurrent Haskell separates the
semantics of the sequential core language and the concurrent layer which
is defined by an operational semantics based on the π-calculus.
Pth (GNU Portable Threads) [10] is a Posix based
library which
provides non-preemptive priority-based scheduling for multi-threading inside
event-driven applications. The thread scheduling is done in a cooperative way
which means that the threads are managed by a priority- and
event-based non-preemptive scheduler. The intention is to get better
portability and run-time performance than with preemptive
scheduling. The event facility, basically implemented with select
,
allows threads to wait until various types of events occur, including
I/Os, asynchronous signals, timers, thread and process termination,
and customized callback functions.
The State Threads (ST) is a library for writing Internet applications such
as Web servers. It offers a multi-threading API for structuring a network
application as an event-driven state machine. The State Threads
library is a derivative of the Netscape Portable Runtime library
(NSPR). ST uses a cooperative event-driven scheduling within a single
process address space (many-to-one model). ST is fully deterministic
and always guarantees that the context switch can only happen in a
well-known set of functions (at I/O points or at explicit cooperation
points).
Actually, there are two major differences between Pth and ST on one side,
and Fair Threads on the other side: first, as opposed to Fair Threads,
Pth and ST do not require any kernel support (like Posix-threads), but
cannot benefit from multiprocessor machines. Second, Pth and ST do not
put the focus on syntax and semantics, as it is done in Fair Threads which
introduces new primitives for multi-threading programming.
Fair Threads are largely based on synchronous reactive programming. They inherit the semantics of signal broadcast and instants. Fair Threads stands on the side of reactive programming. That is, contrarily to synchronous programming à la Esterel [3], it does not enable reaction to the absence of signals. This limitation has an important consequence: by construction, it avoids causality cycles (i.e., a loop checking the presence of a signal and, if this is not present, broadcasting it). Thus, reactive programming does not need a static compilation stage detecting causality cycles. So, it enables dynamic code loading.
Fair Threads is an attempt to adapt techniques developed for reactive programming [13,14] to multi-threading programming. Fair Threads can also be viewed as a simplification of reactive programming.
The Fair Threads model presented in this paper is the last achievement of a research started in 2001 by F. Boussinot [6]. The former versions and the current one differs in several aspects:
thread-get-values
).thread-await*
, and
thread-get-values*
).Fair Threads is a step in the direction of combining some advantages of cooperative scheduling (simple semantics, ease of programming) and some advantages of preemptive scheduling (exploiting hardware parallelism). It has some success (for instance, it enables realistic Web server implementation) but it also has weaknesses. We present two of them and the envisioned solutions in this section.
The main drawback of Fair Threads is the difficulty, faced by programmers, to place cooperation points in the source codes. Sometimes this task is straightforward (see the example of Section A Web Server with Fair Threads). Sometimes it is complex. CPU intensive computations stand on this side. If the program contains too few cooperation points, there is the risk that one thread monopolizes the processor. If the program contains too many cooperation points, there is a risk to impede performance. It is even likely that there is no static optimal solution. It could be that the decisions to switch from one thread to another should be taken according to dynamic criteria, only known at runtime. Fair Threads does not improve cooperative systems in this matter.
To minimize the need of explicit cooperation points, we are
considering the design of Fair Processes. We think that
most of the difficulties of multi-threading programming are basically
due to the simultaneous use of concurrency, shared
memory, and preemption. Fair Threads divides them. User
threads propose concurrency plus shared
memory. System threads propose concurrency plus preemption. Unfortunately, system threads cannot be defined in user
programs. This is why, we are considering designing Fair processes
that could satisfy this demand. Fair processes would look like Unix
processes (à la fork
) or Java isolates. Coupled with new
constructions for handling chunks of shared memory (such as mmap
), they could be used to implement independent CPU intensive
computations. The current semantics of Fair Threads has probably
sufficient room for integrating Fair processes in a way similar to
service threads.
Native threads consume resources (memory, descriptors, ...). So, in general, operating systems limit the number of threads that can be run simultaneously (e.g., 1024 for a standard Linux kernel configuration). However, some applications need more threads than the authorized limit. As stated in Section Synchronous and Reactive programming, Fair Threads programming is largely based on reactive programming. Reactive programming is particularly suitable when a lot (several thousands) of small synchronized behaviors have to be implemented. That is, reactive programming can be used when the upper bound of the number of threads is reached. In addition, reactive programs can be executed more efficiently than threads because the context switches between two reactive programs is cheap. Thus, we are considering merging the two models by extending the language so that one would be able to mix Fair Threads and reactive programs.
In this paper we have presented the Fair Threads model. It combines cooperative user threads and preemptive service threads. This approach confines non-determinism inside the latter. So, it minimizes the problems frequently encountered while programming concurrent applications. Still, it supports parallel executions for improving the global performance of applications.
[1] | Adya, A. and Howell, J. and Theimer, M. and Bolosky, J. and Douceur, J. -- Cooperative Task Management without Manual Stack Management or Event-driven Programming is Not the Opposite of Tthreaded Programming -- Proceedings of the Usenix Annual Technical Conference, Monterey, CA, USA, Jun, 2002, pp. 289--302. |
[2] | Armstrong, J. and Virding, R. and Wikström, C. and Williams, M. -- Concurrent Programming in ERLANG -- Prentice Hall, 1996. |
[3] | Berry, G. and Gonthier G. -- The Esterel Synchronous Programming Language: Design, Semantics, Implementation -- Science of Computer Programming, 19(2), 1992, pp. 87-152. |
[4] | Biagioni, E. and Cline, K. and Lee, P. and Okasaki, C. and Stone, C. -- Safe-for-Space Threads in Standard ML -- Proc. 2nd ACM SIGPLAN Workshop on Continuations, 1996. |
[5] | Birrell, A. -- An Introduction to programming with threads -- 35, DEC Systems Research Center, Palo-Alto, CA, USA, Jan, 1989. |
[6] | Boussinot, F. -- Java Fair Threads -- Research Report, RR-4139, Inria, 2001. |
[7] | Cardelli, L. and al. -- Modula-3 Report (revised) -- 52, DEC Systems Research Center, Palo-Alto, CA, USA, Nov, 1989. |
[8] | Chankhunthod, A. and Danzig, P. and Neerdaels, C., Schwartz, M., Worrell, K. -- A Hierarchical Internet Object Cache -- Proceedings of the Usenix Annual Technical Conference, 1996, pp. 153--164. |
[9] | Dybvig, K. and Hieb, R. -- Engines from Continuations -- Computer Languages, 2(14), 1989, pp. 109--123. |
[10] | Engelschall, R. -- Portable Multithreading - The Signal Stack Trick for User-Space Thread Creation -- Proceedings of the Usenix Annual Technical Conference, San Diego, CA, USA, Jun, 2000. |
[11] | Feeley, M. -- SRFI 18: Multithreading support -- 2000. |
[12] | Gosling, J. and Joy, B. and Steele, G. -- The Java Language Specification -- Addison-Wesley, 1996. |
[13] | Hazard, L. and Susini, J-F. and Boussinot, F. -- The Junior reactive kernel -- Research Report RR-3732, Inria, Jul, 1999. |
[14] | Hazard, L. and Susini, J-F. and Boussinot, F. -- Programming with Junior -- Research Report RR-4027, Inria, 2000. |
[15] | Hollub, A. -- Taming Java Threads -- Apress, 2000. |
[16] | Joubert, P. and King, R. and Neves, R. and Russinovich, M. and Tracey, J. -- High-Performance Memory-Based Web Servers: Kernel and User-Space Performance -- Usenix, 2001, pp. 175--188. |
[17] | Kelsey, R. and Clinger, W. and Rees, J. -- The Revised(5) Report on the Algorithmic Language Scheme -- Higher-Order and Symbolic Computation, 11(1), Sep, 1998. |
[18] | Kelsey, R. and Rees, J. -- A tractable Scheme implementation -- Lisp and Symbolic Computation, 7(4), 1994, pp. 315--335. |
[19] | Kranz, D. and Kesley, R. and Rees, J. and Hudak, P. and Philbin, J. and Adams, N. -- ORBIT: An Optimizing Compiler for Scheme -- Symposium on Compiler Construction, Palo Alto, California, Jun, 1986, pp. 219--233. |
[20] | Larus, J. and Parkes, M -- Using Cohort Scheduling to Enhance Server Performance -- Proceedings of the Usenix Annual Technical Conference, Monterey, CA, USA, Jun, 2002, pp. 103--114. |
[21] | Lea, D. and Pugh, B. -- Correct and Efficient Synchronization of Java Technology-based Threads -- JavaOne, 2000. |
[22] | Mateu, L. -- Efficient Implementation of Coroutines -- International Workshop on Memory Management, Lecture Notes on Computer Science, (637), Saint-Malo (France), Sep, 1992, pp. 230--247. |
[23] | Nichols, B. and Buttlar, D. and Proulx Farrell, J. -- Pthreads Programming -- O'Reilly Associates, USA, 1996. |
[24] | Ousterhout, J. -- Why Threads Are a Bad Idea (for most purposes) -- Invited talk of the USENIX Winter Technical Conference, Jan, 1996. |
[25] | Peyton Jones, S. and Gordon, A. and Finne, S. -- Concurrent Haskell -- Symposium on Principles of Programming Languages, Jan, 1996. |
[26] | Pugh, B. -- Fixing the Java Memory Model -- Java Grande, San Francisco, CA, USA, Jun, 1999. |
[27] | Queinnec, C. -- Lisp In Small Pieces -- Cambridge University Press, 1996. |
[28] | Reppy, J. -- Asynchronous Signals in Standard ML -- TR 90-1144, Cornell University, Aug, 1990. |
[29] | Reppy, J. -- Concurrent Programming in ML -- Cambridge University Press, 1999. |
[30] | Reynolds, J. -- GEDANKEN- A Simple Typeless Language Based on the Principle of Completeness and the Reference Concept -- Communications of the ACM, 13(5), May, 1970, pp. 308--319. |
[31] | Serpette, B. and Serrano, M. -- Compiling Scheme to JVM bytecode: a performance study -- 7th Int'l Conf. on Functional Programming, Pittsburgh, Pensylvanie, USA, Oct, 2002. |
[32] | Serrano, M. -- Bigloo user's manual -- 0169, INRIA-Rocquencourt, France, Dec, 1994. |
[33] | Shivers, O. -- Continuations and threads: Expressing machine concurrency directly in advanced languages -- ACM SIGPLAN Workshop on Continuations, Paris, France, Jan, 1997. |
[34] | Shivers, O. -- Atomic Heap Transactions and Fine-Grain Interrupts -- Int'l Conf. on Functional Programming, Paris, France, Sep, 1999. |
[35] | Vivek, S. and Druschel, P. and Zwaenepoel, W. -- Flash: An efficient and portable Web server -- Proceedings of the Usenix Annual Technical Conference, Monterey, CA, USA, Jun, 1999. |
[36] | Wand, M. -- Continuation-Based Multiprocessing Revisited -- Higher Order and Symbolic Computation, 12(3), 1999, pp. 285--299. |
[37] | Wirth, N. -- Programming in Modula-2 -- Texts and Monographs in Computer Science, 1993. |
(define (ev-var i) (lambda (env events pk) (pk (env i) events))) (define (ev-if pi pi1 pi2) (lambda (env events pk) ((ev pi) env events (lambda (v events1) (if (boolify v) ((ev pi1) env events1 pk) ((ev pi2) env events1 pk)))))) (define (ev-begin pi1 pi2) (lambda (env events pk) ((ev pi1) env events (lambda (v events2) ((ev pi2) env events2 pk))))) (define (ev-lambda1 i pi) (lambda (env events pk) (pk (make-closure (lambda (v events2 pk2) ((ev pi) (extend i v env) events2 pk2))) events))) (define (ev-funcall1 pi1 pi2) (lambda (env events pk) ((ev pi1) env events (lambda (f events1) ((ev pi2) env events1 (lambda (v events2) ((closure->function f) v events2 pk))))))) (define-library call/cc (make-closure (lambda (f events pk) ((closure->function f) (make-closure (lambda (v events2 pk2) (pk v events2))) events pk))))
(define (ev-emit pi1 pi2) (lambda (env events pk) ((ev pi1) env events (lambda (v1 events1) ((ev pi2) env events1 (lambda (v2 events2) (pk v2 (extend v1 v2 events2)))))))) (define (ev-yield) (lambda (env events pk) (make-result (make-thread 'block pk) events '()))) (define (ev-wait pi) (lambda (env events pk) ((ev pi) env events (lambda (v events1) (if (event? (events1 v)) (pk (events1 v) events1) (make-result (make-thread 'wait pk v) events1 '())))))) (define (ev-terminate pi) (lambda (env events pk) ((ev pi) env events (lambda (v1 events1) (make-result (make-thread 'end pk v1) events1 '()))))) (define (ev-thread pi) (lambda (env events pk) (make-result (make-thread 'ready pk) events (list (make-thread 'block (lambda (v events) ((ev pi) env events end-kont))))))) (define (ev-schedule pi) (lambda (env events pk) (pk (schedule (list (make-thread 'ready (lambda (v events) ((ev pi) env events end-kont)))) events))))
(define (doschedule-sans-deadlock T) (let ((T1 (micro-schedule T empty-event '()))) (if (finish? T1) (map (lambda (t) (thread-end-value t)) T1) (schedule T1)))) (define (finish? T) (every? (lambda (t) (thread-end? t)) T)) (define (empty-event i) bottom)
(define (domicro-schedule T events T1) (if (micro-finish? T events) (append (next-ready T) T1) (let* ((Res (it-list3 sc T events T1)) (T2 (result0 Res)) (events1 (result1 Res)) (T3 (result2 Res))) (micro-schedule T2 events1 T3)))) (define (micro-finish? T events) (every? (lambda (t) (cond ((thread-ready? t) #f) ((thread-wait? t) (not-event? (events (thread-wait-signal t)))) (else #t))) T)) (define (next-ready T) (map (lambda (t) (if (thread-block? t) (make-thread 'ready (thread-cont t)) t)) T))
(define (sc-block pk) (lambda (t events T) (make-result t events T))) (define (sc-end pk v) (lambda (t events T) (make-result t events T))) (define (sc-ready pk) (lambda (t events T) (let ((res (pk #unspecified events))) (make-result (result0 res) (result1 res) (append T (result2 res)))))) (define (sc-wait pk v) (lambda (t events T) (if (event? (events v)) (let* ((res (pk (events v) events)) (t (result0 res)) (events1 (result1 res)) (T1 (result2 res))) (make-result t events1 (append T T1))) (make-result t events T))))
This Html page has been produced by
Skribe.
Last update Wed Oct 6 10:36:22 2004.