Regular parsingProgramming languages have poor reading libraries since the lexical information that can be specified is directly tied to the structure of the language. For example, in C it's hard to read a rational number because there is no type rational. Programs have been written to circumvent this problem: Lex [Lesk75], for example, is one of them. We choose to incorporate in Bigloo a set of new functions to assist in such parsing. The syntax for regular grammar (also known as regular analyser) of Bigloo 2.0 (the one described in this document) is not compatible with former Bigloo versions.
A new way of readingThere is only one way in Bigloo to read text, regular reading, which is done by the new form:
read/rp regular-grammar input-portbigloo procedureThe first argument is a regular grammar (also known as regular analyser) and the second a Scheme port. This way of reading is almost the same as the Lex's one. The reader tries to match the longest input, from the stream pointed to by input-port, with one of several regular expressions contained in regular-grammar. If many rules match, the reader takes the first one defined in the grammar. When the regular rule has been found the corresponding Scheme expression is evaluated. remark: The traditional
readScheme function is implemented as:
(define-inline (read port) (read/rp scheme-grammar port))
The syntax of the regular grammarA regular grammar is built by the means of the form
regular-grammar (binding ...) rule ...bigloo syntaxThe binding and rule are defined by the following grammar:
<binding> ⇒ (<variable> <re>) | <option> <option> ⇒ <variable> <rule> ⇒ <define> | (<cre> <s-expression> <s-expression> ...) | (Here is a description of each construction.
else<s-expression> <s-expression> ...) <define> ⇒ (define <s-expression>) <cre> ⇒ <re> | (
context<symbol> <re>) | (
when<s-expr> <re>) | (
bol<re>) | (
eol<re>) | (
bof<re>) | (
eof<re>) <re> ⇒ <variable> | <char> | <string> | (
:<re> ...) | (
or<re> ...) | (
*<re>) | (
+<re>) | (
?<re>) | (
=<integer> <re>) | (
>=<integer> <re>) | (
**<integer> <integer> <re>) | (
...<integer> <re>) | (
uncase<re>) | (
in<cset> ...) | (
out<cset> ...) | (
and<cset> <cset>) | (
but<cset> <cset>) | (
posix<string>) <variable> ⇒ <symbol> <cset> ⇒ <string> | <char> | (<string>) | (<char> <char>)
string-case string rule ...bigloo syntaxThis form dispatches on strings. it opens an input on
stringa read into it according to the regular grammar defined by the
(define (suffix string) (string-case string ((: (* all) ".") (ignore)) ((+ (out #\.)) (the-string)) (else "")))
The semantics actionsThe semantics actions are regular Scheme expressions. These expressions appear in an environment where some ``extra procedures'' are defined. These procedures are:
the-portbigloo rgc procedureReturns the input port currently in used.
the-lengthbigloo rgc procedureGet the length of the biggest matching string.
the-stringbigloo rgc procedureGet a copy of the last matching string. The function
the-stringreturns a fresh copy of the matching each time it is called. In consequence,
(let ((l1 (the-string)) (l2 (the-string))) (eq? l1 l2)) ⇒ #f
the-substring start lenbigloo rgc procedureGet a copy of a substring of the last matching string. If the len is negative, it is subtracted to the whole match length. Here is an example of a rule extracting a part of a match:
(regular-grammar () ((: #\" (* (out #\")) #\") (the-substring 1 (-fx (the-length) 1))))Which can also be written:
(regular-grammar () ((: #\" (* (out #\")) #\") (the-substring 1 -1)))
the-characterbigloo rgc procedure
the-bytebigloo rgc procedureReturns the first character of a match (respectively, the first byte).
the-byte-ref nbigloo rgc procedureReturns the n-th bytes of the matching string.
the-symbolbigloo rgc procedure
the-downcase-symbolbigloo rgc procedure
the-upcase-symbolbigloo rgc procedure
the-subsymbol start lengthbigloo rgc procedureConvert the last matching string into a symbol. The function
the-subsymbolobeys the same rules as
the-keywordbigloo rgc procedure
the-downcase-keywordbigloo rgc procedure
the-upcase-keywordbigloo rgc procedureConvert the last matching string into a keyword.
the-fixnumbigloo rgc procedureThe conversion of the last matching string to fixnum.
the-flonumbigloo rgc procedureThe conversion of the last matching string to flonum.
the-failurebigloo rgc procedureReturns the first char that the grammar can't match or the end of file object.
ignorebigloo rgc procedureIgnore the parsing, keep reading. It's better to use
(ignore)rather than an expression like
(read/rp grammar port)in semantics actions since the
(ignore)call will be done in a tail recursive way. For instance,
(let ((g (regular-grammar () (")" '()) ("(" (let* ((car (ignore)) (cdr (ignore))) (cons car cdr))) ((+ (out "()")) (the-string)))) (p (open-input-string "(foo(bar(gee)))"))) (read/rp g p)) ⇒ ("foo" ("bar" ("gee")))
rgc-context [context]bigloo rgc procedureIf no context is provide, this procedure reset the reader context state. That is the reader is in no context. With one argument,
contextset the reader in the context context. For instance,
(let ((g (regular-grammar () ((context foo "foo") (print 'foo-bis)) ("foo" (rgc-context 'foo) (print 'foo) (ignore)) (else 'done))) (p (open-input-string "foofoo"))) (read/rp g p)) ⇥ foo foo-bisNote that RGC context are preserved across different uses of
the-contextbigloo rgc procedureReturns the value of the current Rgc context.
Options and user definitionsOptions act as parameters that are transmitted to the parser on the call to
read/rp. Local defines are user functions inserted in the produced parser, at the same level as the pre-defined
ignorefunction. Here is an example of grammar using both
(define gram (regular-grammar (x y) (define (foo s) (cons* 'foo x s (ignore))) (define (bar s) (cons* 'bar y s (ignore))) ((+ #\a) (foo (the-string))) ((+ #\b) (bar (the-string))) (else '())))This grammar uses two options x and y. Hence when invokes it takes two additional values such as:
(with-input-from-string "aabb" (lambda () (read/rp gram (current-input-port) 'option-x 'option-y))) ⇒ (foo option-x aa bar option-y bb)
Examples of regular grammarThe reader who wants to find a real example should read the code of Bigloo's reader. But here are small examples
Word countThe first example presents a grammar that simulates the Unix program
(let ((*char* 0) (*word* 0) (*line* 0)) (regular-grammar () ((+ #\Newline) (set! *char* (+ *char* (the-length))) (set! *line* (+ *line* (the-length))) (ignore)) ((+ (in #\space #\tab)) (set! *char* (+ *char* (the-length))) (ignore)) ((+ (out #\newline #\space #\tab)) (set! *char* (+ *char* (the-length))) (set! *word* (+ 1 *word*)) (ignore))))
Roman numbersThe second example presents a grammar that reads Arabic and Roman number.
(let ((par-open 0)) (regular-grammar ((arabic (in ("09"))) (roman (uncase (in "ivxlcdm")))) ((+ (in #" \t\n")) (ignore)) ((+ arabic) (string->integer (the-string))) ((+ roman) (roman->arabic (the-string))) (#\( (let ((open-key par-open)) (set! par-open (+ 1 par-open)) (context 'pair) (let loop-pair ((walk (ignore))) (cond ((= open-key par-open) '()) (else (cons walk (loop-pair (ignore)))))))) (#\) (set! par-open (- par-open 1)) (if (< par-open 0) (begin (set! par-open 0) (ignore)) #f)) ((in "+-*\\") (string->symbol (the-string))) (else (let ((char (the-failure))) (if (eof-object? char) char (error "grammar-roman" "Illegal char" char))))))