21. Bigloo
A practical Scheme compiler
User manual for version 4.2a
September 2015 -- Text
This chapter describes the Bigloo API for processing texts.

21.1 BibTeX

bibtex objBigloo Text function
bibtex-port input-portBigloo Text function
bibtex-file file-nameBigloo Text function
bibtex-string stringBigloo Text function
These function parse BibTeX sources. The variable obj can either be an input-port or a string which denotes a file name. It returns a list of BibTeX entries.

The functions bibtex-port, bibtex-file, and bibtex-string are mere wrappers that invoke bibtex.

(bibtex (open-input-string "@book{ as:sicp,
  author 	= {Abelson, H. and Sussman, G.},
  title 	= {Structure and Interpretation of Computer Programs},
  year 		= 1985,
  publisher 	= {MIT Press},
  address 	= {Cambridge, Mass., USA},
}")) => (("as:sicp" BOOK 
                  (author ("Abelson" "H.") ("Sussman" "G."))
                  (title . "Structure and Interpretation of Computer Programs")
                  (year . "1985")
                  (publisher . "MIT Press")
                  (address . "Cambridge, Mass., USA")))

bibtex-parse-authors stringBigloo Text function
This function parses the author field of a bibtex entry.

(bibtex-parse-authors "Abelson, H. and Sussman, G.")
=> (("Abelson" "H.") ("Sussman" "G."))

21.2 Character strings

hyphenate word hyphensBigloo Text function
The function hyphenate accepts as input a single word and returns as output a list of subwords. The argument hyphens is an opaque data structure obtained by calling the function load-hyphens or make-hyphens.

(hyphenate "software" (load-hyphens 'en)) => ("soft" "ware")

load-hyphens objBigloo Text function
Loads an hyphens table and returns a data structure suitable for hyphenate. The variable obj can either be a file name containing an hyphens table or a symbol denoting a pre-defined hyphens table. Currently, Bigloo supports two tables: en for an English table and fr for a French table. The procedure load-hyphens invokes make-hyphens to build the hyphens table.

(define (hyphenate-text text lang)
   (let ((table (with-handler 
                   (lambda (e)               
                      (unless (&io-file-not-found-error? e)
                         (raise e)))
                   (load-hyphens lang)))
         (words (string-split text " ")))
      (if table 
          (append-map (lambda (w) (hyphenate w table)) words)
The procedure hyphenate-text hyphenates the words of the text according to the rules for the language denoted by its code lang if there is a file lang-hyphens.sch. If there is no such file, the text remains un-hyphenated.

make-hyphens [:language] [:exceptions] [:patterns]Bigloo Text function
Creates an hyphens table out of the arguments exceptions and patterns.

The implementation of the table of hyphens created by make-hyphens follows closely Frank Liang's algorithm as published in his doctoral dissertation Word Hy-phen-a-tion By Com-pu-ter available on the TeX Users Group site here: http://www.tug.org/docs/liang/. This table is a trie (see http://en.wikipedia.org/wiki/Trie for a definition and an explanation).

Most of this implementation is borrowed from Phil Bewig's work available here: http://sites.google.com/site/schemephil/, along with his paper describing the program from which the Bigloo implementation is largely borrowed.

exceptions must be a non-empty list of explicitly hyphenated words.

Explicitly hyphenated words are like the following: "as-so-ciate", "as-so-ciates", "dec-li-na-tion", where the hyphens indicate the places where hyphenation is allowed. The words in exceptions are used to generate hyphenation patterns, which are added to patterns (see next paragraph).

patterns must be a non-empty list of hyphenation patterns.

Hyphenation patterns are strings of the form ".anti5s", where a period denotes the beginning or the end of a word, an odd number denotes a place where hyphenation is allowed, and an even number a place where hyphenation is forbidden. This notation is part of Frank Liang's algorithm created for Donald Knuth's TeX typographic system.

21.3 Character encodings

gb2312->ucs2 stringBigloo Text function
Converts a GB2312 (aka cp936) encoded 8bits string into an UCS2 string.

This Html page has been produced by Skribe.
Last update Thu Sep 3 08:07:38 2015.