Text

This chapter describes the Bigloo API for processing texts.

BibTeX

bibtex objbigloo text function

bibtex-port input-portbigloo text function

bibtex-file file-namebigloo text function

bibtex-string stringbigloo text function

These function parse BibTeX sources. The variable obj can either be an input-port or a string which denotes a file name. It returns a list of BibTeX entries.

The functions bibtex-port, bibtex-file, and bibtex-string are mere wrappers that invoke bibtex.

Example:
(bibtex (open-input-string "@book{ as:sicp,
  author 	= {Abelson, H. and Sussman, G.},
  title 	= {Structure and Interpretation of Computer Programs},
  year 		= 1985,
  publisher 	= {MIT Press},
  address 	= {Cambridge, Mass., USA},
}"))  (("as:sicp" BOOK 
                  (author ("Abelson" "H.") ("Sussman" "G."))
                  (title . "Structure and Interpretation of Computer Programs")
                  (year . "1985")
                  (publisher . "MIT Press")
                  (address . "Cambridge, Mass., USA")))
.keep

bibtex-parse-authors stringbigloo text function

This function parses the author field of a bibtex entry.

Example:
(bibtex-parse-authors "Abelson, H. and Sussman, G.")
 (("Abelson" "H.") ("Sussman" "G."))
.keep

Character strings

hyphenate word hyphensbigloo text function

The function hyphenate accepts as input a single word and returns as output a list of subwords. The argument hyphens is an opaque data structure obtained by calling the function load-hyphens or make-hyphens.

Example:
(hyphenate "software" (load-hyphens 'en))  ("soft" "ware")
.keep

load-hyphens objbigloo text function

Loads an hyphens table and returns a data structure suitable for hyphenate. The variable obj can either be a file name containing an hyphens table or a symbol denoting a pre-defined hyphens table. Currently, Bigloo supports two tables: en for an English table and fr for a French table. The procedure load-hyphens invokes make-hyphens to build the hyphens table.
.keep
Example:
(define (hyphenate-text text lang)
   (let ((table (with-handler 
                   (lambda (e)               
                      (unless (&io-file-not-found-error? e)
                         (raise e)))
                   (load-hyphens lang)))
         (words (string-split text " ")))
      (if table 
          (append-map (lambda (w) (hyphenate w table)) words)
           words)))
The procedure hyphenate-text hyphenates the words of the text according to the rules for the language denoted by its code lang if there is a file lang-hyphens.sch. If there is no such file, the text remains un-hyphenated.

make-hyphens [:language] [:exceptions] [:patterns]bigloo text function

Creates an hyphens table out of the arguments exceptions and patterns.

The implementation of the table of hyphens created by make-hyphens follows closely Frank Liang's algorithm as published in his doctoral dissertation Word Hy-phen-a-tion By Com-pu-ter available on the TeX Users Group site here: http://www.tug.org/docs/liang/. This table is a trie (see http://en.wikipedia.org/wiki/Trie for a definition and an explanation).

Most of this implementation is borrowed from Phil Bewig's work available here: http://sites.google.com/site/schemephil/, along with his paper describing the program from which the Bigloo implementation is largely borrowed.

exceptions must be a non-empty list of explicitly hyphenated words.

Explicitly hyphenated words are like the following: "as-so-ciate", "as-so-ciates", "dec-li-na-tion", where the hyphens indicate the places where hyphenation is allowed. The words in exceptions are used to generate hyphenation patterns, which are added to patterns (see next paragraph).

patterns must be a non-empty list of hyphenation patterns.

Hyphenation patterns are strings of the form ".anti5s", where a period denotes the beginning or the end of a word, an odd number denotes a place where hyphenation is allowed, and an even number a place where hyphenation is forbidden. This notation is part of Frank Liang's algorithm created for Donald Knuth's TeX typographic system.

.keep

Character encodings

gb2312->ucs2 stringbigloo text function

Converts a GB2312 (aka cp936) encoded 8bits string into an UCS2 string.
.keep