Tralics, a LaTeX to XML translator; Part I

4. Translating a bibliography

4.1. Introduction

As said in [6], “citations are cross-references to bibliographical information outside the current document, such as to publications containing further information on a subject and source information about used quotations. [...] There are numerous ways to compile bibliographies and reference lists. They can be prepared manually, if necessary, but usually they are automatically generated from a database containing bibliographic information.”

There are different ways to cite an author, or a text or a specific part of a text. The easiest way (for an automated system) is to use numbers, as above; if you are reading an interactive version of this document, you can click on the number, and you will see the entry in the bibliography, at the end of the document (between the index and the table of contents). This is standard practice; recommendations for a book series say: References are cited in the text simply as numbers in square brackets, e.g. [165], do not use the abbreviations “Ref./Refs/” in the middle of a sentence. Only at the beginning of a sentence should you write “Reference [165]”. In some cases, you can see `[17, p23]´, as the result of `\cite[p23]{foo}´; this means page 23 of the reference numbered 17. A bit more sophisticated are references like `[GMS93]´ instead of `[2]´ for a book by Goosens, Mittelbach and Samarin published in 1993. Computing the key is not obvious, because, if you cite a book by, say, Goethe, Molière and Shakespeare in 1793, the key will be the same, and a post-processor has to add a suffix (typically, this is done by a couple of routines named forward.pass and reverse.pass in a bst file). Sometimes, a more explicit scheme is use, for instance `Knuth, The Art Of ..., Algorithm P´, in the text, and the full reference can be found in the bibliography. A text of R. Ridolfi can be cited as `Vita di Girolamo Savonarola, 5e éd, Florence, 1974, t. II., p. 182-183´. Note that the name of the author appears before the citation, and is not repeated inside it. In some books, citations are given as footnotes, and you can often see `ibid.´, meaning the previous cited text. These kinds of things are generally hard to fully automate. For this reason, only a simple scheme is provided by Tralics: a link to a bibliography section via a key.

The problem is essentially the following: The LaTeX source file contains a given number of citations, introduced by the \cite command or a variant. Each command defines one or more references. For each reference, a key has to be computed and typeset, an item added to the bibliography, and a link created. In LaTeX, the document has in general to be processed three times; the first run will print \citation{companion2} in the auxiliary file. This file is processed by BibTeX, that generates a bbl file containing \bibitem{companion2}. On the second run, the bibliography is typeset, and the key is constructed; if it is 6, then \bibcite{companion2}{6} will be printed in the auxiliary file. On the last run, we know, after reading the auxiliary file, that the \cite command should typeset as 6.

The mechanism in Tralics is a bit different: there is only one run. Each \cite command produces a <cit> element, plus an entry into a biblist. At the end of the document, the bibliography is constructed, with all the necessary entries; details will be given later. This gives the equivalent of a bbl file, it is translated. The result of the translation is some XML element, that will be inserted somewhere in the main XML tree. Finally, a check is made to see if all references are defined. The mechanism is much simpler than in LaTeX; this is really because, in Tralics, you can add an element or an attribute anywhere in the tree (at the start if you like) at any moment. In TeX, on the contrary, once a paragraph is typeset, you cannot modify it, and once a page is shipped out, you cannot modify the whatsits associated to it (pages numbers, in the case of \label, \ref are computed only when the page is shipped out; they are left in a \write, which is a special kind of whatsit).

There are some tentatives to design an XML format for bibliography data bases; none of them is really satisfactory. We give an example of an entry using the DocBook syntax:

  <biblioentry id="abc123" type="book">
    <title>Understanding SGML and XML Tools</title>
    <titleabbrev>SGML &amp; XML Tools</titleabbrev>
    <date YYYY-MM-DD="1998">1998</date>

This should be referenced in the text as:

<citation><biblioref linkend="abc123"/></citation>

This is the same using the TEI syntax:

  <biblFull id="abc123" rend="book">
      <title>Understanding SGML and XML Tools</title>
      <publisher>Kluwer Academic Publishers</publisher>
      <idno type="isbn">0-7923-8169-6</idno>
      <date value="1998">1998</date>

This should be referenced in the text as:

<cit><ref target="abc123"></ref></cit>

These two citations were found on the Web(note: ). The careful reader may notice that two elements are used for the citation (in the DocBook case, they are <citation> and <biblioref>, in the TEI case, they are <cit> and <ref>). Tralics uses the TEI syntax for the citation but a completely different one for the entries (the syntax is very near to BibTeX). We shall explain, in the second part of this document, Chapter 6, how to convert the Tralics DTD into the TEI DTD (at least for the bibliography). The transformation is incomplete: in BibTeX, a name has four components, and the example shows only two, surname and forName (or firstname). A non-trivial question concerns mathematics: how can we insert math formulas like H and what about special words like: “the TeXbook?”. The main reason why Tralics does not read databases written in XML is the need of an XML parser (we have written a BibTeX parser, this is more challenging).

The interaction between the main document and the bibliography is via the `cite key´ on the LaTeX level, in the XML document, this is via the Bid attribute, and for the typeset document, this is the `print key´. As an example, we shall consider a bbl file, created by Tralics, that contains

    \citation{60}{footcite:thesefabien}{bid9}{foot}{phdthesis}[Sey98] ...

This is a temporary piece of stuff, the cite key is `thesefabien´, the Bid is `bid9´, there are two choices for the print key, `60´ or `Sey98´. The XML translation is

    <citation from='foot' key='60' id='bid9' userid='footcite:thesefabien'
        type='phdthesis'>  ...

As you can see, the effective print key is `60´. We shall explain in due time all details. Let´s start with the cite key, the only quantity that the author can choose freely. For the references from the web, this key is `abc123´, this is clearly a randomly chosen value, not mnemonic at all. At the start of the chapter, we have shown a reference with key `companion2´, this is the cite key for the second version of the LaTeX companion. The cite key `thesefabien´ is for the Ph.D. thesis of F. Seyfert. There is no constraint on the cite key for LaTeX: the only important thing is that the key can be printed in the auxiliary file and read by BibTeX (some years ago, a colleague corrected \cite{Christele} to \cite{Christèle}, this gave an awful error; in current LaTeX, there seems to be no problem). On the other hand, BibTeX needs an identifier. This is a character string that does not start with a digit, and contains anything but space, tabulation, double quote, percent sign, sharp sign, backslash, comma, equals sign, braces, parentheses. For XML, there are additional constraints for an ID: it has to be unique for the whole document, and some characters like a plus sign are forbidden. In a first version, we imagined to use the `userid´: this is formed of a prefix (of the form `cite:´ or `footcite:´, thus making it unique), followed by the cite key, where forbidden characters like the plus sign were replaced by a minus sign. However, we found an example where a rather long key differed from another one only by a forbidden character. Replacement introduced a conflict. For this reason, we added the Bid: this is automatically generated, hence is clearly unique and valid. A special feature of BibTeX is that it does not create lines longer than 78 characters. It adds percent characters in a sensible position; in some cases, the choice is wrong. Here is an example:

c{D.~Bergamini, D.~Champelovier, N.~Descoubes, H.~Garavel, R.~Mateescu,

As a result, you will get an error: Undefined control sequence \RAs. Note that there are few people who use such very long cite keys. A simple idea that works most of the time: use 4 letters for the first author, three letters for the others, two letters for the year, for instance `Bara-Chy-Pom02´.

As explained, Tralics cannot use an XML database; instead it will use a bbl file (this is some LaTeX file, that will be translated by Tralics). The bbl can be part of the source document; in general it will be automatically constructed by Tralics (in the current version, BibTeX or any other external program can be used instead). This bbl file should contain, for each unsolved citation, a command that solves it (either \citation which is a Tralics command, or \bibitem which is a standard LaTeX command, see section 4.2).

One question is: can the bbl contains other items, together with these \bibitem commands? If the bibliography is very long, it can be interesting to divide it into subsections, and add a comment at the start of each section; this is easy to do, if the bbl is not produced by BibTeX, or if you edit it, and if you know how to convince LaTeX not to start the bibliography with a \bibitem. In general, we have a unique `\begin{thebibliography}´ at the start, a `\end{thebibliography}´ at the end. The effect is to produce a chapter (or a section), in general unnumbered, whose name depends on the current language. In the case of the Raweb, BibTeX produces more than one such environments. In fact, three databases are used: `foot´, `refer´ and `year´. Each of the two database files `foot´ and `refer´ produce a set of references (the `foot´ bibliography was originally typeset as footnotes, via the footcite package). The third database produces a sequence of sections, such as theses, books, articles, conferences, reports, etc.(note: ) Whenever BibTeX sees an entry with a different category than the preceding entry, it prints the \end{thebibliography} followed by a \begin{thebibliography}. Note: the modified environment takes a required argument (as usual, the longest label) and an optional argument (the name of the section title; the title itself being in the class file). As a consequence, the bbl files produced by the Raweb are incompatible with standard LaTeX classes. Since year 2001, BibTeX is not used anymore for the Raweb and the XML result contains just a sequence of references. However, each entry has a category (this depends on the from and type attributes), entries are sorted by category. The style sheets that convert the XML to HTML or XSL/Format are assumed to create these sections, one for each category (see part two of this document). A nontrivial question is then to guarantee that these two style sheets use the same splitting algorithm, and the same section titles.

The `print key´ is the value that is printed on the paper or displayed on the screen. Each <citation> has a key attribute that can be used as print key. However, an XML processor may as well ignore it, and use numbers 1, 2, 3, etc. It can even sort the entries, before assigning them a number(note: ) (see part two of this document). In some cases, Tralics computes a symbolic key of the form `Sey98´. If the post-processor sorts the entries, and if the keys are not in alphabetic order, this is a bad idea.

The `key´ of an entry is a quantity defined in the database, whose purpose is to help sorting. In most cases, it is empty, (in some cases the values are junk); this value is used only in the case where no author is given (this is standard BibTeX practice, it means that this is rather useless). The `sort key´ of an entry is the character string used for sorting (this is lost(note: ); Tralics could insert it in the resulting XML; this would allow one to merge two bibliographies). In some cases, the print key is part of the sort key. Imagine for instance a book by Samarin, Mittelbach and Goossens, written in 1993. The standard key would be GMS93. Assume however that the authors are taken in the given order, so that the key would be `SMG93´. Alphabetically, this is after `Sey98´, but if we sort by authors, Samarin comes before Seyfert.

4.2. Citing a document

In this paragraph, we shall explain the commands that can be put in the source document for inserting a citation, and the companion commands that solve the reference. When the \end{document} command is about to be translated, Tralics will have created a big list (maybe empty) called the `biblist´. Each item in the list has four slots: Reference, Rtype, Bid and Definition. Here Reference is the cite key, Rtype is a subtype (when merged, these two quantities give the `userid´; this subtype is not standard LaTeX, you can ignore it. In some cases, two items with the same Reference and different Rtype are considered unequal, in some cases they are considered equal; thus, it is a bad idea to use the same cite key with different subtypes). The Bid is the unique id of the target, of the form `bid17´, and Definition is the internal number of the target of the reference (in Tralics, each XML element has an internal number). You can say: element number 25 is the target of reference `foo´ (syntax described later). This will solve the entry: If the entry with key foo has Bid 17, the action is to mark the entry as solved, and to add id=´bid17´ to the element number 25. When the end of the document is sensed, the list of unsolved entries is computed, and a request is made for constructing a bbl. A warning or an error is signaled for missing items by this construction. This bbl is then translated. It is forbidden to add unresolved entries to the list. In BibTeX, there is cross reference mechanism: if X has a cross reference to Y, then X must become before Y; when Y is read, its fields are used to fill missing fields in X. Unless cited explicitly, Y will not appear in the bibliography.

The variable distinguish_refer_in_rabib was introduced in 2006. Since this is a long name, we shall abbreviate it to DRY. If it is true, we distinguish `year´ and `refer´, otherwise there is no distinction. By default the flag is true, you can set it on the com`mand line, or a configuration file. For the case of the Raweb, three Rtypes are defined, `foot´, `year´ and `refer´. There is one command, \footcite, to cite elements with Rtype `foot´ and a command, \cite, for anything else. We generalized this mechanism: for all commands described here, there is no difference between `year´ and an empty Rtype. If DRY is false, the `refer´ is the same as `year´. In 2006, commands \yearcite and \refercite have been introduced. If DRY is false, these two commands behave the same.

The translation of `\footcite {Knuth}´ or `\footcite [p.25] {Knuth}´ is the same as `\cite [foot] [] {Knuth}´ or `\cite [foot] [p.25] {Knuth}´. The translation of `\yearcite {Knuth}´ or `\refercite {Knuth}´ is the same as `\cite [year] [] {Knuth}´ or `\cite [refer] [] {Knuth}´. These commands have an optional argument. The \cite command has two optional arguments, a type and an optional value. If only one optional argument is given, it is the value (so that `\cite [p.25] {Knuth}´ has the same meaning as in LaTeX). The translation of `\cite [x] [y] {z}´ is the same as `\cite@one {x} {z} {y}´ (note the order of the arguments). However, if you say `\cite [p.25] {Knuth,Lamport}´, the result is the same as `\cite@one {} {Knuth} {p.25}´, followed by `\cite@one {} {Lamport} {}´, said otherwise, the second optional argument applies only to the first citation. Between two \cite@one commands (that come from the same \cite) are inserted some \citepunct tokens. This is a command that can be redefined by the user. Its expansion is a comma followed by a space.

People generally say `Text\footcite{blah}´, like `Text\footnote{blah}´, without any space, because \footcite is assumed to produce a footnote; but this is not always the case; for this reason, the command \footcitepre is evaluated before insertion of the XML element associated to the citation. The default behavior is the following: if the last object on the XML tree is a normal or non-breaking space, nothing happens; otherwise, if the object is not an opening parenthesis, a space will be added. Moreover, the \citepunct is replaced by \footcitesep, a command whose translation is comma space (the idea is that you can redefine it, so that `Text\footcite{foo,bar}´ shows as `Text\textsuperscript{13,15}´, exercise left to the reader). This is a slight difference between \footcite and \cite with `foot´ as optional argument.

The command \nocite can take one optional argument (a Rtype). The effect of \nocite{foo} is the same as \cite, regarding the biblist, but it does not modify the XML tree. If you say \nocite{*}, this inserts a special marker, meaning: the whole database should be inserted. The Rtype is ignored in this case. Note that the correct behavior should be: Rtype is ignored only if one of `year´, `refer´ or `foot´.

In order to implement the natbib package, we make the following assumptions. The primitive command is \cite@one, it takes a single reference (defined by a Reference and a Rtype), inserts when needed a new item in the biblist, and construct a Bid for the reference. The command calls \leavevmode, for the case where it appears at the start of a paragraph (Remember the recommendations given above: a paragraph should start with a word, not a reference). The result of the translation is <ref target=´bid17´/>, where `bid17´ should be replaced by the value of the Bid. This element can be non-empty (it contains a note), and is the child of <cit> element, that has some attributes. The LaTeX companion, example 12-3-5, says that \citet {LGC97} should produce `Goossens et al. (1997)´. The translation by Tralics does not contain the name nor the year, so that there should be an attribute that says how parentheses are to be inserted in the final HTML or Pdf document. Another example is \citep [see] [chap. 2] {LGC97}, this produces `(see Gossens et al., 1997, chap. 2)´. This does not really fit in our model: we can put the post-note in the <ref> element, and the pre-note as an attribute. This makes these two quantities asymmetric: the pre-note must contain only characters. Consider now example 12-3-15, \citet [cf.] [p. 55] {vLeunen:92, Knuth-CT-a}. Here the pre-note is added to each citation, the post-note to the last one (the default is to put the single note on the first element). The result is `van Leunen (cf. 92); Knuth (cf. 1986, p.55)´. What Tralics should do in such a case is unclear. The file natbib.plt defines \citeyear and \citeyearpar as follows


The idea is to call \cite, the dispatcher function, and to put locally in \cite@@type the type of the citation (year, or parenthesized year). There is also \cite@prenote for the prenote. To be precise: the translation of \cite@one {bar} {foo} {p25} is <cit rend=`bar´ type=`mtype´ prenote=`mynote´ ><ref target=`bid17´ /> p25</cit>, where `mtype´ is the value of \cite@@type, `mynote´ is the value of \cite@prenote. Arguments `foo´ and `bar´ define the reference (normally, the Rtype `foo´ is empty).

You can say \XMLsolvecite*[25][bar]{foo}. The star is optional, as well as the `25´ and the `bar´. If only one optional argument is given it is the first one. This should be the identifier of an XML element (you can use \XMLlastid, the identifier of the last created element, or \XMLcurrentid, the identifier of the current element). The current element is used if the argument is missing or empty. In any case, this gives an element, say Target. The second optional argument is the Rtype. The required argument is the cite key. The result of the command is to solve the entry defined by the Reference and the Rtype. The easy case is when the reference has not yet been cited. In this case, we can use as Bid either the id of the Target, if it exists, or a new id. In this case, an attribute pair id=`Bid´ is added to the Target. If the entry exists in the biblist, it might be already solved, and you get an error of the form Already solved foo. An attribute pair id=`Bid´ is added to the Target, unless the Target has already an id, case where an error will be signaled, for instance Cannot solve (element has an Id) foo in the case


The problem here is the following: the section element has a Uid, this is like a Bid, it can be used as target of a \label. The XML norm forbids using two ids for the same element. Maybe in a future version, this will be allowed (it suffices to implement a double indirection mechanism). However, I doubt if this is a good idea: if you say \label{foo}, then \ref{foo} will produce a <ref> element, this is identical to the <ref> that comes from the \cite. Note that the Raweb DTD says: the target of a <ref> in a <cit> should be a <citation>.

If a star is given in \XMLsolvecite, there is a little hack. If Reference/Rtype is not found in the biblist, Tralics tries to see if there is an unsolved entry with the same Reference, Rtype arbitrary. In such a case, this entry will be solved. If there is no such entry, then a new slot is added to the reference list.

Some commands may produce strange results. Consider

\setbox0 =\hbox{\XMLsolvecite{foo}} \copy0 \copy0
\setbox1 =\xbox{Box}{\XMLsolvecite{bar}} \copy1 \copy1

This constructs two empty boxes, with an id bid0 and bid1. Since the first box is unnamed, the tag will not appear in the XML tree; and no tag implies no attribute list, so that the first line is an error. On the other hand, the second box is copied twice; hence the id bid1 appears twice in the XML tree, this is also an error (the XML is well-formed, but not valid against any DTD that says that the Bid should be an ID).

You can say \bibitem[XX]{foo}, the result is the same as \par \leavevmode \XMLsolvecite* {foo}. The optional argument is ignored. Note that the \par command terminates the current paragraph, and \leavevmode starts a new paragraph (in LaTeX, \bibitem calls \item that does more or less the same thing). The important point is that this newly created <p> element is the target of the reference. If you feed Tralics with the bbl of this document, produced by LaTeX, you will see something like

<Bibliography><p id='bid0'>
David Carlisle, Michel Goossens, and Sebastian Rahtz.
De XML à PDF avec <hi rend='tt'>xmltex</hi> et Passive<TeX/>.
In <hi rend='it'>Cahiers Gutenberg</hi>, number 35-36, pages 79&ndash;114,
2000. </p>
<p id='bid1'>
Michel Goossens, Frank Mittelbach, and Alexander Samarin.
<hi rend='it'>The <LaTeX/> companion</hi>.
Addison Wesley, 1993.</p>

On the other hand, translation of the second reference is:

<citation from='year' key='GMS93' id='bid4' userid='cite:companion'
<bauteurs><bpers prenom='M.' nom='Goossens' prenomcomplet='Michel'/>
<bpers prenom='F.' nom='Mittelbach' prenomcomplet='Frank'/>
<bpers prenom='A.' nom='Samarin' prenomcomplet='Alexander'/></bauteurs>
<btitle>The <LaTeX/> companion</btitle>
<bpublisher>Addison Wesley</bpublisher>

4.3. Using Tralics instead of BibTeX

The content of the BibTeX database is a sequence of entries of the form

1 @article{example,
2   Author= "Joseph Garrigue and Didier R{\'e}my",
3   Title=   "Extending {ML} with semi-explicit higher-order polymorphism",
4   Number=  "1/2",
5   Volume=  155,
6   Year=    1999,
7   Pages=   "134-169",
8   Journal= "Journal of Functional Programming",
9   Remark=  {a random example},
10   OptMonth = jan,
11   Url=     ""}

This is a second example.

12 @PhdThesis{thesefabien,
13   author =       {Seyfert, Fabien},
14   title =        {Problèmes extrémaux dans les espaces de Hardy,
15     Application à l'identification de filtres hyperfréquences à
16     cavités couplées},
17   school =       {Ecole de Mines de Paris},
18   year =         1998
19 }

These examples are translated by BibTeX as follows

Joseph Garrigue and Didier R{\'e}my.
\newblock Extending {ML} with semi-explicit higher-order polymorphism.
\newblock {\em Journal of Functional Programming}, 155(1/2):134--169, 1999.
Fabien Seyfert.
\newblock {\em Problèmes extrémaux dans les espaces de Hardy, Application à
  l'identification de filtres hyperfréquences à cavités couplées}.
\newblock PhD thesis, Ecole de Mines de Paris, 1998.

After the @ character, there is a keyword, or an entry type. The recognized entry types are article, book, booklet, conference, coursenotes, inbook, incollection, manual, masterthesis, misc, phdthesis, techreport, unpublished, as well as mastersthesis, a synonym of masterthesis. These types are not part of the BibTeX language, but are described in any good book about LaTeX, they are the only ones recognized by Tralics. The case is irrelevant (in one example, we have `article´ in lower case, in the other, we have `PhdThesis´, mixed case). Since Tralics2.9.1, you can extend the list of known types, by putting a line like the following in the configuration file (this will define the types `hdr´ and `movie´):

bibtex_extensions = "hdr movie"

There are three keywords. The first is `comment´. If you say @comment{foo}, this makes `foo´ a comment. Since everything outside the scope of a keyword or an entry is discarded, there is no real need for a comment keyword, or a comment character. In particular, the percent sign is not a comment character inside a BibTeX file. If you insert a percent sign in a field, you have to remember that BibTeX will replace newline characters by spaces, and insert newline characters in the bbl file wherever it judges adequate. Hence, the percent character will behave, in the bbl, as a comment character with a random scope.

The second keyword is `string´. It defines a string, for instance @string{Foo=“bar”} defines the string `foo´ (the case is irrelevant) with value `bar´. In the example, there is a string after the equals sign, but any expression could be used, including one that uses macros. A macro must be defined before its use; it is always possible to redefine the macro. There are 12 predefined macros; there are jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec,. You can see a use of `jan´ on line 10. These macros are defined by every bst file, to be `janvier´, `January´ or `Januar´, depending on the language (since there is no way to tell BibTeX what the current language is, there are two solutions: either write frplain.bst, that is a copy of plain.bst, with all keywords translated into French, or use indirection: the value is \bbljan{}, a LaTeX command defined in a style file depending on the current language. In Tralics, these strings are defined at bootstrap, to be English names, and redefined when \begin{document} is seen. This gives you a chance to select the correct language. Only Frech, English and German are known languages.

The last keyword is `preamble´. If you say @preamble{“foo”}, the effect is to add the string `foo´ to the preamble. More than one preamble keyword can be given, they will be merged, in order. Standard bibliography styles print the preamble at the start of the bbl file, just before the \begin{thebibliography}. In Tralics, the string is inserted at the start of the file, but the environment is implicit. The string should not produce text, otherwise strange errors are signaled, of the form Error signaled at line 4: Non-empty buffer foo Some text may be lost., because only bibliographic entries are allowed in the bbl; you can cheat by changing the current mode via \@setmode). Instead of “foo”, a general value can be used, for instance @preamble{jan} puts January in the preamble. Note: Instead of braces, you can use parentheses to delimit the value of an entry, a string or the preamble. Inside the value, you can use braces instead of double quotes. Thus @preamble({foo}) is a valid preamble.

After an entry type comes the cite key, followed by a sequence of pairs, of the form field=value, separated by commas. The following field names are recognized: address, author, booktitle, chapter, crossref, doi, edition, editor, howpublished, institution, isbn, isrn, issn, journal, key, month, note, number, organization, pages, publisher, school, series, title, type, url, volume, year. The case is irrelevant. If a field name is given, whose value is not in the previous list, it will be ignored. In the example, line 10, we have an unused field, `OptMonth´ (some text editors propose templates where optional fields like `month´ are preceded by `opt´, and sometimes people forget to remove the prefix). In the first example, field names start with an initial capital, and there is no space on the left of the equals sign, in the second, field names are all lower case, there is a space on the left of the equals sign, and opening braces are vertically aligned (this is the template proposed by Emacs); these subtleties are ignored by Tralics.

If the configuration file contains line like

bibtex_fields = "firstpage lastpage"
bibtex_fields = "+allpages"

then three additional fields are read by Tralics, namely firstpage, lastpage, and allpages. They will be inserted in the XML tree, after other fields, but before the `note´, via a call to \cititem.

The value of a field can be a number (lines 5 and 6 in the example), or a macro name (as on line 10), or a constant in braces (line 9), or a constant in double quotes (other lines). It is possible to concatenate basic fields, for instance apr # "~1", via the use of the sharp operator. The way BibTeX handles braces, quotes and backslashes is a bit special. When BibTeX parses a value, there should be as many opening braces than closing braces; trying to put a backslash before a brace has no effect(note: ). If a string is delimited by double quotes, then braces are needed to hide double quotes. Special characters should be entered as {\´e}, never as \´{e}, but Tralics accepts é; in fact, any Unicode character is accepted, provided that you declare the proper encoding. The case of a non-ascii character is undefined. When looking for a particle in a name, Tralics must decide whether a character is upper case or not, and when sorting, the whole string is converted into lower case letters. In the case of {\´e}, the whole group is converted by BibTeX to the single letter e; Tralics leaves it unchanged; in the same fashion, é is left unchanged (it is represented internally in UTF8 as the two bytes é).

Assume that Tralics has seen @article, then an opening brace or parenthesis, followed by example. All fields up to the closing brace (or parenthesis) are read, but, if the entry is useless, no error is signaled in case of undefined macros, or duplicate fields. If an entry is useful, all fields are remembered; if it has a crossref to an entry X, then X becomes useful. Remember: each entry has a Rtype, this is in general empty; it is added as a prefix to the cite key. For instance, `thesefabien´ gives `footcite:thesefabien´. In the case of a crossreference from Y to X, we use as prefix for X the prefix of Y. An entry can be useful because the user has said \nocite{*}. There is a special hack for the Raweb: we have three types of entries, `foot´, `year´ and `refer´. We already mentioned that the types `year´ and `refer´ could be the same as the empty type. The difference is that \nocite applies only to entries from the file `year´, never to `foot´ (there is an implicit \nocite for `refer´).

An entry is useful because it is cited (by \cite or \nocite). Since BibTeX is generally case insensitive, the entry shown above is useful if you say \cite{Example}. However, for LaTeX, \cite{Foo} and \cite{fOO} are two different items, as a consequence, two references are needed. Thus, an entry named `foo´ is ambiguous. For this reason, you should always capitalize entries in a consistent way (say, use always lowercase letters), and use the same method in the LaTeX document.

After some manipulations, the entry is printed on the bbl like this (BibTeX version)

20 \citation {GR99a}{example}{article}
21 \bauteurs{\bpers\RAo J.\RAb \RAb Garrigue\RAb \RAf \bpers\RAo D.\RAb \RAb
22   R{\'e}my\RAb \RAf }
23 \cititem{btitle}{Extending {ML} with semi-explicit higher-order polymorphism}
24 \cititem{bjournal}{Journal of Functional Programming}
25 \cititem{bnumber}{1/2}
26 \cititem{bvolume}{155}
27 \cititem{byear}{1999}
28 \cititem{bpages}{134--169}
29 \url{}
30 \endcitation

or like that (Tralics version)

31 \citation{60}{footcite:thesefabien}{bid9}{foot}{phdthesis}[Sey98]
32 \bauthors{\bpers[Fabien]{F.}{}{Seyfert}{}}
33 \cititem{btitle}{Problèmes extrémaux dans les espaces de Hardy, Application
34 à l'identification de filtres hyperfréquences à cavités couplées}
35 \cititem{btype}{Ph. D. Thesis}
36 \cititem{bschool}{Ecole de Mines de Paris}
37 \cititem{byear}{1998}
38 \endcitation

There are some slight differences between these two entries. If you compare lines 20 and 31, you can see that the number of arguments of the \citation command has changed from three in the original version to six in the current version. The following were added: the type (here `foot´), the unique id (here `bid9´), the numerical print key (here `60´). The first entry was created by BibTeX, that cannot guess the Rtype of the reference nor the Tralics unique id. It could have computed the number 60, but we initially thought that only one of the two keys were useful (in the current version, the \citation command takes five arguments, plus an optional one after these.) If you compare lines 21 and 32, you can notice two differences. First, we have decided, in 2005, to add an optional argument to the \bpers command (it contains the full first name). This might be used for the Ra2005. The second difference is that it is impossible, in BibTeX, to print braces inside a name. Thus we used \RAo for an opening brace, \RAf for a closing brace and \RAb for a pair of closing and opening braces. Omitting the first line, the fields are printed in the following order:

  1. Unless the type is proceedings, the author.

  2. In the case of a book or inbook, the editor.

  3. The title.

  4. In the case of proceedings or incollection, the editor.

  5. In the case of an article, the journal, number, and volume.

  6. In the case of a book or inbook, the edition, series, number, volume, publisher, address.

  7. In the case of a booklet, the howpublished and address.

  8. In the case of incollection, the booktitle, series, number, volume, publisher, address.

  9. In the case of inproceedings or conference, the booktitle, series, number, volume, organization, publisher, editor, pages, address.

  10. In the case of a manual, the organization, edition, address.

  11. In the case of masterthesis, coursenotes, or phdthesis, the type, school, and address.

  12. In the case of a techreport, the type, number, institution, address. (For the case of masterthesis, phdthesis and techreport, the type has a default value, that depends on the language, and is initialized together with the `jan´ macro).

  13. In the case of misc, the howpublished, editor, booktitle, series, number, volume, publisher, address.

  14. In the case of proceedings, the organization, series, number, volume, publisher, address.

  15. In any case, the month, year.

  16. In the case of inbook or incollection, the chapter.

  17. In the case of inbook, incollection, article or proceedings, the pages.

  18. In any case, the doi, url, additional fields, note.

  19. In the case of an extension, all fields mentioned above are considered, in some order.

This may seem confusing (is there a standard way for formating entries?). Note that missing fields are not printed. In some case, BibTeX prints a message like “there´s a number but no series” or “can´t use both volume and number”. No such message is printed by Tralics.

Two keys are computed, the `Sey98´ or `GR99a´ in the example, and the sort key, which is something longer. In fact, handling the author or editor field produces four characters strings L 1 , L 2 , L 3 and L 4 . The L 4 string is the argument of the \bauthors or \beditors (see lines 21, 32). The L 1 string is `Sey´ or `GR´, the L 2 string contains the full name (it is like L 4 , without the full first name, and braces) and L 3 contains only the last name (not the first name).

We consider the author (the editor in the case of proceedings). This may give a triple L 1 , L 2 , L 3 , unless the field is missing. If it is missing, we consider the `key´ field. If it is not empty, then L 1 is formed of the first three characters of the field, L 2 is empty, L 3 is the field. If it is empty, we consider the editor (author, in case of proceedings). If this is empty, we consider the cite key, handle it like the `key´ above. Note: in the case of `Lo{\“i}c´ the first three characters are `Lo{\“i}´, the last two characters are `{\“i}c´. In the case of `Lo\“ic´, asking for the first three or last two characters gives the full string. The last two characters of the year are added to L 1 , so that we may obtain `GR99´. This gives the print label. The LaTeX companion says that you can use year=”{\SortNoop{86}}1991". With the rules above, the last two characters of the year are `91´. However, Tralics uses the full year, not `861991´ when it computes the sort key. In the case when Tralics processes the Raweb for, say year 2003, if a reference has type `year´, then its year field should not be missing, and should be `2003´. Otherwise an error is signaled(note: ). The sort key is computed as follows: first a prefix, then the cite label, then L 2 , then the year, then the title. All characters are converted to lower case. Note: when BibTeX converts {\´E} to lower case, the result is `e´. Converting `É´ can produce strange results. Such subtleties do not exist in Tralics (the style sheet that converts the XML to HTML sorts all entries; how can we tell it that the author used a \SortNoop command?).

Note: Tralics defines \sortnoop to gobble its argument. On the other hand, the BibTeX interpreter, when computing the title part of the sort key, in the case of {\noopsort foo} removes the command and the braces; the same is done for \SortNoop and \noopsort. In a case like title=“study of {$H^p$}, part {I}” it removes the braces (character after opening brace must be dollar or upper case letter). The reason for this is that otherwise `part II´ comes before `part I´, and this looks silly.

Because of this sort-again, we try to be clever. Said otherwise, for the Raweb, and only the Raweb, we use a prefix, formed of a letter and L 3 . The prefix 0 for an entry of Rtype `refer´, 1 for an entry of Rtype `foot´, and for entries of Rtype `year´, it is: 2 for book, booklet, proceedings, 3 for phdthesis, 4 for article, inbook, incollection, 5 for conference, inproceedings, 6 for manual, techreport, coursenotes, 7 for masterthesis, misc, unpublished. These numbers are indices into a table. Currently the order is 02345671. In a future version, this might be changed (however, the result should be compatible with the style sheets described in the second part of this report).

Let´s repeat: for the Raweb case, we have in the sort key a prefix that depends on the type and Rtype, followed by the author names, the print key, the full author names, the year, the title. In this case, the content of the bbl will be as on line 31: the first argument of \citation is not the print key, but the index of the reference in the table after sorting. On the other hand, for the non-Raweb case, the sort key starts with the print key, the bbl looks like line 20. The important point is: assume that we have two entries with the same print key, say `GR99´; we must change them to `GR99a´ and `GR99b´, this is easy to do when they are consecutive. The following piece of code comes from a standard bst file. Parsing a bst file is rather easy (maybe one day, Tralics will do it). The important point is that a postfix language is used: instead of: if a then b else c, you say: a b c if. This piece of code computes a suffix for every entry that has the same key as the previous one.

39 FUNCTION {forward.pass}
40 { last.sort.label sort.label =
41     { last.extra.num #1 + 'last.extra.num :=
42       last.extra.num$ 'extra.label :=
43     }
44     { "a"$ 'last.extra.num :=
45       "" 'extra.label :=
46       sort.label 'last.sort.label :=
47     }
48   if$
49 }

Here is the companion routine, executed in reverse order. Its purpose is to add the `a´ suffix when the next entry has a `b´ suffix. There is a piece of code, not shown here, that computes the longest label. This is sometimes nonsense (consider the `De La Cruz´ case below).

50 FUNCTION {reverse.pass}
51 { next.extra "b" =
52     { "a" 'extra.label := }
53     'skip$
54   if$
55   label extra.label * 'label :=
56   extra.label 'next.extra :=
57 }

In summary, when Tralics is used instead of BibTeX, the following happens. We have a big entry list, and a list of typed databases. From the entry list, we consider only unsolved ones. For each entry, a prefix is computed, for instance, `footcite:fabien´, by considering the Rtype, the word `cite:´ and the cite key. If the Rtype is anything else than `foot´, an empty value will be used.(note: ) When an entry with cite key `foo´ is read from a database of type `bar´, the same mechanism is applied. The type of a database is currently one of `year´, `refer´ or `foot´ (the default being `year´). We plan the extend this mechanism: more than these three types can be used; `year´ and `refer´ are sometimes the same as empty, but `refer´ has an implicit \nocite.

All entries from the database files are read, and stored if useful. For each entry X that has a crossreference to Y, missing fields in X are copied from Y. After that Y is discarded (unless cited via \cite or \nocite). An error is signaled in case some references are undefined. After that, the sort label is computed, entries are sorted, the print label is computed, and everything is printed on the bbl file. This is apics_.bbl if the jobname is `apics´. Note the underscore in the name.

This is the XML version of the reference above, as used in the Raweb2004.

58 <citation from='foot' key='60' id='bid9' userid='footcite:thesefabien'
59   type='phdthesis'>
60 <bauteurs><bpers prenom='F.' part='' nom='Seyfert' junior=''</bauteurs>
61 <btitle>Problèmes extrémaux dans les espaces de Hardy,
62    Application à l'identification de filtres hyperfréquences à cavités
63 couplées</btitle>
64 <btype>Ph. D. Thesis</btype>
65 <bschool>Ecole de Mines de Paris</bschool>
66 <byear>1998</byear>
67 </citation>

4.4. The format of a name

We shall discuss in this section how names can be used in a BibTeX file, and how Tralics constructs keys. We have already mentioned a procedure that gives `Sey´ from `Seyfert´. It is not satisfactory, but is used only in rare cases (when the year is strange, or a strange key has been used). The important point that, when we fetch the first three letters of Lo\“ic, we do not obtain neither `Lo\´ nor `Lo\”´. The mechanism explained here is more subtle. The LaTeX companion explains that, in order to get `Göd´ for the key, you should use one of the first names shown here, not the others.

 author = {A. G{\"o}del and  B. G{\"{o}}del},
 editor = {C. {G{\"{o}}del} and D. {G\"{o}del}}

The rule is that special BibTeX characters are formed by a left brace followed by a backslash. In the case C, the brace in inside another brace. In fact, if the bibliography contains the following

68 @Article{GoA,
69   author = {A. G{\"o}del      }, title="X"}
70 @Article{GoB,
71   author = {B. G{\"{o}}del    }, title="X" }
72 @Article{GoC,
73   author = {C. {G{\"{o}}del}  }, title="X" }
74 @Article{GoD,
75   author = {D. {G\"{o}del}    }, title="X" }

then the translation by Tralics2.9 looks like this. If you compare with lines 60 and 61 above, you can see that the full first name appears, empty attribute pairs part and junior are not shown.

76 <biblio>
77 <citation from='year' key='Ga' id='bid2' userid='cite:GoC' type='article'>
78 <bauteurs><bpers prenom='C.' nom='Gödel' prenomcomplet='C.'/></bauteurs>
79 <btitle>X</btitle>
80 </citation>
81 <citation from='year' key='Gb' id='bid3' userid='cite:GoD' type='article'>
82 <bauteurs><bpers prenom='D.' nom='Gödel' prenomcomplet='D.'/></bauteurs>
83 <btitle>X</btitle>
84 </citation>
85 <citation from='year' key='Göd' id='bid0' userid='cite:Goa' type='article'>
86 <bauteurs><bpers prenom='A.' nom='Gödel' prenomcomplet='A.'/></bauteurs>
87 <btitle>X</btitle>
88 </citation>
89 <citation from='year' key='Göd' id='bid1' userid='cite:GoB' type='article'>
90 <bauteurs><bpers prenom='B.' nom='Gödel' prenomcomplet='B.'/></bauteurs>
91 <btitle>X</btitle>
92 </citation></biblio>

The same file processed by BibTeX gives the following keys: {G{\"}}a, {G\"}b, G{\“o}da and G{\”{o}}db. The first two keys are invalid. The reason why suffixes a and b are added is that a special BibTeX function removes braces and funny characters when comparing keys. Such a function is not implemented in Tralics, thus labels G{\“o} and G{\”{o}}d are considered different, although their translation is the same. In Tralics, the best thing to do is use `Gödel´ as name.

Since lots of errors may be found in bibliography files, Tralics tries to be clever. First, it replaces `\c{c}´ by `ç´ and `\c{C}´ by `Ç´. It also replaces `\v {c}´ by `{\v c}´. Expressions of the form \a´e are replaced by \´e. We also replace backslash-space by a single space. Maybe other replacements of this kind will be made in a future version. For instance, we could expand all accent characters, and interpret double-hat constuct, so that `é´, `\´e´, and `^^e9´ are interpreted in the same way (the translation is the same).

After that, characters or group of characters are classified, this will make parsing easier. A sequence like `{foo}´ will be considered as a single random character; something like \´e as a single lower case letter, \´E as a single uppercase letter. The expression \´{e} will be replaced by {\´e} with a warning, \"\i will be rejected (unless inside braces) because a single character is needed after backslash-accent. Commands like \foo are also rejected. Note that an ampersand & is an error (some people try to use this instead of `and´). Character categories are: space, comma, dash, and tie (this is a ~). In a case like this,

93 @Article{cruz,
94   author = {Maria {\MakeUppercase{d}e La} Cruz},
95 title="X" }

the print key computed by BibTeX is {\MakeUppercase{d}e La}C, this typesets as `De LaC´. Such a construct is not understood by Tralics, that thinks that the last name is `Cruz´.

If more than one author is given, in the author or editor list, you should use `and´ as separator. Case is irrelevant, a space is required. For instance, the following citation contains 3 authors and others. The print key is `AAJA+´, because the last author has a double last name.

96 @Article{many,
97   author = {Joe~And and And,Joe and Joe-And And others}
98 title="X" }

The BibTeX transformation of this is

99 \bibitem[AAJA{\etalchar{+}}]{many}
100 Joe And, Joe And, Joe-And, et~al.

If the list is too long, you can use `others´ as the last name (case is important). A name has four components: von, First, Last and Junior. On line 32, you can see the value of the full first name, then the abbreviated first name, then the von part (empty) then the last name, then the junior part (empty). In Tralics, the von part is always merged with the last name. Consider somebody named Jean de la Fontaine. French rules say that the particle `de´ should be omitted, unless preceded by the first name or a word like `Monsieur´. In particular, in the dictionary, you will find him between La Follette (an American politician) and Lafontaine (a Canadian politician), not between Delacroix and Delage. More interesting is the case of Marie Joseph Gilbert Motier, marquis de La Fayette. The name of this guy is `Motier´, but he is known as `La Fayette´. Another example is William Thomson (For his work on the transatlantic cable Thomson was created Baron Kelvin of Largs in 1866. The Kelvin is the river which runs through the grounds of Glasgow University and Largs is the town on the Scottish coast where Thomson built his house.) How this guy should be cited is unclear: William Thomson or Lord Kelvin?

The simple case is when two fields are given, with a comma between. The first field is the last name, the other field is the first name. Then comes the case of three fields: last name, junior, and first name. You cannot use more than three fields, that is, you cannot give more than two commas. In the case no comma is given, we look at a `von´ part. This is something that starts at a lower case letter. For instance,

101 @Article{poussin,
102   author = {Charles Louis Xavier Joseph de la Vall{\a'e}e Poussin   },
103 title="X" }

This is what BibTeX puts in the bbl file:

104 \bibitem[dlVP]{poussin}
105 Charles Louis Xavier~Joseph de~la Vall{\a'e}e~Poussin.
106 \newblock X.

The translation by Tralics is the same, but no ties are inserted (BibTeX inserts one for the first name, the von part, the last name, see TeXbook, page 92); in my opinion, it is better to split a line between two names, rather than split a name (what hyphenation patterns should be used in a case like `Michel Goosens´, the current patterns, here english, or those found in the bibliography, thus french if we cite the French version of the LaTeX companion?). The `De La Cruz´ example shows how you can fool BibTeX. Tokens between names are recognized. For instance, consider:

107 @Article{strange,
108   author = {A-b-C and A.b.C and A~b~C and A.Bb.Cc},
109 title="X" }

This is how BibTeX interprets the names. Authors number two and four have only a last name, no von part, no first name.

110 \bibitem[bCAbCA]{strange}
111 A~b~C, A.b.C, A~b~C, and A.Bb.Cc.
112 \newblock X.

This is the translation by Tralics. You can see that, for the last author, one dot has been replaced by a space: this is done in case no other way is found to split the name, but there is an upper case letter on each side of the dot. You can also see that BibTeX inserts some characters (here ties) instead of dashes. Tralics keeps the dashes, whenever possible.

113 \citation{bCAbCB}{cite:strange}{bid3}{year}{article}
114 \bauthors{\bpers[A]{A.}{}{b-C}{}
115           \bpers[]{}{}{A.b.C}{}
116           \bpers[A]{A.}{}{b~C}{}
117           \bpers[A]{A.}{}{Bb.Cc}{}}
118 \cititem{btitle}{X}
119 \endcitation

Here is another example.

120 @Article{strange2,
121   author = {Jean-Claude XX and J.-Ch. YY and J.-{Ch.} ZZ},
122 title="X" }

This is the translation by BibTeX, in `abbrv´ mode. The format used in plain mode is {ff }{vv }{ll}{, jj}, and in abbrv mode, it is {f. }{vv }{ll}{, jj}. This is explained in any good reference about BibTeX(note: ).

123 \bibitem{strange2}
124 J.-C. XX, J.-C. YY, and J.-C. ZZ.
125 \newblock X.
126 \bibitem{poussin}
127 C.~L. X.~J. de~la Vall{\a'e}e~Poussin.
128 \newblock X.

This is the translation by Tralics. The quantity `{Ch.}´ is considered as a single character. No dot is added after it, since it is terminated by a dot.(note: )

129 \citation{XYZ}{cite:strange}{bid3}{year}{article}
130 \bauthors{\bpers[Jean-Claude]{J.-C.}{}{XX}{}
131           \bpers[J.-Ch.]{J.-C.}{}{YY}{}
132           \bpers[J.-{Ch.}]{J.-{Ch.}}{}{ZZ}{}}
133 \cititem{btitle}{X}
134 \endcitation

The print key is computed as follows: Each author gives an initial (if the name is complicated, more than one will be used, for instance Poussin gives four letters `dlVP´). If a single author is cited, and if it gives less than three letters, then the first three letters of its name are used (for instance, Seyfert gives `Sey´). If more than four authors are given, only the first three ones give an initial, there is a `+´ sign at the end. If `and others´ is given, there is also a `+´ sign.

We show here the sort key, as computed by Tralics, for some the entries shown above. Remember that these entries have no year field and that the title is X.

135 cru m. {\makeuppercase{d}e la}. cruz          x
136 g c. {g{\"{o}}del}          x
137 g d. {g\"{o}del}          x
138 g{\"o}d a. g{\"o}del          x
139 g{\"{o}}d b. g{\"{o}}del          x
140 aaja+ j. and   j. and    joe-and  etal        x
141 dlvp c. l. x. j. de la vall{\'e}e poussin          x

These are the keys, for the same entries, computed by BibTeX, using the alpha style. You can see that BibTeX uses last name and first name, whereas Tralics uses abbreviated first name then last name. The format is: {vv{ } }{ll{ }}{ ff{ }}{ jj{ }}.

142 delac    dela cruz  maria        x
143 god    godel  a        x
144 god    godel  b        x
145 g    godel  c        x
146 g    godel  d        x
147 aaja    and  joe   and  joe   joe and   et al        x
148 dlvp    de la vallee poussin  charles louis xavier joseph        x

4.5. Commands for the bbl

The Raweb DTD explains that the following items can appear inside a bibliography entry.

In almost every case, if the database file contains a field `foo´ with value `bar´, the bbl file will contain \cititem{bfoo}{bar}, and this is translated into <bfoo>bar</bfoo>. The \cititem command takes two arguments. The second argument is translated as usual. The first argument is the name of the resulting element. There is a hook: in the case where \cititem-foo is defined (this is \cititem followed by a dash followed by the name of the field), this macro is used instead of the default procedure. If the database contains a `url´ field, the result is a call to the \url command, that will produce a <xref> element. The \cititem command should be used only in a bibliography.

If the entry in the database contains a `author´ or `editor´, the \bauthors or \beditors commands will be called. These two commands must be used inside a bibliography. They take a single argument, translate it, and put the result in a <bauteurs> or <beditor> element. Note: the bibliography part of the Raweb DTD was meant to be temporary. For this reason, the names were chosen so as to replace them easily with new names (hence the prefix `b´); For some reason, `auteurs´, `editeur´ and attributes of `bpers´ have French names. Later on, we decided to modify the Tralics names, hence the `bauthors´ and `beditors´. Because `bauteurs´ had a final s, we added an s to both command names; not the best choice.

The \bpers command takes one optional argument, and 4 required arguments. The translation is an empty <bpers> element with following attributes: prenomcomplet for the optional argument, and prenom, part, nom, junior for the required arguments.

The \citation command constructs a <citation> element. It takes 5 required arguments, and an optional argument. The optional argument is ignored. Other arguments are converted to attributes. The whole text, up to \endcitation is translated in bibliography mode, and added to the <citation> element. Example:

149  \citation{a}{b}{c}{d}{e}
150   \cititem{foo}{bar}
151   \beditors{\bpers[a]{b}{c}{d}{e} \bpers[]{B}{}{C}{} \cititem{etal}{}}
152  \endcitation

The translation is

153 <citation from='d' key='a' id='c' userid='b' type='e'>
154   <foo>bar</foo>
155   <bediteur>
156     <bpers prenom='b' part='c' nom='d' junior='e' prenomcomplet='a'/>
157     <bpers prenom='B' nom='C'/>
158     <etal/>
159   </bediteur>
160 </citation>

4.6. Other commands

The \bibliography command takes one argument, this is a comma separated list of database files. Spaces are ignored. The command can be given more than once. This command (the last occurrence) defines the position where the bibliography should be inserted.

The command \insertbibliohere can be used to force the position of the bibliography. It overwrites the location specified by the previous command.

The environment `thebibliography´ can be used for typesetting the bibliography. There is an optional argument (ignored), a required argument (ignored), an optional argument (ignored). The result is an XML element whose name is defined by \refname, by default `Bibliography´, and whose content is formed of the translation of the environment. You can redefine this \refname command. An error is signaled if strange commands appear in the argument, but not for invalid characters (in particular, space cannot appear in an element name). The command can be empty. In this case, the name will not appear in the XML result.

The command \bibliographystyle takes one argument. Its translation is empty. The argument is remembered. This is the style to use. If the argument is `bibtex:´, this is an indication that BibTeX should be used instead of Tralics for the production of the bbl. The style can be given after the colon, or with the invocation of another command. If the argument is `program:foo´, this means to use foo as program. For instance \bibliographystyle{program:cat -v}. In this example, this will print the auxiliary file; this is not good, because the command should create the bbl file (its argument is jobname.aux, data must be written on jobname.bbl). A second \bibliographystyle command can be used for specifying the style (the default is `plain´). Example. Consider a file that contains these lines

161 \documentclass{article}
162 \begin{document}
163 \AtEndDocument{\bibitem{unused}Hey}
164 \bibliography{torture}
165 \bibliographystyle{bibtex:}
166 \cite{poussin,cruz,many,strange,unused}
167 \end{document}

When Tralics sees the \end{document} command, it evaluates it (with the hooks, etc.) After that, a bbl is created and translated. If there is no unsolved entry, nothing happens. If no style command indicates that BibTeX or an external program should compute the bbl, then Tralics does it, as explained above. In the case of the Raweb, three database files are used: apicsfoot_2004, apicsrefer_2004, and apics2004. These files are typed `foot´, `refer´ and `year´. In the non-Raweb case, files in the list indicated by \bibliography are used. If a file is named `miaou+refer´ or `miaou+foot´ and does not exist, then miaou is tried instead; in this case the type will be `refer´ and `foot´ (otherwise, it is `year´). In the case an external program is used, a minimal auxiliary file is created. In the case of the example, it will contain

168 \citation{poussin}
169 \citation{cruz}
170 \citation{many}
171 \citation{strange}
172 \bibstyle{plain}
173 \bibdata{torture}

The database torture.bib contains a sequence of entries, plus the following lines. In order to understand the last line, you have to remember that character strings are always balanced against braces. Hence it is not: open brace concatenated with 1 concatenated with 1 and close brace. It is: open brace, double quote, space, sharp, etc, up to double quote, close brace.

174 @String{ stra= {\def}}
175 @String{ strb= "#1" }
176 @String( strc= "\mycmd " )
177 @Preamble (stra # strc # strb )
178 @Preamble( "{" #1 #1 "}")

After that, the external program is called, and the bbl file is read. In the example this gives the following. The first line is the preamble.

179 \def\mycmd #1{" #1 #1 "}
180 \begin{thebibliography}{1}
182 \bibitem{many}
183 Joe And, Joe And, Joe-And, et~al.
184 \newblock X.
186 \bibitem{strange}
187 A~b~Cde.
188 \newblock X.
190 \bibitem{poussin}
191 Charles Louis Xavier~Joseph de~la Vall{\a'e}e~Poussin.
192 \newblock X.
194 \bibitem{cruz}
195 Maria {\MakeUppercase{d}e La}~Cruz.
196 \newblock X.
198 \end{thebibliography}

After that, the bibliography is translated and inserted. The resulting XML file is shown here.

1 <?xml version='1.0' encoding='iso-8859-1'?>
2 <!DOCTYPE std SYSTEM 'classes.dtd'>
3 <!-- Translated from latex by tralics 2.9.1, date: 2006/11/02-->
4 <std>
5 <biblio>
6 <Bibliography><p id='bid2'>
7 Joe And, Joe And, Joe-And, et al.
8 X.</p>
9 <p id='bid3'>
10 A b Cde.
11 X.</p>
12 <p id='bid0'>
13 Charles Louis Xavier Joseph de la Vallée Poussin.
14 X.</p>
15 <p id='bid1'>
16 Maria De La Cruz.
17 X.</p>
18 </Bibliography>
19 </biblio><p><cit><ref target='bid0'/></cit>, <cit><ref target='bid1'/></cit>,
20 <cit><ref target='bid2'/></cit>, <cit><ref target='bid3'/></cit>,
21 <cit><ref target='bid4'/></cit></p>
22 <p id='bid4'>Hey</p>
23 </std>

Finally, we show here everything printed on the screen, including all warnings by BibTeX.

1 This is tralics 2.9.1, a LaTeX to XML translator
2 Copyright INRIA/MIAOU/APICS 2002-2006, Jos\'e Grimm
3 Licensed under the CeCILL Free Software Licensing Agreement
4 Starting translation of file testb.tex.
5 Configuration file identification: standard $ Revision: 2.24 $
6 Read configuration file /Users/grimm/work/cvs/tralics/confdir/.tralics_rc.
7 Document class: article 2006/08/19 v1.0 article document class for Tralics
8 Bib stats: seen 5(1) entries
9 This is BibTeX, Version 0.99c (Web2C 7.5.4)
10 The top-level auxiliary file: testb.aux
11 The style file: plain.bst
12 Database file #1: torture.bib
13 Warning--empty journal in many
14 Warning--empty year in many
15 Warning--empty journal in strange
16 Warning--empty year in strange
17 Warning--empty journal in poussin
18 Warning--empty year in poussin
19 Warning--empty journal in cruz
20 Warning--empty year in cruz
21 (There were 8 warnings)
22 Math stats: formulas 0, kernels 0, trivial 0, \mbox 0, large 0, small 0.
23 Buffer realloc 0, string 1240, size 12510, merge 4
24 Macros created 97, deleted 0; hash size 1565; foonotes 0.
25 Save stack +20 -20.
26 Attribute list search 1476(1402) found 906 in 1097 elements (1076 at boot).
27 Number of ref 0, of used labels 0, of defined labels 0, of ext. ref. 0.
28 Modules with 0, without 0, sections with 0, without 0
29 Output written on testb.xml (593 bytes).
30 No error found.
31 (For more information, see transcript file testb.log)

Some comments. Line 6 shows the name of the configuration file. If this file contains a line that starts with `## tralics ident rc=´ then all characters after the equals sign are printed (see line 5). Since version 2.5 (pl4), in the case where character number 30 is a dollar sign, a space will be added after it.(note: ) The reason for this is that the RCS software interprets a string like `Revision´ in dollar signs; we do not want it to replace the 2.11 by the revision number of the LaTeX document. We shall explain elsewhere how to read the statistics.

Line 8 shows the number of entries in the biblist. If some entries are solved, they are shown in parentheses. Here, we have 5-1=4 unsolved entries. If line 5 of the source file is commented out, then BibTeX is not used, and lines 9 to 21 will be replaced by the single line `Seen 4 bibliographic entries´.

The standard configuration file contains a line that says that `article´ is an alias for `std´. The `std´ configuration defines two quantities: the name of the DTD, hence the root element, it is <std>, see line 4 of the XML result. It defines xml_biblio to be `bibliography´. This is the name of the element that will hold the bibliography. The default value is `biblio´, but it can be redefined (see line 5). Do not confuse this with the name of the element produced by the environment `thebibliography´, that appears line 6 in the XML result.

Back to main page