Tralics, a LaTeX to XML translator; Part I

French title: Tralics, un traducteur de LaTeX vers XML; Partie I

Author: José Grimm(note: )

Location: Sophia Antipolis – Méditerranée

Inria Research Theme: THnum

Inria Research Report Number: 309

Revision: 3

Team: Apics

Date: September 2005

Revised Date: April 2008

Keywords: Latex, XML, HTML, MathML, Perl, PostScript, Pdf.

French keywords: Latex, XML, HTML, MathML, Perl, PostScript, Pdf.

Abstract

In this paper we describe Tralics, a LaTeX to XML translator. A previous version of the software (written in Perl) was used to obtain the Pdf version of Inria´s “Rapport d´Activité” for year 2001. The current version of the software (written in C++) is used for both the HTML and Pdf version since year 2002: the XML generated by Tralics is conforming to a local DTD, similar to the TEI; it is converted to Pdf via pdfTeX and the xmltex package, and the HTML via a xslt processor.

We explain here the philosophy of the software, its usage, its limitations, and how to customize it. All commands defined by Tralics are explained in this document, for most of them, we give an example of use. The index lists all commmands, environments, and options.

Version 2 of this document holds additions for Tralics 2.9. Version 3 of this document holds additions for Tralics 2.12.

French Abstract

Dans cet article nous décrivons le logiciel Tralics, un traducteur de LaTeX vers XML. Une version antérieure de ce logiciel, écrite en Perl, a été utilisée pour générer la version Pdf du Rapport d´activité de l´Inria en 2001. La version actuelle du logiciel, écrite en C++, a été utilisée pour obtenir à la fois le HTML et le Pdf depuis 2002 : nous avons utilisé une DTD locale, similaire à la TEI, et pdfTeX plus xmltex pour obtenir le Pdf.

Nous expliquons ici la philosophie de Tralics, son usage, ses limitations, et comment paramétrer le logiciel. Toutes les commandes définies par Tralics, sont expliquées, pour la plupart d´entre elles on donne un exemple d´utilisation. Un index regroupe l´ensemble des commmandes, environnements et options.

La version 2 de ce document contient des mises à jour pour Tralics 2.9. La version 3 de ce document contient des mises à jour pour Tralics 2.12.


Short Table of Contents


1. Introduction
2. Expansion
3. Mathematics
4. Translating a bibliography
5. Other commands
6. Running Tralics
Bibliography
Table of Contents
Index

1. Introduction

The Tralics software was designed as a tool for the Raweb. In this chapter, we explain some of our motivations. In the next chapter, we study some TeX commands and explain how they are handled in the same fashion by Tralics. Following chapters explain some differences, merely because XML is not dvi. In a final chapter, we explain how to configurate Tralics. There is a second part, that explains how the XML files can be used, converted into Pdf or HTML; it describes also the Raweb DTD. The last chapter of the second part describes additions to the program made since 2007.

1.1. A short history of the Raweb

A short history of the Raweb may be found on the Inria internal web site(note: ). The question concerns Inria´s Annual Activity Report, also known as “Rapport d´activité”, or “Annexe technique” to the RA or “annexes scientifiques” to the RA. This is a document, written by the research teams, at the end of the year N (October, November), and published in March of year N+1.

Until the 1993 edition (published in 1994), only a paper version existed. A LaTeX model was used since 1987, designed by Jacques André then Martin Jourdan. See the reference [5], by Louarn in the first Cahiers Gutenberg.

In 1993, contacts were made with the Grif S.A. society, for the design of a SGML DTD and a LaTeX-to-SGML converter(note: ). As a result, Philippe Louarn was able to put on the web the RA (year 1994) in its HTML version(note: ). But this converter was judged too complicated (rules were too strict) and for several years, the HTML was directly produced from the LaTeX source, using latex2html.

In 1996, a working group (conducted by Albert Benveniste) gave new specifications: independent modules, grouped into ten sections, etc. A technical group was created (conducted by Gérard Paget), whose objective was to find a company that could sell a software (maybe using XML as intermediate language). None was found, but the design of modules (in LaTeX syntax) was well-defined by Laurent Pierron and José Grimm with the aid of Marie-Pierre Durollet and Jean-Claude Le Moal. For the Ra98, a Perl script did some preprocessing, splitting the LaTeX source into modules (one module per HTML page). The author wishes to thank all these people (including A. Quadrat), who gave him the idea to work on LaTeX and write a translator.

In 1999, the Scientific Annexes to Inria´s Annual Report were renamed RAWEB, to emphasize the role played by the Web (it is available as a CD-Rom, but no more printed by Inria).

In 2001, the Perl scripts mentioned above evolved into a LaTeX-to-XML converter (some ideas were borrowed from latex2html, which is also a Perl script). The main trouble was conversion from XML to Pdf, and we used tools from the TeX community (by S. Rahtz and D. Carlisle [1]) and pdfLaTeX. On the other hand, the images in the HTML files were converted by latex. Additional software (for creation on a global index, etc.) was written/used by Marie-Pierre Durollet.

This gave a complicated object: a Perl script, that converts a LaTeX file into XML (using Omega as a subprocess for the math) followed by another Perl script that extracts the math, calls latex, then dvips, then pstoimg (a Perl script that calls ppmquant, pnmcrop, and so forth, whose job is to produce a png image for the math formula), and finally an XSLT processor for the effective conversion. This became even more complicated in 2004, where a new DTD was introduced (designed by Bruno Marmol and people mentioned above), hence a XML-to-XML translator. See Figure 1.

Figure 1. A diagram that explains how the Raweb operates. Rectangular boxes contain tools, diamond-shape boxes are style sheets, and circles contain language names. The name XML is in a double circle, it is the central object; the arrow labeled `D4´ that connects it to itself indicates conversion from one DTD to the other, used in 2004. The box containing `em´ represents the Perl script extract-math.pl that handles the math formulas; it uses tools borrowed from latex2html. The diagram is written in `pgf´, a format that Tralics cannot interpret yet.
xml-route2

1.2. Birth of Tralics

The big Perl script was rewritten as a C++ translator, renamed Tralics, and got (for version 1.6) a first IDDN(note: ) number in December 2002. This software was still able to produce a LaTeX preview of the Raweb: The source is read, the syntax is tested, a LaTeX file is written for each module, latex and bibtex are called, the resulting dvi is converted to PostScript. But the same Tralics can be used in a different way: the source is read, the syntax is tested, an XML file is created, an XSLT processor is called to generate the XSL/Format, and pdflatex generates the Pdf (you can also generate an HTML version).

Since 2003, there are a few people writing their RA directly in XML. As a consequence, the new Tralics, that is used in 2004, does only the bare minimum: it converts the LaTeX code into XML code. There is a Perl script that does everything else (calling external programs like latex, xsltproc, etc). A non-obvious point concerns the math and the images. For the math, see above; for the images, a Perl script (the same as above) is used for conversion from PostScript to png. In some cases, the image already exists in png format and it is unwise to re-create it.

In 2007, the name of the Raweb changed; it is now RalyX. Here is a quote from the Web page:

The objective of the RalyX(note: ) project was to publish and exploit dynamically the annual INRIA activity reports; Ralyx is based on the Xylème(note: ) system, a native XML database. Xylème stores the XML version of the activity reports and supports queries to the reports involving both their structure and their content.

The first objective was to offer a tool for browsing the activity report with the same interface as the legacy HTML version. However, thanks to Xylème, pages and links are no longer statics, but computed on the fly, which can offer more flexibility in the future (different styles of display, editing along the year, etc).

1.3. Main objectives

The main objectives of Tralics are described in [3]. Recall that we want a program that converts from one language to another, such that

By the very choice of the XML language as target, the first objective is automatically satisfied. The only little problem concerns spaces: in which cases are spaces used as delimiter or as text? The translation of `\par␣x{␣}{␣}x\par´ is `<p>x␣␣␣x</p>´, and this is often interpreted in the same fashion as a single space.

The third item has the following meaning: the translation of `\foo´ and `\bar´ should be different in case these objects have different meanings in LaTeX, but should be the same otherwise. One could argue that some tokens could be translated more than once in a different context (remember the \texorpdfstring command). We could imagine special rules for the bibliography; in fact, in math mode you must use a different function to add attributes to an element. On the other hand, we have an application where `\tt´ and `\bf´ are treated alike: in fact, all font information is removed, by redefining element names.

The important point is preservation of the structure. This means of course that the initial document has some structure and the target has one also. Clearly the structure of an XML document is given by its DTD, but this is completely ignored by Tralics. The first version of the program was very strict; nowadays, in your document, you can omit the `\documentclass´, the `\begin{document}´, or the `\end{document}´.(note: )

There are some implicit rules, for instance that `\section´ terminates a `\subsection´, the occurrence of a character in vertical mode triggers the start of a <p> element, etc.

In [3], we wrote that the document should satisfy three kinds of of assumptions: some technical assumptions (like: there is no `\0´ in the document), some validity assumptions (the document can be compiled by LaTeX without errors, it respects the rules of the Raweb as given on the web pages), and more general, unwritten, rules. The current translator is more flexible: there is no restriction on the syntax of commands, except that null characters are not allowed, plain TeX documents are translated (for instance the file xii.tex by D. Carlisle that starts with \let~\catcode~`76 and ends with Yer.W,:jbye), and we have an application where only one part of the document is translated: only code before `\maketitle´ is used. In this document we shall explain these things.

1.4. Notes on the distribution

The Tralics software is formed of five different types of files. On one hand we have the documentation; this is formed of a sequence of LaTeX documents (like the one you are reading), converted by Tralics into XML then HTML, together with a sequence of HTML pages that describe all commands, features, and packages. These files have no explicit Copyright notice (so that default rules apply). The distribution contains a lots of tools that can be used for the conversion from XML to XML, HTML, or Pdf, written by various authors, with different copyright notices.

The Tralics executable can be obtained by compiling the sources; some precompiled versions are also available. Some configuration files are also provided (files with extension .tcf, .clt, or .plt). The binary may contain a path to these auxiliary files, if this is not the case, the path must be given at runtime to the program. Sources are distributed according to the CeCILL Free Software Licensing Agreement. This gives anybody the right to modify the sources, redistribute the software, and include it (or part of it) into another Software (see the Copyright Notice for details).

Finally, the distribution contains some test files. These are examples of source code, and their XML translation. Most of these file compile without error, but some files test the error handling mechanism. These files are distributed under the same conditions as the source files, and it is assumed that whoever changes the sources modifies the test files accordingly.

1.5. An example

Let´s consider the following piece of code.

    $\left [1=2\right\}$ est une \emph{formule de mathématiques}~!

You would expect this to be understood by LaTeX as `1=2 est une formule de mathématiques !´ but(note: ) the result is `Hello, world!´, since we have preceded it by this (curious) list of commands

    \catcode`\$=\active\def$#1~{\catcode`\$=3 Hello, world} %$emacs

In TeX, translation depends on a great number of tables, that assign values to numeric or symbolic quantities. For instance, each character (a number between 0 and 255) in each font (for instance \OT1/cmr/m/it/10) is associated to its dimensions, ligature informations, etc. Our example (with a normal dollar sign) uses three fonts: a math font, an italic font and an upright font. Often, these tables are read only by TeX, in some other cases a user might look at them, and sometimes they are designed to be modified. For instance, the slant value of the current font is in general ignored (except by commands like `\emph´). A very important table holds the meaning of commands: they may be predefined (like `\left´), defined in a format (like `\emph´), or defined by the user (there are a great number of mathematicians that define a command named `\RR´ for the set of real numbers).

An important table is the table of category codes. Each character has a category code, an integer between 0 and 15. For instance, `e´ has category letter, `é´ has category active (so that it is equivalent to `\´e´)(note: ), backslash, open brace, close brace and dollar sign are of category 0, 1, 2, and 3 respectively. A command like `\left´ is formed by an introducer (any character of category 0, its value is irrelevant) and a sequence of letters (in the example, the space that follows is not part of the command, since its category code is 5; this space will be read again, and ignored, unless its category code changes; such a change can be triggered by the evaluation of the command); while a command like `\}´ or `\$´ is formed of an introducer and a single non-letter character (when followed by spaces, these spaces do not disappear). Writing a TeX scanner is easy, the only difficulty is that category codes can change (for instance, in verbatim mode). In the example, the tokens are the following: $3 left [12 112 =12 right {, etc, s11 }2 ~13 !12. Note that a closing brace, with default category code is shown as `}2´, while a command whose name is formed of a closing brace is shown as }. We use underlining in the HTML version, instead of boxing, as in the TeXbook. If you say \let\foo\bar, and ask Tralics for the meaning of \foo, as in \show\foo, it knows the name `foo´ of the token and its value (this is a command code) and gets the name `bar´ from a lookup table, and adds the current value of the escape character in front. If this is the plus sign, you will see +foo=+bar.

Parsing an expression means finding for each command its arguments. In the example, the two active characters é and ~ take no argument. The \active command takes no argument either. The \emph command takes one argument(note: ): a token, or a token list delimited by braces (i.e. characters with category codes 1 and 2), these braces are not part of the argument. The two commands \left and \right are special: they want a delimiter, in reality a pointer into a special slot in the current math font; the argument cannot be delimited by braces. The syntax of \catcode is more complicated. In fact, it is an instance of <codename>(note: ), and <codename><8-bit number> is something that can follow \the or be used as <internal integer>, while a <code assignment> is <codename> <8-bit number> <equals> <number>. We shall explain this in details later; the idea is that, depending on the context, the \catcode command returns a value stored in a table or modifies it. In the case of

      \catcode`\$=\active

the <8-bit number> is the internal internal code of the dollar sign, expressed in the form ``\$´, and the <number> is another character code(note: ). Expanding a command means (roughly speaking) reading its arguments, and replacing it with the body of the command (where special markers like `#1´ have been substituted by the value of the arguments). In some cases, internal tables may be consulted, but they are never modified. Evaluating a command implies modification of some internal state variables (for instance, a character can be added to the current character list, or a complete paragraph split into lines, or a register modified). In this example, the dollar character becomes magically active: a dollar character is no more read as $3 but as $13.

The syntax of \def and friends is <def><control sequence><definition text> where <definition text> is <parameter text><left brace><balanced text><right brace>. In the case of

      \def$#1~{\catcode`\$=3 Hello, world}

the <control sequence> is the dollar sign (the object to be defined, a command or, as in this example, an active character), the <parameter text> is everything before the open brace, here `#1~´, and the <balanced text> is everything between the braces. Evaluation of `\def´ consists in storing in a table the <parameter text> and the <balanced text> (TeX stores also a special marker representing the <left brace>). What happens now when TeX sees a dollar sign? since this character is active, the definition given above applies. The <parameter text> explains how to read arguments. In this case, `#1~´ means that there is one argument, everything up to (but not including) a tilde character. In the case of `$\left....$...}~!´, all characters are read (except the exclamation point). They are replaced by the body (no substitution is needed). After that, TeX sees \catcode and evaluates it as before, so that the dollar sign becomes a math shift character again. The space after the digit 3 disappears, and we are left with `Hello, word!´. Note: the example is given in French, first in order to show how 8-bit characters can be used, and also because, in English, there is no tilde before an exclamation point. In general, when a macro reads a delimited argument and sees an empty line instead of the delimiter, it signals an error of the form Runaway argument? Paragraph ended before $ was complete.

Translation difficulties. There are different kinds of LaTeX to HTML/XML translators. Some, like gellmu(note: ) use a syntax of their own; others, like tex4ht use TeX as preprocessor, thus understand the full syntax; there are translators like tth or hévéa that use a fixed (and efficient parser), or like latex2html (written in Perl) that use pattern matching, and global substitutions instead of sequential evaluation. Neither of these is perfect. We explain in this paper how Tralics deals with a certain number of problems.

1.6. Some remarks on the Translation

In LaTeX, commands can be defined in five places:

  1. In the Pascal source. In this case, the command is a primitive, for instance `\def´ or `\left´. No equivalent might exist in the target language (for instance, you cannot define anything in a XML document, the result of `\left´ is an attribute of the <mfenced> element that results of the translation of the group implied by \left. And what about \dump?)

  2. In the format file. In this case, the command is defined in a file (for instance latex.ltx), analyzed by TeX and stored on disk using a fast retrieval method. For instance, plain TeX, LaTeX, ConTeXt have their own format file. Both plain TeX and LaTeX define a command named `\item´, in a different way, for the same purpose (the LaTeX command must be used inside a special environment, the plain TeX one can be used everywhere).

  3. In a class file. A class file defines markup commands for a generic purpose (a book, an article, a presentation). For a book, you have a frontmatter, for an article, you can have a title page, in both you have sections, etc. The class defines also the current font, together with a lot of dimensions.

  4. In a package. The difference between a class and a package was introduced by LaTeX2ϵ. Each document uses a single class, and lots of packages (this document uses the `report´ class, and the `RR´ package for the look; it uses `hyperref´ for hyper-links, `amsmath´ for the mathematical examples, `fancyvrb´ for the verbatim examples, etc.). Note that the plain TeX format provides a macro \proclaim for theorems, while the amsmath package provides a command \newtheorem for defining theorems.

  5. In the TeX source, or files included via \input. The TeX source may redefine commands defined earlier; a package may redefine commands from the class, but should not redefine commands from other packages (i.e., the order in which packages are loaded should have no importance).

If one is to design a translator, the question is: which commands to translate? and how? Our idea is that all TeX primitives should be understood (the difference between `\dump´ and `\mydump´ is that you get either an Unimplemented or Undefined command error), as well as all standard LaTeX commands; of course all user-defined commands are expanded. Concerning classes and packages, our model (the Raweb) looks like a report. There are too many classes and packages for implementing them all. In earlier versions of the software, commands \documentclass {foo} or \usepackage {bar} did some action if the class foo or the package bar were known, and were ignored otherwise; we mentioned in the first version of this document interest in PhD thesis and slides classes, but no progress have been made in this direction.

In the current version of the software, commands \documentclass, \usepackage and all these are fully implemented. However, instead of reading foo.cls and bar.sty, Tralics reads foo.clt, and bar.plt. As an example, consider the makeidx package. The difference between the sty and plt files is in the definition of the \printindex command. The semantics of this command is: insert the index here. In the first version of Tralics there was no index, but nowadays multiple indexes (including glossaries) are allowed; the effect of the command is to mark a position in the XML tree, if omitted the index will be put at the end of the document; as a consequence, this cannot be a user-defined command. On the other hand, the LaTeX code asks to read the content of the index file (produced by makeindex) via \@input@; this a command that, as \@input, is used by LaTeX to read files that may be created after a first run (files of type aux, toc, bbl, ind, etc.), and is not implemented in Tralics (you run Tralics only once).

Another feature is worth mentioning. We have shown above some syntax rules; the source document uses the command \syntax, that takes one argument and put brackets around it; we followed Knuth´s notations and used the character \langle. This produces a math formula with a single character and my my favourite Web browser (in non-MathML mode) produces an ugly result, so that a simple less-than sign has been used for the HTML version. You can define a command \iftralics in the same way as \ifpdf (checking for instance that \tralicsversion is defined), and write conditional code (an example will be given later). But we used the opportunity provided by Tralics that, if your file is myfile.tex, myfile.ult is read before the TeX source (by Tralics, and not LaTeX) and this file may contain command redefinitions (of course, if you want to redefine a not-yet-defined command, some care must be taken).

Let´s assume that \foo is a command defined in one of these LaTeX files. Should it be translated by Tralics? and how? Notice first that a user command is always expandable, and has to be expanded; only the input stream is affected by this operation that consists in reading arguments and replacing them by an instantiated body. On the other hand, evaluation may modify or use internal tables, which can be implemented differently in our translator than in TeX (for instance, in pdfTeX, there is a way to re-use an image). Whenever a command is defined in a style file(note: ), we could use the LaTeX source. In some cases, for efficiency reasons, we implemented them in C++ (for instance, the whole fp package has been re-written).

A typical example is the \it command. This is a non official LaTeX command (see [2], or [6, section 7.3.6]): “it is legitimate for you to redefine them in a package or in the preamble according to your personal taste”. In Tralics, there is no difference between `\it´ and `\normalfont\itshape´. What these commands do in LaTeX is rather complex: essentially, some variables are set and `\selectfont´ is called. These commands are robust: if you use `\it´ in a chapter title, the toc file(note: ) will contain the name of the command, not the result of the expansion. Since Tralics does not write anything in a toc file, this feature is not implemented. Using font changes in a title is not recommended: look at the table of contents for this section. You will see that the word `Translation´ is in a different font, although it is the same in the main document. The command \texorpdfstring has been used, otherwise pdfTeX complains with Token not allowed in a PDFDocEncoded string, because font changes are forbidden in bookmarks.

Consider now a command like \motscle. This is defined by the Raweb as an environment (you can use it in Tralics only as an environment) and it expands to something like `{\bf mots clés}~´. We do not want Tralics to use this expansion. One reason can be that the post-processor might prefer something like `\textbf{Mots clefs}~:´ (i.e., use an alternate spelling, use an initial capital, add punctuation, etc). Another reason is that, since 2003, the Raweb is in English, and the translation should be an English word. For this reason, the translation of \motcles is <keywords>. The raweb class file contains commands that should not be translated: for instance, there is a command \ra@finpart whose purpose is to add a period after the last keyword. This should not be done by the translator, but by the program that will typeset the XML result.

Some TeX primitives are hard to translate. For instance the command is assumed to put an acute accent over a character. It is defined (via an indirection through font encoding tables) in terms of the \accent primitive. We could translate into a Unicode combining character (U+301, to be placed after the character, see [7, paragraph 7.7]). But in general, we have a construction like \´e which is defined in iso-latin1, this is the Unicode character “latin small letter e with acute”. In the same fashion `\k a´ translates to `ą´, this could produce `&aogon;´, but the actual translation by Tralics is `&#x105;´. In a previous version of Tralics, a construction like \´\^ was illegal. In the current version, we use a double indirection table. You can typeset the name Hàn Thế Thành of the author of pdfTeX, the input is Th\´{\^e}, the Tralics output is Th&#x1EBF;. If you know the Unicode character value, you can enter it (in Tralics as Th^^^^1ebf or as Th\char1EBF).

A construction like `\font\myfont=cmt10 at 13pt´ defines a command \myfont that can be used as {\myfont \char217w}. The effect of this command is to pick up two characters from the font cmt10, scaled by some ratio (it depends on the “at” size of the font), and typeset them. Note that the font could specify a ligature, so that the result could be a single character. In any case, the result might look very different from `Ùw´ (for instance LaTeX provides a font containing only lines or circles). In the current version of Tralics, translation does not depend on the current font. The only interest of changing the current font with \font is that you can access or modify the internal tables of the font (but no metric file is read at all by Tralics).

In order to translate font changes correctly, you should define a command, for instance \myw, that uses character w, of the font `myfont´; the command could be more elaborate (it could look at the slant of the current font and select a slanted version of the character); the command could then be redefined for Tralics, either using a Unicode character, for instance \char"1E09 if you want a C with cedilla and acute, or something else if the character is not defined by Unicode.

1.7. Category codes and characters

A category code is an integer between 0 and 16, as explained in the TeXbook; it is an attribute of a character, as used by the tokenizer. The codes are used according to the following table.

  1. A character of category code 1 serves as group delimiter (opening character), as well as delimiter for macros and token lists. It is a <left brace> as used in the rules explained above. By default, the left brace character { is the only one with category 1.

  2. A character of category code 2 serves as group delimiter (closing character), as well as delimiter for macros and token lists. It is a <right brace> as used in the rules explained above. By default, the right brace character } is the only one with category 2. The end of `verbatim´ environment is handled by the following piece of code, that shows how to change the category codes of the three characters mentioned above, and use alternate characters for the same purpose.

    \begingroup \catcode `|=0 \catcode `[= 1
    \catcode`]=2 \catcode `\{=12 \catcode `\}=12
    \catcode`\\=12 |gdef|@xverbatim#1\end{verbatim}[#1|end[verbatim]]
    |endgroup
    
  3. A character of category 3 serves a math shift character. By default the dollar character `$´ is the only character in this category. The same character can be used to start and end a formula. Two consecutive character of category code 3 are used to start or end display math; the characters need not be the same.

  4. A character of category code 4 is used as delimiter in arrays; it indicates the end of a cell. A typical row in a LaTeX array is a&b&c\\; in plain TeX, it would finish with \cr or \crcr. The & character is the only one of category code 4.

  5. A character of category 5 is a end-of-line character. When such a character is seen, all remaining characters on the current line are ignored. After that, the reader behaves as if it had seen nothing, a space, or a \par token. In some cases, the \par token is invalid. In Tralics, this space character is special, in that it may print as a new line character, in TeX, it is a normal space. By default, only the carriage return character is of category code 5 (this character is inserted at the end of every line, instead of the line feed, carriage return, or both, that marks the end-of-line in the file).

  6. A character of category 6 can be used as delimiter in a command, or a table preamble. By default, there is one character of this category, the `#´. A typical array preamble is `\indent#\hfil&\quad#\hfil\cr´ (this is the first example of \halign in the TeXbook. We shall not describe TeX arrays here).

  7. A character of category code 7 is a superscript character; by default, it is the hat character. Do not confuse it with \^ that produces an accent. Such a character can be used only in math mode. There is also the double hat construction, explained later in section 5.1: two identical characters of category code 7 can be used to read any 8bit character as in ^^ab; in Tralics, a 16 bit character can be read using 4 such characters.

  8. A character of category code 8 (by default the underscore character) can be used as subscript character; as in the case of superscript characters, you can use it only in math mode.

  9. A character of category code 9 is ignored. Tralics ignores no character by default. On the other hand, it cannot put a null character in a string, so that the null character will not appear in the XML output (note: the same holds for \char0 and ^^00).

  10. A character of category 10 behaves like a space. By default, space, tabulation, character 160 behave like this (note that the character 160 is no-break space, it should be equivalent to ~).

  11. A character of category code 11 is a letter. By default, only characters in the range `a´ to `z´ and `A´ to `Z´ are of category 11.

  12. A character of category 12 is an `Other character´. All characters not listed here are of category 12, including all digits.

  13. A character of category 13 is an active character. Currently, there is only one active character, the ~. An error is signaled in the case where an active character is used, but undefined. Tralics defines _, # and &, to be the same as \_, \# and \&.

  14. A character of category code 14, is a comment character: all characters remaining on the current line are discarded. By default `%´ is a comment character.

  15. A character of category code 15 is invalid. By default, all characters are valid.

  16. A character of category code 16 is a `short verb´ character. This is a feature that does not exist in TeX. If you use \DefineShortVerb to make it a short verb, you should undefine the character before changing its category code. Unexpected results can follow if non-ASCII characters are of category code 16.

  17. A character of category code 0 is used to create a command like \foo or \\; the associated token is foo or \; the value of the character is irrelevant. By default, there is only one character of category 0, the backslash.

In version one of this document, we started with category code 0, decremented the `enumi´ counter by one, before the first item, so that the first item label was zero. However Tralics does not use this counter in a `enumerate´ environment. Hence, the easiest solution, for having the same labels in the Pdf and HTML version, was to move this item to the end of the list.

If a TeX file contains the following lines,

\show{ \show} \show$  \show& \show# \show^ \show_ %$
\expandafter\show\space  \show a \show 1 %
\def\foo+{}
\foo{ \foo} \foo$ \foo& \foo# \foo^ \foo_ \foo a \foo1

then Tralics will print:

begin-group character {.
end-group character }.
math shift character $.
alignment tab character &.
macro parameter character #.
superscript character ^.
subscript character _.
blank space  .
the letter a.
the character 1.

In any case, we have a prefix that depends on the category code, then the value of the character. All calls to the command \foo are wrong and signal an error. We show here the first error message, followed by the “got” part of the other error messages. It is important to remember that the command has to be followed by the right token, or the right character with the right category code.

Error signaled at line 20 of file txt6.tex:
Use of \foo doesn't match its definition;
   got {Character { of catcode 1},
   expected {Character + of catcode 12}.
   got {Character } of catcode 2}
   got {Character $ of catcode 3}
   got {Character & of catcode 4}
   got {Character # of catcode 6}
   got {Character ^ of catcode 7}
   got {Character _ of catcode 8}
   got a
   got {Character 1 of catcode 12}

1.8. Considerations about mathematics

Translating mathematical formulas is rather difficult: this is because mathematics are complex in both TeX and MathML. Basically, you start with kernels, add some decoration, and connect these things. Kernels can be letters like x, y, z or A, B, C, but you see very often Greek letters like α, β, Γ, Δ, Hebraic characters like ℵ, old German, like ℘. Decoration can be, like in x y , a second kernel on the left, the right, above, below. People use also bars, dots, rings, arrows, etc. There are different types of connectors: for instance you can say x=y or xy, using parallel bars, or x×y, x+y using crossing bars. Amstex was designed by Spivak for easy typesetting of tensors of the form T j 1 j 2 ...j m i 1 i 2 ...i n .

The first difficulty consists in representing all these symbols. In TeX, you use commands like \alpha, defined as \mathchar"10B, whose effect is to chose a character from a font (depending on the packages used; a big difficulty is to chose math fonts that go well with the main text font of the document). Some characters exist in bold version, or in italics version. A non trivial question is how to put everything in bold: you could use commands like \boldx \boldequals \boldalpha. It would be simpler to say \bold{$x=\alpha$}. One solution is to say `\boldmath´(note: ), or equivalently `\mathversion{bold}´ before starting a math formula; the effect is to use a bold version of everything (in fact, of all characters that use one of the math families, but the number of families is so small that not all symbols use this mechanism). You can say `\mathbf{x}´ or `\bm{x}´ in a math formula. Only \mathbf is implemented in Tralics, it provides a bold upright font; on the other hand, the \bm command is defined in the bm package; provided that fonts are available, it should use a bold version of everything (at least for all characters for which a bold typeface exists).

Traditionally, uppercase letters were upright, lowercase letters were italics. This is the default for Greek letters, but TeX uses italics for Roman letters; digits are by default upright uppercase, but lower case digits (also known as “oldstyle numbers”) can also be used. There are some exceptions to these rules: an operator like \sin is typeset using upright font. As said above, \mathbf produces upright characters. The so-called “black board” or “double-struck” characters obtained by \mathbb are upright, for instance . An interesting point: lots of people prefer , using \mathbbm. Some people fake it as IR, using $I\!\!R$. You can always use \mathrm for an upright character, as in $\mathrm{E}=mc^2$. Some packages provide italic uppercase Greek letters, and upright lowercase. Then you can apply the Laplace operator to the Delta function like this: ΔΔ. This is shown in the dvi file as deltadelta.png.

In MathML, characters should not be used directly. For instance, you should use <mn>125</mn> for a number, <mi>Foo</mi> for the variable Foo and <mo>sin</mo> for the operator sin. These elements have a mathvariant attribute, which indicates which variant to use. In the case <mo>, the default is upright; in the case <mi>, the MathML norms says: “The default mathvariant would (typically) be normal (non-slanted) unless the content is a single character, in which case it would be italic.” So we could translate \mathbb{R} as an <mi> element containing a normal R with a mathvariant attribute of value doublestruck. An alternate solution consists in using a double struck character R, and no mathvariant attribute. Here are all the possible variants, and for the character A its Unicode value: normal, bold (1D400), italic (1D434), bold-italic (1D468), double-struck (1D358), bold-fraktur (1D56C), script (1D49C), bold-script (1D4D0), fraktur (1D504), sans-serif (1D5A0), bold-sans-serif (1D5D4), sans-serif-italic (1D608), sans-serif-bold-italic (1D63C), monospace (1D670). In the case of Greek letters, Unicode knows the following variants: bold, italic, bold italic, sans-serif bold, and sans-serif bold italic. And in the case of digits: bold, double-struck, sans-serif, sans-serif bold, monospace. Your browser should show these characters as &#x1D400;,&#x1D434;,&#x1D468;,&#x1D358;,&#x1D56C;,&#x1D49C;,&#x1D4D0;,&#x1D504;,&#x1D5A0;,&#x1D5D4;,&#x1D608;,&#x1D63C;,and&#x1D670;.

Note: there are some holes in the table, starting at 1D400. For instance the Laplace symbol (U+2112) looks like , but the the translation of \mathcal (that TeX shows as ), is not 1D49C+11 (this character does not exists). The dvi file shows calL.png and scrL.png for these symbols. Another special case: there are some variants of the Greek letters ϵ and ε; which one is the default is unclear.

There are two complementary views of MathML: presentation and content. This is how you would convert a=b using content markup: <reln> <eq/> <ci> a </ci> <ci> b </ci> </reln>. And this is how Tralics converts the same using presentation <mrow> <mi>a</mi> <mo>=</mo> <mi>b</mi> </mrow>. A more complex example, in presentation mode:

<apply>
  <int/>
  <bvar><ci>x</ci></bvar>
  <condition>
    <apply><in/><ci>x</ci><ci type="set">C</ci></apply>
  </condition>
  <apply><sin/><ci>x</ci></apply>
</apply>

You can clearly see that we apply the <int> operation to some quantity (in which <sin> is applied to x), subject to some condition (in which <in> is applied to x and some set). If you consider the expression $\int_0^\infty \sin(x)dx$, it is translated by Tralics as

<msubsup><mo>&int;</mo> <mn>0</mn> <mi>&infin;</mi> </msubsup>
<mrow>
  <mo form='prefix'>sin</mo>
  <mo>(</mo>
  <mi>x</mi>
  <mo>)</mo>
  <mi>d</mi>
  <mi>x</mi>
</mrow>

Forget about the <mrow> element, this is added by Tralics using some heuristic rules that do not always work; their effect is to isolate the parentheses from the integral sign: the height of the parentheses should be normal. In the expression, one can see that the integral sign is a character considered as an operator (there no <int> element here); it has an exponent and an index. In the same fashion, \sin is translated as a <mo> element (with an attribute that says it precedes its argument). Nothing in the formula says that the argument is x.(note: ) Nothing says that `dx´ is not the product of d and x, but a conventional way of indicating that x is the bound variable (the <bvar> above). In fact, Tralics cannot guess the use of the tokens, it knows only the layout: for instance, is Γ ij k a Christoffel symbol with three indices or a randomly Greek letter, with two indices raised to some power? And what about 3 He or this footnote?(note: ) Note that such expressions are not part of content markup: in both cases, these things look like tensors and are to be produced with <mmultiscripts>. In what follows, we shall speak only of the presentation part of MathML.

Some features are difficult to implement. For instance, it is possible to group some equations in a single mathematical formula, and put a label (with a reference like 17) to the whole, as well as a label for each subequations (referenced as 17.a, 17.b, etc). It is also possible to split an equation on more than one line, with a single number for it:

ζ(s)=1/1 s +1/2 s +1/3 s +1/4 s ...=1/1 s +1/3 s +...+1/2 s +1/4 s +...

It is possible to add some lines of text between two equations

X=1/2 s +1/4 s +1/6 s +1/8 s +1/10 s +1/12 s +1/14 s +1/16 s +...asinthisexample(equalssignsarealigned)=1 2 s (1/1 s +1/2 s +1/3 s +...)(1)

What Tralics produces in this case is a single table (with a single equation number), and the intertext is just a new row, left aligned, that spans two columns (see \intertext in section ).

Horizontal spacing in math formulas in managed intelligently by TeX. In the case of a+b or a=b, there is some space on each side of the operator, and this space disappears when the formula becomes an exponent or an index. Here is an example.

a+b a=b,x a=b a+b .

that shows as spacing.png in the dvi file. Each operator has a type, and the value of horizontal spacing depends on the type of the tokens on the left and the right. Traditionally, parentheses are removed around the arguments of sine and cosine, and you say: sine squared of X instead of sine of X, the whole squared. Example:

sin 2 x+(sinx) 2 .

In TeX, you can consider every expression (a simple atom, or a list in braces) as an operator, provided that it is preceded by its type. Types Over, Under, Acc, Rad, Vcent are obtained by construction (overline, underline, adding an accent, constructing a radical, vertical centering). Types Ord, Op, Bin, Rel, Open, Close, Punct, Inner are obtained by defining characters via \mathchar, or using the following commands \mathord, \mathop, \mathbin, \mathrel, \mathopen, \mathclose, \mathpunct, \mathinner. These commands are understood by Tralics, but in general ignored. In the case of an inline expression, a line break can only appear after an operator (like plus or equal), provided it is the main operator (in particular, if you put the whole expression in braces, no line break will occur). This is ignored by Tralics. Hence something like f(x a+b ) cannot be broken(note: ).

Note that exponents and indices use a smaller font size, and exponents in exponents use an even smaller one: compare x y ε ϵ with doubleindex. The image was obtained by converting the math formula into XML, then in a png image (such kind of images will be put into HTML pages, because there are few HTML browsers that understand MathML. We used anti-aliasing, because this is supposed to increase readability). There are three sizes, and four styles (\displaystyle, \textstyle, \scriptstyle, and \scriptscriptstyle). The first two styles in the list have the same size (in MathML, there is an attribute scriptlevel that controls the size of the expression, and another one, displaystyle, that says if the expression is in display style or not). In fact, TeX has two substyles, cramped or not. Consider carefully the placement of exponents in the following example

a 2 +x 2 a 2 +x 2 =x 3 2 x 3 2 .

this is shown as mathformula.png in the dvi file.

One problem (for a translator like Tralics) deals with the placement of arguments versus operators. For instance, if you want to put a dot over a letter, you can say $\dot x$ (compare the math version x ˙ and the text version ; in text mode, slants are taken into account). The MathML equivalent is an <msup> whose first element is the identifier x. If you want to put a prime after a letter you say , or x^{\prime}. If you want to put an arrow over x prime, you have the choice between x ' and x ' . With the fonts used in this example, the first solution looks horrible. You can say x_2^3 or x^3_2: the result is the same: a math item, formed of a nucleus, a superscript and a subscript. If you want something like 3 2 x, you have to use two items, the first one has an empty nucleus, the second has no scripts. You can say R i j kl , using three items. The MathML translation should consist in a single <mmultiscripts> element. It is possible to enclose a formula by braces, brackets, etc, provided that the font contains the machinery needed for it. You can either use \big and its variants, if you know the height of the formula, or \left and \right, or as in the example that follows, use an environment like matrix that uses whatever is best.

abcd

The \over command (and friends) are discouraged by amstex: you get a message of the form Package amsmath Warning: Foreign command \over; \frac or \genfrac should be used instead. In fact the command takes two arguments, one before and one after. Example

x=\left( a+b \over c+d\right)^2+1 gives

x=a+b c+d 2 +1

The trouble is the following: Assume that we have a command \myfrac that typesets its arguments as a/b if the current style is script or scriptscript, and a b otherwise. In an expression as above, the current style cannot be known before the \over is seen. For this reason, TeX introduced a command \mathchoice that takes four arguments, one for each style; after TeX has completely read the math expression, it takes, in a second pass, the relevant token list. This is complicated. This mechanism is partially implemented in the current version of Tralics.

If you do not like MathML, you can set the integer \@nomathml to a non-zero value. If it is positive, then most math commands are allowed outside math mode, with trivial translation. If it is negative, then math formulas are parsed as usual, but the resulting tree is output à la TeX, rather than producing a MathML formula. Example

\makeatletter
\@nomathml=1
\sqrt{\alpha+\beta ^4}
\@nomathml=-1
$\sqrt{\alpha+\beta ^4}$

The translation is

\sqrt \alpha +\beta ^4
<texmath type='inline'>\sqrt{\alpha +\beta ^4}</texmath>

1.9. Some subtleties of TeX

Assume that we have a number, say `1.3´, and want to convert this to a dimension, say `1.3pt´. If the number is in a command \foo, we can say \dimen0=\foo pt. On the other hand, how can we get 1.3 given that \dimen0 holds 1.3pt? We may use \the\dimen0, and remove the last two characters. Hence we say \expandafter\rem@pt\the\dimen0. One trouble is that the category codes of the characters `pt´ produced by the \the command are 12, not 11, so that a definition like

\def\rem@pt#1pt{#1}

does not work. You could try to change the category codes of p and t in the definition, but these letters are part of the name of the command. Some black magic has to be used. The code shown here converts also `12.0pt´ to `12´.

\begingroup
  \catcode`P=12
  \catcode`T=12
  \lowercase{
    \def\x{\def\rem@pt##1.##2PT{##1\ifnum##2>\z@.##2\fi}}}
  \expandafter\endgroup\x

Implementing such a construction in a Perl script is not obvious. Consider then the following example:

\def\foo#1{#1x#1}\def\xbar#1{ $#1$}
\expandafter\foo\xbar y
\foo\xbar y

Consider line 2. After \expandafter has read the two tokens \foo and \xbar, current state is S, a space and the character `y´ are not yet read. Expansion of \xbar reads as argument the character y; the space before it is ignored. The expansion is then foo 5 $3 y11 $3. Hence the argument of \foo is the dollar sign, expanding it gives $3 x11 $3 y11 $3. Note that this gives an odd number of dollar signs.

Consider now the third line. Here the argument of \foo is \xbar. The expansion is xbar x11 xbar, and a space and `y´ have to be read. The expansion of \xbar is 5 $3 x11 $3. The second \xbar reads a space and `y´ as its argument. The expansion is 5 $3 y11 $3. Note that latex2html complains with: Unknown commands: xbarx, because the string is rescanned, and a space should have been added. In the second version of our translator, which was a Perl script as latex2html, we solved this problem as follows. The commands \\ and \! are replaced by \00! and \01!, special characters are replaced by \3#!, \3&!, \3<!, \3>!, and \3~!. A command like \foo is replaced by \1foo!, spaces after \foo are removed. Using such a mechanism, we can handle spaces correctly, as long as the category codes do not change.

The current version of Tralics uses the same representation as TeX for its tokens, namely an integer. A character token like $3, is represented by the integer c+N*C, where N is the number of characters, c the character value (here 36), and C the category code (here 3). In TeX, the constant N is 256, in Tralics, it is 2 16 . Let M=16*N; then non-character tokens are integers x at least M, and x-M is the address in the table of equivalents. An active character is represented by c+N*C, with C=16, and a single-character command is represented by c+N*C with C=17; multiletter control sequences are represented by integers larger than M+2N, and x-M-2N is the hashtable location of the token (it contains the name of the token).

If you say \uccode`\~=`A two integers are read; in these case, they are character constants; the value is obtained by taking the token modulo N. If you say \uppercase{~}, for each token in the list, something happens if the value is less than M+N; if the value modulo N is c, and if the upper case equivalent of c is c ' , a non-zero value, then x ' =x-c+c ' is used instead of x. Note that if C=x/N and C ' =x ' /N, then C=C ' , this means that an active character is replaced by an active character, a character with category code C is replaced by a character with the same category code.

The behavior of a token is determined by its command code (and its subcode): if the token is c+N*C, then C is the command code, and c the subcode. Said otherwise, for a character token, the command code is the category code, and the subcode is the character value. In the other case, the token is non-constant, its meaning can change. The actual meaning is in the table of equivalents, it can be pushed on the save stack. For instance, the default value of \count is (1,90,0). Here the first integer is the definition level, (1 is the bottom level), 90 is the command code, namely register, and 0 is the subcode. The default value of \pausing is (1,91,324), where the second number is the command code assignint and the third number an address in the table of integers. Remember that, in the case of \the\count0, the \count command is evaluated for a value; in all other cases it is evaluated for side effects. In particular for a case like \global\count3=17. Assume that you say \count3=17; in such a case a number is read, namely 3, and checked for out-of-range. After that, the subtype is looked at (the commands \count, \dimen, \skip, or \muskip have the same command code, but different subtypes, they read something and store it somewhere). Note: if you say \countdef\foo3, then \foo has assignint as command code, and its subcode is 3, thus behaves exactly like \pausing. Moreover the action is the same as \count, after 3 has been read. In the same fashion, if you say \chardef\bar3, then \bar has chargiven as command code, and its subcode is 3; the action is exactly the same as \char, after 3 has been read. In the case of chargiven, the action consists in putting the character in the dvi file (Tralics puts in the XML tree). In the case of assignint, a integer is stored or retrieved. For instance, \foo=17, and \count3=17 read a number 17. This number will be stored somewhere in the eqtb table; the location is 3 slots after the start of the \count table. This slot contains (L,V), for instance (4,23). The first number is the definition level, and the second is the value. In the case L=0, this means that the object is undefined (in the case of a counter, this means zero). Otherwise this is the level. In the case A{B{C}}, A is at level 1, B at level 2, and C at level 3. The quantity L is never greater than the current level. In the case where the assignment is global, (L,V) is instantiated to (1,17). In the case where the current level is L, then V in (L,V) is replaced by 17. Otherwise, the old value is saved on the stack, and (L,V) is replaced by (l,17) (where l is the current level). When the group is closed, the old value is restored.

Note the following trick. Assume that \A increases some counter and puts the value in \foo, \B does the same, but changes globally \foo. Assume that you say {\A\A\B\B\A\A...}. The first \A sets \foo to 1 and saves (0,undef). The second \A sets \foo to 2. The first \B sets \foo to 3 at level 1, the second \B sets \foo to 4, level unchanged. The next \A saves the old value of \foo and sets \foo to 5, etc. As a consequence: every \A preceded by a \B will put an item on the save stack. When the stack is restored, the value to restore will be (L,V) and the current value (L ' ,V ' ). If L ' is 1, nothing happens. Otherwise, (L ' ,V ' ) is replaced by (L,V). As a consequence the value after the group is the value of the last \B. This results in a waste of the save stack. For this reason, Knuth says: all assignments to the scratch registers whose numbers are 1, 3, 5, 7 and 9 should be \global; all assignments to the other scratch registers (0, 2, 4, 6, 8, 255) should be non-\global.

A silly question is: what happens if you say {\let \endgraf \par \gdef\par{} \edef \foo {\endgraf} \Foo \def \endgraf {} \Bar }. In the current version of \Tralics, but this is also true for TeX, when you say \def\foo, the command code of \foo is changed to be user-defined, and the subcode is an address into a table containing the token list of the body. On the other hand, \let \foo \bar will use the command code and subcode of \bar, and copy this in \foo. In the original version of Tralics, the Perl version, we had two tables: the list of predefined commands, with their internal number, and the list of user defined commands with their body. In the code above, when \Foo is executed, then \foo is a user defined command, whose body contains \endgraf, whose meaning is the original \par. When \Bar is seen, the meaning of this token has changed. The essential reason why Tralics was re-written in C++ is to make this piece of code work.

1.10. Language options

Most TeX formats (plain, LaTeX, amstex) have been written by American people; nowadays, major developments are done in Europe (including the conTeXt format, and the hyperref package). However, lots of people use basic primitives for their French publications, and the situation is not simplified by the fact that there are two packages for writing French documents (one by late Bernard Gaulle, and one by D. Flipo).

In the original version of TeX you had to say `\´e´ for an e acute, and you had to say `\c c´ or `\c{c}´ for a c cedilla (which form being the best is in general unknown). For homogeneity reasons, Lamport recommends \´{e}. Because some accents are redefined by tables, or tabular environments, the solution that always works is \a´{e}. This is something strange, but a translator like Tralics has to cope with it (for instance, some authors of the Raweb use BibTeX files that are generated automatically from a data-basis, and this software systematically produces \a for accents). This makes texts rather uneasy to read, and not every spell-checker understands this (Ispell for instance allows José or Jos\´e, but not both). By default (i.e. on Linux machine) LaTeX understands iso-8859-1. This means that all characters used in France are recognized (except, œŒŸ, these characters cause also problems in HTML). Recently, another character was introduced, namely €. Nobody knows how to use it (according to [6, paragraph 7.8.7], published in april 2004, \texteuro is the official LaTeX way, it is translated by Tralics as &#20AC;). In case of doubt, you should use `euro´.

In order to emphasize words, you can underline them, use a different font, or mark them with quotes. In English, you would use quotes “like these” or `like these´, but never like “this”. In France, guillemets « are used like this ». Note that the spacing is different from English, but the package should take case of everything. The forever question is: how to enter these funny characters in my keyboard made in Mexico(note: ). One solution consists of typing two < in a row and hope for the best (we have either an active character, or a ligature). Note that Tralics translates \verb+<<-->>+ as <hi rend=´tt´>&lt;&#x200B;&lt;&#x200B;-&#x200B;-&#x200B;&gt;&#x200B;&gt;&#x200B; </hi> the funny characters have as only purpose to inhibit ligatures in the resulting XML(note: ). The result might also be: ¡¡ and ¿¿. Depending on the packages, you should perhaps use \guillemotleft, \guillemotright or \og, \fg. Which method is the best is still unclear to me.

2. Expansion

One part of the work of TeX is to replace all user defined tokens by primitives; this is the main objective of the `expansion´ process. In this respect, there is little difference between TeX and Tralics. In this chapter, we review some constructions.

2.1. Defining new commands

A definition is typically of the form

\def\fooi{foo}
\def\fooii#1#2{#2#1}
\def\fooiii+#1.#2=#3#{Seen#1#2#3.}

You may wonder why the commands are not called `\foo1´, `\foo2´ and `\foo3´. The reason is that, if digits have standard category codes, they are not of type letter, so that `\2foo´ is the command \2, followed by the letters `foo´ (the tokens are 2 f11 o11 o11) and `\foo2´ is the command \foo followed by the digit 2 (the tokens are foo 212). It is possible to create the token foo2 via \csname foo2\endcsname, and it is also possible to change the category code of 2. This is in general a bad idea: If you say \setlength{\parindent}{\foo2+2cm}, it is impossible to design the \setlength command so that `\foo2´ is read as a command and `2cm´ as a dimension. On the other hand, if you say \def\foo2#1#2{#2#1}, TeX expects, after the second #, the character 2 with category code 12; if not it complains with: Parameters must be numbered consecutively. In Tralics, the message is a bit different, it says Error while scanning definition of \foo2 expecting #2; got #{Character 2 of catcode 11}.) Note how 211 is printed.

Before \def, you can put a prefix: it can be \long, indicating that the command accepts whole paragraphs as arguments; it can be \outer, indicating that the command cannot be the argument of another command; it can be \protected, indicating that the command should not be expanded in an \edef (this is an ϵ-TeX extension); it can be \global. This last prefix can be put before any assignment, it says that the assignment is global (unless \globaldefs is non-zero). More than one prefix can be used, the order is irrelevant. After the \def comes the object to define (this is either an active character, or a command name), then what TeX calls <parameter text>, and this is followed by the body. The body starts with the first opening brace (any character of category code 1) and ends with the first closing brace (any character with category code 2) that makes the body balanced against braces. These braces are not part of the body. The parameter text is an arbitrary sequence of tokens, but cannot contain braces. If it contains a # (in fact, any character of category code 6), it has to be the final character of the sequence, or be followed by the digits 1, 2, 3, up to 9, in order. If there is some text between #3 and #4 (or between #3 and the start of the body), this imposes a constraint on the third argument. If there is some text before #1, this imposes a constraint on the command itself. In the body you can use ##, this will be replaced by a #; you can also use #1, #2, etc., this will be replaced by the value of the first, second, etc., argument. As above, the # is any character of category 6, the digits are of category 12, you cannot access the second argument if only one is available. If you define \foo2 as above, TeX will signal a second error: Illegal parameter number in definition of \foo2.

Once you have defined the commands, you can use them. We give here an example, and the translation by Tralics

\fooi\fooii12\fooiii+ok. {\itshape 3} =xyz{}!
foo21Seenok <hi rend='it'>3</hi> xyz.!

and also by LaTeX `foo21Seenok 3 xyz.!´ Some explanations. The first command takes no argument, thus is easy to understand. The second command takes two arguments, its body is `#2#1´ so that the expansion is the token list formed by the tokens of the second argument followed by the tokens of the first argument. In the case of `\foo12´, the arguments are `1´ and `2´ (a list of length one). In the case of `\fooii {AB} {CD}´, the arguments are `AB´ and `CD´, a list of length two. This is because TeX ignores initial spaces when reading undelimited arguments; in any case, an argument is well-balanced against braces (same definition as above for the body of a command). The shortest possible sequence of tokens is read (in the case of an undelimited argument, this sequence is never empty). If the result starts with an open brace and ends with a closing braces, these are removed, provided that the remaining token list is well-balanced; for instance, in the case `\fooii{}a´, the first argument is empty. If the command is not \long, then \par tokens are forbidden in the argument. In any case tokens that are defined to be \outer are forbidden in a parameter.

In the case of \fooiii, the situation is a bit more complicated. Fetching the argument is more involved than in the general case. The specification is: plus sign, argument, dot, argument, equals sign, argument, sharp sign. Note first that the `+´ sign is not part of the command name, but is required after it whenever used. The first argument here is the shortest sequence (possibly empty) of tokens, that is a balanced list, and this is followed by the required token list (here, a single dot). Here it is `␣{\it␣3}␣´; a pair of initial and final braces disappear, if possible. The `#{´ after `#3´ says that the third argument is delimited by an open brace. This brace is left unread. Such a construction is rare: it occurs only four times in the LaTeX sources, two example will be given later in section 2.10.

Consider the following example: `\def\opt[#1]{}´. If you say `\opt[foo]´ or `\opt[{foo}]´, the argument is `foo´. If you say `\opt[{[foo]}]´, it is `[foo]´. It is important to know that braces are required if you want a closing bracket in the argument. In the case of `\item[{\it foo}]´, the braces are useless; the scope of the \it command is limited to `foo´ because an additional pair of braces is added somewhere in the body of the \item command. The following example is non-trivial:

\def\@car#1#2\@nil{#1}
\def\@cdr#1#2\@nil{#2}
\if b\expandafter\@car\f@series\@nil\boldmath\fi

Both commands \@car and \@cdr read a normal (undelimited) argument, and a second argument delimited by \@nil, and return one of these. These commands are implemented in Tralics in the C++ kernel for efficiency. The third line shows a use of \@car, where the arguments are the expansion of \f@series; the main assumption is that this token list does not contain the \@nil token, which is a reserved command. The caller of the macro must also ensure that the list is not empty, for otherwise the first argument would be be \@nil, and the end of the second argument would never be seen if the \@nil does not appear in the document text. Note that an error is signaled and scanning stops at the first \par token (or empty line) because the command is not outer.

Let´s assume that \f@series expands to a non-empty list, for instance `mc´ (this means that the current font has medium weight and is condensed). Then `\@car md\@nil´ expands to `m´. The third line of our example uses \@car to get the first character of \f@series, and compares it to `b´ (the result is true if the current font is bold, extra bold, bold condensed, etc). This code is used for typesetting the LaTeX2ϵ logo in bold version as LaTeX2ϵ. The commands \if and \expandafter will be explained later. Note that \if fully expands what follows the letter b. This means that you are in trouble if \f@series expands to an empty list, or if the first token is a command whose expansion may cause problem (perhaps because it has delimited arguments and \@car gobbled the delimiter), or is empty, or is a list that starts with the letter b.

The following example is from the TeXbook:

\def\cs AB#1#2C$#3\$ {#3ab#1 c##\x #2}
\cs AB{\Look}C${And\$ }{look}\$ 5

If you feed this to Tralics(note: ), you will get three errors (one because of the `##´, and two undefined commands). In verbose mode, the transcript file of Tralics will contain the following

\cs AB#1#2C$#3\$ ->#3ab#1 c##\x #2
#1<-\Look
#2<-
#3<-{And\$ }{look}

One question is: should arguments be in braces or not? As seen elsewhere, some commands have a special syntax, and cannot be followed by braces (for instance, in the case of `\catcode`\$´ the argument is the backtick followed by the dollar). In a case like $a \over b+c$, there are two arguments, one before and one after the command. An expression like $a\over b\over c$ is a error. The error message says to add some braces, but they are used only for grouping. A similar error message is issued if you say $a^b^c$. But compare `$a^{b^c}d$´ and `$a\over {b\over c}d$´: the translation is a b c d and a b cd. In the case of \sqrt \frac12, braces are inserted by TeX when converting \frac into \over; since Tralics replaces \over by \frac, no such braces are added and an error is signaled because of missing braces.

It is sometimes important to know which braces disappear or remain. As an example, you can say `\def\ap{a´}´ in order to get a ' ; but if you say this `$x_\ap\not=x_{\ap}$´, you get x a ' x a ' . In fact, you cannot say that `\ap´ is the argument of the underscore command; this is because TeX expands everything; in one case, it sees that the underscore is followed by the letter a, in the second case by a brace, hence a delimiter for a math list.

In general, you will be faced with the following problem: you say `\def\foo#1{\xbar#1}´ and `\def\xbar#1{{\itshape #1}}´. Note the double braces: the outer braces delimit the argument (of \def, i.e., the body of \xbar), the inner braces delimit the scope of \itshape. When you say `\foo{12}´ only the first letter is in italics, another level of braces is needed. This is what you can see in the transcript file of Tralics:

\foo #1->\xbar #1
#1<-12
\xbar #1->{\itshape #1}
#1<-1
{begin-group character {}
+stack: level + 3 for brace
{\itshape}
{font change \itshape}
...

In this example, braces are missing in \foo, a remedy is to add a pair of braces in the argument, like `\foo{{12}}´. A comment in the TeX source says: Braces are effectively removed when they surround a single Ord without sub/superscripts, or when they surround an accent that is the nucleus of an Ord atom. This is the case in `{{\tilde x}^2}^3´, hence you get a Double superscript error; in this case adding additional braces has no effect; the only solution consists in adding something in the inner list (for instance a kern of width zero).

It is possible to define commands inside commands. For instance, you can say

\def\foov#1{\def\xbar##1{#1##1###1####1}}

When the scanner reads a token list, it handles `#´ signs (in fact, any character of category 6) in a special manner inside a definition. The token list of the previous line is def foov #23 {1 def xbar #6 112 {1 125 #6 112 #6 125 #6 #6 112 }2 }2. As you can see, there are three possibilities for a sharp sign: before the brace that defines the body, it is #23, and the digit that follows is omitted(note: ), it is 125, 225, in the body when followed by 1, 2, etc(note: ). It is #6 when followed by a sharp sign. Said otherwise, a double sharp sign in a definition is equivalent to a normal one outside. Note the following trick.

\catcode`\^6
\def\foo#1^2{#1^1## #^ ^# ^^}
\show\foo

A quantity like 125 is shown as ^1, because the hat character appears as ^2 (i.e., the token ^25) in the <parameter text> part of the definition. Hence TeX prints \foo=macro: #1^2->^1^1## ^^ ## ^^. On the other hand, Tralics uses a different mechanism for macros: it remembers the number of arguments and the items between them, hence does not make the difference with a macro defined as `\def\xbar^1#2{...}´(note: ). \foo=macro: #1#2->#1#1## ^^ ## ^^.

Assume now that you say `\foov{17}´. The result of the expansion is the token list shown above, with 125 replaced by 112 712. When \xbar is defined the #6 will read the character that follows, in this case 112. The situation is as if you had said `\def\xbar#1{17#1#17##1}´. Evaluating \xbar may signal an error, because of the `##´ (no error is signaled in case the argument of \xbar is `\gee´, a command that ignores its first and third argument). If you call \foo with `25´ instead of `17´ as argument, you will get the following error Illegal parameter number in definition of \xbar(note: ).

2.2. Defining commands in LaTeX

You can say

\newcommand*{\fooi}{foo}
\newcommand*{\fooii}[2]{#2#1}
\newcommand\fooiv[3][bar]{Seen#1#2#3}

The first two lines define the same commands as in the start of section 2.1. It is not possible to define \fooiii. However, you can define \fooiv, a command that takes an optional argument. In fact, you call it like this `\fooiv[X]YZ´; the expansion will be `SeenXYZ´. You can put a pair of braces around the arguments, like `\fooiv[{X}]{Y}{Z}´, the result is the same. Braces are needed for the first argument in case you want a closing bracket in it. If the first argument is `bar´, you can omit the `[bar]´: for this reason, the argument is called optional. In LaTeX, \fooiv expands to \@protected@testopt, which is a command to make \fooiv robust (i.e., in some cases, the test for an optional argument is delayed); it then expands to \\fooiv, which is a command that takes three arguments. In Tralics, no auxiliary command is used. If you say `\show\fooiv´, Tralics will print the following on the transcript file.

\fooiv=opt \long macro: bar#2#3->Seen#1#2#3.

Commands defined by \newcommand are \long unless a star is used (they accept paragraphs as arguments.) The `opt´ before it shows that the command takes an optional argument. We show the value of this argument instead of #1 before the ->. The following is printed by LaTeX

> \fooiv=macro:
->\@protected@testopt \fooiv \\fooiv {bar}.

Since being \long deals with reading parameters, in LaTeX, it is the auxiliary command \\fooiv which is \long. This shows how to ask LaTeX for the meaning of the auxiliary command and its answer:

\expandafter\show\csname\string\fooiv\endcsname
> \\fooiv=\long macro:
[#1]#2#3->Seen#1#2#3.

The philosophy of LaTeX is that a user should not randomly redefine commands. For this reason, you must use \newcommand (for an undefined command) or \renewcommand (for overwriting an existing command). In the same fashion, \renewenvironment is used to redefine an environment; we shall see later that an environment `foo´ is defined by two commands: \foo and \endfoo. You should never define \endfoo. This explains error messages of the form: LaTeX Error: Command \endfoobar already defined. Or name \end... illegal, see p.192 of the manual. In Tralics, we do not check that the command starts with `end´; the error message is \newcommand: cannot define \foo; token is already defined. You can use \providecommand, the syntax is the same. In this case, the definition is silently ignored if the command already exists. You can use \DeclareRobustCommand, this is defined by Tralics to be the same as \providecommand although the LaTeX behavior is different. You can say `\global\def\foo{}´, this is the same as `\gdef\foo{}, it defines \foo globally. You cannot use the \global prefix for LaTeX commands. You can use \CheckCommand. This is like \newcommand, but it does not define the command; instead it defines a dummy command, then checks that the dummy command has the same definition as the real one and produces a warning in case of mismatch; this feature can be used before overwriting a command.

It is now time to explain that braces have two different purposes: as a delimiter for an argument list, and also for grouping: in the same fashion as the formula z(x+y) can be considered as z applied to x+y or the product of z and x+y. In the case of `\textit{12}´, the braces delimit the arguments, in the case of `{\itshsape 12}´, the braces are used for grouping. In both cases, all characters up to the closing brace are in italics, but this property depends on the semantics of the operator, not the syntax. There is a big difference between these two use of braces: the tokenizer produces token lists that are always balanced (there are as many opening delimiters as closing delimiters, where delimiters are characters of category code 1 and 2). On the other hand, if you say `\let\bgroup={´, the \bgroup has the same meaning as an opening brace, hence triggers the start of a new group; but it is not an explicit character (such things are called “implicit characters” in the TeXbook). When you say `\hbox...´ the opening brace can be implicit or explicit (in this case, braces are used both as delimiters and for grouping). Groups can also be defined by math shift characters (if you like empty lines in the source of a math formula, you can say `$\let\par\relax ...$´), or implicitly for a cell in a table, or via \left and \right in a math formula, or via \begingroup and \endgroup (they define a “semi simple group”).

One difference between plain TeX and LaTeX is the existence of named groups: instead of saying `\beginfoo´ and `\endfoo´, you say `\begin{foo}´ and `\end{foo}´. This is interpreted by LaTeX as

  1. When \begin{foo} is seen,

    1. a test is made to see if `\foo´ exists: if it does not exist, an error is signaled and steps (1.3) and (1.4) are skipped (via a call to \expandafter):

    2. the command \begingroup is executed (with space hacking);

    3. the name `foo´ is stored in \@currenvir;

    4. the command \foo is executed.

  2. When \end{foo} is seen,

    1. the command \endfoo is executed;

    2. the name `foo´ is compared with \@currenvir, an error is signaled in case of mismatch;

    3. the command \endgroup is executed (with more space hacking).

This mechanism is not symmetric. It is implemented in Tralics in a similar manner (but there are some differences that can be analyzed by a malicious user). The first remark is the following: on entry, you may get a message that says LaTeX Error: Environment unknown undefined, on exit you would get LaTeX Error: \begin{document} ended by \end{unknown}. The trick is that the \endfoo token (created by \csname) is never undefined (its default action is \relax). One important point is that the command used in step (1.4) is \foo, not \beginfoo. In [6, example 7-3-1], there is an example of `bfseries´ as an environment; there is no command \endbfseries. Note that in step (1.4), the token that comes after \foo is the token after `\begin{foo}´, and this means that \foo can grab its arguments; on the other hand the token after \endfoo in step (2.1) is the start of the sequence that checks the environment name: thus \endfoo cannot read its argument (we shall see in a minute why steps (2.1) and (2.2) cannot be swapped). In the current version of Tralics, the “space hacking” is not implemented; we shall not discuss it here.

There are some tokens whose name start with `end´, you should no use these as environments. Consider \begingroup and \endgroup, the commands explained above; consider \input, \endinput, these are TeX primitives for inputting from a file; consider \beginL, \endL, \beginR, \endR, the ϵ-TeX extensions for left-to-right or right-to-left writing; consider \citation and \endcitation, these are Tralics commands for the bibliography; the command \endsec indicates the end of a section; the \endlinechar is a reference to an integer register that contains the character to be added at the end of each line. Commands \endgraf and \endline are aliases to \par and \cr.

This is how you can define new environments:

\newenvironment{x}[2]{#1BY\begin{y}#2AY} {by\end{y}ay}
\newenvironment{y}{Z}{z}
\begin{x}a b c \end{x}

This typesets as aBYZbAY c byzay. The \begin part reads two arguments. The \end part takes no argument; it could use the first argument of the \begin, provided that this one saves it in a command. In verbose mode, the following is printed by Tralics in the transcript file. We have removed all lines with `Character sequence´ and `Text´.

1 [185] \begin{x}a b c \end{x}
2 {\begin}
3 {\begin x}
4 +stack: level + 3 for environment entered on line 185
5 \x #1#2->#1BY\begin {y}#2AY
6 #1<-a
7 #2<-b
8 {\begin}
9 {\begin y}
10 +stack: level + 4 for environment entered on line 185
11 \y ->Z
12 {\end}
13 {\end x}
14 \endx ->by\end {y}ay
15 {\end}
16 {\end y}
17 \endy ->z
18 {\endgroup (for env)}
19 +stack: ending environment y; resuming x.
20 +stack: level - 4 for environment from line 185
21 {\endgroup (for env)}
22 +stack: ending environment x; resuming document.
23 +stack: level - 3 for environment from line 185

At lines 4, 10, 20 and 23, you can see that the current “level” changes (this is what TeX calls the “semantic level”). The default level is level one, our example was done at level two, the first environment is at level three, the second at level four(note: ). When you see `level + 4´, it is because the level has just incremented; if you see `level - 4´ it means that the level will decrease. At lines 18 and 21, you see that Tralics uses a special `\endgroup´ token. Look closely at lines 13 and 19: when Tralics sees `\end{x}´, the current environment is `y´, it is only after evaluation of \endx that the environment is `x´ again; this example shows that steps (2.1) and (2.2) cannot be swapped. In Tralics the name of the environment cannot be modified by the user.

Because of the \begingroup command, everything, until the \endgroup, is local to this group; in particular \@currenvir will be restored. If you say something like

\begin{zfoo}\renewcommand\endzfoo{A}\end{zfoo}

the command associated to \end{zfoo} is locally redefined. In some cases, this is a big mistake: in Tralics, the start command can assume that the corresponding end command is executed or an error is signaled. In fact, the meaning of \endzfoo is stored on a special stack, and restored by \end{zfoo}. There is a big hack in LaTeX (and also in Tralics): since no text should follow the end of the document, there is no need to store on the stack every definition given between the start and end of the document; thus \document executes a \endgroup; logically, \enddocument should insert a \begingroup token; in LaTeX, this is not needed because step (2.3) is never executed. In Tralics we re-insert a \begin, because we have to typeset the bibliography. (as a consequence, the start-line in the trace is the line that contains \end). Moreover, action cannot be completely trivial, because we have to re-insert all tokens saved by \AtEndDocument. We show here the transcript file, assuming that only one token has been saved, namely \empty. You can see the stack increase and decrease; you can see the \endinput that closes the current file; you can also see a second \enddocument command whose action is to pop the XML stack; it is marked `pop (module)´ for historical reasons.

[31] \end{document}
{\end}
{\end document}
+stack: level + 2 for environment entered on line 31
{\enddocument}
{\endallinput}
\empty ->
{\enddocument}
{Pop (module) 2: document_v div0_v div1_v}
{\endgroup (for env)}
+stack: ending environment document; resuming document.
+stack: level - 2 for environment from line 31
++ Input stack empty at end of file

The last line of the transcript file shown above says that the current file was not inputted by another one. What happens if a file foo.tex contains \input tralics-rr, followed by some junk? Well, the purpose of the pseudo command \endallinput is to forget about everything. The transcript file would contain

++ End of file tralics-rr.tex
++ cur_file_pos restored to 0
++ Input stack -- 1 tralics-rr.tex
++ Input stack empty at end of file

Clearly, you cannot use a document environment in a document; if you try, LaTeX complains with LaTeX Error: Can be used only in preamble (the preamble is everything before \begin{document}). The error message of Tralics is a bit more explicit: Two environments named document. If you put \begin{it} before \begin{document}, LaTeX does not complain. The trouble is at the end: you will get an error of the form LaTeX Error: \begin{it} on input line 9 ended by \end{document}, followed by a TeX warning : (\end occurred inside a group at level 1). In Tralics, an error is signaled at the start: \begin{document} not at level 0. On page 6.5, you see statistics of the form `Save stack +1582 -1582´; this means that the semantic stack pointer has increased 1582 times, it has decreased the same number of times, so that the end of the document has been seen at level zero, no warning is issued in the case the two numbers are not the same.

The package checkend contains a magic command whose effect is to unwind the stack, signaling an error if unclosed items are seen. This command should only be used at end of document, in the end-document hook. The result of using the package produces a result like the following:

Error signaled at line 687 of file testkeyval.tex:
Non-closed \begingroup started at line 683.
Non-closed brace started at line 437.
Non-closed environment `it' started at line 213.

2.3. Some small examples

Remember that \foo and \; are two commands who differ only in the following behavior: when the tokenizer sees a backslash followed by a semi colon (whose category code is not letter), it constructs a command whose name is formed by that character (and sets the internal state to a mode in which spaces are not ignored). On the other hand, if the backslash is followed by a letter, all letters are read (and the state is set such that following spaces will be ignored). By space, we mean here every character that has the category code of a space. A space after \verb is never ignored, but it is unwise to use this space as delimiter. In the case of \foo, the tokenizer allocates a slot on the hash table (unless \foo already exists). The possibility to change category codes dynamically is interesting (however, the implementation of \verb in Tralics uses no category code changes, and is more efficient). The two commands \makeatletter and \makeatother change the category codes of the at sign character @, to letter and other. For instance

\makeatletter
  \def\foo@val{}
  \def\foo#1{\def\foo@val{#1}\check@foo}
  \def\usefoo{\foo@val}
\makeatother

In this example, we have two user commands: \foo that defines a variable, and \usefoo that uses it. The variable \foo@val has a reserved name; there is a command \check@foo that makes sure that the argument is correct. The default category code of @ is 12; in most of the examples, we shall assume that it is 11, because these examples come from the LaTeX kernel or style files where the default category code is 11.

As explained above, `\catcode`\$=3´ changes the category code of the dollar sign. What follows has to be a character code (a number between 0 and 255) followed by an optional equals sign, followed by a valid category code (an integer between 0 and 15). Assume that you say \def\A{25}, followed by `\catcode\A7.´. In the case where standard category codes are in effect this is tokenised as catcode A 712 .12. But when a number is read, all tokens are expanded, until the end of the number is found (in the case where the number is formed by digits, one space character after the number will be read, if possible). In this case, TeX reads the digits 2, 5 and 7. It stops reading at the dot. This is an error (signaled by Tralics as Bad character code replaced by 0: 257). Then TeX reads an optional equals sign (there is none) and an integer (there is none). Hence you get a second error (Missing number, treated as zero). The result is that you have changed the category code of the null character to zero (like backslash). Since version 2.9, Tralics accepts 16bit characters, so that the number 257 is valid, and you changed the category code of the letter `latin small letter a with caron´ to zero.

If you want to put 7 in the category code of the character defined by the command \A, you should say `\catcode\A=7~´.(note: ) It is possible to make \A a reference to the character number 25, by using \chardef. Thus you can say `\chardef\A25~´ and `\catcode\A7~´. Note that, in the context of routines like scanint, a character number is a valid number; so that \A can be used as the number 25, wherever a number is required. In the sources of LaTeX you can see `\chardef\active=13´. You will also see `\mathchardef\@cclvi=256´; there is no difference between \chardef and \mathchardef, except that a character is in the range 0-255, while a math char can take larger values (less than 2 15 ). You can use \countdef\B26 (this will make \B as a reference to count register number 26), \dimendef\C27 (this will make \C as a reference to dimension register number 27), \skipdef\D28 (this will make \D as a reference to skip register number 28), \muskipdef\E29 (this will make \E as a reference to muskip register number 29), and \toksdef\F30 (this will make \F as a reference to token register number 30). There is no `\boxdef´. The reason is that, if you want to copy the value of counter 1 into counter 0, you say \count0=\count1. If you say \count@=\B this will put the value of the counter 26 into \count@ (this is the counter 255). However, you say \setbox0=\copy1 if you want to copy the content of box 1 into box 0: the syntax is not the same. Note that \setbox0=\box1 copies and clears the box number one. When you use a command like \chardef, a line will be added to the transcript file, even in non-verbose mode, see section 6.13.

Commands can be defined via `\let´. You say \let\A=\B, where \A is a token that can be defined (active characters or commands; TeX does not care if the token is defined or not). It is followed by <equals><one optional space>. This means that TeX reads all space tokens; if the first unread token is an equals sign, it is read as well as the next token, provided that it is a space. If the equals sign is followed by two space tokens, only one is read. Instead of \B, you can put any token. After that, the current meaning of \A will be the current meaning of \B. For instance, if you say \let\foo\bar\show\foo you will get \foo=macro:->\mathaccent "7016\relax. In plain TeX, you would see a space instead of \relax (both a space and a \relax indicate the end of the number). In Tralics, you would see \foo=\bar, this is because \bar is a primitive, instead of a user defined command. If you say \let\A=+, then \A will behave like a + character (of category 12). In fact, this is called an implicit character, and sometimes an explicit character is required. For instance in the case \parindent=-3.4pt, the minus sign, the digits, the dot, and the two letters pt must be explicit characters. However, after

\let\bgroup={  \let\egroup=} \let\sp=^ \let\sb=_

there is no difference between $x\sp\bgroup a\sb b\egroup$ and $x^{a_b}$. The assignments shown here are made by Tralics when bootstrapping, and the command so defined should be considered primitives. A token list has to be well balanced against explicit braces. For instance

\def\foo{{\catcode`}=0\egroup}

satisfies the requirements. The body of the command consists in {1 catcode `12 }2 =12 012 egroup. If you evaluate \foo, the \catcode command will read the four tokens that follow; it will modify the category code of the opening brace. All this happens inside a group opened by {1 and closed by egroup, so that this is harmless. One use of \let is the following:

\def\fooA{a very long command}
\def\fooB{another very long command}
\def\xbar#1{\ifx 0#1\let\foo\fooA \else \let\foo\fooB\fi}

Here we use the fact that \let just moves a pointer.(note: ) This is faster than copying a list. In particular, consider

\def\xbar#1{\ifx 0#1\fooA \else \fooB\fi}
\def\xbar#1{\ifx 0#1\let\foo\fooA \else \let\foo\fooB\fi\foo}

The first line executes conditionally one of \fooA and \fooB. However, this command cannot read an argument (because \fooA is followed by \else and \fooB by \fi). In the second case, we define \foo conditionally, and it can read its arguments without problem.

You can use the following construct

\def\addtofoo#1{\let\oldfoo\foo\def\foo{#1\oldfoo}}
% example of use
\def\foo{A}\foo
\addtofoo{B}\foo

This typesets as ABA. Beware: the \addtofoo command can be used only once (the old value of \oldfoo has to be saved...). We shall see later how to replace in the definition above the \oldfoo by its value, using either tokens lists or \edef, using a method where \oldfoo is a temporary. This is another example:

\def\double#1#2{\let#1#2\def#2{#1#1}}
% example
\def\B{\C}\def\C{to}\double\tmp\B

Here `\B´ typesets as `toto´. In fact \B is defined as `\tmp\tmp´, where \tmp is the old definition of \B, namely a command that expands to `\C´. It you say \def\C{ti}\B, you will get `titi´. If in \double the \let is replaced by a \def as \def#1{#2}, the expansion of \tmp would have been \B, and \B would have been the same as \B\B. You see the problem? This could provoke a stack overflow, a parameter stack overflow, or even a program crash.

Let´s mention the existence of \futurelet\A\B\C. It is the same as \let\A\C\B\C. The usefulness of such a construct will be explained later.

You can say \expandafter\A\B. In such a case, TeX reads the first token, saves it somewhere, calls expand if possible, re-inserts the saved token. Nothing special happens if the second token (here \B) cannot be expanded, because it is a non-active character, or a command like \par or \relax. But assume that \A is a command that uses one argument (for instance \textit) and \B expands to `foo´. If you use \expandafter, only the first letter will be in italics. Assume that \foo expands to a dollar sign. Then $\foo is an empty math formula because \foo is not expanded, but \expandafter$\foo.$$ is a display math formula with a dot. The main reason why tokens are not expanded after a dollar sign (when TeX looks for an other dollar sign) is that a test $\ifmmode true\fi$ should evaluate to true. You can use \expandafter if you want the test to be executed outside math mode. Note: if a table contains a template of the form `$#$´, if the cell starts with \ifmmode, then the test is expanded (i.e. evaluated) before math mode is entered, because TeX is looking for an \omit token. As a consequence you should always put `\relax´ before a test (this is not needed if a command is made “Robust”).

Look carefully at the following lines:

1 \def\toto{\titi!}\def\titi{\tata}\def\tata{\tutu}
2 \expandafter\expandafter\expandafter\def\toto{5}
3 \let\E\expandafter \E\E\E\def\toto{6}
4 \def\E{\expandafter} \E\E\E\def\toto{7}
5 \expandafter\def\toto{8}

On the first line we define three commands \toto, \titi and \tata. As we shall see, lines 2, 3 and 4 do not change the meaning of \toto, so that, on line 5, the expansion of `\toto´ is `\titi!´. In this case, the effect of the \expandafter is to replace `\toto´ by `\titi!´. Hence, line 5 defines a macro \titi, that has to be followed by an exclamation point, takes no argument, and expands to 8. Consider now line 2. The first \expandafter puts apart the \expandafter token; it expands the next token, which is \expandafter, and the expansion of this is: read the token that follows (here `\def´), and expand the token that follows. This is `\toto´, that expands to `\titi!´. If we pop back the two tokens, line 2 is equivalent to `\expandafter\def\titi!{5}´. This looks like line 5, so that it is the same as `\def\tata!{5}´. There is no difference between lines 2 and 3: the \E command behaves exactly like \expandafter. Consider now line 4. What TeX does is expand the first token. It is \E, it expands to `\expandafter´. Since the token can be expanded, it will. Thus TeX reads and remembers the token that follows. It expands the next token (the third `\E´). Its expansion is `\expandafter´. Hence, line 4 is equivalent to `\E\expandafter\def\toto{7}´. Now, the \E in this list has as effect to try to expand the second token; it is \def, which cannot be expanded. Hence this `\E´ is useless. Line 4 is equivalent to `\expandafter\def\toto{7}´. And this defines \titi. We give here the trace of Tralics (it is a bit more complete then the trace of TeX):

\E ->\expandafter
{\expandafter \E \E}
\E ->\expandafter
\E ->\expandafter
{\expandafter \expandafter \def}
{\expandafter \def \toto}
\toto ->\titi !
{\def}
{\def \titi !->7}

A question is : how many commands with two characters can be defined in Tralics? The answer is 255 squared (all characters but the null character are allowed(note: )). Of course, if you say `\def\++{}´, this defines the `\+´ command not the `\++´. You could imagine to change category codes (but, in a construction like `\def\{}{}, it is impossible to give a different role to the first and second opening brace). The solution is given by \csname, you can use it like this `\csname1+1=2\endcsname´. Note that this typesets nothing: when \csname manufactures a new control sequence name, it defines it as being \relax (the control sequence will exist, until the end of the job). You can hide the \csname command, like this

\def\nameuse#1{\csname #1\endcsname}
\nameuse{1+1=2}

If you want to define such a beast, you must use \expandafter.

\def\namedef#1{\expandafter\def\csname #1\endcsname}
\namedef{1+1=2}{true}

The two commands \@namedef and \@nameuse are defined by LaTeX and Tralics like \namedef and \nameuse.

You can also say \namedef{++}#1{#1+#1} followed by \nameuse{++}{3}. This should give 3+3. If you want a macro named \{}, you can say \nameuse{\string\{\string\}}, provided that \escapechar=-1. If you do not like this setting of \escapechar, you can define a command, say \Lbra, that expands to {12 (an inactive opening brace character) using whatever method seems best. For instance

{\escapechar=-1 \xdef\Lbra{\string\{}\xdef\Rbra{\string\}}}
\namedef{\Lbra\Rbra}{Hey}

We explained above what happens when three \expandafter come in a row. Thus, it should not surprise you that the following command defines \foo.

\expandafter\expandafter\expandafter\def\nameuse{foo}{12}

A more realistic example of \csname is

\def\allocate#1{....}
\def\newcount#1{\allocate{ctr}\countdef#1\allocationnumber}
\def\newcounter#1{\expandafter\newcount\csname c@#1\endcsname}

There are ten such commands in LaTeX, \newcount, \newtoks, \newbox, \newdimen, \newskip, \newmuskip, \newread, \newwrite, \newlanguage are implemented in Tralics. The equivalent of \allocate takes as argument a type (for counters, dimensions, skip registers, muskip registers, box registers, token registers, input registers, output register, math families, language codes, insertions, etc) and allocates a unique number depending on the type, and puts it in \allocationnumber. Count registers between 10 and 19 are used for this purpose, and the user should not modify them. Command \new@mathgroup is not implemented because math groups are unused. Note that \newsavebox and \newdimen are the same as \newbox and \newskip since Tralics does not check redefinition of the command; the command \newinsert is not implemented (this requires a box register, a count register, a dimen register and a skip register; each unprocessed float in LaTeX uses a insert, this may trigger a too many unprocessed floats error). The command \newhelp is not implemented in Tralics, it allocates no counter.

For instance, if you say \newcount\Foo, the allocated number could be 110, if you say \newskip\Bar, the number could be 46. In the first case, the result is as if you had said \countdef\Foo110. In the case of \newcounter{foo}, the result is as \newcount\c@foo111. Note that there are only 256 count registers available in TeX(note: ). You can use registers zero to nine as scratch registers (Do not forget that \count0 contains the current page number), LaTeX uses registers 10 to 21 for its allocation mechanism. In the current version, the first free counter is 79. Some other counters are allocated by the class, and the package (in the transcript file, one line is printed for every call to \allocate, for instance: \c@chapter=\count80; in Tralics, the line looks like {\countdef \c@foo=\count43}).

A very important point is that all tokens between \csname and \endcsname are fully expanded. It is an error if a non-character token remains. Thus it is important to know which commands are expanded, and those that cannot be expanded. The exact rules are in the TeXbook, chapter 20. As a rule of thumb, commands that do no typesetting and modify no internal table can be expanded. More precisely: user defined commands, conversions like \string, \number, conditionals like \fi, marks, and some special commands like \csname, \expandafter, \the can be expanded. A construction like \csname\char`A\endcsname is invalid.

If you say \noexpand\foo, the result is \foo, unexpanded. Example:

1 \def\FOO{12}
2 %\csname\noexpand\FOO\endcsname  %bad
3 \edef\xbar{\noexpand\FOO}
4 \noexpand\FOO
5 \expandafter\textit\FOO
6 \expandafter\textit\noexpand\FOO
7 \count0=1\FOO
8 \count0=1\noexpand\FOO

Line two is an error: the no-expanded \FOO is not a character. On line 3, the body of \xbar is `\FOO´, it will be expanded later. The translation of line 4 is empty (the command \FOO is temporarily seen as \relax, and \relax does nothing). Because of the \expandafter, the argument of \textit on line 5 is 1, on line 6 it is 12. On line 7, 112 is put in \count0, because \FOO is expanded. On line 8, 1 is put in the register, and 12 is typeset. On lines 8 and 6, \FOO is expanded twice, the first expansion being inhibited by the \noexpand.

Some quantities are never expanded, for instance \lowercase (this is black magic), \def (more generally all assignments), \relax (it does nothing, but stops scanning integers, dimensions, glue, etc), \hbox, \par(note: ), \left, etc. There are cases when an expandable token is not expanded: ten cases in TeX, and four additional cases in ϵ-TeX, these are described in section 6.12. Be careful with constructs like \csnameé\endcsname: LaTeX may signal an error involving \unhbox.

A command can be defined via \edef instead of \def (\xdef is the same as \edef, with an implicit \global prefix). All tokens, unless defined with \protected, in the body of the definition are expanded. Example:

\def\A{\B\C} \def\C{1}
\def\Bgroup{{\iffalse}\fi}\def\Egroup{\iffalse{\fi}}
{\let\B\relax \global\edef\D\bgroup{\A\noexpand\C\egroup}}
{\let\B\relax \global\edef\E\Bgroup{\A\noexpand\C\Egroup}

In this example, we consider two groups, that define (locally) a command \B and (globally) two commands \D and \E. The difference between these two commands is that \bgroup is an implicit character: when evaluated, it behaves like an opening brace, but it cannot be expanded. On the other hand, \Bgroup expands to an open brace. The \edef expands tokens following an explicit opening brace. It stops reading after having found an explicit closing brace (resulting from the expansion of \Egroup, not \egroup). The expansion of `\A´ is `\B\C´, this is expanded again. Since \B is relax, it cannot be expanded, and is left unchanged. The expansion of `\C´ is `1´, so that the full expansion of `\A´ is `\B1´. The expansion of `\noexpand\C´ is `\C´. Thus, the example is equivalent to

\global\def\D\bgroup{\B1\C\egroup}
\global\E\Bgroup{\B1\C}

You can put three \noexpand in a row followed by some token X. After the first expansion, the result is \noexpand followed by X, after the second expansion, the result is X. In the example that follows, the value of \B is \xbar.

\def\xbar{xbar}
\edef\A{\noexpand\noexpand\noexpand\xbar}
\edef\B{\A}

Consider a realistic example like this

\def\add#1#2{\edef#1{#1\do{#2}}}
\def\cons#1#2{\begingroup\let\@elt\relax\xdef#1{#1\@elt #2}\endgroup}

We can say something like

\def\A{}\def\B{}  %init
\let\do\relax% just in case
\add\A x, \add\A y, \add\A z,
\cons\B{ab}, \cons\B{cd}, \cons\B{ef}.
\show\A\show\B

This gives two ways to add some tokens to a list. Because both commands use \edef, full expansion is in use; you have to be very careful if the tokens contain macros that can be expanded. For the case of \add, we assume that \do does nothing; for the case of \cons, the command resets \@elt to \relax. The body of \A will be \do{x}\do{y}\do{z} and the body of \B will be \@elt ab\@elt cd\@elt ef. Note the absence of braces: if you really need them, you should add them to the argument of the \cons command. The built-in command \@cons

The major problem with \edef is that it is not aware of assignments. Assume that \def\@A\B{}, and \def\C{B \let\@A\D}, \def\E{\C} have been somehow evaluated. Consider now an \edef containing \E. This implies expansion of \C, hence of `\let\@A\D´. The \let command cannot be expanded. Hence \@A is expanded, and you get the following error: Use of \@A doesn´t match its definition from inside \C. You have never heard of this command \@A, and never used \C! For this reason some commands are made robust: for instance \hspace expands to `\protect\hspace ´ (the second command here has a space at the end), and \protect is defined to be \relax, or \noexpand, and sometimes \string. This mechanism works only if you use \protected@edef instead of \edef. (Note: \protect behaves like \string inside \protected@write, which is a variant of \write).

2.4. Variables in TeX

By variable, we mean everything that the user can modify or watch changing. For instance, the current hash table usage is not a variable (it varies, of course, but the value is available only at the end of the run, in the transcript file). The current vertical list is updated whenever a paragraph is split into lines; you cannot access this list, however the \output routine gets the part of it that should be typeset on the current page in the box register 255. There are general purpose variables, and specialised ones: for instance \spacefactor makes sense only in horizontal mode, and the height of the material on current page (\pagetotal) can be used only between paragraphs (in fact, it is updated by TeX whenever a line is added to the page; you can consult, and even modify, this value at any time). There are variables that you cannot modify (the version number, for instance) or only once (the magnification), or in the preamble (i.e., LaTeX reads some variables at begin-document, changes done later to these variables are ignored).

Variables can be classified into two categories depending on their use: in some cases you need to put a prefix before \foo if you want to use it, in other cases the prefix is required for modification. For instance, if \foo is a user-defined command, you say \let\foo, or \def\foo, if you want to change the value, and simply \foo if you want to use it. In the same fashion \font\tenrm defines a font, and \tenrm is a use. On the other hand, if you say \pageno=3, this will set the current page number to 3 (this is plain TeX syntax, the LaTeX syntax will be explained later). If you say something like \hskip-\fontdimen2\font, the \hskip command is a prefix that says that the variable that follows will be used. In this case, this is some dimension from a font. Note that \fontdimen is a prefix so that \font does not define a new font, but refers to the current font. The meaning of the above piece of code is: insert horizontal space, whose amount is the opposite of the second parameter of the current font (i.e., normal interword space).

According to the TeXbook, a <font> can be a command like \tenrm defined by \font \tenrm =somefont, of the null font \nullfont, or the current font \font, or a family member (\textfont, \scriptfont, or \scriptscriptfont, followed by a 4bit integer). In the case of \hyphenchar or \skewchar, a <font> follows the command. This gives a reference to an integer, the hyphenchar or skewchar of the font (if this integer is not a valid character, the font has no hyphenchar or skewchar). In the case of \fontdimen, there is an integer P, a font, and this defines a reference to a dimension. The integer P must be positive and not greater than the number of parameters in the font (initialised by TeX to the number of parameters in the font metric file, 7 for a normal font, 13 for math extension, 22 for math symbols, see TeXbook, appendix F). You can get an error: Font somefont has only 7 fontdimen parameters. In Tralics, the value is zero if P is out-of-range. In TeX, the last loaded font table can be dynamically increased: if you assign a value at position P>M, this will increase M. In Tralics, this is possible for all fonts, if P<10 5 .

The value of a variable can be

You can say `\afterassignment\foo\count0=3´; in this case, the command \foo is pushed on a special stack, and popped after assignment is complete. There is only room for one token on this special stack. For instance, if you write the following:

\def\fooA{\relax}\def\fooB{\relax}\def\fooC{\relax}\def\fooD{\relax}
\afterassignment \fooA\afterassignment\fooB
\fooC\count0=1\fooD

the transcript file of Tralics will contain (in verbose mode)

[9] \afterassignment \fooA\afterassignment\fooB
{\afterassignment}
{\afterassignment: \fooA}
{\afterassignment}
{\afterassignment: \fooB}

At this point, the after assignment stack contains \fooB. The order of evaluation is now the following: \fooD is expanded; this gives \relax, which terminates scanning of the number; it will be read again, after evaluation of \fooB:

[10] \fooC\count0=1\fooD
\fooC ->\relax
{\relax}
{\count}
+scanint for \count->0
\fooD ->\relax
+scanint for \count->1
{after assignment: \fooB}
\fooB ->\relax
{\relax}
{\relax}

You can use the \showbox command for displaying the content of a box. This is a little example. It uses \everyhbox and \afterassignment. Note the order in which these tokens are inserted.

\everyhbox{3}
\def\foo{12}
\afterassignment\foo\setbox0=\hbox{4}
\showbox0

This is what TeX prints in the log file:

> \box0=
\hbox(6.4151+0.0)x19.99512
.\T1/cmr/m/n/10 1
.\T1/cmr/m/n/10 2
.\T1/cmr/m/n/10 3
.\T1/cmr/m/n/10 4

The first line of the trace starts with \hbox or \vbox, followed by the dimensions (height, depth, width; the unit is `pt´ by default), optionally followed by `shifted 27.1´ if the the box is shifted, and by `glue set 0.19´ if the glue has to be stretched or shrunk. After that, you will see the content of the box, one line per item (no more than \showboxbreadth lines are printed per box), each item is preceded by a context (a sequence of N dots at depth N, tokens at depth greater than \showboxdepth are not shown). In the box, you can see things like `\penalty -51´ or `\kern 28.45274´ or `\glue 3.0 plus 1.0´ or `\glue(\baselineskip) 2.28015´ (this last glue is inserted automatically by TeX, it knows where it comes from, so that the name can be printed), \special{...}, \write4{\indexentry...}. The interesting point in the last object is that we have a list of tokens that will be evaluated later (when the page is shipped out). Tralics does not put \kern, \penalty, neither \glue in a box. The \special command is not implemented; finally \write is never delayed. In our example, the box contains four items, which are characters (TeX shows a command that contains the name of the font; in our example, the font is something like `ecrm1000´).

In Tralics, you would see the same characters, but no font and no size. On the other hand, you can say something like

\everyxbox{Test}
\setbox0=\xbox{foo}{1\xbox{bar}{2} %
   \AddAttToLast{x}{1}\AddAttToCurrent{y}{2}3}
\showbox0

and you will see

<foo y='2'>Test1<bar x='1'>Test2</bar> 3</foo>

Note the two commands that were used to add attributes to the current XML elements, and the last constructed one. We have added another command, \XMLaddatt that takes as optional argument the id of the element to which the attribute value pair should be added. This is an integer; if omitted, the current element is used. You can use \XMLlastid or \XMLcurrentid (there are references to variables, you must use \the if you want the value). If you want to overwrite an existing attribute pair, you must use a star. The previous example can be written like this:

\everyxbox{Test}
\setbox0=\xbox{foo}{1\xbox{bar}{2} %
   \XMLaddatt[\the\XMLlastid]{x}{1}\XMLaddatt[\the\XMLcurrentid]{y}{22}%
   \XMLaddatt[\the\XMLlastid]{x}{11}\XMLaddatt*{y}{2}3}
\showbox0

If \foo is any command then \show\foo will show its value. Here are some examples

\def\Bar#1#{#1} \show\Bar
\let\foo\par \show\foo
\renewcommand\foo[2][toto]{#1#2} \show\foo
\let\foo=1 \show\foo
\let\foo=_ \show\foo
\let\foo=\undef \show\foo
\show\bgroup

This is what Tralics prints (it differs slightly from the LaTeX output)

\Bar=macro: #1#->#1.
\foo=\par.
\foo=opt \long macro: toto#2->#1#2
\foo=the character 1.
\foo=subscript character _.
\foo=undefined.
\bgroup=begin-group character {.

In the case of a variable, you can say \the\foo, the result is a token list that represents the value of \foo (if \foo is a token list, \the\foo is the value of \foo, otherwise, it is a list of characters). The command \showthe will show the value, i.e. print on the terminal the token list returned by \the. Example

\def\Show#1{\the#1\showthe#1}
\widowpenalty=3 \Show\widowpenalty
\parindent1.5pt \Show\parindent
\leftskip = 1pt plus 2fil minus 4fill \Show\leftskip
\thinmuskip = 3mu plus -2fil minus 4fill \Show\thinmuskip
\count0=17 \Show{\count0}
\dimen0=17pt \Show{\dimen0}
\skip0=17pt plus 1 pt minus 2pt \Show{\skip0}
\muskip0=17mu plus 1 mu minus 2mu \Show{\muskip0}
\Show{\catcode`\A}
\Show{\lccode`\B}
\Show\inputlineno
\font\xa=cmr10 at 11truept
\fontdimen6\xa = 11pt \hyphenchar\xa=`\-
\Show{\fontdimen6\xa}
\Show{\hyphenchar\xa}
\chardef\foo25
\Show\foo
\Show\xa
\toks0={\foo = \foo} \def\foo{foo}
\Show{\toks0}

This is what Tralics prints on the screen.

\show: 3
\show: 1.5pt
\show: 1.0pt plus 2.0fil minus 4.0fill
\show: 3.0mu plus -2.0fil minus 4.0fill
\show: 17
\show: 17.0pt
\show: 17.0pt plus 1.0pt minus 2.0pt
\show: 17.0mu plus 1.0mu minus 2.0mu
\show: 11
\show: 98
\show: 79
\show: 11.0pt
\show: 45
\show: 25
\show: cmr10
\show: \foo= \foo

The typeset result is: 31.5pt0.0pt0.0mu1717.0pt17.0pt plus 1.0pt minus 2.0pt17.0mu plus 1.0mu minus 2.0mu11987911.0pt 45 25cmr10 foo= foo(note: ).

In the case of \the\foo, \showthe\foo, \advance\foo, \multiply\foo, \divide\foo, the token that follows the first command is fully expanded.

2.5. All the variables

All variables (exceptions will be given later) are in the table of equivalents: this table contains the current meaning of quantities that are saved/restored by the grouping mechanism of TeX. In TeX this table is divided into six parts; in Tralics, the layout is slightly different, for instance, because TeX makes a heavy using of glue (each space character produces a glue item), while Tralics ignores them completely. This big table contains the following objects

  1. the current equivalent of single character control sequences (for ~ as well as \~);

  2. the hash table (in Tralics, there are two such tables, if the command \foo produces <bar gee=´true´>, the three strings `bar´, `gee´ and `true´ are in a special table).

  3. all glue parameters.

  4. all quantities that fit on 16 bits.

  5. all integers.

  6. all dimensions.

The glue parameters are the following (unused by Tralics, initialised to 0, unless stated otherwise.

The token parameters are the following (initially empty; unused by Tralics unless stated otherwise):

The integer parameters are the following. These parameters are zero, unless stated otherwise.

The following quantities are read only variables. They are integers, unless stated otherwise.

The counters defined in Tralics are the following. The counters are not used unless specified otherwise, but you can say \renewcommand\thepage{...}, this is not an error.

The dimension parameters are the following:

The registers are the following

Since version 2.9 of Tralics, all characters have 16 bits, so that the number of characters 256 should be replaced by 2 16 . i.e. 65536, in the sizes above. Moreover, the number of other registers (from \count to \box above) has been increased to 4096.

Some quantities are meaningful when TeX makes lines into pages. The dimension \pagegoal contains the current page height (minus the size of all potential insertions). The current page height has a natural value in \pagetotal and a shrink part in \pageshrink, the stretch part is in \pagestretch, its `fil´ part is in \pagefilstretch, its `fill´ part in \pagefillstretch and its `filll´ part in \pagefilllstretch. The depth of the box is a constant dimension, in \pagedepth. Whenever the output routine is called, TeX increases the value of the integer counter \deadcycles; an error is signaled if the value is too big, it is reset to zero by \shipout. In \prevdepth, you can find the depth of the most recent box on the current vertical list, in the integer \prevgraf the number of lines in the most recent paragraph that has been completed or partially completed. Of course, all these value are zero in Tralics.

In plain TeX, you can use \nointerlineskip and \offinterlineskip. These commands change the value of \prevdepth. They are ignored by Tralics.

2.6. Using the variables

There are three routines defined in Tralics, named scanint, scandimen and scanglue that read a integer, a dimension and glue. Assume that \count0 is 1, \parindent is 3pt, and you say \skip\count0=2pt plus \parindent\relax. The transcript file of Tralics will contain

[346] \skip\count0=2pt plus \parindent \relax
{\skip}
+scanint for \count->0
+scanint for \skip->1
+scanint for \skip->2
+scandimen for \skip->2.0pt
+scandimen for \skip->3.0pt
{scanglue 2.0pt plus 3.0pt\relax }
{changing \skip1=0.0pt into \skip1=2.0pt plus 3.0pt}
{\relax}

The exact rules will be given later. The following happens here: After \skip there is an integer, an optional equals sign, then glue. After \count there has to be an integer. Thus, scanint reads an integer for \count, and an other one for \skip. A glue item is formed of a dimension (the natural width), optionally followed by `plus´ and a dimension (the stretch part), optionally followed by `minus´ and a dimension (the shrink part). In this case, there is no stretch part, because of \relax. The second dimension comes from the variable \parindent; the first dimension is explicit: the integer part of the dimension is read by scanint.

An integer can be explicit or implicit: an implicit integer comes from a command (it can be a variable like \date, or a constant like \active). In all other cases, the number can be followed by one optional space. In general, the number will be given as a non-empty sequence of digits, like 01239; you can specify digits in base 16 as “FF, this is 255, in this case, letters between A and F (uppercase, category 11 or 12) are allowed. You can specify digits in base 8 as ´177, this is 127. You can also specify a digit as a character: `A is 65. You can say `\A, this is also 65; note that a backslash is needed in cases like `\%. Only one letter is allowed in the command name, digits and quotes must have category 12.

An integer or a dimension can be preceded by a sign. This is a sequence of arbitrary length, formed of spaces or +12 or -12 signs. If the number of minus signs is odd, this changes the sign of the result. Hence if you say \count0=+-+´77 and \count1=-\count0, this will put 63 in \count1.

A dimension can be implicit or explicit. You can say \count0=\dimen0: in this case the value of the dimension in sp units is put in the count register. You can say \dimen0=\skip0: the shrink and stretch part of the glue is ignored. You can also say \count0=\skip0 (guess what happens). It is not possible to convert (directly) an integer to a dimension or glue. An explicit dimension is formed of a factor and a unit of measure. The factor can be an integer (hence -´77pt is a valid dimension), or a decimal number (given in base ten, like 1.5, or 1,5). Units can be pt, pc, in, bp, cm, mm, dd, cc, sp. The case is irrelevant: Pt, pt, PT and pT are valid, the category code may be anything (it cannot be active, because everything is fully expanded). Units shown above can be preceded by true (note that Tralics ignores magnification, thus the `true´ prefix). Units can also be em or ex. These values depend on the current font. Tralics always assumes that the font is a ten point one. A unit of measure can also be an integer, a dimension, or glue. For instance \dimen0=1\count0 will multiply the value of \count0 by one. This is the dual to \count0=\dimen0. You can say \parindent=1.2\parindent if you want to increase it by 20%.

A glue is formed of three parts: a dimension, a stretch part, and a shrink part. The stretch part can be a dimension (it can use special units like `fil´, `fill´ and `filll´, these are called infinite, of first, second and third order). You can say \skip0=0pt plus 1fil. For some strange reasons, after fil you can put a second L, and a third one. As is the case with other units like ex or em, the case is irrelevant. Spaces are ignored after the L. Moreover, TeX continues scanning for an L after having found `filll´; if found, it signals the following error: Illegal unit of measure (replaced by filll). In the case of \skip0=2\skip0, the equals sign is followed by a dimension: there is a factor 2, and a unit (the fixed part of \skip0). As a consequence, this multiplies by two the fixed part of the glue, and sets the shrink and stretch part to zero (unless the code above is followed by `plus´ or `minus´).

Note: if you say \chardef\foo=123\foo, then \foo is made equal to 123: the first thing that \chardef does it to make \foo the same as \relax, so that scanning stops after digit 3. On the other hand in the case of \count0=3\ifnum... the conditional is evaluated while reading the number, thus before the assignment is complete. In particular, if the test compares \count0 with something else, the value might be different from three. Assume that \count0 and \count13 contain the value 7. What happens if you say: \count0=2\ifnum\count0=\count13\fi4 ? It will put 2 in \count0 and typeset 4. In fact, after the digit 3 is sensed, the \fi token terminates the \ifnum. It does so by inserting a \relax token, and a second \fi token. The effect of \relax is to finish reading the number. Thus \ifnum can compare the two values. If these two values are different, the expansion of the conditional is empty, and 24 is put in \count0. But the test is true, and TeX reads again the inserted \relax: it has as effect to stop scanning of the number 2. After that the inserted \fi is read. The transcript file of Tralics might look like the following. Since version 2.9, the transcript file contains also assignments. So you can see the order: when the \fi is seen, the last \count, hence the RHS of the equality, is not yet evaluated and a \fi token is inserted, preceded by a \relax token; these are evaluated later; the \relax token is seen by \count, and left unchanged. After that, we have the number 13, hence the value of \count13, hence the truth value of the test. Now, the body of the conditional is read; it consists solely of the \relax. This one is seen by the first \count, that has the value needed by the assignment. After the assignment is complete, the \relax is considered again: it is read, and the inserted \fi is evaluated.

[2677] \count0=2\ifnum\count0=\count13\fi4
{\count}
+scanint for \count->0
+\ifnum989
+scanint for \count->0
+scanint for \ifnum->7
+\fi989
+scanint for \count->13
+scanint for \ifnum->7
+iftest989 true
+scanint for \count->2
{changing \count0=7 into \count0=2}
{\relax}
+\fi989
Character sequence: 4 .

A token list is a like a command without arguments. You can say \foo={ABC} if you want to put something it it, and \the\foo if you want to use the list. The equals sign is optional. You can insert a \relax between the equals sign and the opening brace. In the example that follows, you can see that, after the optional equals sign, you can put as many spaces or \relax tokens as you like; tokens are expanded, as long as no brace is found. The last line of the example shows that the token that follows \the is expanded (if \the itself is expanded). Thus, the last line adds some tokens at the end of the list. Note the space in \A: without it, TeX would see something like \the\toks0\the\toks0, and the second \the is expanded by the scanint routine, so that this inserts in \toks0 the content of \toks01 followed by a sharp sign.

\def\myrelax{ \relax}
\def\A{\toks0 }
\A=\relax\myrelax{1#}
\A=\expandafter{\the\A \the\A}\showthe\toks0

The \showthe command prints `1##\the \A´, but only a single # is in the list.

We have seen on page 2.3 how to use \cons to add some tokens to a command via \edef. The code that follows adds tokens to a list. The command is called \addto@hook in Tralics and is long, but the body is the same.

\def\addtohook#1#2{#1\expandafter{\the#1#2}}
\newtoks\foo
\addtohook{\foo}{\do{A}}\addtohook{\foo}{\do{B}}\addtohook{\foo}{\do{C}}

The command \newtoks defines its argument as a reference to a token register, for instance \toks23. Whenever you use \addtohook with \A as first argument, it is like the assignment \A=\expandafter{\the\A...} shown in the previous example. Other example

  \T\expandafter{\L}
% \xdef\L{...\the\T}
  \xdef\L{\catcode\the\count@=\the\catcode\the\count@\relax\the\T}

Let´s assume that \L is a parameterless command, and \T a reference to a token register. The first line puts the value of \L in \T. The second line explains what we do in the third one. Remember that \xdef expands everything in the body. All tokens are fully expanded (except that the result of \the is not expanded). As a result, this will put some tokens in front of \L. Let´s explain which tokens. We assume that \count@ is a reference to some counter, that the counter contains 65, this is the ASCII code of the upper case letter A, and we assume that the category code is 11. The first token is \catcode, it cannot be expanded, it will be left unchanged. The second token is \the. It can be expanded, the result is the value of the counter, the two characters 65. The equals sign cannot be expanded. Then comes \the; this expands what follows. The \catcode command reads a number. Because of \the, it reads two digits 6 and 5, and looks at the \relax. Note: this \the is useless, this example revealed a bug in Tralics. This is the log of Tralics. The last line indicates the value of \L:

[18]   \xdef\L{\catcode\the\countx=\the\catcode\the\countx\relax\the\T}
{\xdef}
{\the}
{\the \countx}
\the->65.
{\the}
{\the \catcode}
{\the}
{\the \countx}
\the->65.
+scanint for \catcode->65
\the->11.
{\the}
{\the \T}
\the->\catcode 48=12\relax .
{\def \L ->\catcode 65=11\relax \catcode 48=12\relax }

There are some advantages in putting items in a box. For instance, if it takes a long time to translate a piece of text that will be used several times, it can save some time. A second possibility is to create a box in a given context and use it in another one (this can be used for instance to put verbatim material in a section title; not in the toc, because the toc is obtained by reading characters from a file, but the box can be used for page headings). Finally, one can put some text in a box, measure the size of the box, and do some action according to the size of the box; it is not possible to measure a box in Tralics because no typesetting is done. Note that there is a limited number of boxes (there is a limit on the number of token registers, but you can always put your token list in a macro; in the same fashion, it is always possible to store integers and dimensions into token lists, i.e., in commands). Note that, if you want to implement arithmetics on big numbers, if you represent a number x=x k B k as a sequence of commands, try to access to x k via \csname x\the\k\endcsname, and parse this as an integer, then you get something inefficient. It is much more efficient to say \fontdimen\k\x (there is a TeX file by Denis Roegel that computes thousands of digits of π using font tables as auxiliary memory).

2.7. Counters

The most useful registers are counters. Rather than saying `\count16=0´, at the risk of destroying variables used by other packages, you should use named counters, together with an allocation scheme. We have seen that `\newcount\foo´ does that. In LaTeX, we can do more. If you say `\newcounter{foo}[bar]´ then a counter foo is defined that depends on bar. Let´s assume(note: ), for simplicity , that the allocation mechanism allocates count register 17. Then \c@foo is a reference to `\count17´. It is assumed that no package defines a command that starts with c@, or p@ or cl@, so that \c@foo, \cl@foo, and \p@foo are reserved for the counter foo. In LaTeX, there is a command \value that takes one argument and expands to \csname c@#1\endcsname. The same command exists in Tralics, but it signals an error in the case where \c@foo is not a reference to a count register. You can say `\value{foo}=10´, this will put 10 into the counter, you can say `\the\value{foo}´, this will typeset the value of the counter. You should not use this low-level TeX syntax. In fact, if you say `\value{foo}=10\the\value{foo}´ this will put 103 into the counter (assuming that it contained 3). Compare this with \parindent=10\parindent where there is an implicit multiplication.

Assignment should be done via `\setcounter{foo}{10}´. This is the same as `\global\value{foo}=10\relax´ (plus a check that `foo´ is a counter). The \relax has as effect to stop scanning the number. The \global makes the assignment global. In the same fashion, `\addtocounter{foo}{4}´ is the same as `\global\advance\value{foo}4\relax´. You can say something like `\parindent=\value{foo}\xbar´, this puts in \parindent the value of \xbar (let´s assume it is a dimension) multiplied by the the value of the foo counter. If you want to typeset the value of the counter, you say `\number\value{foo}´. You can also use \romannumeral or \Romannumeral (this last command is not defined by TeX) instead of \number (it has to be followed by a number, for instance \value...). The following commands take as argument the name of a counter, and typeset the value: \arabic (it gives 7), \roman (it gives vii), \Roman (it gives VII), \alph (it gives g), \Alph (it gives G), \fnsymbol (it gives **). The following commands: \@arabic, \@roman, \@Roman, \@alph, \@Alph, \@fnsymbol are used internally by LaTeX. They are defined in Tralics for compatibility reasons. Hence `\number\value{foo}´ is the same as `\@arabic\c@foo´ and the same as `\arabic{foo}´; using \arabic is the best choice.

Three operations are defined: \advance that increments a counter (or a dimension, or a glue), \multiply that multiplies it by an integer, and \divide that divides it by an integer. In the case of integer division, TeX divides the absolute values, and adds the required sign to the quotient (the remainder is not computed). The following piece of code puts in \count0 the number of hours and in \count2 the numbers of minutes (quotient of remainder of the division of \time by 60).

\count0\time
\divide\count0 60
\count2=-\count0
\multiply\count2 60
\advance\count2 \time

You can say \newlength\foo. This allocates a new skip register. You can use \setlength and \addtolength, in the same way as \setcounter and \addtocounter. However, assignments are local. Using plain TeX syntax, you can say:

\dimen0=2mm\dimen1=0.2cm
\advance\dimen0 by-\dimen1
\count0=\dimen0

Note that \dimen@, \dimen@i, and \dimen@ii are aliases for \dimen0, \dimen0 and \dimen2, these quantities are defined but not used by the LaTeX kernel (but they are used by packages). All registers with number less than ten can be used freely, others should use the allocation mechanism. Example

\newcounter{foo}\newlength\lenA\newlength\lenB
\setlength{\lenA}{2mm}
\setlength\lenB{0.2cm}
\addtolength\lenA{-\lenB}
\setcounter{foo}{\lenA}

After this operation, the counter foo contains 5. This means that the difference between 2mm and 0.2cm is 5sp (two thousands of a micrometer). Note: Tralics uses exactly the same algorithms as TeX, hence produces the same results.

Appendix A.3.1 of [6] describes the calc (package) package. It allows to write commands like that:

\newcommand{\printtime}{%
   \setcounter{hours}{\time/60}%
   \setcounter{minutes}{\time-(\value{hours}*60)}
   \thehours h \theminutes min}
 \def\today{\ifcase\day\or
      1st\or 2nd\or 3rd\or 4th\or 5th\or
      6th\or 7th\or 8th\or 9th\or 10th\or
      11th\or 12th\or 13th\or 14th\or 15th\or
      16th\or 17th\or 18th\or 19th\or 20th\or
      21st\or 22nd\or 23rd\or 24th\or 25th\or
      26th\or 27th\or 28th\or 29th\or 30th\or
      31st\fi~\ifcase\month\or
      January\or February\or March\or April\or May\or June\or
      July\or August\or September\or October\or November\or
      December\fi\space \number\year}
The time is \printtime, \today.

In this case, the result of Tralics could be: `The time is 16h 37min, 7th December 2004.´

You can do operations on integers like this:

\newcounter{Ac}
\setcounter{Ac}{(1+2)*(3+4)-20}          %% \theAc=1
\addtocounter{Ac}{(1*2)+(3*-4)+(34/7)}   %% \theAc=-5

and on dimensions:

\newlength{\Bc}
\setlength{\Bc}{(1cm+2cm)*(3+4)-200mm}                    %%\the\Bc=28.4526pt
% exact results should be 1.0pt
\setlength\Bc{\the\Bc*\ratio{25.4pt}{722.7pt}}            %%\the\Bc=0.99985pt
\Bc=1in \setlength\Bc{\the\Bc * 100 / 7227}               %%\the\Bc=0.99998pt
\Bc=1in \setlength\Bc{\the\Bc * \real{ 0.01383700013837}} %%\the\Bc=1.00018pt
\Bc=1cm \setlength\Bc{\the\Bc / \real{28.452755}}         %%\the\Bc=0.99985pt
\Bc=1cm \setlength\Bc{\the\Bc * \ratio{254pt}{7227pt}}    %%\the\Bc=0.99985pt
\Bc=1in \setlength\Bc{\the\Bc / \ratio{7227pt}{100pt}}    %%\the\Bc=1.00018pt
\Bc=1IN \setlength\Bc{\the\Bc / \ratio{7227PT}{100pT}}    %%\the\Bc=1.00018pt

In LaTeX, there is a command called \stepcounter. Its effect is to increment a counter, and reset all counters that depend on it (see example below). There is also \refstepcounter whose purpose is to define the current label. This is not implemented in Tralics (see later for how \label works). The idea is that, for a counter `foo´, the printed value of the label is defined by `\p@foo\thefoo´. Here \thefoo is normally `\arabic{foo}´, but the quantity can be redefined. For instance, the book class has \renewcommand \thesection {\thechapter .\@arabic \c@section} (the article class has no chapter, and does not redefine \thesection). Both book and article classes say: \renewcommand\thesubsection{\thesection.\@arabic\c@subsection}.

Here we define some counters, and make them depend on other counters.

\newcounter{toto}          \setcounter{toto}{10}
\newcounter{titi}[toto]    \setcounter{titi}{20}
\newcounter{tata}[titi]    \setcounter{tata}{30}
\newcounter{tutu}[toto]    \setcounter{tutu}{40}

Here we call \stepcounter. The typeset result should be 11101=11101.

\stepcounter{toto} %  kills titi, tutu
\stepcounter{tata} %%% \thetata=31,
\stepcounter{titi} %% \thetata=0 % titi=1
\stepcounter{tutu}
\thetoto\thetiti\thetata\thetutu=11101

The magic is accomplished by the following command:

\def\@addtoreset#1#2{\expandafter\@cons\csname cl@#2\endcsname {{#1}}}

The first argument is the counter to define (for instance `tutu´), and the second argument is the dependent counter (for instance `toto´). The \@cons command is defined like on page 2.3. It modifies the command \cl@toto by adding \@elt{tutu}. If you say \stepcounter{toto}, then LaTeX executes `\let \@elt \@stpelt \csname cl@#1\endcsname´. Here is a part of the transcript file of Tralics that shows what happens (you won´t see the \csname, because characters needed for \c@toto and \cl@toto are read and expanded only once by Tralics.)

[720] \stepcounter{toto}
\stepcounter->\global \advance \c@toto 1\relax {\let \@elt \@stpelt \cl@toto }
{\global}
{\global\advance}
+scanint for \c@toto->1
{globally changing \count45=10 into \count45=11}
{\relax}
{begin-group character {}
+stack: level + 3 for brace entered on line 720
{\let}
{\let \@elt \@stpelt}
{changing \@elt=undefined}
{into \@elt=\@stpelt}
\cl@toto ->\@elt {titi}\@elt {tutu}
\@elt->\global \c@titi 0\relax
{\global}
{\global\c@titi}
+scanint for \c@titi->0
{globally changing \count46=20 into \count46=0}
{\relax}
\@elt->\global \c@tutu 0\relax
{\global}
{\global\c@tutu}
+scanint for \c@tutu->0
{globally changing \count48=40 into \count48=0}
{\relax}
{end-group character }}
+stack: killing \@elt
+stack: level - 3 for brace from line 720
[721] \stepcounter{tata}
\stepcounter->\global \advance \c@tata 1\relax {\let \@elt \@stpelt \cl@tata }
{\global}
{\global\advance}
+scanint for \c@tata->1
{globally changing \count47=30 into \count47=31}
{\relax}
{begin-group character {}
+stack: level + 3 for brace entered on line 721
{\let}
{\let \@elt \@stpelt}
{changing \@elt=undefined}
{into \@elt=\@stpelt}
\cl@tata ->
{end-group character }}
+stack: killing \@elt
+stack: level - 3 for brace from line 721

2.8. Fonts

One of the question we can ask is: what does \it do? As explained above, this is an unofficial command, thus could be implemented to do anything. Let´s assume that it is defined in LaTeX2.09 compatibility mode. It is then possible to explain what happens, but it is harder to explain what Tralics should do. A software like latex2html (that we studied carefully when implementing the first version of Tralics in Perl) uses a lot of energy in order to translate font changes properly. It is however very difficult to tell it that \french is a similar command (in fact, what we wanted is more than just finding the scope of the \french, we also wanted French syntax rules to apply, we wanted dashes instead of bullets in lists, etc.). In this paragraph, we shall explain all the gory details concerning fonts (however, look at [6] for what is in a .fd file).

One big table in TeX is the table of fonts: there are N fonts with N characters in them (currently N=256, and this is a small limit, in Ω, this value is 2 16 ; the dvi format specifies N=2 32 ). A book like [6] uses lots of fonts indirectly, via inclusion of PostScript files. Note that metric files designed for Ω cannot be read by TeX. The hyphenation algorithm considers as a word only sequences of characters from the same font (hence 256 characters per font is a hard limit). A metric file contains all that it needed for TeX to typeset a character; it does not contain glyphs. Essentially, it contains three tables, indicating for each character its height, its depth and its width. There are two other tables, the lig/kern table, and the kern table, that indicate, for instance in the case VA that some negative space should be used to make the characters narrower, and in the case of fi to use a single glyph instead of two. There is another table (useful only for math mode) that explains how to construct, for instance, braces of various sizes. Finally, there are some parameters. One parameter is the design size (the design size of a ten point font is 10pt), other parameters are the slant, the width of a space (this is glue), the two values of ex and em, and extra space. Math fonts have extra parameters, see [4, appendix G]. A font has two integer parameters: hyphen char, and skew char. These values are not in the metric file: when the font is loaded TeX uses the values of \defaulthyphenchar and \defaultskewchar. Note: Tralics does not read TFM files, it sets all parameters to zero.

You load a font by saying \font\preloaded=cmr7 scaled \magstep4 or \font\foo=cmr10 at 12pt. Such a construction will read a file cmr7.tfm or cmr10.tfm and apply a scale factor (a factor 2 in the first case, and 1.2, in the second case). A font like ecrm exists in size 5, 6, 7, 8, 9, 10, 10.95 (magstephalf), 12, 14.4, 17.28, 20.74, 24.88 (magstep 1, 2, 3, 4, and 5 respectively), 29.86 and 35.83. There are some slight differences between cmr10 at 12pt and cmr12 (see the TeXbook for details). You can simply say \font\tenrm=cmr10. After that you use it like this {\tenrm test}. This gives: test. You can use \fontdimen1\tenrm like any dimension. For instance, using \the to typeset the value, we get 0.0pt for the slant, 0.0pt plus 0.0pt minus 0.0pt for the interword space, 0.0pt for the ex-height, 0.0pt for the quad,(note: ) 0.0pt for the extra space. Parameters for the current font are: 0.0pt for the slant, 0.0pt plus 0.0pt minus 0.0pt for the interword space, 0.0pt for the ex-height, 0.0pt for the quad, 0.0pt for the extra space. If you say

   \fontencoding{T1}\fontfamily{cmr}\fontseries{m}%
   \fontshape{n}\fontsize{10pt}{12pt}
   \selectfont

you specify all font parameter, and you switch (from the font named `cmr10 at 10.0pt´) fontfamily Ucmrfontseries Umfontshape Unfontesize U10pt12ptselectfont Uto the default ten point font with T1 encoding, namely `cmr10 at 10.0pt´. The default font in this document uses `lmr´ as family. The parameters are now: 0.0pt for the slant, 0.0pt plus 0.0pt minus 0.0pt for the interword space, 0.0pt for the ex-height, 0.0pt for the quad, 0.0pt for the extra space. As you can see, they are not exactly the same. However, the glyphs are similar. The current font name can be printed via \fontname\font. If you read the XML version, all dimension is zero, and font names empty. For cmr10, the slant is 0, the interword space is 3+1/3pt plus 1+2/3pt minus 1+1/9pt, the ex height is 4.30554, the quad is 10.00002pt, the extra space is 1+1/9pt.

The commands shown above are provided by LaTeX. The effect of \selectfont is to take all values (stored by the other commands) and create a font name (say \tenrm for simplicity, see example below for a real name), check the font, and make it the current font. Printing a character like e-acute can depend of the encoding (in some cases the character is in the font, in other cases a combination of two characters is needed). As a consequence, checking the font means to inform some commands of an encoding change. In the example above, the quantity 10pt is the size of the font, but the value 12pt is the baseline skip, changing it means changing some other parameters (for instance the value of \strut). An important task of \selectfont is to associate to the font name \tenrm a real name (say cmr10) and call the \font commmand. The real name is computed according to rules defined in a font definition file, for instance t1cmr.fd, that depend only on the encoding and family; there are rules that say how to deal with the case where the desired series, shape or size are unavailable. All these commands are implemented in Tralics. The size and encoding is currently ignored. We shall describe below some commands that change the series and shape of the current font (for instance \bfseries, \itshape) that are easily related to parameters of \selectfont. Interpreting the argument of \fontfamily is a bit more complicated: for instance pcr is interpreted as cmtt (the name cmtt will be explained below, while pcr refers to a Courier font). There is another bunch of font commands, implemented in Tralics, that provoke an Unimplemented NFSS command error; for instance \DeclareTextAccent is a command that takes three arguments A, E and N, and says that accent A in encoding E is at position N in the font.

An important characteristic of a font is how glyphs are represented: For TeX, this is irrelevant, since the dvi file contains only the metrics. However, the reader will see some black and white pixels (of ink on a sheet of paper, or dots on a screen, or points on a wall projected by a beamer). All fonts designed by Knuth are produced by the metafont program that produces both the metrics and the glyphs as bitmaps (in the form of gf file, usually packed as pk files). If the resolution of these bitmaps is different from that required by the printing device, some interpolation, extrapolation is required (this is sometimes called `antialiasing´, it may involve colored pixels instead of black and white). In general, people print a dvi file by converting the first into PostScript format; in a PostScript or pdf file, a font can be specified via different formats, Type1, Type3, TrueType etc. The simplest format is Type3, namely bitmaps. Some software like Acrobat Reader prefer Type1 (a format in which characters are defined by small programs). There is no direct way to produce a Type1 file from a metafont file, so that not all TeX fonts exist in Type1. For instance, the computer modern fonts (in version OT1) have been translated but not the T1 version (said otherwise, cmr10 exists in Type1 format, but not ecrm1000). On the other hand, most commercially available fonts are not produced by metafont, hence cannot be used directly by TeX. In this document, we experiment the Latin Modern font family; it is very similar to Computer Modern.

In modern distributions, the engine behind LaTeX is pdfTeX, so that producing pdf instead of dvi is as easy; in this case, the engine needs the glyphs. Since it is no more restricted to informations found in the metric files, funny effects can be achieved. An extension of TeX, called XeTeX, produces spectacular results; as in the case of Ω, the result can be a variant of the dvi format, called xdv or odvi.

In the case of a format like plain TeX, fonts are used according to the following scheme. First you define fonts like \tenrm, in three sizes (thus, you define \sevenrm, \fiverm), and different variants (say \teni, \tensy, \tenex, \tensl, \tentt, etc). Then you say \textfont0=\tenrm, \scriptfont0=..., \scriptscriptfont0=...: this defines family zero. You do the same for family 1, 2, 3, etc. We shall see later how certain math symbols use a specified family, in other cases the family specified by the \fam variable is used (there are only 16 families available). The size of a symbol is defined by the current style (displaystyle, textstyle, scriptstyle, or scriptscriptstyle). Then you say \def\it{\fam4\tenit}. Thus \it has two effects: one is to switch to \tenit, the second one is to set \fam to 4. Now, you can define a command \twelvepoint that modifies all the fonts values, using larger values. Guess what happens for a definition like \def\it{\tenit\fam4}.

In the case of a format like LaTeX, the situation is different. There are some high level commands like \large, that are defined like \@setsize \large {14pt} \xiipt\@xiipt (note: infinite recursion may be possible), and the \xiipt command is like the \twelvepoint command mentioned above. This is rather complicated. The situation became worse when people tried to replace computer modern fonts by other fonts. We shall describe here only the user interface of the NFSS (new font selection scheme).

There is a clear distinction between \textit and \mathit: they are to be used in text mode or math mode only; the command \it chooses one of them. Guess how \mathit is defined. In fact, it switches to some family (the number is not hard-code as the 4 above), to which a font is associated. This may be OT1/cmr/m/it/10; an important point is that the size may vary (depending on the current math style of the current font size), but the encoding is fixed: if the current encoding is T1, a different font is used in lath mode and in text mode.

We already mentioned that a important characteristic of the font is the encoding: We met OT1 (Original encoding by Knuth) and T1 (“Cork” encoding, similar to latin 1). There is an obsolete OT2 encoding for cyrillic, and new ones: T2A, T2B, T2C. The companion mentions over twenty standard font encodings. In the example of \showbox above, TeX told us that the current font was \T1/cmr/m/n/10. The first two letters indicate the encoding. There are different families of fonts. Assume that you use Computer Modern fonts (you do this by selecting a package; after that, your whole document will be in computer modern, unless you use fonts selected via \font or \selectfont). There are six sub-families: Roman, Sans, Typewriter, Fibonacci, Funny roman, and Dunhill. The name of these families are: cmr, cmss, cmtt, cmfib, cmfr, cmdh. The default family in this document is cmr. You can chose another family via the commands \rmfamily, \ttfamily and \sffamily (no command is provided for the other families). The commands \textrm, \textsf and \texttt take an argument and typeset it using the family. The commands \rm, \sf, \tt do the same, but they reset the series to medium, and the shape to normal. The series of a font can be: bold, bold extended, semibold, etc. In LaTeX you have \mdseries and \bfseries (you have also \textmd and \textbf, which are commands that take an argument; you have also \bf that selects roman family, bold series, normal shape). The shape can be: normal, italic, slanted, upright italic, small caps, etc. In LaTeX we have \upshape, \itshape, \slshape, and \scshape (and as, usual, \textup, \textit, \textsl and \textsc; there is also \it, \sl, \sc). There are two commands \em (a declaration) and \emph (that takes an argument) that use upright shape if the current font has a slant, and italics shape otherwise. These rules explain the cmr/m/n part in the font. In fact, the `cmr´ part comes from the command \rmdefault, but these commands are not implemented in Tralics. The command \textnormal takes an argument as is the equivalent of \normalfont.

There are two parameters that define the size of the font. First, document class options indicate the size used by \normalsize. In our example it is 10pt. There are ten commands that change the font size. In increasing order they are \tiny, \scriptsize \footnotesize, \small, \normalsize, \large, \Large, \LARGE, \huge, and \Huge. There is a command \selectfont; its purpose is to combine everything, the result will be \T1/cmr/m/n/10. There is another process that converts this to the font name ecrm1000, using font definition files.

In math formulas, you see things like α ' and e ¨, but never `á´ and `ë´. If you want an acute accent you use \acute, if you want a double dot accent you say \ddot. In fact, the textfont used for math is very often a 7bit font, without accented letters. If you want x e `me you should say x$^{\grave{e}me}$, or perhaps x$^{\hbox{ème}}$ (this gives x ème , letters are too big). Note that Tralics may translate this as xe; if you do not like it, either set the notrivialmath counter to zero, or an an empty group in the formula before the hat. A solution is x\textsuperscript{ième}, xième. In French, you say 1er, 1re, 1ers, 1res, 2e, 3es, etc., via 1\ier, 1\iere, 1\iers, 1\ieres, 2\ieme, 3\iemes. In English, you say 1st, 2nd, 3rd, 4th.

2.9. Spaces

In TeX spaces are ignored after a command like \foo, and a sequence of spaces is treated as a single one. The exact rule is the following. There is a variable whose values can be N (start of line), or M (middle of line) or S (when spaces are skipped). Whenever a line is read, TeX removes every space character at the end of the line. It inserts the value of \endlinechar (provided this is a valid character, an integer between 0 and 255). The state in N. Spaces are ignored if the state is S or N; if the state is M, a space produces a space token, and the state is changed to S; in this sentence a “space” is any character whose category code is 10. If TeX sees an end-of-line character (category 5), it ignores all other characters on the current line. If the state is N (line was empty), the tokeniser returns a \par token, if the state is M it returns a space token, otherwise the character is ignored. Note: in Tralics, the space token produced by an end-of-line is a line-feed character, this is to keep line breaks in the XML translation. If TeX sees a backslash (or any character of category code 0), it reads a command; the state will be S if the character is a letter or a space, it will be N otherwise. If TeX sees anything else, the state will be M.

For instance, if you say `x␣{␣}␣␣␣{␣}y´ the tokeniser sees 5 spaces. If you say \def\A{␣} and \def\B{␣\A␣\A␣}, then the body of \A contains a space as well as the body of \B. Full expansion of \B contains three spaces and x␣\B\y contains four spaces. The command \space is defined like \A above.

Spaces discarded by the tokeniser do not appear in the translation. However, spaces produced by the tokeniser can be ignored in some cases. A typical example: a command can take a space as argument, and ignore the argument. For instance \\ is a command that ignores spaces that follow it using explicit scanning (i.e. \futurelet). We already mentioned that spaces between arguments are generally ignored. Spaces can be ignored because you say \ignorespaces: the effect of this command is to expand what follows, until a non-expandable token is seen. If it is a space, it is ignored, and the process continues. A space can be ignored because of a syntax rule (for instance, before an equals sign in an assignment). In LaTeX you can see things like that \end{x} \end{y} \end{z}, each `end(xxx)´ being on a line by itself: this produces a space, and the LaTeX environment mechanism is clever enough to remove these spurious spaces. It is also possible to remove a space from typeset material via \unskip.

Spaces are ignored in math mode. The reason is that spaces are used to separate words, and there are no words in math formulas. There are operators, and these operators know how much white space to use. In the case of x+y=z, on each side of the plus sign there is some glue, the value comes from \medmuskip, it is 2.22 plus 1.11 minus 2.22; on each side of the equals sign there is \thickmuskip, namely 2.77 plus 2.77 (the unit is pt).(note: ) After the zed, there is a kern of value 0.4398. Note: the plus sign is followed by a penalty of 700, the equals sign by a penalty of 500. Plain TeX defines

\thinmuskip=3mu
\medmuskip=4mu plus 2mu minus 4mu
\thickmuskip=5mu plus 5mu

In Tralics, constant values are used (expressed in terms of em units; one em is 18mu, in the example above one em is 10pt). You can say \:, \> and \;. This produces a space (thin, medium, thick) using the values given above. You can also use \!, this is the negative of thin space. The translation of $A\:B\>C\;D\!E$ is:

<mrow><mi>A</mi><mspace width='0.166667em'/>
      <mi>B</mi><mspace width='0.222222em'/>
      <mi>C</mi><mspace width='0.277778em'/>
      <mi>D</mi><mspace width='-0.166667em'/><mi>E</mi></mrow>

The \space command expands to a single space token. It may disappear in all cases where the syntax says that a space is optional (because in general these rules imply expansion); in a case like \let\foo\space, tokens are not expanded, and \foo is made equivalent to the current value of \space. The \␣ command cannot be expanded. It starts a paragraph (if used in vertical mode). It inserts some white space whose value is the same as if the current space factor were 1000. You can use it after an abbreviation like Mr. in order to indicate that the dot is not an end of sentence marker. You can also use it after a command like \TeX if you want to leave some space. In math mode, Tralics interprets it as a space of width 6pt. The ~ character is usually active, its expansion is \nobreakspace. This is defined in Tralics to translate to &nbsp;. You can say \quad or \qquad. This inserts some space (the width is one or two em). If you say \hskip 1cm, this will append some glue (in Tralics, it will generate a sequence of &nbsp; whose width is more or less 1cm). Note: in the current version, entity names are no more generated, hence &nbsp; is replaced by the Unicode character U+A0, and we assume the the width of this character is one forth of a quad. In math mode, both the tilde character and \nobreakspace will give 3.33pt; inside an URL, the result is a tilde character. If you say \kern1cm this will append a kern (like glue, but the size is fixed). This is ignored by Tralics. A normal space produces glue (the value of the glue depends on some font parameters; it can also depend on the current space factor). A glue may disappear at a line break. Kerns will not. In LaTeX, you use \hspace instead of \hskip. You can use \hspace*, in this case, spaces at start of line are not ignored. Note the syntax \hspace{2cm} vs \hskip2cm\relax.

A\space\space B\ \ C\quad\qquad etc
a\hskip2cm b\hspace{3cm}etc.
x\vskip2cmy\vspace{2cm}etc.

Translation is (we have replaced nobreak space by tilde)

<p>A  B  C~~~~~~~~~etc
a~~~~~~~~~~~~~~b~~~~~~~~~~~~~~~~~~~~~etc.
x</p>
<p spacebefore='56.9055pt'>y</p>
<p spacebefore='56.9055pt'>etc.
</p>

When TeX wants to split a paragraph into lines of equal width, it will have to stretch and shrink the glue that appears on the line; it will remove interword glue at break points. An item of glue has the form x+y-z, where x, y and z are dimensions (y and z can be expressed in terms of fil, fill and filll), all three values can be positive or negative. We can express this as: we have a vector of size 9: x 0 is the regular part of the glue, x 1 , x 2 , x 3 and x 4 are the stretch component (in units of pt, fil, fill, and filll, only one of these components can be given), x 5 , x 6 , x 7 and x 8 are the shrink components (in units of pt, fil, fill, and filll, only one of these components can be given). When two pieces of glue are added, all components are added. The convention is that x 2 is much larger than x 1 , so that the sum of x 1 and x 2 is x 2 (said otherwise if we add 1pt plus 2pt and 3pt plus 4fil, the result is 4pt plus 4fil). Such simplifications are not done when TeX computes the sum of all glue items in a paragraph (as a result, addition is associative). The command \hfil is equivalent to \hskip0pt plus 1fil, the command \hfill is equivalent to \hskip 0pt plus 1fill, the command \hfilneg is equivalent to \hskip 0pt plus -1fil, the command \hss is equivalent to \hskip 0pt plus 1fil. It is an error to use infinite shrinkage, like \hss, in a paragraph, TeX complains with: Infinite glue shrinkage found in a paragraph. However you can say 123\hbox to1cm{\hss xxxxxxx\hss}456, the result is 123xxxxxxx456, said otherwise, the text is centered, no overfull neither underfull box is signaled.

The commands \vfil, \vfill, \vfilneg, \vss, behave in the same fashion, in vertical mode, adding vertical space. Tralics translates \hfil, \hfill, \hfilneg, and \hss as \leavevmode followed by an element <hfil>, that has the same name as the command. It translates \vfil, \vfill, \vfilneg, and \vss in the same fashion, by using \par instead of \leavevmode. The three commands \bigskip, \medskip and \smallskip are used to insert vertical space between paragraphs, of size 12pt, 6pt and 3pt respectively (in LaTeX, this is some glue that the user can modify, however, Tralics ignores the shrink and stretch parts of the glue inserted by \hskip, \vskip, \hspace and \vspace.) These four commands read an argument (in LaTeX, \hspace and \vspace accept an optional star, that translates to an empty vertical or horizontal rule, Tralics ignores the star). In the case of an horizontal space, \leavevmode is executed, then ~ are produced (one every 4 pt, a negative dimension produces nothing). In the case of a vertical space, the current paragraph is terminated; if after that the mode is vertical, a new paragraph is started, it has an attribute spacebefore with as value the dimension. In LaTeX, the behavior is different (see appendix A.1.5 of [6]). In math mode, you can also use \mskip and \mkern, these command use mu as unit, where 18mu is one em. Since Tralics does not know the value of an em, it uses 10pt, so that the dimension is first divided by 18, then multiplied by 10. Example

a\vfil\vfill\vfilneg\vss
b\hfil\hfill\hfilneg\hss
c\bigskip d\smallskip e\medskip f
$\mskip3mu\mkern2mu \mskip 18mu$

Translation is

<p>a</p>
<vfil/><vfill/><vfilneg/><vss/>
<p>b<hfil/><hfill/><hfilneg/><hss/>c</p>
<p spacebefore='12.0pt'>d</p>
<p spacebefore='3.0pt'>e</p>
<p spacebefore='6.0pt'>f
<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow><mspace width='1.66656pt'/><mspace width='1.111pt'/>
<mspace width='10.0pt'/></mrow></math>
</formula>
</p>

In TeX, there is no command that starts a paragraph. The \leavevmode command is implemented as \unhbox\voidb@x, where \unhbox starts a new paragraph if needed, and produces nothing, provided that its argument is the void box; the paragraph may contain the current indentation and the value of \everypar. This is a primitive in Tralics, the value of \everypar is unused. Both commands \indent and \noindent make sure the current mode is horizontal, the first one inserts the current indentation (an empty box with the width of \parindent). In TeX, you can use \indent anywhere in a paragraph. In Tralics, the translation of

a\noindent b \indent c
{\centering a\noindent b \indent c\par d}
{\raggedright a\noindent b \par\indent c\par d}

is

<p>a</p>
<p noindent='true'>b</p>
<p rend='center' noindent='false'>c
a</p>
<p rend='center'>b</p>
<p rend='center'>c</p>
<p rend='center'>d
a</p>
<p noindent='true' rend='flushed-left'>b</p>
<p noindent='false' rend='flushed-left'>c</p>
<p rend='flushed-left'>d</p>

The rules are the following: if \indent or \noindent appear in an empty paragraph, that is not centered, and that has no noindent attribute, one is set. Otherwise a new paragraph is started. It will have a noindent attribute, unless the paragraph is centered. The value of \parindent is never considered.

The translation of \par is a bit complicated. Nothing happens inside a \hbox, in \term(note: ), or if the current mode is not horizontal. The current XML element should be a <p>. A final space is removed from it. It will be popped from the stack. This restores the mode to the value of the previous mode. It restores the current XML element to the parent of the <p>. A newline character is added to it. There is an exception: in cases like \noindent\par, or \bigskip\par, or \\\par, the \par command was ignored until version 2.5 (pl7). The behavior is now: if the paragraph is empty, but there are attributes, then the <p> is removed, and attributes are added to the next <p> element.

The translation of \\ depends on the context. The command can be followed by an optional star, and an optional dimension. Inside a cell, this indicates the end of the cell as well as the the end of the row. You can say \newline, this is like \\ without optional argument and array test. In vertical mode, LaTeX complains with There´s no line here to end, but Tralics ignores the command. Inside a title, the command is ignored. Otherwise, the behavior is like \noindent; if an optional argument is given, it behaves like \vskip. For instance, the translation of

a \\b \\[2cm] c \newline[3cm]d \noindent e \vskip 4cm f

is

<p>a</p>
<p noindent='true'>b</p>
<p noindent='true' spacebefore='56.9055pt'> c</p>
<p noindent='true'>[3cm]d</p>
<p noindent='true'>e</p>
<p spacebefore='113.81102pt'>f</p>

Many people do not know that \\ takes an optional argument, and try to use different tricks in order to avoid errors triggered by \\\\. We have seen for instance

\def\ligne{\protect{\mbox{}\\\mbox{}\indent}}

Remember that \protect is like \noexpand, it is not a LaTeX command that takes an argument! More strange cases can be found in [3].

The commands \nolinebreak, \nopagebreak, \pagebreak, and \linebreak are defined by LaTeX to take an optional argument, an integer between 0 and 4. They insert some penalty, but depend on the mode, like \hspace and \vspace. They are ignored in Tralics. The command \break, \nobreak, and \allowbreak, are defined by LaTeX, they insert some penalty(zero, plus or minus infinity). They are ignored by Tralics. The commands \fillbreak, \goodbreak, \eject, \smallbreak, \medbreak, \bigbreak are defined by LaTeX to terminate a paragraph and insert some penalty. In Tralics, they behave like \par. Note. The last chapter of the second part of this document explains that, when converting XML to Pdf, special rules must be used when hyphenating URLs: ambiguities can be avoided when text is split a slashes. For this reason, Tralics inserts a <allowbreak> element in these cases, and when the command \allowbreak is used as well.

2.10. Conditional expansion

In the previous paragraphs we have shown how to define a macro `\foo´ that expands to `\bar´ and a macro `\bar´ that expands to `gee´.(note: ) Can a translator replace all \foo by \bar and all \bar by gee? the answer is obviously no; first because, if you say `\something\bar´, the argument will be (after expansion) `gee´, while in the case of `\something gee´ it will be `g´; there is a second problem, that occurs in latex2html: if you replace `\bar´ by its value, you get `\somethinggee´, and this is wrong, if you reparse it(note: ); some commands can be randomly redefined (for instance, at first use) like this:

   \def\NFSS{NFSS (New Font Selection Scheme)\global\def\NFSS{NFSS}}

The last reason is conditional expansion. Our original translator (written in Perl) has some troubles in these cases.

In this section, we shall consider cases where expansion depends on the context. We have already seen the commands \noexpand for delayed execution and \expandafter that changes the order of expansion, in section 6.12 we will describe \protected which inhibits expansion in a \edef. We shall analyze three commands: \Color, \Map and \Loop.

2.10.1. Constructing commands dynamically

Using colors in TeX is not completely trivial, one reason is that there are different color models, more or less adapted to the task (printing on paper, on transparencies, or using a video projector). The color package proposes

\def\textcolor#1#{\@textcolor{#1}}
\def\@textcolor#1#2#3{\protect\leavevmode{\color#1{#2}#3}}

Note that the brace character that indicates the start of the body of \textcolor is preceded by a sharp sign. This means that the argument of the command is everything before the brace. In a case of \textcolor {green} {text}, it is empty. The \color command takes two arguments (the color model, empty in the example, and the color); it changes the current color, which is magically restored at the end of the group. One of the reasons why colors are not implemented in Tralics is also the scope of the command is unclear. Assume that we have two commands `\enrouge´ and `\envert´ that take an argument and typeset it in red and green; they could be defined as

\def\enrouge#1{\textcolor{red}{#1}}
\def\envert#1{\textcolor{green}{#1}}

We explain here how to solve the following problem. We want to define a command \Color that takes two arguments, a color and text; if the color is `rouge´ or `vert´ it should call \enrouge or \envert. Otherwise, some default action is specified (an error could be signaled, in the following, we assume that the color should be ignored). One solution to this problem uses tests, as explain in the next section. This means that we have to change the macro if a new color (for instance `\enbleu´ for blue) is added to the list. The following works

    \def\color#1{\csname en#1\endcsname}

The only drawback with this method is that it might produce unexpected results in the case where the command defined by \csname already exists (try `\color{d}{document}´).

There are many commands that use \csname. The problem mentioned above can be avoided if the command contains a non-letter character. For instance, when the counter foo is defined, the command \p@foo is created, and this command is used whenever the counter is printed. No package should define commands starting with p@. In some cases the construction can be

\csname\string\color @#1\endcsname

This constructs a command with a backslash in its name, and can be created only via \csname, thus offers a good protection.

2.10.2. Iterating over lists

In this paragraph we explain how to apply a command to all items in a list. The list could be defines as follows

    \def\mylista{\do{A1}\do{B2}\do{C3}}
    \def\mylistb{{A1}{B2}{C3}}
    \def\mylistc{A1,B2,C3}

The last line is an example of CVS (comma separated values). The LaTeX command \@for can be used to apply a command to every item, and \@tfor should be used in the second case. Here is an example.

\makeatletter
\let\BreakTfor\@break@tfor\let\Lfor\@for
\makeatother
\def\List{}\def\thelist{12,3,4,5,6} % list is expanded here
\Lfor\Elt:=\thelist\do{\edef\List{\List\Elt}\if\Elt4\BreakTfor\fi}

We give here the transcript file produced by Tralics. The same algorithm is used as in LaTeX. Arguments of \@for are respectively an element name, the colon-equal separator, the list to work on (it will be expanded), the \do-separator, and the code to be applied. On lines 4 and 5 you see the expansion: there is a call to \@forloop, taking as arguments the expanded list where two dummy items have been added, the end marker \@@, the element name and the code. The command is optimised in teh case where the list is empty, or has a single element; in the general case, you will see assignment of \Elt (lines 7-8) and expansion on lines 9, 10, 11. Note that \@iforloop is used; you can see on lines 36, 37, 38 the expansion of \@iforloop, which is a simple recursive function. Other assignment of \Elt can be seen on lines 22 and 34. On lines 47 and 48 you can see the expansion of \@break@tfor. What you do not see is that this command gobbles all tokens inserted by \@for and friends (namely, everything up to the \@@ token, the element name and the code). Caveat: the expansion of the LaTeX command with the same name is a double \fi.

1 [6] \Lfor\Elt:=\thelist\do{\edef\List{\List\Elt}\if\Elt4\BreakTfor\fi}
2 {\@for}
3 \thelist ->12,3,4,5,6
4 \Lfor<- \@forloop 12,3,4,5,6,\@nil ,\@nil \@@ \Elt {\edef \List {\List \Elt }
5 \if \Elt 4\BreakTfor \fi }
6 {\@forloop}
7 {changing \Elt=undefined}
8 {into \Elt=macro:->12}
9 \@forloop<- \edef \List {\List \Elt }\if \Elt 4\BreakTfor \fi \def \Elt {3}
10 \edef \List {\List \Elt }\if \Elt 4\BreakTfor \fi \@iforloop 4,5,6,\@nil
11 ,\@nil \@@ \Elt {\edef \List {\List \Elt }\if \Elt 4\BreakTfor \fi }
12 {\edef}
13 \List ->
14 \Elt ->12
15 {changing \List=macro:->}
16 {into \List=macro:->12}
17 +\if1
18 \Elt ->12
19 +iftest1 false
20 +\fi1
21 {\def}
22 {changing \Elt=macro:->12}
23 {into \Elt=macro:->3}
24 {\edef}
25 \List ->12
26 \Elt ->3
27 {changing \List=macro:->12}
28 {into \List=macro:->123}
29 +\if2
30 \Elt ->3
31 +iftest2 false
32 +\fi2
33 {\@iforloop}
34 {changing \Elt=macro:->3}
35 {into \Elt=macro:->4}
36 \@iforloop<- \edef \List {\List \Elt }\if \Elt 4\BreakTfor \fi \relax
37 \@iforloop 5,6,\@nil ,\@nil \@@ \Elt {\edef \List {\List \Elt }\if
38 \Elt 4\BreakTfor \fi }
39 {\edef}
40 \List ->123
41 \Elt ->4
42 {changing \List=macro:->123}
43 {into \List=macro:->1234}
44 +\if3
45 \Elt ->4
46 +iftest3 true
47 {\@break@tfor}
48 \BreakTfor<- \fi
49 +\fi3

2.10.3. Mapping a command

We consider here the following task. We have a list, like \mylista above, and we want to apply a command, say \foo to every element of the list. The solution we propose here is faster than the previous one; remember that, in the case of \@for, for every element, the unread part of the list, together with five additional tokens, the element name and the body, all these tokens are read, and pushed back in the stream. Our solution is as simple as

   \def\Map{\let\do}
   \def\Map#1#2{\let\do#1#2}
   \def\Map#1#2{\def\do{#1}#2}

Then we say \Map\textit\mylista. This produces A1B2C3. This is a however a bit unsatisfactory: in some cases the list delimiter is different from \do, an example is given above: at the start of a chapter, we want to reset all counters that depend on the chapter counter, in this case \@elt is used as delimiter. We could imagine a map-with-argument macro, that would take as argument the \do. But this is nothing else than \let! Our definition is so simple that people just say `\let\do\@makeother\dospecials´, see for instance 2.12. The difference between the first two versions of \Map is that the second command takes arguments, hence removes an additional level of braces. If you say \Map\foo{\do{A}}, the command \foo is executed in a group in the first case (and it is a mistake to put braces around it).

In the third case, the first argument #1 can consist in more than one token. For instance, if you say \def\foo#1#2{␣#1#2␣} then \Map{\foo A}\mylista gives ` AA1 AB2 AC3 ´. Note that there are too many spaces in this example: the last space in \foo is spurious.

2.10.4. Creating a list via pattern matching

Consider

% \newcommand{\fooiv}[3][bar]{Seen #1 #2 #3}
\def\fooivaux[#1]#2#3{Seen #1 #2 #3}
\def\fooiv{\@ifnextchar[{\fooivaux}{\fooivaux[bar]}}

The commented line is interpreted by LaTeX in the same fashion as the two other lines (except that the internal name is a bit more complicated than `\fooivaux´). We shall explain later how `\@ifnextchar´ works(note: ). We are interested here in how LaTeX converts the `[3]´ into `[#1]#2#3´. Since the number of arguments is between zero and nine, a short sequence of conditionals could be used. Instead, the following code is used by LaTeX:

1 \long \def \@yargdef #1#2#3{%
2   \ifx#2\tw@
3     \def\reserved@b##11{[####1]}%
4   \else
5     \let\reserved@b\@gobble
6   \fi
7   \expandafter
8     \@yargd@f \expandafter{\number #3}#1%
9 }
10 \long \def \@yargd@f#1#2{%
11   \def \reserved@a ##1#1##2##{%
12     \expandafter\def\expandafter#2\reserved@b ##1#1%
13     }%
14   \l@ngrel@x \reserved@a 0##1##2##3##4##5##6##7##8##9###1%
15 }

In the case of `\newcommand\fooiii[3]{foo}´ the \@yargdef command is called with three arguments, the first is \fooiii, the command to be defined, then comes `\@ne´ (some randomly chosen token), then `3´ (the number of arguments) and finally `{foo}´, the body of the command to be defined. This argument is not read, but the code relies on the fact that it starts with an opening brace. The objective is to produce `#1#2#3´. In the case of `\newcommand\fooi{foo}´, arguments are the same with 0 as third argument, the objective is to produce the empty string. In the case of \fooiv, the second argument is `\tw@´, this is something different from `\@ne´, the objective is similar, but a bit different: we want `[#1]#2#3´.

In order to make things easier to understand, we shall proceed to the following simplifications: let´s forget about the percent signs (their purpose is to suppress unwanted space). Let´s forget about `\long´ (is it really needed?) and `\l@ngrel@x´ (this is something that adds conditionally a `\long´ token before the definition). Let´s simplify the names: we write `\Ra´ and `\Rb´ instead of `\reserved@a´ and `\reserved@b´. We also write `\ydef´ and `\yaux´ instead of `\@yargdef´ and `\@yargd@f´. Finally, we replace the arguments by X, Y, Z, and `##´ by a simple `#´. Hence we get

\def \ydef XYZ{
  \ifx Y\tw@
    \def\Rb#11{[##1]}
  \else
    \let\Rb\@gobble
  \fi
  \expandafter \yaux \expandafter{\number Z}X
}
\def\yaux XY{
  \def\Ra #1X#2#{\expandafter\def\expandafter Y\Rb #1X}
  \Ra 0#1#2#3#4#5#6#7#8#9#X
}

Let´s start the analysis with the lines 7 and 8. Because of the two `\expandafter´ tokens, the first token to be expanded is `\number´. This means that Z is replaced by its numeric value. Said otherwise, the number of arguments can be `03´, or ``\^^C´, or even `\value{page}´ if the page number if not too big. In Tralics, only explicit numbers are allowed (You will get a message like Only one token allowed; I will assume that the command takes no argument.) In general, lines 7 and 8 are equivalent to \yaux{Z}X.

Let´s now explain lines 2 to 6. We are in a simple case of a conditional (the commands \@ne and \tw@ are normally equivalent to 1 and 2, they compare unequal), so that line 3 is executed in case of an optional argument, and line 5 otherwise. In the last case \Rb is a command that takes an argument and ignores it; otherwise \Rb is a command that takes an argument, delimited by the character `1´, ignores it, and the expansion is `[#1]´ (four tokens). Remember that we want `[#1]#2#3´, that is a good starting point.

Consider now lines 11 and 12. In order to simplify explanations, we replace X by Z and Y by X (i.e. use the argument names of the outer function). We shall denote by U and V the arguments of \Ra. Thus \Ra is

    \def\Ra UZV{\expandafter\def\expandafter X\Rb UZ}

The question now is what are the values of U and V? In order to answer this question we shall write line 14 in a different way. Let s(n) be the sequence #1#2....#n#, and S(n) the sequence #n...#9. The content of line 14

    \Ra 0#1#2#3#4#5#6#7#8#9#Z

can be interpreted as \Ra, 0, s(n-1), n, S(n+1), #Z, whenever n is a digit between 1 and 9. Said otherwise, whenever Z is a digit between 1 and 9, the first argument U of \Ra is 0s(n-1) (the second argument is ignored, it is everything up to the first brace, the one that delimits the body). Obviously, in the case where Z is the digit 0, U is empty. We leave it as an exercise to the reader to see what happens in the case where Z is a sharp sign(note: ). In all other cases, U is the sequence 0s(9). The important point is that, whatever Z, TeX will not read beyond the opening brace of the body.

Assume now that we want to construct a normal command (case \Rb is gobble). It always gobbles a zero (if Z is zero, U is empty, and Z is gobbled). Thus \Rb UZ expand to: nothing if Z is 0, s(n-1)n if Z is a digit between 1 and 9, and #1#2#3#4#5#6#7#8#9#Z otherwise. This yields an error You already have nine parameters which is adequate in case Z is a number larger than nine. Consider now the case of an optional argument. Here \Rb is a bit different: it reads the `0#1´ part and replaces it by `[#1]´. You will get a Runaway argument? error (or some other strange behavior) in case Z is `0´ because pattern matching fails (of course, you should never try to make optional the first argument of a function that takes none).

2.10.5. A variant of the previous problem

In the previous paragraph we have shown how to convert an integer, say 3, into a sequence #1#2#3. One trouble with sharp signs is that you have to double them, and if you define a command in the body of the other one, they must be doubled again. Thus we state our problem as: given an integer N between 1 and 9, construct \sharp1\sharp2...\sharpN. After that, we can evaluate the `\sharp´ command(note: ), replacing it by `#´. One solution (original LaTeX code) uses a loop from N down to Y (with Y=1 in the case of normal argument, Y=2 otherwise(note: )). Some variants will be discussed later on. The current LaTeX code uses pattern matching, as explained above, this leasa toi the following solution

\def\ydef#1#2#3{%
  \def\tmp##1#2##20{##1#2}%
  \def\Sharp{\noexpand\Sharp}%
  \edef\etmp{\tmp\Sharp1\Sharp2\Sharp3\Sharp4\Sharp5\Sharp6\Sharp7%
    \Sharp8\Sharp90}%
  \def\Sharp{########}% needs 8 #
  \edef\etmp{\etmp}%
  \expandafter\def\expandafter#1\etmp{#3}}%
 
\ydef{\acmd}{3}{\string \acmd\space called with #1, #2, and #3.}
Test \acmd A{BC}D.

Test \acmd called with A, BC, and D..

2.10.6. Loops

A silly question is: can we do loops without conditionals? The answer will be given later. We assume here that our loop will be of the form: while N is not too big, do something and increment N. This mechanism needs modifying a table (the location of N) hence is not pure expansion. In our example, we will write `\sharp\the\count0´, and hope that this will evaluate to `#3´ later on, assuming that \count0 contains `3´ now. How that can be implemented is left as an exercise to the reader. See also section 2.11.1. We shall explain later all the silly details concerning conditionals in TeX, all we need to know is that you can test a<b and a>b, but neither ab nor ab. Here is our code:

\def\code{\advance\count0 by 1 \sharp\the\count0}
\def\Loop{\ifnum\count0<\count1 \code\Loop\fi}

Assume that \count0 holds 0 and \count1 holds 3. In this case the test is true, `\code´ is evaluated, then `\Loop´. The effect of evaluating `\code´ is to increment the counter and produce `\sharp1´. The loop terminates after `\sharp3´ has been produced. Notice that recursion is not terminal (but it would be in most computer languages): when the test is found false, there are four `\if´ tokens not yet evaluated. This example is atypical, in that the counter is modified before its use; exchanging the `\sharp...´ and the `\advance...´ part implies changing initial and final value (1,4 instead of 0,3).

Our \Loop command is not generic, in that the name of the counters are built-in. Thus Knuth proposes the following:

\def\loop#1\repeat{\def\body{#1}\iterate}
\def\iterate{\body \let\next\iterate \else\let\next\relax\fi \next}
\let\repeat\fi % this makes \loop...\if...\repeat skippable
 
\loop \ifnum \count0<\count1 \sharp \the\count0 \advance\count0 by 1\repeat

Note that the last line contains an `\ifxx´ where the associated `\fi´ is the `\repeat´ at the end of the line. Thus, in the case where the `\loop´ command is not expanded this line is well-balanced regarding conditionals. In the case where `\loop´ is expanded, the value of the `\repeat´ token is irrelevant, it just serves as delimiter, and the `\fi´ has to be found in \iterate. In order for \iterate to work, the `\body´ should expand to an incomplete conditional, without \else part. It conditionally sets \next, and evaluates it after \fi; this trick makes the recursion terminal.

An alternate version is given by LaTeX, as follows(note: )

\def\loop#1\repeat{\def\iterate{#1\relax\expandafter\iterate\fi}%
  \iterate \let\iterate\relax}

Adding `\let\iterate\relax´ at the end of the definition has no real importance; but it causes no harm either. Note the \expandafter trick: if the test in the loop is false, neither \expandafter nor \iterate are expanded, if the test is true, \fi is evaluated before \iterate. Thus recursion is terminal. One difference with the TeX method is that the body of the loop is put in \iterate rather than in a auxiliary command. The interesting point is the `\relax´. Guess what happens in this case:

\def\bloop#1\repeat{\def\iterate{#1\expandafter\iterate\fi}\iterate}
\count0=0 \count1=4
\bloop \ifnum \count0<\count1 \the\count0 \advance\count0 by 1\repeat

If you use LaTeX in verbose mode, you can see that the test is true, true and false, where you expect it to be true four times. The printed result is `0´ (hence the question: what did the second iteration do?). Using Tralics, you will get more information.

1 +\ifnum6
2 +scanint for \count->0
3 +scanint for \ifnum->0
4 +scanint for \count->1
5 +scanint for \ifnum->4
6 +iftest6 true
7 {\the}
8 {\the \count}
9 +scanint for \count->0
10 \the->0.
11 Character sequence: 0.
12 {\advance}
13 +scanint for \count->0
14 {\expandafter \iterate \fi}
15 +\fi6
16 \iterate ->\ifnum \count 0<\count 1 \the \count 0 \advance \count 0 by 1\expandafter \...
17 +\ifnum7
18 +scanint for \count->0
19 +scanint for \ifnum->0
20 +scanint for \count->1
21 +scanint for \ifnum->4
22 +iftest7 true
23 {\the}
24 {\the \count}
25 +scanint for \count->0
26 \the->0.
27 +scanint for \count->10
28 {\advance}
29 +scanint for \count->0
30 {\expandafter \iterate \fi}
31 +\fi7
32 \iterate ->\ifnum \count 0<\count 1 \the \count 0 \advance \count 0 by 1\expandafter \...
33 +\ifnum8
34 +scanint for \count->0
35 +scanint for \ifnum->10
36 +scanint for \count->1
37 +scanint for \ifnum->4
38 +iftest8 false
39 +\fi8

Lines 16 and 32 are a bit too long; there are two token \iterate\fi that are replaced by \...

As you can see, all tests have a serial number. On lines 2–5, you can see why the first test is true: it is because the numbers 0 and 4 are compared. On lines 18–21, you see why the second test is true, and on lines 34–37, you see why the last test is false; in fact, \count0 contains ten. On line 27, you see something strange. Explanations: Assume that you say `\advance \Foo4´, where \Foo is a reference to some counter. In this case, the trace of Tralics will contain +scanint for \Foo->4, and everybody understands this. If you replace \Foo by \count0, the trace will contain \count; it will also contain a line for the zero in \count0. Hence, the number that appears in line 27 is the value read by the \advance in line 12. What happened is the following: after `by´ we have seen the digit `1´. In the case of \loop, the next token would be `\relax´, and this stops scanning of the number. But here, we have `\expandafter´, which is expandable and expanded, as a consequence, this finishes the first conditional. After that comes the test; it is true, because we did not increment our counter yet. Then comes `\the´ which is expandable. This reads `\count0´, as well as the space after it. The expansion of `\the...´ is the digit zero; so far, we have read 10, and continue reading. The next token is `\advance´ and this is not expandable. Hence \advance has read everything up to the next \advance. Is it needed to explain what happens next? Let´s just notice that, at line 39, Tralics (and also TeX) are still reading tokens for the second \advance. Since version 2.9, Tralics prints an additional line, between line 27 and 28, of the form \count0 changed fro 0 to 10.

2.11. Conditionals in TeX

We shall discuss in this section the following commands

2.11.1. Syntax of the conditionals

A conditional has the form \if test true-code \else false-code \fi. The \else part is optional; conditionals can be nested, and this nesting is independent from anything else. The command \unless (provided by ϵ-TeX) can be used before the if-like command (except \ifcase), its effect is to reverse the truth value of the test. Conditionals are expanded: this means that conditionals are evaluated inside a \edef, you can use \noexpand to delay evaluation, and \expandafter to change the order of expansion.

An important point is the following: if you define a command \ifthenelse with three argument, that evaluates the first argument as a boolean, and expands conditionally to the second or third argument, then these two arguments must be balanced, and category codes are fixed. In the case of \if, there is no such limitations: if the test is found false, then all tokens are read at high speed until finding a `\else´, and normal processing occurs, or until finding a `\fi´, that indicates the end of the conditional; if the test is true, and if there is an \else part, all tokens between `\else´ and `\fi´ are read at high speed. Consider for instance this piece of code

    \ifnum \A=\B do-nothing \else {\let\fi\relax\C}\fi

Assume that the test is false; this means that the else part is evaluated. Locally `\fi´ is redefined to do nothing, and `\C´ is evaluated. Let´s assume that `\C´ does nothing special (it could typeset `Hello, world!´). In this case the `\fi´ after the brace terminates the conditional. Assume now that the test is true. Skipping over the \else part at high speed just means compare the actual value of a token with `\if´ or `\fi´: in the first case, the if-counter is incremented, in the second case it is decremented, in all other cases the counter is left unchanged; reading stops when the counter is zero. Here, the conditional is terminated by the first `\fi´. This means that you have to be very careful: the end of the conditional can change, depending on whether the test is true or false. When we say: `compare the actual value of the token´, this means that the name is irrelevant, only the meaning is used, for instance `\repeat´ has the same value as `\fi´, and \loop...\if...\repeat is well balanced.

All constructions indicated above have a then-part and an else-part, except \ifcase: this command reads a number (see section 2.6 for details) and you can specify action for the case zero, the case one, the case two, using \or as separator, and an optional \else for other cases. Any other use of the \or command wil signal a Extra \or error. For instance, we can solve the problem of constructing \sharp1...\sharp N as follows (assuming `\N´ holds the value of N)

\ifcase \N \error{You cannot use zero here}
 \or \sharp1
 \or \sharp1\sharp2
 \or \sharp1\sharp2\sharp3
 \or \sharp1\sharp2\sharp3\sharp4
 \or \sharp1\sharp2\sharp3\sharp4\sharp5
 \or \sharp1\sharp2\sharp3\sharp4\sharp5\sharp6
 \or \sharp1\sharp2\sharp3\sharp4\sharp5\sharp6\sharp7
 \or \sharp1\sharp2\sharp3\sharp4\sharp5\sharp6\sharp7\sharp8
 \or \sharp1\sharp2\sharp3\sharp4\sharp5\sharp6\sharp7\sharp8\sharp9
 \else \error{Argument must be non-negative, at most nine}
\fi

The simple conditional `\if AB ... \else ... \fi´ compares two characters A and B, it shares some features with \ifcat. It expands tokens, using the following rules

The command \if compares the two numeric codes, and \ifcat compares the category codes. If you say something like

\catcode `\A=3
\def\fooi{A}
\catcode `\A=11
\def\fooii{A}
\if\fooi\fooii H\fi \ifcat\fooi\fooii\else e\fi
\if\bgroup{l\fi \ifcat\egroup}l\fi \if\relax\par o\fi
\if01\else,\fi \ifcat01 w\fi \if\par1\else o\fi
\if\noexpand\fooii\relax r\fi \if\fooii Ald\fi \if!!!\fi

this should typeset as `Hello, world!´.

You must be very careful using a construction like `\if\A\B...´, because of the following

Plain TeX provides an \outer macro(note: ) \newif that takes an argument \iffoo (whose name starts with the two letters if) and makes it a new conditional; the ifthen package provides the more LaTeXish syntax \newboolean{foo}. This means that \iffoo true-code \else false-code \fi becomes valid, and evaluates false-code. You can say `\footrue´ and the condition becomes true (true-code is evaluated) or `\foofalse´ and it becomes false (false-code is evaluated). The \global prefix is allowed before the command. The ifthen package provides \setboolean{foo}{true} where the second argument is case insensitive. These commands could be implemented as

\def\footrue{\def\iffoo{\if00}}
\def\foofalse{\def\iffoo{\if01}}

The trouble with this definition is that, when `\iffoo´ is read at high speed, it is not recognized as a conditional (it is a user defined command), see discussion about `\ifhph´ in [4, Chapter 20]. For this reason, the commands \iftrue and \iffalse were added to TeX, they evaluate respectively to true and false, and the following lines work (because \let is used instead of \def):

\def\footrue{\let\iffoo\iftrue}
\def\foofalse{\let\iffoo\iffalse}

You can use `\ifnum´ or `\ifdim´: in both cases a numeric quantity, an operator, and another numerical quantity are read. Three operators are recognized: less than, greater than and equal to. In the case of `\ifnum´, both quantities have to be numbers, otherwise dimensions. Note that glue is converted to a dimension (and possibly a number), by ignoring the shrink and stretch part. If you want to compare two items of glue, you must split them into components and check them in order. The example that follows shows also that math glue must first be converted into ordinary glue. All the commands shown here are fully expandable; without the \relax, this piece of code gives three errors (and TeX is still trying to see if the `fill´ is not a `filll´).

\muskip0=36mu plus 18mu minus 1fill\relax
\ifnum\glueshrinkorder\mutoglue\muskip0=2\else\bad\fi
\ifdim\glueshrink\mutoglue\muskip0=1pt\else\bad\fi
\ifdim\gluestretch\mutoglue\muskip0=18pt\else\bad\fi

Here is an example that uses no extension.

\count0=0 \count1=1 \dimen0=1pc \dimen1=12pt
\skip0=1cm minus3fill  \skip1=1mmplus 2fill
\tracingall
\ifnum \count0<\count1
  \ifdim \dimen0=\dimen1
    \ifdim \skip0>\skip1 ok \fi\fi\fi

This is the trace of Tralics. Note that for LaTeX, all lengths allocated by \newlength are “rubber” length, i.e. associated to a \skip register. Such quantities are automatically converted into rigid length (however, if you replace in the example `\skip1´ by `1mmplus 2fill´, then only a rigid dimension is read, the `plus 2fill´ is not part of the condition).

+\ifnum26
+scanint for \count->0
+scanint for \ifnum->0
+scanint for \count->1
+scanint for \ifnum->1
+iftest26 true
+\ifdim27
+scanint for \dimen->0
+scandimen for \ifdim->12.0pt
+scanint for \dimen->1
+scandimen for \ifdim->12.0pt
+iftest27 true
+\ifdim28
+scanint for \skip->0
+scandimen for \ifdim->28.45274pt
+scanint for \skip->1
+scandimen for \ifdim->2.84526pt
+iftest28 true
+\fi28
+\fi27
+\fi26

This is one solution to our problem of producing N sharp signs in a row:

\ifnum \N>0 \sharp1\fi\ifnum \N>1 \sharp2\fi\ifnum \N>2 \sharp3\fi
\ifnum \N>3 \sharp4\fi\ifnum \N>4 \sharp5\fi\ifnum \N>5 \sharp6\fi
\ifnum \N>6 \sharp7\fi\ifnum \N>7 \sharp8\fi\ifnum \N>8 \sharp9\fi

The following construction is a priori more efficient (on the average there are less tests) but it takes more memory.

\ifnum \N>0 \sharp1\ifnum \N>1 \sharp2\ifnum \N>2 \sharp3%
\ifnum \N>3 \sharp4\ifnum \N>4 \sharp5\ifnum \N>5 \sharp6%
\ifnum \N>6 \sharp7\ifnum \N>7 \sharp8\ifnum \N>8 \sharp9%
\fi\fi\fi\fi\fi\fi\fi\fi\fi

You can test whether a character can be read from an input channel, via the \ifeof command. Here is an example from the Tralics torture file. The file tortureaux.tex has six lines, the first one contains abc, the second one is empty, the third one contains \a \b {\c, the fourth one contains {} \d} \e, the next one contains 123, the last one is empty. The \testeq commands compares two commands: things should be equal here. (See TeXbook, exercise 20-18, if you do not understand the setting of \endlinechar). Commands starting with `bad´ are not evaluated in this example. Details can be found in section 5.12.

{
\openin 5=tortureaux
\endlinechar=-1
\ifeof5 \badifeofatentry\fi
\read 5 to \foo\testeq\foo{abc}
\read 5 to \foo\testeq\foo{}
\read 5 to \foo\testeq\foo{\a\b{\c{} \d} \e}
\global\read 5 to \foo
\closein5\relax
\ifeof5\else\badifeofatexit\fi
}\testeq\foo{123}
\ifeof3\else \badifeofnonexists\fi

You can say \ifvoid25, \ifhbox25 or \ifvbox25. In TeX these command would test the content of box register 25: if empty, the \ifvoid is true, the other tests are false; if not empty, the box contains a horizontal list or a vertical list, and \ifhbox and \ifvbox are respectively true, the two other tests being false. In Tralics, a box contains a character string or an XML element, but there is no associated orientation; hence \ifhbox and \ifvbox always evaluate to false. Instead of 25, any number can be given (provided it is a valid register number) In the example that follows, only the first equals sign is part of an assignment, and box number one is tested.

       \count0=1
\ifvoid\count0=2\fi
\ifvbox\count0=3\fi
\ifhbox\count0=4\fi

You can say \ifmmode, \ifvmode, \ifhmode and \ifinner. These commands check the current mode. The first three evaluate to true if the mode is math mode, vertical mode, or horizontal mode. The last is true if the mode is inner (internal vertical mode, restricted horizontal mode, or (nondisplay) math mode). The following example shows these modes.

\def\wm{\edef\res{\ifinner i\else I\fi
   \ifhmode h\else H\fi
   \ifvmode v\else V\fi
   \ifmmode m\else M\fi}\res}
\par \wm$$\wm \hbox{\wm $\wm$} \eqno \wm$$

The result is: `IHvM IHVm ihVM iHVm iHVm´. If you remove the `\edef´, the trouble will be that typesetting the `I´ enters horizontal mode. This example fails if `$$...$$´ is replaced by `\[...\]´, because \eqno switches to inner math mode, and `\]´ checks for outer math. The same test provokes an error in Tralics, because of the implemenation of \eqno, that expands all tokens, including the token that follows \edef. Tralics knows whether is is in or out of math mode; in math mode it knows whether it is in display math or not. In these cases, it produces the same result as TeX. Outerwise \ifinner is false, and \ifvmode or \ifhmode produce results in accordance to the current mode, that has little to do with TeX modes.

An extension of ϵ-TeX is \isdefined. This reads a token, and yields true unless it is a macro (or active character) that is undefined. The command \ifcsname reads all characters up to \endcsname and constructs a character string in the same way as \csname. The value is true if a command with that name exists (possibly undefined); it is false otherwise (the important point is that the command is not created). In the example that follows, assuming \foo and \FOO undefined, you will see aBc (or abc, in case someone dedfined \undefined). You will also see DEF, because the LaTeX command \@ifundefined creates the token if it deos not exists, and sets it to \relax.

\makeatletter
\ifcsname foo\endcsname A\else a\fi
\ifx\foo\undefined  B\else b\fi
\ifdefined\foo  C\else c\fi
\@ifundefined{FOO}{D}{d}
\ifcsname FOO\endcsname E\else e\fi
\ifdefined\FOO F\else f\fi

The command \iffontchar is another extension; it reads a font identifier (for instance \font denotes the current font) and an integer (a character position); it yields true if the font specifies a character at that position.

The last conditional to explain is \ifx. This reads two tokens and compares them. Two tokens are equal if they are character tokens (implicit or explicit) with same character value and category code, or two TeX primitives with the same meaning, or two user-defined commands with the same value (same arguments, same body, same \long and \outer flags)(note: ),(note: ).

2.11.2. Examples of conditional commands

Using \ifx we can code our \Color command properly, like that

\def\Color#1#2{%
  \def\crouge{rouge}\def\cvert{vert}\def\cc{#1}%
 \ifx\cc\crouge\enrouge{#2}\else\ifx\cc\cvert\envert{#2}\else#2\fi\fi}

It is possible to avoid these assignments in the \Color macro, provided that they are hidden elsewhere. For instance

\def\ifstringeq#1#2#3#4{%
  \def\tempa{#1}\def\tempb{#2}%
  \ifx\tempa\tempb#3\else#4\fi}
 
\def\Color#1#2{%
  \ifstringeq{#1}{rouge}{\enrouge{#2}}
  {\ifstringeq{#1}{vert}{\envert{#2}}{#2}}}

Note that the ifthen package provides the \equal command as helper for such a situation: you could say \ifthenelse{\equal{A}{B}}{X}{Y} instead of \ifstringeq {A}{B}{X}{Y}. Caveat: the \equal command fully expands its two arguments, our version expands nothing.

In any computer language, you would define a command that compares two strings and returns true or false; this is not possible in TeX because commands return no value. All you can do is modify some variable (a command, a register, a token list, etc). This assignment can be done by the caller or the callee. Here is a solution where the token \next is set by the caller:

\def\Color#1{%
 \ifstringeq{#1}{rouge}{\let\next\enrouge}
 {\ifstringeq{#1}{vert}{\let\next\envert}{\let\next\relax}}%
 \next}

Note that, if \envert accepts an optional argument, for instance if \envert[clair]{text} typesets the text using light green, you can say \Color{vert}[clair]{text}. We consider now a case where the assignment is done by the callee (via \equaltrue or \equalfalse; there is a variant that uses \setboolean).

\newif\ifequal
\def\streq#1#2{%
 \def\tempa{#1}\def\tempb{#2}%
 %%variant: \setboolean{equal}{\ifx\tempa\tempb true\else false\fi}
 \ifx\tempa\tempb\equaltrue\else\equalfalse\fi}
 
\def\Color#1{%
 \streq{#1}{rouge}%
 \ifequal\let\next\enrouge\else
    \streq{#1}{vert}%
    \ifequal\let\next\envert\else \let\next\relax\fi\fi
 \next}

A subtlety of TeX is that tokens are read only when needed. Said otherwise, if you say `\if AB C\else D\fi´, TeX will evaluate the test; it will remember that a new conditional has started. If the test is false, it will skip at high speed until the \else, and resume normal evaluation; but if the test is true, it will resume normal evaluation right now. It is only when TeX sees an \else token (and this can be another one) that it will read all tokens at high speed until the \end. And, when TeX sees the \fi, it will pop the conditional stack. Consider the following example:

\def\ifstringeq#1#2#3#4{%
   \def\tempa{#1}\def\tempb{#2}%
   \ifx\tempa\tempb\aux{#3}\else\aux{#4}\fi}
\def\aux#1#2\fi{\fi#1}
\def\color#1{%
  \ifstringeq{#1}{rouge}{\enrouge}{\ifstringeq{#1}{vert}{\envert}{\relax}}}

Assume that the test is true. Then \aux reads all tokens, up to `\fi´, provides a \fi to finish the conditional now, then expands to its first argument (which is argument 3 of \ifstringeq). In the case where the test is false, the same thing happens. This is nicer that the solution that consists in defining conditionally \next and evaluating it after the \fi, it avoids an assignment.

2.11.3. Testing the next token

Let´s consider now a variant of the color problem. We want to write a command with three arguments A, B and C, it is assumed to read a token, compare it with A, and expand to B or C. We need an auxiliary command that reads the token. Thus the solution

\def\ifnextchar#1#2#3{%
 \let\tempa=#1\def\tempb{#2}\def\tempc{#3}%
 \ifaux
}
\def\ifaux#1{%
  \let\lettoken=#1%
  \ifx\lettoken\tempa\let\tempd\tempb\else\let\tempd\tempc\fi
  \tempd
}

Note that we have put an equals sign after `\let\tempa´ and `\let\lettoken´ for the case where the token to match is an equals sign. If you want to catch spaces, a bit more complicated machinery must be used. There is a problem with this command, because, if the argument of \ifaux is not a single token, say `foobar´, then only `f´ will be put in \lettoken and `oobar´ will be typeset. On the other hand, if the argument is empty, then `\ifx´ will be put in \lettoken; after that \lettoken will be expanded. Since this is \ifx, the following tokens will be compared (said otherwise `\tempa´ and `\let´), this is not exactly what is required. In order to solve this problem, we first modify slightly our code:

\def\ifnextchar#1#2#3{%
 \let\tempa=#1\def\tempb{#2}\def\tempc{#3}%
 \ifaux
}
\def\ifaux#1{\let\lettoken=#1\ifnch}
\def\ifnch{%
  \ifx\lettoken\tempa\let\tempd\tempb\else\let\tempd\tempc\fi
  \tempd
}

The \ifnch command given above looks like the LaTeX version of the beast. In fact, spaces are ignored in LaTeX, so that there is an additional test. Moreover, some variables have a different name, nevertheless, here is the code:

\def\@ifnch{%
  \ifx\@let@token\@sptoken
    \let\reserved@c\@xifnch
  \else
    \ifx\@let@token\reserved@d
      \let\reserved@c\reserved@a
    \else
      \let\reserved@c\reserved@b
    \fi
  \fi
  \reserved@c}

The problem is the \ifaux command. The question is: can we rewrite it in such a way as to read a single token, before calling \ifnch. Recall that we want to distinguish between `{x}´ and `x´. A very interesting question is the following: if we read the opening brace, how can we put it back in the input stream? we cannot do so by just expanding a macro (because the body is always well balanced). You could try something like {\ifnum0=`}\fi (that leaves an unmatched brace after expansion), or something like `{\iffalse}\fi´. Our solution is much simpler. There is a TeX primitive that gets the token without reading it. To be precise, \futurelet reads a token A, that has to be a command name or an active character, then a second token B, then a third token C. The value of the token is put in A, using the equivalent of \let, then C and B are pushed back in the input stream (in this order, the token to be read first is B). The code of \ifnextchar is hence the following:

\def\ifnextchar#1#2#3{%
 \let\tempa=#1\def\tempb{#2}\def\tempc{#3}%
 \futurelet\lettoken\ifnch}

What `\futurelet\lettoken\ifnch´ does is read a token. This could be a space character, an equal sign, an open brace, a closing brace, whatever. It puts it back in the input stream. It puts it also in \lettoken. After that, it evaluates \ifnch (which is a command that should take no argument, of course; it should consult \lettoken and depending on the value, call a command that, maybe, reads the token). There are some variants. For instance amsmath has a version that omits the comparison with <@sptoken>. The xkeyval package provides a version where the category codes of the character to test and the actual token may be different.

2.11.4. Reading a space

We consider in this paragraph the following problem: is it possible to define a command \sptoken that behaves like a space character inside \ifx? One problem with the current version of Tralics is that, as has been mentioned earlier, a newline character in the source file produces a new line character in the XML file; thus has a different representation as a normal space. Thus, there are two different space tokens N and S (they have the same category code, but a different value, 13 or 32). If a macro requires an argument delimited by a space, both these characters can be used. When comparing token lists, these tokens are considered equal. However, when using \ifx, these two tokens compare unequal. Our purpose is to create \sptoken that compares equal to S; it is trivial to create the N token, and compare them.

We give here three solutions. The first one uses \futurelet. If the arguments are A, B and C, where A is the command to define, and C the space, then B has to be a command (if it is a character, it will be typeset); this cannot be \foo, since spaces after \foo disappear, it has to be something like `\;´. This command must read the space, otherwise it appears in the output. We provide two solutions: a command that is delimited by a space, and a command that takes an argument (remember that spaces disappear before undelimited arguments):

\def\; {}\futurelet\SPtoken\; % comment required
\def\;#1{}\futurelet\SPtoken\; 0

In both cases, the command \; cannot be used for typesetting (in the LaTeX kernel, it is used for computing the \SPtoken, and correctly redefined after that). We give here an example, where the redefinition is temporary, inside the box. We can discard the content of the box.

\setbox0\hbox{\def\;{}\global\futurelet\SPtoken\; }

We give now a solution using \let. Remember the syntax, after \let and \sptoken (the token to be assigned), comes <equals> and <one optional space> and <token>, where the last token is our space token. Since <equals> reads an arbitrary number of spaces and an optional equals sign, an equals sign is required. Our optional space cannot be optional. So we must produce a double space. This is not completely trivial. We give here two solutions (the comment is necessary)

\def\makesptoken#1{\let\sptoken= #1}\makesptoken{ }
\def\:{\let\Sptoken= } \:  % this makes \Sptoken a space token

And now, how can we define \@xifnch? this command is assumed to read a space, discard it, and check again for the next character. Thus the question is to design a macro that reads a space. This cannot be done via \def\@xifnch#1..., since spaces are ignored before undelimited arguments; we cannot use the technique of the command `\;´ above, because we cannot read what follows the space; the solution consists in a command that takes no argument, and that starts with a compulsory token, like \def\foo\bar{etc}. The non trivial point is that we want \bar to be replaced by a space token, but spaces disappear after \foo. We give here two solutions.

\expandafter\def\expandafter\foo\space{etc}
\def\:{\Foo}\expandafter\def\: {etc}

2.11.5. Variants of the Map problem

Let´s consider the following variant of the \Map command. If we have \do{A}\do{B}\do{C}, we want to separate arguments with a comma, and put a period after the last argument; we might as well do something with the argument, say, typeset it in italics. This is not always possible. In one of the style sheets used by the Raweb, a Perl postprocessor is used for replacing some commas by a period. We assume here that we know where the list ends. For instance, we assume that we can put a `\endl´ token at the end of the list. Then we can write something like

\def\foo#1#2\endl{\textit{#1}\ifx#2\endl\endl.\else, \foo#2\endl\fi}

Then `\foo{A}{B and C}{D}\endl´ produces `A, B and C, D.´ as expected. Let´s analyze the code and try to see why it is wrong. We assume that you never say \foo\endl, because the list is assumed non-empty. We also assume that the list does not contain the \endl token (in LaTeX, you should use the special marker `\@nil´ only as list delimiter). In our case, the first argument is `A´, the second is `{B and C}{D}´. In the case where the second argument is empty, the test is true, because \endl is compared against itself. In our case, the test is false because the brace is compared with the letter B. If we put the second argument in a pair of braces, we get an error: Too many }´s, because the test is true, and a part of `#2\endl\endl´ has been evaluated. This means that our test is wrong. The only safe way to check whether #2 is empty is to put it in a command, and check whether this is the same as \empty. We shall give a second version of the code where the test is replaced by \ifx\endl#2\endl. In the case where #2 is empty, the test evaluates to true, and if #2 evaluates to some token list that does not start with \endl, the test will be false; this is better.

Note that, when \foo is called again, it compares `D´ with `\endl´. Does this surprise you? In fact, if you say `\foo{A}{XY}{UV}\endl´, you get `A, XY, U, V.´. The trouble is the following: when TeX reads the arguments of a command, a pair of braces disappears, when possible. Thus arguments are `A´ (without braces) and `{XY}{UV}´ (it is not possible to remove the braces). When \foo is called again, arguments are `XY´ and `UV´, without braces. This explains why the test compares U and V (by the way, if `UV´ is replaced by `UUVV´, the test will be true, yielding an Undefined control sequence error). When \foo is called again, arguments are now `U´ and `V´, an unwanted result. There is a simple way to avoid disappearance of braces: it suffices to put a token before each item, for instance like this

\def\foo\do#1#2\endl{\textit{#1}\ifx\endl#2\endl.\else, \foo#2\endl\fi}
\foo\do{A}\do{B}\do{C}\endl

The good way of testing that the argument is empty is to use \@iftempty, which a has different syntax:

\def\foo\do#1#2\endl{\textit{#1}\@iftempty{#2}{.}{, \foo#2\endl}}
\foo\do{A}\do{B}\do{C}\endl

A more elegant solution: notice that #2 starts with \do, unless it is empty. There is no need to read the argument for seeing this, we can use the \ifnextchar command. With the solution proposed here, the token that marks the end of the list is evaluated: we use \relax, because this is harmless.

\def\foo{\def\do##1{\textit{##1}\@ifnextchar{\do}{, }{.}}}
\foo\do{A}\do{B}\do{C}\relax

Note that we can replace \relax by something more useful, for instance a period:

\def\foo{\def\do##1{\textit{##1}\@ifnextchar{\do}{, }{}}}
\foo\do{A}\do{B}\do{C}.

An alternate solution could use `\ifprevchar´ instead of `\ifnextchar´. There is no such command in LaTeX, but the idea is the following: instead of putting a comma after each argument but the last, we can put a comma before each argument but the first. All we need to do is to know if this argument is the first. In one application, we have coded this as: apply \do-first on the first argument, and map \do-other on the rest of the list. If side effects are allowed, we can use a piece of code like this (note how the final period is typeset):

\newif\iffirst
\def\do#1{\iffirst\firstfalse\else , \fi\textit{#1}}
\firsttrue
\do{A}\do{B}\do{C}.

In fact, there is no need to use an auxiliary command, it suffices to modify \do itself:

\def\foo{\def\do##1{\textit{##1}\def\do####1{, \textit{####1}}}}
\foo\do{A}\do{B}\do{C}.

If you think that there are two many sharp signs, you can try

\newcommand\normaldo[1]{, \textit{#1}}
\newcommand\firstdo[1]{\textit{#1}\let\do\normaldo}
\newcommand\foo{\let\do\firstdo}
\foo\do{A}\do{B}\do{C}.

There are other possibilities implying conditional commands. We shall see later how to define a comment environment that ignores the content of it. It is as if you said

\newenvironment{comment}{\iffalse}{\fi}

One can make the following strange construct {\ifnum0=`}\fi. In this case, we compare two numbers, zero and the internal code of the brace (which is in general non-zero). The result of the test is false, but who cares? the body of the conditional as well as the else part is empty. Hence, the result is like \bgroup, there are some differences because TeX has two brace counters: the balance counter and the master counter; there is only one counter in Tralics. For details, see the TeXbook and its appendix D, where it is said “If you understand [...] you´ll agree that the present appendix deserves its name.” (the name of the appendix is `Dirty Tricks´).

A piece of code like this causes trouble to Tralics

\def\foo#1{%
  \sbox\tempboxa{#1}%
  \ifdim \wd\tempboxa >\hsize
  #1\par
  \else \hbox to \hsize{\hfil\box\tempboxa\hfil}%
\fi}

It is a simplification of the \@makecaption command of the article class. The idea is to center the caption of an image if it fits on a line (centering is achieved via \hfil). The argument is typeset in a temporary box, and the width of the box is compared against \hsize. Captions in the Raweb are always centered, but this is not aesthetic.

2.11.6. More examples

Consider again the following example

\def\ifnch{%
  \ifx\lettoken\tempa\let\tempd\tempb\else\let\tempd\tempc\fi
  \tempd
}

It would be much simpler to write:

\def\ifnch{%
  \ifx\lettoken\tempa\tempb\else\tempc\fi
}

The problem here is that the commands \tempb and \tempc may take an argument, that would be \else or \fi. The remedy is

\def\ifnch{%
  \ifx\lettoken\tempa\expandafter\tempb\else\expandafter\tempc\fi
}

In general, you need an \expandafter before each token between \else and \fi. The command \@afterfi can be used to simplify such definitions. Its effect is easy: it reads all token, up to the \fi tokens, evaluates \fi, then the other tokens. Such a command is provided by the following packages: typehtml, grabhedr, gmutils, gmverb, morehelp, splitbib, babel, and maybe others. Example:

\def\test#1{
   \ifnum\count0=#1
   somecode
   \else\@afterfi\fct v\fi}

If the test is true, then somecode is evaluated, then everything between \else and \fi is discarded. But if the test if false, the else part is interpreted as if it were \fi\fct v. The command \@afterelsefi is to be used in the true part (all tokens between \else and \fi are discarded). In the example that follows, \fct is called with two arguments, the first one is u or v, the second is 2.

\def\test#1{%
   \ifnum\count0=#1 %
   \@afterelsefi \fct u
   \else\@afterfi\fct v\fi}
\def\fct#1#2{} \test32

The piece of code that follows computes the factorial of a number, using only expandable commands (it requires \numexpr, an extension provided by ϵ-TeX).

\def\JGfactorial#1{%
  \ifnum\numexpr#1>1
    \number \numexpr#1*\JGfactorial{(#1-1)}\relax
  \else 1\fi}
 
\def\factorial#1{%
    \ifnum\numexpr#1>1
      \number \numexpr#1*\factorial{(#1-1)}\expandafter\relax
    \else
      \expandafter1\fi}
 
\def\Factorial#1{%
  \number\ifnum\numexpr#1>1
    \numexpr#1*\Factorial{(#1-1)}\expandafter\relax
  \else
   1\expandafter\space
  \fi
}
 
\def\UDfactorial#1{%
  \number\ifnum\numexpr#1>1
    \numexpr#1*\UDfactorial{(#1-1)}\expandafter\relax
  \else
    \numexpr\ifnum\numexpr#1<0 0\else1\fi\expandafter\relax
  \fi
}%

Ulrich Diez, wrote versions 3 and 4; Version 3 uses a space character instead of \space using one of the techniques shown above; he then produced version 4, which gives a different value for the factorial of a negative number, and the space after the digit 1 is not needed anymore. In fact, if the argument is zero or one (case where the first \ifnum is false, version 1 and 2 return the character 1, while versions 3 and 4 return the digits of the number 1, computed by \number; in case 3, an optional space is read after the integer constant, in case 4, the \relax token is an end marker for \numexpr, an no optional space is needed after it (I guess that the purpose of this \numexpr if to avoid any problems if \space is redefined); the first \numexpr is needed for the product, and the two other calls are needed if the command calls itself). The difference between versions 2 and 3 is the placement of \number. I put it just before \numexpr, because \numexpr can be used only in a context where a number is seen. Ulrich puts it before the \ifnum. Does this make any difference? If you want to compute the factorial of a number, no. What about the following code:

\expandafter\expandafter\expandafter\def
\expandafter\expandafter\expandafter\factorialresult
\expandafter\expandafter\expandafter{\JGfactorial{12}}

The effect is the following. The command \JGfactorial is expanded twice, and the result is put in a command; evaluating this command yields the desired result. The same can be applied to \UDfactorial. In any case, the first expansion gives the body of the macro. The second expansion expands the \ifnum and \number respectively. In one case you get lines two and three of \JGfactorial. This is something like

\def\factorialresult{...\else...\fi}

If you do not use this command, TeX will signal an unterminated \if. If you call it twice, you will get an extra \else error. On the other hand, if you consider \UDfactorial, the one-level expansion of \number implies expansion of the \ifnum, then the \numexpr of the body; expansion of the command means considering all tokens up to the final \relax, and since this \relax is preceded by \expandafter, everything up to the final \fi is taken into account. Thus, the one-level expansion of the body is a number, the desired result.

2.11.7. Producing N asterisks in a row

In appendix D of the TeXbook, there are some examples of how to produce N asterisks in a row. The question is: can we produce this using pure expansion? this is a solution given by D. Kastrup:

\def\nlines#1{\expandafter\nlineii\romannumeral\number\number #1 000\relax}
\def\nlineii#1{\if#1m\expandafter\theline\expandafter\nlineii\fi}
\def\theline{A}
\nlines{5}

This produces `AAAAA´. The idea is the following: `\romannumeral3000´ expands to `mmm´. It is then rather easy to convert this sequence of m into a sequence of A. The argument of the command can be `\count0´; the `\number´ has as effect to convert the value of this counter into a number, it gobbles a space. The argument of the command can be `\count1␣´; the second `\number´ will gobble the second space (I don´t know if there is some other reason for these two \number commands). Here is the same idea, without tests:

\def\recur#1{\csname rn#1\recur}
\def\rn#1{}
\def\rnm#1{\endcsname{#1}#1}
\def\replicate#1{\csname rn\expandafter\recur
  \romannumeral\number\number#1 000\endcsname\endcsname}
 
\dimen0=4sp \replicate{\dimen0}{P}

You may wonder how this works. Here is the transcript file of Tralics.

1 [216] \replicate{\dimen0}{P}
2 \replicate #1->\csname rn\expandafter \recur \romannumeral
3    \number \number #1 000\endcsname \endcsname
4 #1<-\dimen 0
5 {\csname}
6 {\expandafter \recur \romannumeral}
7 +scanint for \dimen->0
8 +scanint for \number->4
9 +scanint for \number->4000
10 +scanint for \romannumeral->4000
11 \recur #1->\csname rn#1\recur
12 #1<-m
13 {\csname}
14 \recur #1->\csname rn#1\recur
15 #1<-m
16 {\csname}
17 \recur #1->\csname rn#1\recur
18 #1<-m
19 {\csname}
20 \recur #1->\csname rn#1\recur
21 #1<-m
22 {\csname}
23 \recur #1->\csname rn#1\recur
24 #1<-\endcsname
25 {\csname}
26 {\csname->\rn}
27 \rn #1->
28 #1<-\recur
29 {\csname->\rnm}
30 \rnm #1->\endcsname {#1}#1
31 #1<-P
32 {\csname->\rnm}
33 \rnm #1->\endcsname {#1}#1
34 #1<-P
35 {\csname->\rnm}
36 \rnm #1->\endcsname {#1}#1
37 #1<-P
38 {\csname->\rnm}
39 \rnm #1->\endcsname {#1}#1
40 #1<-P
41 {\csname->\rn}
42 \rn #1->
43 #1<-P
44 Character sequence: PPPP .

This is now something else, it is part of a command defined in the RR style file:

\bgroup
  \edef\foo{\ifnum 0<0#1x\else y\fi}\def\xbar{x}%
  \ifx\foo\xbar
    \global\compteurtheme=#1
  \else \global\compteurtheme=0 \@latex@error{Pas un thème #1}\@eha\fi
\egroup

Assume that #1 contains a positive number, for instance 25. In this case, the test will be true, \foo will be defined as `x´, and will be equal to \xbar. In this case, our command puts 25 in \compteurtheme. Some other tests (not shown here) are done for instance, the value should be a number between 1 and 4, or a number with two digits, each one being between 1 and 4. Assume that the argument is not a number, say it is `gee´; then \ifnum will compare 0 and 0, the test will be false, \foo will be defined as `y´ hence is not equal to \xbar. Assume that the argument is `3a´; this is not a theme, but a theme and a subtheme. In this case, the test is true, but \foo expands to `3x´, and this is not equal to \xbar. Nowadays, themes are `com´, `cog´, etc, and this piece of code has become useless. It is replaced by something different, see end of section 6.9.

2.12. A nontrivial command \verb

The code that follows is a simplified version of a LaTeX command

1 \def\verb{%
2   \bgroup
3     \let\do\@makeother \dospecials
4     \verbatim@font\@noligs
5     \@vobeyspaces \frenchspacing\@sverb}
6  
7 \def\verb@egroup{\global\let\VBG\@empty\egroup}
8 \let\VBG\@empty
9  
10 \def\@sverb#1{%
11   \catcode`#1\active
12   \lccode`\~`#1%
13   \gdef\VBG{\verb@egroup\error{...}}%
14   \aftergroup\VBG
15   \lowercase{\let~\verb@egroup}}

Note first that this code contains two empty lines, that are read by TeX as a \par token (it is ignored, provided that the definition is read in vertical mode). Lines 5, 7, and 15 are terminated by a brace and the end of line character produces a space token, that is ignored for the same reasons. Lines 1, 10, 12, and 15 are terminated by a % character, since otherwise, it would produce a space character (ignored in case the command is executed in vertical mode, and that is not always the case). In the case of lines 2, 3, 4, etc., the end of line is converted into a space character that disappears because it follows a command name.

This code defines a command \verb that starts a group via \bgroup. At line 3, \dospecials is executed, after redefining \do. This changes the category code of all special characters (included all characters made active by packages like babel(note: )). Line 4 changes the current font to a typewriter one, and it executes a piece of code that inhibits ligatures (for instance the one that converts a double dash in an en-dash). Note that this document contains a great number of verbatim examples, either inline or as environments. In some cases, we use a smaller font; it is hence important to allow the user to parameterize commands like these. Line 5 contains three commands: The first makes an end-of-line character active (usually, it will behave like \par), the second enters so-called french spacing mode (a mode where the width of a space is constant), and the last command \@sverb will be explained later. The `s´ in the name of this command comes from the `starred´ version of \verb: If you say `\verb*+ +´, you will get `´. We have omitted the test with the star character.

On lines 7 and 8, we define a command \VBG that does nothing (i.e. expands to the empty list) and a command that evaluates to \egroup preceded by a global assignment of \VBG to nothing. On line 13, \VBG is defined as calling \verb@egroup plus some error, whose text is not shown here. Thus \VBG is a command that 1) resets \VBG to a harmless command, 2) closes the current group, 3) signals an error.

Let´s consider lines 11 and 12. We assume that the argument of \@sverb is some character c (If you say \def\foo{\verb\foo=\foo then \foo, you will get an error Improper alphabetic constant, and after that, you´re really in trouble. In the usual case, the character that follows \verb is read with category code 11 or 12, because of the code line 3.) Line 11 makes the character c active (of category 13); the category code will recover its old value at the end of the group, and line 13 changes the lc-code of the tilde character (the lc-code will recover its value at the end of the group). The lc-code of a character will be used for hyphenation, as well as conversion from upper case to lower case. We assume here, for the sake of simplicity, that hyphenation is inhibited by the use of a verbatim font. Note that Tralics does not care about subtleties like hyphenation. For this reason, when you say \verb+foo+, it will execute \verbprefix {\verbatimfont foo}. You can redefine both commands (the prefix is empty, the font defaults to \tt). Notice that Tralics grabs the argument, contrarily to LaTeX.

Line 14 contains the special command \aftergroup. This reads a token, saves it on a stack, and re-inserts it at the end of the current group.

Let´s come back to the LaTeX implementation of \verb. So far, we have read a character, changed its category code, changed the lc-code of the tilde character, changed the font and other tables, redefined \VBG, aftergrouped it (code on line 13: the token is popped at the end of the current group, that was opened on line 2, and normally closed on line 7). Line 15 is a kludge: what \lowercase does is replace in its argument every character by its lower case equivalent (using the lc-code table). The result is evaluated again. Here the argument is formed of three tokens: \let, the tilde and \verb@egroup. Since ~ is a character that has a lower-case equivalent, it will be replaced by that, namely the character c. Note: category codes are left unchanged by this procedure. It is hence important that ~ be an active character (because \let modifies that value of ~) and that c be active (otherwise, there is no meaning in changing the value of c).

Consider the case of \verb+\toto+. Here the character c is the plus sign. After line 15 has been executed, the situation is the following: all characters are of category other, ligatures are disabled, french spacing is active, current font is typewriter, a group is opened, and a token is waiting for the group to terminate. In such a situation, you cannot go outside LaTeX properly. In fact, the carriage return has been made active in order to help error recovery (this is not shown here), and the `+´ sign has been made active: this will help us. TeX sees now the following tokens \12 t11 o11 t11 o11 +13. The first five tokens are added to the current horizontal list as characters in the current font, while the last one is expanded. The expansion is that of \verb@egroup, see line 7. This defines globally \VBG, then closes the group, restoring everything. It does not restore \VBG (because the last assignment was global). After the group, the after-grouped token \VBG is evaluated but it does nothing.

So far, so good: the translation of `\verb+\foo+´ is the same as `\texttt{\char`\\foo}´. Note that the author could have entered the previous expression as `\verb-\verb+\foo+-´, or using the fancyvrb package as `|\verb+\toto+|´, but he used \quoted{\BS verb+\BS foo+}, because, in the HTML file produced by Tralics, different colors are used for verbatim material; this is explained in the second part of this document.

Consider now the following example:

\def\duplicate#1{#1#1} `\duplicate{\verb+x+}++'

You would expect `xx++´ but you get x+x+ in LaTeX, an error in tralics. Explanations: the expansion of \duplicate is verb +12 x11 +12 verb +12 x11 +12 +? +?. The last two plus signs have not been read, and their category code is still unassigned. The \verb command reads the +12 via \@sverb. It changes the category code of the plus sign. The second \verb does the same. It reads the +? as a +13, this finishes evaluation of \verb. The second \verb command does the same. In the case where you replace ++ by --, the \verb command will see an end of line character before a plus character and complain with LaTeX Error: \verb ended by end of line.

Consider now the following example:

\def\braceme#1{{#1}} `\braceme{\verb+x+}++'

You get the following error LaTeX Error: \verb illegal in command argument. Let´s try to see how this is done. The expansion of \braceme produces the following tokens: {1 verb +12 x11 +12 }2. After \@sverb has finished, the first non-inactive character is }2, this closes the current group. Hence, as above, this restores category code, fonts, lc-codes, etc. It does not restore \VBG because assignment is global (\gdef at line 13 is like \global\def). The trick is now that a \VBG token is popped from the aftergroup stack. This one calls \verb@egroup and signals an error. What \verb@egroup does is to close a group (the one opened by \braceme), and reset \VBG to something harmless. Note that TeX is in a clean mode when the error is signaled. Tralics has no such error handling mechanism (however, no category codes are changed when scanning for the end of the command, so nothing harmful can be done). What this example shows is that error recovery is not completely trivial; nevertheless nice things can be done.

Note the following special cases;

\verb test
\verb+test+
\verb^^abtest^^ab

In the first case, the delimiter is a space character; the first line is terminated by a space and you would expect it to be interpreted in the same way as the second line. The trouble is that TeX removes all spaces characters at the end of the line (regardless of category codes). The last line has also a problem: the delimiter is character 171 (double hat mechanism), and one \verb has changed category codes, the double hat sequence is not seen any more as such, and an error is signaled.

There is a variant to \verb, it is the `verbatim´ environment. The classical exercise is: write a command that reads everything up to \end{verbatim} (backslash and braces are of category 12 in this token list). There are different packages that solve this problem; For instance fancyvrb is one of them. A solution is also given in the first chapter. It does not allow an optional space after `\end´.

We give here the LaTeX implementation of the \end command.

\def\end#1{%
  \csname end#1\endcsname\@checkend{#1}%
  \expandafter\endgroup\if@endpe\@doendpe\fi
  \if@ignore\@ignorefalse\ignorespaces\fi}

As you can see, if you say \end{foo}, then \endfoo is executed first. After that the current environment in \@currenvir is compared with the argument, in case of error the variable \on@line contains the start line of the environment. After that, the group is terminated, and we have two tests. The first uses \expandafter, this means that the command \@doendpe is executed outside the environment in the case where the variable \if@endpe is true inside the environment. This command is very complicated (it redefines \par and modifies \everypar), and not implemented in Tralics; the effect is to suppress the indentation of the following paragraph. On the other hand, the two commands \@ignoretrue and \@ignorefalse redefine \if@ignore globally, so that no \expandafter is needed for this one.

This is an example of \aftergroup.

\def\lrbox#1{%
  \edef\reserved@a{%
    \endgroup
    \setbox#1\hbox{\begingroup\aftergroup}%
    \def\noexpand\@currenvir{\@currenvir}%
    \def\noexpand\@currenvline{\on@line}%
  }%
  \reserved@a
  \@endpefalse \color@setgroup \ignorespaces}
\def\endlrbox{\unskip\color@endgroup}

The effect of the \edef command is to replace the previous definition by the following (where `17´ is to be replaced by the current line number). One important point here is that implementing colors in LaTeX is non trivial, and for this reason, there are two hooks (the commands with the name `color´, that do nothing if the package is not loaded). Colors are not implemented in Tralics.

\def\lrbox#1{%
  \endgroup
  \setbox#1\hbox{\begingroup\aftergroup}%
  \def\@currenvir{lrbox}%
  \def\@currenvline{ on input line 17}%
  \@endpefalse \color@setgroup \ignorespaces}

The order of evaluation is the following. Assume that the current environment is X. The \begin command opens a group via \begingroup and changes the environment name to `lrbox´. The command starts with \endgroup, closing this group. After that, we put something in the box whose number is the argument of the environment; the content is a hbox, whose start is defined by the brace (and this brace is a group); we start a group with \begingroup, and call \aftergroup. This pushes a brace on the stack; this brace indicates the end of the hbox, but it will be evaluated later. After that, we change again the name of the current environment (it was restored to X by the \endgroup, but we made a copy of it in the \edef). When the end of the environment is reached, the following happens. First, the end-code is executed (this removes space at the end of the box), and \endgroup is executed. As a side-effect this restores the current environment name to X. It also pops the after group stack, namely the closing brace that terminates the \hbox. One important point here is that the \setbox assignment is done outside the environment (it could done inside, with a \global prefix). Such a piece of code is illegal. The lrbox environment is not implemented in Tralics version 2.10.

2.13. Expandable tokens

Assume thar \err is un undefined command. The following code

\ifnum1=0\err1 \err1 \fi

will signal two errrors: when TeX reads the second number, it expands undefined command (hence a first error), and continues scanning, until finding the space; the test is true, hence the second error.

We give here the list of all tokens that can be expanded.

3. Mathematics

3.1. Introduction

Mathematics play a great role in TeX and Tralics. For instance, TeX has three modes: vertical mode, in which no typesetting is done, horizontal mode (where everything happens) and math mode, a mode in which special objects are handled; a two phase process converts these special objects in normal ones. Fonts to be used in math mode have special properties (see appendices F and G of the TeXbook). Not all subtleties of TeX math can be implemented in Tralics; on the other hand, the XML translation is conforming to MathML. This defines some entities, for instance in isoamsc.ent, there is a definition of &rceil; to &#x02309;. As a consequence, Tralics will translate \rceil to <mo>&rceil;</mo> or <mo>&#x02309;</mo>, depending on an option. Translation of a footnote is in general a <footnote> element, and the user can change the name of this element; this is not done for maths: the name <mo> is a constant.

The syntax of mathematics is often strange. Instead of

\math{E=\fraction{1}{2} m\superscript{v}{2}}

you say

$E={1\over 2} mv^2$

Three categories codes are defined for use in math mode, they correspond to the dollar sign (math shift), underscore character (subscript) and hat character (superscript). If you want a dollar or underscore character, you can say \$, or \_, but \^ produces an accent over what follows, not a hat character (In LaTeX, you can say \textasciicircum, provided that you can guess the name).

In the example above, we have two pseudo commands \fraction and \superscript (followed by two arguments) whereas the plain TeX version uses infix operators (placed between the arguments). The first opertr is greedy. This means that, without the braces in the example above, everything before \over would be the numerator, and everthing after it would be the denominator. On the other hand, you see sometimes 216 instead 216, when people forget braces around the superscript. The essential difference however is that arguments are typeset in different style: the nucleus (what precedes the hat operator) is typeset in text style, while numerator, denominator, superscripts and subscripts are in script style; moreover, it two objects are placed one above the other, cramped style is used used for the object that is below the other one (i.e., the denominator or a subscript). The style influences spacing; because of commands like \over, the current style is known only after the whole expression is parsed. This explains why you may see: Package amsmath Warning: Foreign command \over; \frac or \genfrac should be used instead.

TeX has also a notion of “inner” mode. Inside an inner object, you cannot put an outer one. Such a distinction exists also in HTML, where <div> is outer and <span> is inner. We explained in the previous chapter that \ifinner can be used to check whether current mode is inner or outer, and we mentioned that, outside math mode, this is not well defined in Tralics. This may produces surprising results. Consider for instance \hbox{$$}. Inner mode is the rule inside a box, and a double dollar sign signals the start of an outer (display math) formula. You would expect this expression to provoke an error. In fact, TeX assumes that you know what you do, enters inner math mode when it sees the first dollar sign, and quits when it sees the second one; this gives an empty math formula (in fact, it will contain all tokens from the \everymath hook), surrounded by some space: the value of \mathsurround (this can be set to zero using \m@th). Note that a math formula defines group: assignments made inside the formula are forgotten after full evaluation (in particular after this space is added).

The essential difference between inner (normal, inline) math and outer (display) math is that a display formula uses a line of its own (very often the formula is centered on the line). One could say that a display formula terminates the current paragraph. In fact, it is just interrupted, the paragraph continues after the formula (this is only interesting in constructions like \parshape, whose scope is the current paragraph; here a formula counts for three lines; not implemented in Tralics). The construction \hbox{$$ x$$} produces a display math formula in Tralics, instead of two empty math formulas. Before version 2.11.7, an error was signaled (because Tralics started a new paragraph at the end of the equation, and this is illegal in a box).

A display math formula can have an equation number (via commands \eqno, \leqno, \tag, \notag; these commands were not implemented in early versions, and are described in the last chapter of the second part of this report). The MathML documentation says “One of the important uses of <mlabeledtr> is for numbered equations. In a <mlabeledtr>, the label represents the equation number and the elements in the row are the equation being numbered. The side and minlabelspacing attributes of <mtable> determine the placement of the equation number.” Thus, the recommended way, for MathML, is to use a table, like this (replace ellipsis by an expression)

<mtable>
  <mlabeledtr id='e-is-m-c-square'>
    <mtd>
      <mtext> (2.1) </mtext>
    </mtd>
    <mtd>
     ...
    </mtd>
  </mlabeledtr>
</mtable>

This mechanism is not yet implemented. We do not know how to insert numbers automatically, so that the proposed solution is: you can use \label, \ref for any display math formula. This will add an id attribute to the <formula> object, which is a wrapper for the <math>.

When you say {\alpha^2}, TeX will enter math mode with an error of the form Missing $ inserted. On the other hand, Tralics will signal two errors, the first is Math only command \alpha. Missing dollar not inserted, the second is Missing dollar not inserted, token ignored: {Character ^ of catcode 7}. If you want a command that works in math mode and outside math mode, you can say:

\def\foo{\ifmmode \alpha^2 \else $\alpha^2$\fi}

This can be generalised, using the following command

\DeclareRobustCommand{\ensuremath}{%
  \ifmmode
    \expandafter\@firstofone
  \else
    \expandafter\@ensuredmath
  \fi}
\long\def\@ensuredmath#1{$\relax#1$}

The purpose of the \relax on the last line is for the case of an empty argument: we do not want \ensuremath{} to expand to $$. Note that the argument is handled only once (i.e., \ensuremath does not read it, but calls a helper), because of subtle bugs, see latex bugs data base amslatex/2104. We shall say later `Mode independent commands are interpreted as usual´, this implies that the \relax token will do nothing. We shall see later that, in non-mathml mode, \relax appear in the result unless it is the first in the list. Other commands, not listed in this chapter, may signal an error. For instance, \par is forbidden. Note that \mathchar provokes an Unimplemented command error. If you want a random Unicode character, you should use commands like \mathmi, \mathmo, \mathmn. You can also define a command via \chardef or \mathchardef (the result is the same), and use it, the result is always a <mi> element. The following example shows that \amp produces an ampersand sign in some case, it must be used with care.

\chardef\AAA"1000
\chardef\CCC`x
\mathchardef\BBB"2000
$\mathbf{x\AAA\BBB\CCC} \mathmi{foo}\mathmo{\&\#666;}\mathmo{\amp\#777;}$

Translation

<formula type='inline'>
  <math xmlns='http://www.w3.org/1998/Math/MathML'>
    <mrow>
      <mi mathvariant='bold'>x</mi>
      <mi>&#x1000;</mi>
      <mi>&#x2000;</mi>
      <mi>x</mi>
      <mi>foo</mi>
      <mo>&amp;#666;</mo>
      <mo>&#777;</mo>
    </mrow>
  </math>
</formula>

Because a math expression translates as <math> inside a <formula>, and that the math has a long namespace attribute, examples will never fit on a single line. In order to make the result easier to read, we have inserted some newline characters, and reindented all these examples. Two consecutive newline characters are scanned by TeX as space plus \par. This space is ignored by TeX (see TeXbook, the text between exercises 14.12 and 14.13). Hence the general rule in Tralics: when a <p> element is ended, a trailing space or newline is removed from the content of the element, a newline character is added to the parent of the <p>. As a result, you will very often see <p> at the start of a line and </p> at the end of a line in a XML file generated by Tralics.

Consider the following simple example:

$\alpha$ and $$\beta \label{foo}$$

The translation is the following

<p>
 <formula type='inline'>
  <math xmlns='http://www.w3.org/1998/Math/MathML'>
   <mi>&alpha;</mi>
  </math>
 </formula> and</p>
<formula id='uid1' type='display'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
  <mi>&beta;</mi>
 </math>
</formula>

You can also say

\(\alpha\) and \[\beta \label{foo}\]

The result is exactly the same. In LaTeX, the commands \(, \), \[ and \] test the current mode. No such test is done by Tralics. The LaTeX implementation of \[ is a bit strange. If the formula is in vertical mode, it will be preceded by a box of width .6\linewidth containing nothing (except two \hss commands to fill it) preceded by the current paragraph indentation. The command \] executes \ignorespaces. As you can see, there is some difference between a single dollar and a double dollar. In the first case, we are in normal math mode, otherwise in display math mode. One difference is the initial style: it is \textstyle (for normal mode) and \displaystyle otherwise (this will be explained later). A second difference is that the \everymath or \everydisplay token list is inserted when scanning the formula depends on the mode. The third difference is specific to Tralics. A display math formula is never `trivial´ (see section 3.5), it can have a label (not more than one): in this case, the <formula> element has an id attribute. In any case, the <formula> element has a type attribute that explains that the formula is inline or display. A non-display formula starts a paragraph; a display math formula cannot appear in a paragraph (the equivalent of \par is executed), if the first non-space token (after expansion) that follows the math formula is not \par, a \noindent token will be inserted (see line 34 of the transcript at page 3.3). Note that, in TeX, a math formula does not end a paragraph, in the sense that a \parshape is valid across math formulas; however what precedes the formula is split into lines, according to parameters in force at the start of the formula. Tralics does not split paragraphs into lines, and does not implement use \parshape.

3.2. The basic objects

The following environments are recognized outside math mode, and produce a math formula: eqnarray*, align*, aligned, split, multline, equation*, math and displaymath. When Tralics sees a dollar character, it looks at the next character (without expansion). If this is a dollar sign, it will be read, and display math mode is entered, otherwise, normal math mode is entered. All environments shown above start display math mode (except math, which enters normal math mode). The environments math and displaymath are equivalent to \(...\) and \[...\] respectively. The environments eqnarray, and split are implemented as arrays. There is no difference between

\begin{eqnarray} a&b\\ c&d \end{eqnarray}
\begin{split} a&b\\ c&d \end{split}

and

\[\begin{array}{rcl} a&b\\ c&d \end{array}\]
\[\begin{array}{rl} a&b\\ c&d \end{array}\]

Environments equation and align are translated as normal math. A star after the environment name is ignored. In the case of normal math mode, the content of the token list \everymath is inserted before the formula, for displaymath it is \everydisplay. For instance, if you say

\everymath={(N)\ }
\everydisplay={(D)\ }
$\alpha$ and $$\beta$$

the translation will be

<p>
 <formula type='inline'>
  <math xmlns='http://www.w3.org/1998/Math/MathML'>
   <mrow>
    <mo>(</mo><mi>N</mi><mo>)</mo><mspace width='6pt'/>
    <mi>&alpha;</mi></mrow></math></formula> and</p>
<formula type='display'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
  <mrow>
   <mo>(</mo><mi>D</mi><mo>)</mo><mspace width='6pt'/>
   <mi>&beta;</mi>
  </mrow>
 </math>
</formula>

In TeX, you can put anything inside a math formula, provided it is hidden in a box; this is not possible in Tralics, because we want the XML result to be conforming to MathML. We shall list here all commands valid in math mode, and explain later on how they are translated.

Commands \limits, \nolimits and \displaylimits can be used just after an operator and before subscripts or supscripts, as in \int \limits _x. They are curently ignored by Tralics.

The following environments are recognized: array, matrix, pmatrix, bmatrix, Bmatrix, vmatrix, Vmatrix. All these environments produce arrays. For the first, an argument is required, explaining how cells are aligned. For all other environments, cells are centered. Environments of the form Xmatrix have fences, an implicit \left and \right. In order: parentheses, braces, brackets, simple bars, double bars. There is also an environment cases, with two columns, left aligned, that has an open brace as left delimiter, an empty right delimiter. Example

$\begin{array}{lcr}a&b&c\end{array}
\begin{bmatrix}d&e\\f&g\end{bmatrix}$

The translation is the following.

<formula type='inline'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
  <mrow>
    <mtable>
     <mtr>
      <mtd columnalign='left'><mi>a</mi></mtd>
      <mtd><mi>b</mi></mtd>
      <mtd columnalign='right'><mi>c</mi></mtd>
     </mtr>
    </mtable>
   <mfenced open='{' close='}'>
    <mtable>
     <mtr>
      <mtd><mi>d</mi></mtd>
      <mtd><mi>e</mi></mtd>
     </mtr>
     <mtr>
      <mtd><mi>f</mi></mtd>
      <mtd><mi>g</mi></mtd>
     </mtr>
    </mtable>
   </mfenced>
  </mrow>
 </math>
</formula>

The following delimiters are recognized: <, >, ., (, ), [, ] |, \{, \}, \langle, \rangle, \lbrace, \rbrace, \lceil, \rceil, \lgroup, \rgroup, \lfloor, \rfloor, \lmoustache, \rmoustache, \vert, \Vert, \uparrow, \downarrow, \updownarrow, \Uparrow, \Downarrow, \Updownarrow. A delimiter is anything that can follow \left or \right. For MathML, this has to be a character. As the following example shows, we use in most cases a character entity.

$\left\lceil \left\uparrow x\right\}\right.$
$\lceil \uparrow x\}$

The translation is

<formula type='inline'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
   <mfenced open='&lceil;' close='.'>
     <mfenced open='&uparrow;' close='&rbrace;'>
       <mi>x</mi></mfenced></mfenced></math></formula>
<formula type='inline'>
  <math xmlns='http://www.w3.org/1998/Math/MathML'>
    <mrow><mo>&lceil;</mo><mo>&uparrow;</mo><mi>x</mi><mo>}</mo>
    </mrow></math></formula>

This is the list of commands allowed in math mode, as well as in text mode: \dots, \ldots, \quad, \qquad, \␣, \$, \%, \&, \!, \, \{, \}, \i, \sharp, \natural, \flat, \_. The following commands produce space: \;, \:, \>. Note that \! produces a negative space in math mode, nothing outside math mode. Example of use:

\def\alist{\i\j\$\,\_\&\{\}\%\ \^^J\^^I\^^M\!}
\def\blist{\quad,\qquad,\dots,\sharp,\natural,\flat}
\alist\blist
$\alist\blist$

This is the translation, with nobreak space replaed by tilde:

&#x131;j$ _&amp;{}%    ~~~,~~~~~~,...,&#x266F;,&#x266E;,&#x266D;
<formula type='inline'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
  <mrow><mo>&inodot;</mo><mi>j</mi><mi>$</mi><mspace width='0.166667em'/>
  <mo>~</mo><mo>&amp;</mo><mo>{</mo><mo>}</mo><mo>%</mo>
  <mspace width='6pt'/><mspace width='6pt'/>
  <mspace width='6pt'/><mspace width='6pt'/>
  <mspace width='-0.166667em'/><mspace width='1.em'/><mo>,</mo>
  <mspace width='2.em'/>
  <mo>,</mo><mo>&ctdot;</mo><mo>,</mo><mo>&sharp;</mo><mo>,</mo>
  <mo>&natur;</mo><mo>,</mo><mo>&flat;</mo></mrow>
 </math>
</formula>

We give here the list of all symbols that have a translation of the form <mi>&alpha;</mi>. They are of type Ord (ordinary symbol). We start with the lower case Greek letters: \alpha, \beta, \gamma, \delta, \epsilon, \varepsilon, \zeta, \eta, \theta, \iota, \kappa, \lambda, \mu, \nu, \xi, \pi, \rho, \sigma, \tau, \upsilon, \phi, \chi, \psi, \omega, \varpi, \varrho, \varsigma, \varphi, \vartheta, \varkappa, then upper case Greek letters: \Gamma, \Delta, \Theta, \Lambda, \Xi, \Sigma, \Upsilon, \Phi, \Pi, \Psi, \Omega, then other symbols: \hbar, \ell, \wp, \Re, \Im, \partial, \infty, \emptyset, \nabla, \surd, \top, \bottom, \bot, \angle, \triangle. Example

$\alpha\Gamma \surd$

This translates as

<formula type='inline'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
  <mrow><mi>&alpha;</mi><mi>&Gamma;</mi><mi>&radic;</mi>
  </mrow></math></formula>

Next comes the list of all symbols whose translation is like log. There are of type Ord (ordinary symbol), though they should be Op (large operator). The list is divided in two parts: these have movable limits: \det, \gcd, \inf, \injlim, \liminf, \limsup, \max, \min, \sup, \projlim, and these have not: \dim, \exp, \hom, \ker, \lg, \lim, \ln, \log, \Pr, \arccos, \arcsin, \arctan, \arg, \cos, \cosh, \cot, \coth, \csc, \deg, \sec, \sin, \@mod, \sinh, \tan, \tanh. Example

$\displaystyle\lim_a \liminf_a \sin_a \hom_a$

The LaTeX translation is lim a lim inf a sin a hom a , and the Tralics version is

<formula type='inline'>
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mstyle scriptlevel='0' displaystyle='true'>
<mrow>
 <msub><mo movablelimits='true' form='prefix'>lim</mo> <mi>a</mi> </msub>
 <msub><mo movablelimits='true' form='prefix'>lim inf</mo><mi>a</mi></msub>
 <msub><mo form='prefix'>sin</mo> <mi>a</mi> </msub>
 <msub><mo form='prefix'>hom</mo> <mi>a</mi> </msub>
</mrow></mstyle></math></formula>

From now on, all symbols translate into the form <mo>...</mo>. We start with symbols of type Ord. In reality, most of them they should be of type Op (large operator). \mho, \clubsuit, \diamondsuit, \heartsuit, \spadesuit, \aleph, \backslash, \Box, \imath, \jmath, \square, \cong, \lnot, \neg, \forall, \exists, \coprod, \bigvee, \bigwedge, \biguplus, \bigcap, \bigcup, \int, \sum, \prod, \bigotimes, \bigoplus, \bigodot, \oint, \bigsqcup, \smallint. Examples

$\bigcap \int\oint$

The translation is

<mrow><mo>&bigcap;</mo><mo>&int;</mo><mo>&oint;</mo></mrow>

These are of type Bin (binary operator). \triangleleft, \triangleright, \bigtriangleup, \bigtriangledown, \wedge, \land, \vee, \lor, \cap, \cup, \multimap, \dagger, \ddagger, \sqcap, \sqcup, \amalg, \diamond, \Diamond, \bullet, \wr, \div, \odot, \oslash, \otimes, \ominus, \oplus, \uplus, \mp, \pm, \circ, \bigcirc, \setminus, \cdot, \ast, \times, \star, \in. Example

$\cap \cup \wr$

The translation is

<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow><mo>&cap;</mo><mo>&cup;</mo><mo>&wr;</mo></mrow></math></formula>

These are of type Rel (relation). \propto, \sqsubseteq, \sqsupseteq, \sqsubset, \sqsupset, \parallel, \mid, \dashv, \vdash, \Vdash, \models, \nearrow, \searrow, \nwarrow, \swarrow, \Leftrightarrow, \Leftarrow, \Rightarrow, \ne, \neq, \le, \leq, \ge, \geq, \succ, \approx, \succeq, \preceq, \prec, \doteq, \supset, \subset, \supseteq, \subseteq, \bindnasrepma, \ni, \gg, \ll, \gtrless, \geqslant, \leqslant, \not, \notin, \leftrightarrow, \leftarrow, \owns, \gets, \rightarrow, \to, \mapsto, \sim, \simeq, \perp, \equiv, \asymp, \smile, \iff, \leftharpoonup, \leftharpoondown, \rightharpoonup, \rightharpoondown, \hookrightarrow, \hookleftarrow, \Longrightarrow, \longrightarrow, \longleftarrow, \Join, \longmapsto, \frown, \bowtie, \Longleftarrow,

\longleftrightarrow

, \Longleftrightarrow. Example.

$\approx\leftrightarrow\Longleftrightarrow$

Translation:

<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow><mo>&approx;</mo><mo>&leftrightarrow;</mo>
<mo>&Longleftrightarrow;</mo></mrow></math></formula>

These are of type Inner: \cdots, \hdots, \vdots, \ddots. These are of type Between (they are of type Ord in TeX, but are used as opening or closing delimiters): \Vert, \|, \vert, \uparrow, \downarrow, \Uparrow, \Downarrow, \Updownarrow, \updownarrow. These are of type Open and Close: \rangle, \langle, \rmoustache, \lmoustache, \rgroup, \lgroup, \rbrace, \lbrace, \lceil, \rceil, \lfloor, \rfloor.

The following characters are classified as `small´: <>,.:;*?!x, these are classified as `small-l´ and `small-r´: ()[], the vertical bar is small-l, these are bin: +/ and the equals sign is of type Rel. Note: what you see here as x is in reality the character 215. It cannot be printed in verbatim mode by LaTeX.

$<>,.:;*?!x ()[]|+-/=$

Translation:

<formula type='inline'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
   <mrow><mo>&lt;</mo><mo>&gt;</mo><mo>,</mo><mo>.</mo><mo>:</mo>
      <mo>;</mo><mo>*</mo><mo>?</mo><mo>!</mo><mi>&times;</mi><mo>(</mo>
      <mo>)</mo><mo>[</mo><mo>]</mo><mo>|</mo><mo>+</mo><mo>-</mo>
      <mo>/</mo><mo>=</mo>
   </mrow></math></formula>

The following commands are used for accents: \acute, \grave, \mathring, \ddddot, \dddot, \ddot, \tilde, \widetilde, \bar, \breve, \check, \hat, \widehat, \vec, \overrightarrow, \overleftarrow, \underrightarrow, \underleftarrow, \dot.

The following commands are special. They will be explained later: \overline, \underline, \stackrel, \underset, \overset, \mathchoice, \frac, \overbrace, \underbrace, \genfrac, \dfrac, \tfrac, \sqrt, \root.

3.3. Parsing a math formula

This is a non-trivial operation, for this reason in verbose mode, the math expression will be printed on the transcript file. For instance, given

\tracingall
$\begin{cases} x &y\\a&b \end{cases} \mkern18mu x^{ {2 }}!$

whose translation in no-mathml mode is

<texmath type='inline'>
 {\left\rbrace \begin{array}{ll} x &amp;y\\a&amp;b \end{array}\right.}
 \hspace{10.0pt}x^{ {2 }}!
</texmath>

the transcript file will contain

1 {math shift character $}
2 +stack: level + 2 for math entered on line 2
3 +stack: level + 3 for math entered on line 2
4 \cases ->\left \{\begin {array}{ll}
5 +stack: level + 4 for math entered on line 2
6 +stack: level + 5 for cell entered on line 2
7 +stack: level + 6 for math entered on line 2
8 +stack: level - 6 for math from line 2
9 +stack: level - 5 for cell from line 2
10 +stack: level + 5 for cell entered on line 2
11 +stack: level - 5 for cell from line 2
12 +stack: level + 5 for cell entered on line 2
13 +stack: level - 5 for cell from line 2
14 +stack: level + 5 for cell entered on line 2
15 \endcases ->\end {array}\right .
16 +stack: level - 5 for cell from line 2
17 +stack: level - 4 for math from line 2
18 +stack: level - 3 for math from line 2
19 +scanint for \mkern->18
20 +scandimen for \mkern->18.0mu
21 +stack: level + 3 for math entered on line 2
22 +stack: level - 3 for math from line 2
23 +stack: level + 3 for math entered on line 2
24 +stack: level + 4 for math entered on line 2
25 +stack: level - 4 for math from line 2
26 +stack: level - 3 for math from line 2
27 +stack: level - 2 for math from line 2
28 Math: $\begin {cases}{\left\{\begin {array}{ll} x &y\\a&b\end{cases}
29 \end {array}\right.} \mkern\hspace{10.0pt}x^{ {2 }}!$
30 +scanint for \hspace->10
31 +scandimen for \hspace->10.0pt
32 {scanglue 10.0pt\relax }
33 Realloc xml math table to 20
34 {Push p 1}

We shall explain for each line in the transcript file where it comes from. Math mode scanning is entered when the translator sees a math shift character (line 1). The scanner reads some tokens and puts them in a list. The list is printed at the end (lines 28-29). The start of the formula is a bit special, in that the token that follows the first dollar sign is considered unexpanded when we check for a double dollar sign. A new group is entered, before scanning the whole formula (line 2).

The loop is as follows:

We give here an example with some fonts.

$\mathtt{Ab}\mathcal{Cd}\mathbf{Ef}\mathrm{Gh}\mathit{Ij}
\mathbb{Kl}\mathsf{Mn}$

The translation is as follows. You can notice that some variants affect only uppercase letters.

<formula type='inline'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
  <mrow>
   <mi mathvariant='monospace'>A</mi>
   <mi mathvariant='monospace'>b</mi>
   <mi>&Cscr;</mi>
   <mi>d</mi>
   <mi mathvariant='bold'>E</mi>
   <mi mathvariant='bold'>f</mi>
   <mi> G </mi>
   <mi> h </mi>
   <mi>I</mi>
   <mi>j</mi>
   <mi>&Kopf;</mi>
   <mi>l</mi>
   <mi mathvariant='sans-serif'>M</mi>
   <mi mathvariant='sans-serif'>n</mi>
  </mrow>
 </math>
</formula>

3.4. Translation of arrays

Whenever we see an array (this can be a global environment like eqnarray or a local one, like array), we translate all cells one after the other. The character & is the cell separator. The command \\ is the row separator. In the case where an array ends with a \\, this gives an empty row: it will be removed. Each cell has an alignment, left, right, or center. An attribute is added only if this is not center. The array environment has an argument that explains the type of the columns (columns not indicated are centered). The default alignment is `rl´ for split and align, `rcl´ for eqnarray, centered for matrix. You can use \multicolumn. This command takes three arguments: the span which should be some integer, then the alignment (one of r, l or c) and the content of the cell. The program may signal errors in case of wrong syntax. Here is an example:

$\begin{array}{rcl}
a&b&c&d\\
A&\multicolumn{1}{r}{B}&C&D\\
\end{array}$

This is the translation of the array.

<mtable>
 <mtr>
  <mtd columnalign='right'><mi>a</mi></mtd>
  <mtd><mi>b</mi></mtd>
  <mtd columnalign='left'><mi>c</mi></mtd>
  <mtd><mi>d</mi></mtd>
 </mtr>
 <mtr>
  <mtd columnalign='right'><mi>A</mi></mtd>
  <mtd columnalign='right' columnspan='1'><mi>B</mi></mtd>
  <mtd columnalign='left'><mi>C</mi></mtd>
  <mtd><mi>D</mi></mtd>
 </mtr>
</mtable>

3.5. Trivial math

If you say `$x$ and $123$´, the translation will be

<p><formula type='inline'><simplemath>x</simplemath></formula> and 123</p>

Initially, we found this a good idea; because this can easily be converted in HTML into <i>x</i>. Moreover `$2^{i\grave eme}$´ gives

<temporary>2<hi rend='sup'>e</hi></temporary>

Here the <temporary> element will not show in the XML tree, but is printed on the terminal if Tralics is called with the `interactivemath´ switch. If you invoke Tralics with the `-notrivialmath´ switch, these hacks are not tried, and the formula translates into:

<formula type='inline'>
  <math xmlns='http://www.w3.org/1998/Math/MathML'>
   <msup>
    <mn>2</mn>
    <mrow>
     <mi>i</mi>
     <mover accent='true'><mi>e</mi> <mo>&grave;</mo></mover>
     <mi>m</mi>
     <mi>e</mi>
    </mrow>
   </msup>
  </math>
</formula>

There are three hacks: the first is when the formula contains only a letter, the second is when the formula contains only digits, and the last one is when people use a math formula instead of \textsuperscript. This hack is applied only if the math formula starts with digits (no digit at all is OK; braces are ignored) followed by a exponent marker, followed by a special exponent; this has to be a single token or a token list. In the case of a single token, the hack is applied only if this is e or o. Typically, it applies in cases like 2e and No. In the case of more than one token, it applies when the exponent is `th´, `st´, `rd´ and `nd´, for cases like 1st, 2nd, 3rd, and 4th. There are four rules for French: `e´, `eme´, `ieme´, `ème´ and `ième´ convert to `e´, `ier´ and `er´ convert to `er´, `iemes´, `ièmes´ and `es´ convert to `es´, `ère´ and `re´ convert to `re´. The accented letter can be given as è, or \`e or \`{e} or \grave{e} or \grave e. The hack is applied in a case like:

$2 ^{\text{\small\rm \grave ere}} $

Instead of \text, \hbox can be used. Instead of \small or \rm any font change or font size command can be used. Up to two commands can be given. The original Perl version had 30 exceptions, including $\Sigma{}^{{\rm it}}$ and \ddot{\rm o}. Compare Σ it with Σit and o ¨ with ö.

Since version 2.8, there is an integer register named \notrivialmath, that controls these hacks; it contains initially 1, it is set to zero if Tralics is called with the -notrivialmath switch, to seven if Tralics is called with the -trivialmath math switch (and to 349 if Tralics is called with -trivialmath=349). If the value is A+2B+4C modulo 8, where A, B, and C are zero (false) or one (true), then the behavior is the following (by default A is true, other flags are false).

$1^e$, $3^{eme}$ X$^{eme}$ $4^{i\grave{e}me}$
$1^{st}$ $2^{nd}$ $3^{rd}$  $4^{th}$
$x$ $1$ $\alpha$ $\pm$ $\longleftrightarrow$ $-$
$_{foo}$ $^{2+3}$  $_{\bf Foo}$
$+$ $x^{eme}$ $\log$ $_{F\bf oo}$

Translation (with MathML namespace removed), all hacks enabled:

<p>1<hi rend='sup'>e</hi>, 3<hi rend='sup'>e</hi>
    X<hi rend='sup'>eme</hi> 4<hi rend='sup'>e</hi>
1<hi rend='sup'>st</hi> 2<hi rend='sup'>nd</hi> 3<hi rend='sup'>rd</hi>
    4<hi rend='sup'>th</hi>
<formula type='inline'><simplemath>x</simplemath></formula>
   1 &alpha; &pm; &longleftrightarrow; &#x2013;
<hi rend='sub'>foo</hi> <hi rend='sup'>2+3</hi>
   <hi rend='sub'><hi rend='bold'>Foo</hi></hi>
<formula type='inline'><math><mo>+</mo></math></formula>
<formula type='inline'><math><msup><mi>x</mi>
    <mrow><mi>e</mi><mi>m</mi><mi>e</mi></mrow> </msup></math></formula>
<formula type='inline'><math><mo form='prefix'>log</mo></math></formula>
<formula type='inline'><math><msub><mrow></mrow>
     <mrow><mi>F</mi><mi mathvariant='bold'>o</mi>
  <mi mathvariant='bold'>o</mi></mrow> </msub></math>
</formula></p>

3.6. Conversion to XML

In the case where the value of the counter \@nomathml is negative, then the translation is a <texmath> element containing all tokens of the math list. For instance,

\csname@nomathml\endcsname=-1
$\begin{pmatrix}
\binom 12&\int_0^\infty f(x)dx\\[2cm]
\mathfrak{W}_2&\text{xyz}=\sqrt{xxyyzz}
\end{pmatrix}$

translates as

<p><texmath type='inline'>\begin{pmatrix}
\genfrac(){0.0pt}{}{1}{2}&amp;\int _0^\infty f(x)dx\\[2cm]
\@mathfrak W\@mathit _2&amp;\text{xyz}=\sqrt{xxyyzz}
\end{pmatrix}</texmath></p>

In all other cases we use a highly recursive procedure that converts a math list into a formula. The procedure takes as argument the current style. This is one of D, T, S, or SS (display, text, script, or script script style). It is D for a display math formula, T for a normal formula.

Consider first the case where the formula has an \over, or a variant, not hidden inside braces. This example has 6 subexpressions, each of them have such an operator.

${a\over b}{a\above2mm b}{a\atop b}
{a\overwithdelims[] b}{a\abovewithdelims[]2mm b}{a\atopwithdelims[] b}$

The translation is

<formula type='inline'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
  <mrow>
   <mfrac><mi>a</mi> <mi>b</mi></mfrac>
   <mfrac linethickness='2mm'><mi>a</mi> <mi>b</mi></mfrac>
   <mfrac linethickness='0.0pt'><mi>a</mi> <mi>b</mi></mfrac>
   <mfenced open='[' close=']'>
       <mfrac><mi>a</mi> <mi>b</mi></mfrac></mfenced>
   <mfenced open='[' close=']'>
       <mfrac linethickness='2mm'><mi>a</mi><mi>b</mi></mfrac></mfenced>
   <mfenced open='[' close=']'>
       <mfrac linethickness='0.0pt'><mi>a</mi> <mi>b</mi></mfrac></mfenced>
  </mrow>
 </math>
</formula>

It is an error if the formula has more than one such operators. Otherwise, we have two parts: what precedes the operator and what follows the operator. As the example shows, some operators need delimiters. Other operators read a dimension. This dimension must be given explicitly as a sequence of digits and a unit of measure (we could do better; if you want \parindent instead of 2mm, you should use \genfrac instead). After splitting the formula into two parts, the same idea than \genfrac is used. If the current style is C, the next style in the list is used for both parts of the formula (if the style is D or T, the next style is S, otherwise it is SS). Note that \choose is like \over, you should use \binom instead.

We assume from now on that the formula contains no more operators like \over. This means that the current style can be used for the current object. Items are handled as follows:

3.7. Final math mode hacks

Before we forget it: when the formula is completely translated, we have a list of XML elements. If the list is empty, the result is <mrow/>. For instance, in the case of x^{}, then exponent is empty. If the list has a single XML token, this will be the result. Otherwise, everything is put in a <mrow>. If the current formula, or subformula contains a style change, it is put in a <mstyle> element. This is not always the good solution, because the same style is used for everything, what precedes and what follows the style command. If you look at the \genfrac example above, you can see that styles are added by the \genfrac interpreter (the single TeX switch is associated with two MathML attributes).

If we have a formula, of the form $_x^{2}_{abc}$, the translation rules explained so far tell us that we have: an underscore character, an XML element for x, a hat character, an XML element for {2}, an underscore, and an XML element for {abc}. We may have \nonscript tokens; they will be removed, as well as a space that follows. We have to evaluate the commands that control subscripts and superscripts. A hat character gives <msup>, an underscore character gives <msub>, and both give <msubsup>. It is possible for a formula to start with an underscore or a hat: in this case, the kernel is empty. It is not possible for a formula to end with hat or underscore. A kernel can have at most one subscript and at most one superscript; hence the formula above is wrong: the letter x is the first subscript to the empty kernel. A valid formula is for instance $_yx^2$. It translates as

<mrow>
  <msub><mrow></mrow> <mi>y</mi> </msub>
  <msup><mi>x</mi> <mn>2</mn> </msup>
</mrow>

We have mentioned above that some operators can be flagged as left, right, and that adding \bigr may convert a left operator into a right operator. There is a magic that converts, in some cases, the \big operator into fences. For instance

$\bigl [ A\big ( x^2 \big) B \bigr[  $

translates as

<mfenced open='[' close='['>
  <mi>A</mi>
  <mfenced open='(' close=')'><msup><mi>x</mi> <mn>2</mn> </msup></mfenced>
  <mi>B</mi>
</mfenced>

There is another trick, that works in some cases. Consider:

$\int_0^\infty f(x) dx = \big[  U \big ]$

the translation is

<mrow>
 <msubsup><mo>&int;</mo> <mn>0</mn> <mi>&infin;</mi> </msubsup>
 <mrow>
   <mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mi>d</mi><mi>x</mi>
 </mrow>
 <mo>=</mo>
 <mfenced open='[' close=']'><mi>U</mi></mfenced>
</mrow>

The interesting point here is the placement of the inner \mrow. The idea is that the parentheses should remain small (not larger than the \mrow). In particular, it should not be influenced by the integral that precedes and the fence that follows. In some cases, it works.

3.8. Extensions

In Tralics, you can use the following three commands \mathmo, \mathmi, and \mathmn. They take an argument and produce a <mo>, <mi>, or <mn>. There is a file tralics-iso.sty that contains

\def\makecmd#1{\expandafter\newcommand\csname math#1\endcsname}
\def\makemo#1#2{\makecmd{#2}{\mathmo{\amp\##1;}}}
\def\makemi#1#2{\makecmd{#2}{\mathmi{\amp\##1;}}}
\def\makemn#1#2{\makecmd{#2}{\mathmn{\amp\##1;}}}

Then you can say \makemo{x02190}{slarr}, and this will define a command \mathslarr, whose translation (in math mode only) is <mo>&#x02190;</mo>. The file provides nearly 2000 such definitions, taken from the MathML entity files, with the MathML names. These commands can be used instead of TeX commands like \mathchar: remember that a math-char is a 15bit integer, where 8 bits are used for the position in a font table, 3 bits for the type, and 4 bits for the family. Only three types are defined for Tralics, but the content of the element is arbitrary (most math symbols are between U+2100 and U+27FF, there are also letters between U+1D400 and U+1D7FF). There is a command \mathattribute that adds an attribute pair to the last created math element. You can say for instance

\providecommand\operatorname[1]{%
  \mathmo{#1}%
  \mathattribute{form}{prefix}%
  \mathattribute{movablelimits}{true}%
}

After that,

$\min _xf(x) >\operatorname{min} _xf(x)$

translates as

<formula type='inline'>
 <math xmlns='http://www.w3.org/1998/Math/MathML'>
  <mrow>
   <msub><mo movablelimits='true' form='prefix'>min</mo> <mi>x</mi> </msub>
   <mrow>
    <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>&gt;</mo>
   </mrow>
   <msub><mo movablelimits='true' form='prefix'>min</mo> <mi>x</mi> </msub>
   <mrow>
    <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo>
   </mrow>
  </mrow>
 </math>
</formula>

The command \DeclareMathOperator takes two arguments (say `foo´ and `bar´), with an optional star before the first argument. It defines \foo to be the command \operatorname applied to `bar´ (with a star when required). The command \operatorname is as shown above (the movablelimits attribute is only added if the command is followed by a star).

You can use the command \mathchardef. This is like \chardef, it reads a command and a number. The number should fit on 15 bits. Otherwise, you will see an error of the form: Bad mathchar replaced by 0: 1234567. The \mathchardef command reads a command, say \foo, and an integer N; there is no difference between \foo and \mathcharN, except that \the\foo returns the integer N, and is faster to parse. Some constants, like \@cclvi=256, are defined in this way by the TeX kernel and should not be used as math characters. Some commands, like \eta=11116, are meant to be used as a math character. In Tralics, until version 2.8 an error will be signaled. In version 2.9, the translation, in math mode, is a <mi> element containing this character; you might say \mathchardef\eta"3B7. Outside math mode, this gives an error: that takes the form Undefined command \eta; command code = 264, instead of Math only command \theta. Missing dollar not inserted; inside math mode, the behavior is the same as the standard one.

TeX has a special register called \fam. If you say something like

\fam3 ${\fam9 \the\fam}\ \the\fam$

then the second \the expands to minus one. The first gives 9, but LaTeX complains with: \textfont 9 is undefined (character 9). In Tralics, you would see

<mrow><mn>9</mn><mspace width='6pt'/><mn>3</mn></mrow>

As the example shows, the family is unused, and not correctly restored. Each character has a \mathcode. The following

\mathcode`\a="0941 $a\the \mathcode`\a$

is interpreted by Tralics as $a2369$. However TeX complains, with \textfont 9 is undefined (character A), because you ask the lower case letter a to be printed like the upper case letter A with textfont 9. A mathcode is a 15bit integer, with an exception: a character whose mathcode is 32768 behaves like an active character, the action associated to it must be defined somehow, for instance like this:

{\catcode`\'=\active \global\let'\active@math@prime}

There is a command \delimiter, it reads a number, but you cannot use it. There is a command \radical, it reads a number, then signals an error. The \mathaccent command is similar.

There are commands \raise and \lower, as well as \vcenter. The last one is not implemented in Tralics. The translation of

a\raise2cm\xbox{foo}{bar}\lower 2pt\xbox{xfoo}{xbar}

is

<p>a<foo>bar</foo><xfoo>xbar</xfoo></p>

As you can see, the specification disappear. Maybe in a future version, we will add an attribute to the box. You cannot use these commands in math mode in Tralics. In TeX, you can get an error of the form: You can´t use `\raise´ in vertical mode, while \vcenter is a math only command. Currently \indent and \noindent are ignored in math mode (in TeX  $\indent_b$ produces a kernel and an index; the kernel is an empty box of width \parindent, of type Ord).

4. Translating a bibliography

4.1. Introduction

As said in [6], “citations are cross-references to bibliographical information outside the current document, such as to publications containing further information on a subject and source information about used quotations. [...] There are numerous ways to compile bibliographies and reference lists. They can be prepared manually, if necessary, but usually they are automatically generated from a database containing bibliographic information.”

There are different ways to cite an author, or a text or a specific part of a text. The easiest way (for an automated system) is to use numbers, as above; if you are reading an interactive version of this document, you can click on the number, and you will see the entry in the bibliography, at the end of the document (between the index and the table of contents). This is standard practice; recommendations for a book series say: References are cited in the text simply as numbers in square brackets, e.g. [165], do not use the abbreviations “Ref./Refs/” in the middle of a sentence. Only at the beginning of a sentence should you write “Reference [165]”. In some cases, you can see `[17, p23]´, as the result of `\cite[p23]{foo}´; this means page 23 of the reference numbered 17. A bit more sophisticated are references like `[GMS93]´ instead of `[2]´ for a book by Goosens, Mittelbach and Samarin published in 1993. Computing the key is not obvious, because, if you cite a book by, say, Goethe, Molière and Shakespeare in 1793, the key will be the same, and a post-processor has to add a suffix (typically, this is done by a couple of routines named forward.pass and reverse.pass in a bst file). Sometimes, a more explicit scheme is use, for instance `Knuth, The Art Of ..., Algorithm P´, in the text, and the full reference can be found in the bibliography. A text of R. Ridolfi can be cited as `Vita di Girolamo Savonarola, 5e éd, Florence, 1974, t. II., p. 182-183´. Note that the name of the author appears before the citation, and is not repeated inside it. In some books, citations are given as footnotes, and you can often see `ibid.´, meaning the previous cited text. These kinds of things are generally hard to fully automate. For this reason, only a simple scheme is provided by Tralics: a link to a bibliography section via a key.

The problem is essentially the following: The LaTeX source file contains a given number of citations, introduced by the \cite command or a variant. Each command defines one or more references. For each reference, a key has to be computed and typeset, an item added to the bibliography, and a link created. In LaTeX, the document has in general to be processed three times; the first run will print \citation{companion2} in the auxiliary file. This file is processed by BibTeX, that generates a bbl file containing \bibitem{companion2}. On the second run, the bibliography is typeset, and the key is constructed; if it is 6, then \bibcite{companion2}{6} will be printed in the auxiliary file. On the last run, we know, after reading the auxiliary file, that the \cite command should typeset as 6.

The mechanism in Tralics is a bit different: there is only one run. Each \cite command produces a <cit> element, plus an entry into a biblist. At the end of the document, the bibliography is constructed, with all the necessary entries; details will be given later. This gives the equivalent of a bbl file, it is translated. The result of the translation is some XML element, that will be inserted somewhere in the main XML tree. Finally, a check is made to see if all references are defined. The mechanism is much simpler than in LaTeX; this is really because, in Tralics, you can add an element or an attribute anywhere in the tree (at the start if you like) at any moment. In TeX, on the contrary, once a paragraph is typeset, you cannot modify it, and once a page is shipped out, you cannot modify the whatsits associated to it (pages numbers, in the case of \label, \ref are computed only when the page is shipped out; they are left in a \write, which is a special kind of whatsit).

There are some tentatives to design an XML format for bibliography data bases; none of them is really satisfactory. We give an example of an entry using the DocBook syntax:

  <biblioentry id="abc123" type="book">
    <author>
      <surname>Flynn</surname>
      <firstname>Peter</firstname>
    </author>
    <title>Understanding SGML and XML Tools</title>
    <titleabbrev>SGML &amp; XML Tools</titleabbrev>
    <publisher>
      <publishername>Kluwer</publishername>
      <address>Boston</address>
    </publisher>
    <isbn>0-7923-8169-6</isbn>
    <date YYYY-MM-DD="1998">1998</date>
  </biblioentry>

This should be referenced in the text as:

<citation><biblioref linkend="abc123"/></citation>

This is the same using the TEI syntax:

  <biblFull id="abc123" rend="book">
    <titleStmt>
      <title>Understanding SGML and XML Tools</title>
      <author>
        <persName>
          <foreName>Peter</foreName>
          <surname>Flynn</surname>
        </persName>
      </author>
      <respStmt>
        <name>http://imbolc.ucc.ie/~pflynn/books</name>
      </respStmt>
    </titleStmt>
    <extent>432</extent>
    <publicationStmt>
      <publisher>Kluwer Academic Publishers</publisher>
      <pubPlace>Boston</pubPlace>
      <idno type="isbn">0-7923-8169-6</idno>
      <date value="1998">1998</date>
    </publicationStmt>
  </biblFull>

This should be referenced in the text as:

<cit><ref target="abc123"></ref></cit>

These two citations were found on the Web(note: ). The careful reader may notice that two elements are used for the citation (in the DocBook case, they are <citation> and <biblioref>, in the TEI case, they are <cit> and <ref>). Tralics uses the TEI syntax for the citation but a completely different one for the entries (the syntax is very near to BibTeX). We shall explain, in the second part of this document, Chapter 6, how to convert the Tralics DTD into the TEI DTD (at least for the bibliography). The transformation is incomplete: in BibTeX, a name has four components, and the example shows only two, surname and forName (or firstname). A non-trivial question concerns mathematics: how can we insert math formulas like H and what about special words like: “the TeXbook?”. The main reason why Tralics does not read databases written in XML is the need of an XML parser (we have written a BibTeX parser, this is more challenging).

The interaction between the main document and the bibliography is via the `cite key´ on the LaTeX level, in the XML document, this is via the Bid attribute, and for the typeset document, this is the `print key´. As an example, we shall consider a bbl file, created by Tralics, that contains

    \citation{60}{footcite:thesefabien}{bid9}{foot}{phdthesis}[Sey98] ...

This is a temporary piece of stuff, the cite key is `thesefabien´, the Bid is `bid9´, there are two choices for the print key, `60´ or `Sey98´. The XML translation is

    <citation from='foot' key='60' id='bid9' userid='footcite:thesefabien'
        type='phdthesis'>  ...

As you can see, the effective print key is `60´. We shall explain in due time all details. Let´s start with the cite key, the only quantity that the author can choose freely. For the references from the web, this key is `abc123´, this is clearly a randomly chosen value, not mnemonic at all. At the start of the chapter, we have shown a reference with key `companion2´, this is the cite key for the second version of the LaTeX companion. The cite key `thesefabien´ is for the Ph.D. thesis of F. Seyfert. There is no constraint on the cite key for LaTeX: the only important thing is that the key can be printed in the auxiliary file and read by BibTeX (some years ago, a colleague corrected \cite{Christele} to \cite{Christèle}, this gave an awful error; in current LaTeX, there seems to be no problem). On the other hand, BibTeX needs an identifier. This is a character string that does not start with a digit, and contains anything but space, tabulation, double quote, percent sign, sharp sign, backslash, comma, equals sign, braces, parentheses. For XML, there are additional constraints for an ID: it has to be unique for the whole document, and some characters like a plus sign are forbidden. In a first version, we imagined to use the `userid´: this is formed of a prefix (of the form `cite:´ or `footcite:´, thus making it unique), followed by the cite key, where forbidden characters like the plus sign were replaced by a minus sign. However, we found an example where a rather long key differed from another one only by a forbidden character. Replacement introduced a conflict. For this reason, we added the Bid: this is automatically generated, hence is clearly unique and valid. A special feature of BibTeX is that it does not create lines longer than 78 characters. It adds percent characters in a sensible position; in some cases, the choice is wrong. Here is an example:

\bibitem[13]{Bergamini-Champelovier-Descoubes-Garavel-Mateescu-Serwe-04-a}\RAs%
c{D.~Bergamini, D.~Champelovier, N.~Descoubes, H.~Garavel, R.~Mateescu,
  W.~Serwe},

As a result, you will get an error: Undefined control sequence \RAs. Note that there are few people who use such very long cite keys. A simple idea that works most of the time: use 4 letters for the first author, three letters for the others, two letters for the year, for instance `Bara-Chy-Pom02´.

As explained, Tralics cannot use an XML database; instead it will use a bbl file (this is some LaTeX file, that will be translated by Tralics). The bbl can be part of the source document; in general it will be automatically constructed by Tralics (in the current version, BibTeX or any other external program can be used instead). This bbl file should contain, for each unsolved citation, a command that solves it (either \citation which is a Tralics command, or \bibitem which is a standard LaTeX command, see section 4.2).

One question is: can the bbl contains other items, together with these \bibitem commands? If the bibliography is very long, it can be interesting to divide it into subsections, and add a comment at the start of each section; this is easy to do, if the bbl is not produced by BibTeX, or if you edit it, and if you know how to convince LaTeX not to start the bibliography with a \bibitem. In general, we have a unique `\begin{thebibliography}´ at the start, a `\end{thebibliography}´ at the end. The effect is to produce a chapter (or a section), in general unnumbered, whose name depends on the current language. In the case of the Raweb, BibTeX produces more than one such environments. In fact, three databases are used: `foot´, `refer´ and `year´. Each of the two database files `foot´ and `refer´ produce a set of references (the `foot´ bibliography was originally typeset as footnotes, via the footcite package). The third database produces a sequence of sections, such as theses, books, articles, conferences, reports, etc.(note: ) Whenever BibTeX sees an entry with a different category than the preceding entry, it prints the \end{thebibliography} followed by a \begin{thebibliography}. Note: the modified environment takes a required argument (as usual, the longest label) and an optional argument (the name of the section title; the title itself being in the class file). As a consequence, the bbl files produced by the Raweb are incompatible with standard LaTeX classes. Since year 2001, BibTeX is not used anymore for the Raweb and the XML result contains just a sequence of references. However, each entry has a category (this depends on the from and type attributes), entries are sorted by category. The style sheets that convert the XML to HTML or XSL/Format are assumed to create these sections, one for each category (see part two of this document). A nontrivial question is then to guarantee that these two style sheets use the same splitting algorithm, and the same section titles.

The `print key´ is the value that is printed on the paper or displayed on the screen. Each <citation> has a key attribute that can be used as print key. However, an XML processor may as well ignore it, and use numbers 1, 2, 3, etc. It can even sort the entries, before assigning them a number(note: ) (see part two of this document). In some cases, Tralics computes a symbolic key of the form `Sey98´. If the post-processor sorts the entries, and if the keys are not in alphabetic order, this is a bad idea.

The `key´ of an entry is a quantity defined in the database, whose purpose is to help sorting. In most cases, it is empty, (in some cases the values are junk); this value is used only in the case where no author is given (this is standard BibTeX practice, it means that this is rather useless). The `sort key´ of an entry is the character string used for sorting (this is lost(note: ); Tralics could insert it in the resulting XML; this would allow one to merge two bibliographies). In some cases, the print key is part of the sort key. Imagine for instance a book by Samarin, Mittelbach and Goossens, written in 1993. The standard key would be GMS93. Assume however that the authors are taken in the given order, so that the key would be `SMG93´. Alphabetically, this is after `Sey98´, but if we sort by authors, Samarin comes before Seyfert.

4.2. Citing a document

In this paragraph, we shall explain the commands that can be put in the source document for inserting a citation, and the companion commands that solve the reference. When the \end{document} command is about to be translated, Tralics will have created a big list (maybe empty) called the `biblist´. Each item in the list has four slots: Reference, Rtype, Bid and Definition. Here Reference is the cite key, Rtype is a subtype (when merged, these two quantities give the `userid´; this subtype is not standard LaTeX, you can ignore it. In some cases, two items with the same Reference and different Rtype are considered unequal, in some cases they are considered equal; thus, it is a bad idea to use the same cite key with different subtypes). The Bid is the unique id of the target, of the form `bid17´, and Definition is the internal number of the target of the reference (in Tralics, each XML element has an internal number). You can say: element number 25 is the target of reference `foo´ (syntax described later). This will solve the entry: If the entry with key foo has Bid 17, the action is to mark the entry as solved, and to add id=´bid17´ to the element number 25. When the end of the document is sensed, the list of unsolved entries is computed, and a request is made for constructing a bbl. A warning or an error is signaled for missing items by this construction. This bbl is then translated. It is forbidden to add unresolved entries to the list. In BibTeX, there is cross reference mechanism: if X has a cross reference to Y, then X must become before Y; when Y is read, its fields are used to fill missing fields in X. Unless cited explicitly, Y will not appear in the bibliography.

The variable distinguish_refer_in_rabib was introduced in 2006. Since this is a long name, we shall abbreviate it to DRY. If it is true, we distinguish `year´ and `refer´, otherwise there is no distinction. By default the flag is true, you can set it on the com`mand line, or a configuration file. For the case of the Raweb, three Rtypes are defined, `foot´, `year´ and `refer´. There is one command, \footcite, to cite elements with Rtype `foot´ and a command, \cite, for anything else. We generalized this mechanism: for all commands described here, there is no difference between `year´ and an empty Rtype. If DRY is false, the `refer´ is the same as `year´. In 2006, commands \yearcite and \refercite have been introduced. If DRY is false, these two commands behave the same.

The translation of `\footcite {Knuth}´ or `\footcite [p.25] {Knuth}´ is the same as `\cite [foot] [] {Knuth}´ or `\cite [foot] [p.25] {Knuth}´. The translation of `\yearcite {Knuth}´ or `\refercite {Knuth}´ is the same as `\cite [year] [] {Knuth}´ or `\cite [refer] [] {Knuth}´. These commands have an optional argument. The \cite command has two optional arguments, a type and an optional value. If only one optional argument is given, it is the value (so that `\cite [p.25] {Knuth}´ has the same meaning as in LaTeX). The translation of `\cite [x] [y] {z}´ is the same as `\cite@one {x} {z} {y}´ (note the order of the arguments). However, if you say `\cite [p.25] {Knuth,Lamport}´, the result is the same as `\cite@one {} {Knuth} {p.25}´, followed by `\cite@one {} {Lamport} {}´, said otherwise, the second optional argument applies only to the first citation. Between two \cite@one commands (that come from the same \cite) are inserted some \citepunct tokens. This is a command that can be redefined by the user. Its expansion is a comma followed by a space.

People generally say `Text\footcite{blah}´, like `Text\footnote{blah}´, without any space, because \footcite is assumed to produce a footnote; but this is not always the case; for this reason, the command \footcitepre is evaluated before insertion of the XML element associated to the citation. The default behavior is the following: if the last object on the XML tree is a normal or non-breaking space, nothing happens; otherwise, if the object is not an opening parenthesis, a space will be added. Moreover, the \citepunct is replaced by \footcitesep, a command whose translation is comma space (the idea is that you can redefine it, so that `Text\footcite{foo,bar}´ shows as `Text\textsuperscript{13,15}´, exercise left to the reader). This is a slight difference between \footcite and \cite with `foot´ as optional argument.

The command \nocite can take one optional argument (a Rtype). The effect of \nocite{foo} is the same as \cite, regarding the biblist, but it does not modify the XML tree. If you say \nocite{*}, this inserts a special marker, meaning: the whole database should be inserted. The Rtype is ignored in this case. Note that the correct behavior should be: Rtype is ignored only if one of `year´, `refer´ or `foot´.

In order to implement the natbib package, we make the following assumptions. The primitive command is \cite@one, it takes a single reference (defined by a Reference and a Rtype), inserts when needed a new item in the biblist, and construct a Bid for the reference. The command calls \leavevmode, for the case where it appears at the start of a paragraph (Remember the recommendations given above: a paragraph should start with a word, not a reference). The result of the translation is <ref target=´bid17´/>, where `bid17´ should be replaced by the value of the Bid. This element can be non-empty (it contains a note), and is the child of <cit> element, that has some attributes. The LaTeX companion, example 12-3-5, says that \citet {LGC97} should produce `Goossens et al. (1997)´. The translation by Tralics does not contain the name nor the year, so that there should be an attribute that says how parentheses are to be inserted in the final HTML or Pdf document. Another example is \citep [see] [chap. 2] {LGC97}, this produces `(see Gossens et al., 1997, chap. 2)´. This does not really fit in our model: we can put the post-note in the <ref> element, and the pre-note as an attribute. This makes these two quantities asymmetric: the pre-note must contain only characters. Consider now example 12-3-15, \citet [cf.] [p. 55] {vLeunen:92, Knuth-CT-a}. Here the pre-note is added to each citation, the post-note to the last one (the default is to put the single note on the first element). The result is `van Leunen (cf. 92); Knuth (cf. 1986, p.55)´. What Tralics should do in such a case is unclear. The file natbib.plt defines \citeyear and \citeyearpar as follows

\def\cite@type#1#2{{\def\cite@@type{#1}\cite{#2}}}
\def\citeyear{\cite@type{year}}
\def\citeyearpar{\cite@type{yearpar}}

The idea is to call \cite, the dispatcher function, and to put locally in \cite@@type the type of the citation (year, or parenthesized year). There is also \cite@prenote for the prenote. To be precise: the translation of \cite@one {bar} {foo} {p25} is <cit rend=`bar´ type=`mtype´ prenote=`mynote´ ><ref target=`bid17´ /> p25</cit>, where `mtype´ is the value of \cite@@type, `mynote´ is the value of \cite@prenote. Arguments `foo´ and `bar´ define the reference (normally, the Rtype `foo´ is empty).

You can say \XMLsolvecite*[25][bar]{foo}. The star is optional, as well as the `25´ and the `bar´. If only one optional argument is given it is the first one. This should be the identifier of an XML element (you can use \XMLlastid, the identifier of the last created element, or \XMLcurrentid, the identifier of the current element). The current element is used if the argument is missing or empty. In any case, this gives an element, say Target. The second optional argument is the Rtype. The required argument is the cite key. The result of the command is to solve the entry defined by the Reference and the Rtype. The easy case is when the reference has not yet been cited. In this case, we can use as Bid either the id of the Target, if it exists, or a new id. In this case, an attribute pair id=`Bid´ is added to the Target. If the entry exists in the biblist, it might be already solved, and you get an error of the form Already solved foo. An attribute pair id=`Bid´ is added to the Target, unless the Target has already an id, case where an error will be signaled, for instance Cannot solve (element has an Id) foo in the case

\cite{foo}\section{something}\XMLsolvecite{foo}

The problem here is the following: the section element has a Uid, this is like a Bid, it can be used as target of a \label. The XML norm forbids using two ids for the same element. Maybe in a future version, this will be allowed (it suffices to implement a double indirection mechanism). However, I doubt if this is a good idea: if you say \label{foo}, then \ref{foo} will produce a <ref> element, this is identical to the <ref> that comes from the \cite. Note that the Raweb DTD says: the target of a <ref> in a <cit> should be a <citation>.

If a star is given in \XMLsolvecite, there is a little hack. If Reference/Rtype is not found in the biblist, Tralics tries to see if there is an unsolved entry with the same Reference, Rtype arbitrary. In such a case, this entry will be solved. If there is no such entry, then a new slot is added to the reference list.

Some commands may produce strange results. Consider

\setbox0 =\hbox{\XMLsolvecite{foo}} \copy0 \copy0
\setbox1 =\xbox{Box}{\XMLsolvecite{bar}} \copy1 \copy1

This constructs two empty boxes, with an id bid0 and bid1. Since the first box is unnamed, the tag will not appear in the XML tree; and no tag implies no attribute list, so that the first line is an error. On the other hand, the second box is copied twice; hence the id bid1 appears twice in the XML tree, this is also an error (the XML is well-formed, but not valid against any DTD that says that the Bid should be an ID).

You can say \bibitem[XX]{foo}, the result is the same as \par \leavevmode \XMLsolvecite* {foo}. The optional argument is ignored. Note that the \par command terminates the current paragraph, and \leavevmode starts a new paragraph (in LaTeX, \bibitem calls \item that does more or less the same thing). The important point is that this newly created <p> element is the target of the reference. If you feed Tralics with the bbl of this document, produced by LaTeX, you will see something like

<Bibliography><p id='bid0'>
David Carlisle, Michel Goossens, and Sebastian Rahtz.
De XML à PDF avec <hi rend='tt'>xmltex</hi> et Passive<TeX/>.
In <hi rend='it'>Cahiers Gutenberg</hi>, number 35-36, pages 79&ndash;114,
2000. </p>
<p id='bid1'>
Michel Goossens, Frank Mittelbach, and Alexander Samarin.
<hi rend='it'>The <LaTeX/> companion</hi>.
Addison Wesley, 1993.</p>

On the other hand, translation of the second reference is:

<citation from='year' key='GMS93' id='bid4' userid='cite:companion'
type='book'>
<bauteurs><bpers prenom='M.' nom='Goossens' prenomcomplet='Michel'/>
<bpers prenom='F.' nom='Mittelbach' prenomcomplet='Frank'/>
<bpers prenom='A.' nom='Samarin' prenomcomplet='Alexander'/></bauteurs>
<btitle>The <LaTeX/> companion</btitle>
<bpublisher>Addison Wesley</bpublisher>
<byear>1993</byear>
</citation>

4.3. Using Tralics instead of BibTeX

The content of the BibTeX database is a sequence of entries of the form

1 @article{example,
2   Author= "Joseph Garrigue and Didier R{\'e}my",
3   Title=   "Extending {ML} with semi-explicit higher-order polymorphism",
4   Number=  "1/2",
5   Volume=  155,
6   Year=    1999,
7   Pages=   "134-169",
8   Journal= "Journal of Functional Programming",
9   Remark=  {a random example},
10   OptMonth = jan,
11   Url=     "ftp://ftp.inria.fr/INRIA/Projects/Cristal/iandc.ps.gz"}

This is a second example.

12 @PhdThesis{thesefabien,
13   author =       {Seyfert, Fabien},
14   title =        {Problèmes extrémaux dans les espaces de Hardy,
15     Application à l'identification de filtres hyperfréquences à
16     cavités couplées},
17   school =       {Ecole de Mines de Paris},
18   year =         1998
19 }

These examples are translated by BibTeX as follows

\bibitem{example}
Joseph Garrigue and Didier R{\'e}my.
\newblock Extending {ML} with semi-explicit higher-order polymorphism.
\newblock {\em Journal of Functional Programming}, 155(1/2):134--169, 1999.
 
\bibitem{thesefabien}
Fabien Seyfert.
\newblock {\em Problèmes extrémaux dans les espaces de Hardy, Application à
  l'identification de filtres hyperfréquences à cavités couplées}.
\newblock PhD thesis, Ecole de Mines de Paris, 1998.

After the @ character, there is a keyword, or an entry type. The recognized entry types are article, book, booklet, conference, coursenotes, inbook, incollection, manual, masterthesis, misc, phdthesis, techreport, unpublished, as well as mastersthesis, a synonym of masterthesis. These types are not part of the BibTeX language, but are described in any good book about LaTeX, they are the only ones recognized by Tralics. The case is irrelevant (in one example, we have `article´ in lower case, in the other, we have `PhdThesis´, mixed case). Since Tralics2.9.1, you can extend the list of known types, by putting a line like the following in the configuration file (this will define the types `hdr´ and `movie´):

bibtex_extensions = "hdr movie"

There are three keywords. The first is `comment´. If you say @comment{foo}, this makes `foo´ a comment. Since everything outside the scope of a keyword or an entry is discarded, there is no real need for a comment keyword, or a comment character. In particular, the percent sign is not a comment character inside a BibTeX file. If you insert a percent sign in a field, you have to remember that BibTeX will replace newline characters by spaces, and insert newline characters in the bbl file wherever it judges adequate. Hence, the percent character will behave, in the bbl, as a comment character with a random scope.

The second keyword is `string´. It defines a string, for instance @string{Foo=“bar”} defines the string `foo´ (the case is irrelevant) with value `bar´. In the example, there is a string after the equals sign, but any expression could be used, including one that uses macros. A macro must be defined before its use; it is always possible to redefine the macro. There are 12 predefined macros; there are jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec,. You can see a use of `jan´ on line 10. These macros are defined by every bst file, to be `janvier´, `January´ or `Januar´, depending on the language (since there is no way to tell BibTeX what the current language is, there are two solutions: either write frplain.bst, that is a copy of plain.bst, with all keywords translated into French, or use indirection: the value is \bbljan{}, a LaTeX command defined in a style file depending on the current language. In Tralics, these strings are defined at bootstrap, to be English names, and redefined when \begin{document} is seen. This gives you a chance to select the correct language. Only Frech, English and German are known languages.

The last keyword is `preamble´. If you say @preamble{“foo”}, the effect is to add the string `foo´ to the preamble. More than one preamble keyword can be given, they will be merged, in order. Standard bibliography styles print the preamble at the start of the bbl file, just before the \begin{thebibliography}. In Tralics, the string is inserted at the start of the file, but the environment is implicit. The string should not produce text, otherwise strange errors are signaled, of the form Error signaled at line 4: Non-empty buffer foo Some text may be lost., because only bibliographic entries are allowed in the bbl; you can cheat by changing the current mode via \@setmode). Instead of “foo”, a general value can be used, for instance @preamble{jan} puts January in the preamble. Note: Instead of braces, you can use parentheses to delimit the value of an entry, a string or the preamble. Inside the value, you can use braces instead of double quotes. Thus @preamble({foo}) is a valid preamble.

After an entry type comes the cite key, followed by a sequence of pairs, of the form field=value, separated by commas. The following field names are recognized: address, author, booktitle, chapter, crossref, doi, edition, editor, howpublished, institution, isbn, isrn, issn, journal, key, month, note, number, organization, pages, publisher, school, series, title, type, url, volume, year. The case is irrelevant. If a field name is given, whose value is not in the previous list, it will be ignored. In the example, line 10, we have an unused field, `OptMonth´ (some text editors propose templates where optional fields like `month´ are preceded by `opt´, and sometimes people forget to remove the prefix). In the first example, field names start with an initial capital, and there is no space on the left of the equals sign, in the second, field names are all lower case, there is a space on the left of the equals sign, and opening braces are vertically aligned (this is the template proposed by Emacs); these subtleties are ignored by Tralics.

If the configuration file contains line like

bibtex_fields = "firstpage lastpage"
bibtex_fields = "+allpages"

then three additional fields are read by Tralics, namely firstpage, lastpage, and allpages. They will be inserted in the XML tree, after other fields, but before the `note´, via a call to \cititem.

The value of a field can be a number (lines 5 and 6 in the example), or a macro name (as on line 10), or a constant in braces (line 9), or a constant in double quotes (other lines). It is possible to concatenate basic fields, for instance apr # "~1", via the use of the sharp operator. The way BibTeX handles braces, quotes and backslashes is a bit special. When BibTeX parses a value, there should be as many opening braces than closing braces; trying to put a backslash before a brace has no effect(note: ). If a string is delimited by double quotes, then braces are needed to hide double quotes. Special characters should be entered as {\´e}, never as \´{e}, but Tralics accepts é; in fact, any Unicode character is accepted, provided that you declare the proper encoding. The case of a non-ascii character is undefined. When looking for a particle in a name, Tralics must decide whether a character is upper case or not, and when sorting, the whole string is converted into lower case letters. In the case of {\´e}, the whole group is converted by BibTeX to the single letter e; Tralics leaves it unchanged; in the same fashion, é is left unchanged (it is represented internally in UTF8 as the two bytes é).

Assume that Tralics has seen @article, then an opening brace or parenthesis, followed by example. All fields up to the closing brace (or parenthesis) are read, but, if the entry is useless, no error is signaled in case of undefined macros, or duplicate fields. If an entry is useful, all fields are remembered; if it has a crossref to an entry X, then X becomes useful. Remember: each entry has a Rtype, this is in general empty; it is added as a prefix to the cite key. For instance, `thesefabien´ gives `footcite:thesefabien´. In the case of a crossreference from Y to X, we use as prefix for X the prefix of Y. An entry can be useful because the user has said \nocite{*}. There is a special hack for the Raweb: we have three types of entries, `foot´, `year´ and `refer´. We already mentioned that the types `year´ and `refer´ could be the same as the empty type. The difference is that \nocite applies only to entries from the file `year´, never to `foot´ (there is an implicit \nocite for `refer´).

An entry is useful because it is cited (by \cite or \nocite). Since BibTeX is generally case insensitive, the entry shown above is useful if you say \cite{Example}. However, for LaTeX, \cite{Foo} and \cite{fOO} are two different items, as a consequence, two references are needed. Thus, an entry named `foo´ is ambiguous. For this reason, you should always capitalize entries in a consistent way (say, use always lowercase letters), and use the same method in the LaTeX document.

After some manipulations, the entry is printed on the bbl like this (BibTeX version)

20 \citation {GR99a}{example}{article}
21 \bauteurs{\bpers\RAo J.\RAb \RAb Garrigue\RAb \RAf \bpers\RAo D.\RAb \RAb
22   R{\'e}my\RAb \RAf }
23 \cititem{btitle}{Extending {ML} with semi-explicit higher-order polymorphism}
24 \cititem{bjournal}{Journal of Functional Programming}
25 \cititem{bnumber}{1/2}
26 \cititem{bvolume}{155}
27 \cititem{byear}{1999}
28 \cititem{bpages}{134--169}
29 \url{ftp://ftp.inria.fr/INRIA/Projects/cristal/iandc.ps.gz}
30 \endcitation

or like that (Tralics version)

31 \citation{60}{footcite:thesefabien}{bid9}{foot}{phdthesis}[Sey98]
32 \bauthors{\bpers[Fabien]{F.}{}{Seyfert}{}}
33 \cititem{btitle}{Problèmes extrémaux dans les espaces de Hardy, Application
34 à l'identification de filtres hyperfréquences à cavités couplées}
35 \cititem{btype}{Ph. D. Thesis}
36 \cititem{bschool}{Ecole de Mines de Paris}
37 \cititem{byear}{1998}
38 \endcitation

There are some slight differences between these two entries. If you compare lines 20 and 31, you can see that the number of arguments of the \citation command has changed from three in the original version to six in the current version. The following were added: the type (here `foot´), the unique id (here `bid9´), the numerical print key (here `60´). The first entry was created by BibTeX, that cannot guess the Rtype of the reference nor the Tralics unique id. It could have computed the number 60, but we initially thought that only one of the two keys were useful (in the current version, the \citation command takes five arguments, plus an optional one after these.) If you compare lines 21 and 32, you can notice two differences. First, we have decided, in 2005, to add an optional argument to the \bpers command (it contains the full first name). This might be used for the Ra2005. The second difference is that it is impossible, in BibTeX, to print braces inside a name. Thus we used \RAo for an opening brace, \RAf for a closing brace and \RAb for a pair of closing and opening braces. Omitting the first line, the fields are printed in the following order:

  1. Unless the type is proceedings, the author.

  2. In the case of a book or inbook, the editor.

  3. The title.

  4. In the case of proceedings or incollection, the editor.

  5. In the case of an article, the journal, number, and volume.

  6. In the case of a book or inbook, the edition, series, number, volume, publisher, address.

  7. In the case of a booklet, the howpublished and address.

  8. In the case of incollection, the booktitle, series, number, volume, publisher, address.

  9. In the case of inproceedings or conference, the booktitle, series, number, volume, organization, publisher, editor, pages, address.

  10. In the case of a manual, the organization, edition, address.

  11. In the case of masterthesis, coursenotes, or phdthesis, the type, school, and address.

  12. In the case of a techreport, the type, number, institution, address. (For the case of masterthesis, phdthesis and techreport, the type has a default value, that depends on the language, and is initialized together with the `jan´ macro).

  13. In the case of misc, the howpublished, editor, booktitle, series, number, volume, publisher, address.

  14. In the case of proceedings, the organization, series, number, volume, publisher, address.

  15. In any case, the month, year.

  16. In the case of inbook or incollection, the chapter.

  17. In the case of inbook, incollection, article or proceedings, the pages.

  18. In any case, the doi, url, additional fields, note.

  19. In the case of an extension, all fields mentioned above are considered, in some order.

This may seem confusing (is there a standard way for formating entries?). Note that missing fields are not printed. In some case, BibTeX prints a message like “there´s a number but no series” or “can´t use both volume and number”. No such message is printed by Tralics.

Two keys are computed, the `Sey98´ or `GR99a´ in the example, and the sort key, which is something longer. In fact, handling the author or editor field produces four characters strings L 1 , L 2 , L 3 and L 4 . The L 4 string is the argument of the \bauthors or \beditors (see lines 21, 32). The L 1 string is `Sey´ or `GR´, the L 2 string contains the full name (it is like L 4 , without the full first name, and braces) and L 3 contains only the last name (not the first name).

We consider the author (the editor in the case of proceedings). This may give a triple L 1 , L 2 , L 3 , unless the field is missing. If it is missing, we consider the `key´ field. If it is not empty, then L 1 is formed of the first three characters of the field, L 2 is empty, L 3 is the field. If it is empty, we consider the editor (author, in case of proceedings). If this is empty, we consider the cite key, handle it like the `key´ above. Note: in the case of `Lo{\“i}c´ the first three characters are `Lo{\“i}´, the last two characters are `{\“i}c´. In the case of `Lo\“ic´, asking for the first three or last two characters gives the full string. The last two characters of the year are added to L 1 , so that we may obtain `GR99´. This gives the print label. The LaTeX companion says that you can use year=”{\SortNoop{86}}1991". With the rules above, the last two characters of the year are `91´. However, Tralics uses the full year, not `861991´ when it computes the sort key. In the case when Tralics processes the Raweb for, say year 2003, if a reference has type `year´, then its year field should not be missing, and should be `2003´. Otherwise an error is signaled(note: ). The sort key is computed as follows: first a prefix, then the cite label, then L 2 , then the year, then the title. All characters are converted to lower case. Note: when BibTeX converts {\´E} to lower case, the result is `e´. Converting `É´ can produce strange results. Such subtleties do not exist in Tralics (the style sheet that converts the XML to HTML sorts all entries; how can we tell it that the author used a \SortNoop command?).

Note: Tralics defines \sortnoop to gobble its argument. On the other hand, the BibTeX interpreter, when computing the title part of the sort key, in the case of {\noopsort foo} removes the command and the braces; the same is done for \SortNoop and \noopsort. In a case like title=“study of {$H^p$}, part {I}” it removes the braces (character after opening brace must be dollar or upper case letter). The reason for this is that otherwise `part II´ comes before `part I´, and this looks silly.

Because of this sort-again, we try to be clever. Said otherwise, for the Raweb, and only the Raweb, we use a prefix, formed of a letter and L 3 . The prefix 0 for an entry of Rtype `refer´, 1 for an entry of Rtype `foot´, and for entries of Rtype `year´, it is: 2 for book, booklet, proceedings, 3 for phdthesis, 4 for article, inbook, incollection, 5 for conference, inproceedings, 6 for manual, techreport, coursenotes, 7 for masterthesis, misc, unpublished. These numbers are indices into a table. Currently the order is 02345671. In a future version, this might be changed (however, the result should be compatible with the style sheets described in the second part of this report).

Let´s repeat: for the Raweb case, we have in the sort key a prefix that depends on the type and Rtype, followed by the author names, the print key, the full author names, the year, the title. In this case, the content of the bbl will be as on line 31: the first argument of \citation is not the print key, but the index of the reference in the table after sorting. On the other hand, for the non-Raweb case, the sort key starts with the print key, the bbl looks like line 20. The important point is: assume that we have two entries with the same print key, say `GR99´; we must change them to `GR99a´ and `GR99b´, this is easy to do when they are consecutive. The following piece of code comes from a standard bst file. Parsing a bst file is rather easy (maybe one day, Tralics will do it). The important point is that a postfix language is used: instead of: if a then b else c, you say: a b c if. This piece of code computes a suffix for every entry that has the same key as the previous one.

39 FUNCTION {forward.pass}
40 { last.sort.label sort.label =
41     { last.extra.num #1 + 'last.extra.num :=
42       last.extra.num int.to.chr$ 'extra.label :=
43     }
44     { "a" chr.to.int$ 'last.extra.num :=
45       "" 'extra.label :=
46       sort.label 'last.sort.label :=
47     }
48   if$
49 }

Here is the companion routine, executed in reverse order. Its purpose is to add the `a´ suffix when the next entry has a `b´ suffix. There is a piece of code, not shown here, that computes the longest label. This is sometimes nonsense (consider the `De La Cruz´ case below).

50 FUNCTION {reverse.pass}
51 { next.extra "b" =
52     { "a" 'extra.label := }
53     'skip$
54   if$
55   label extra.label * 'label :=
56   extra.label 'next.extra :=
57 }

In summary, when Tralics is used instead of BibTeX, the following happens. We have a big entry list, and a list of typed databases. From the entry list, we consider only unsolved ones. For each entry, a prefix is computed, for instance, `footcite:fabien´, by considering the Rtype, the word `cite:´ and the cite key. If the Rtype is anything else than `foot´, an empty value will be used.(note: ) When an entry with cite key `foo´ is read from a database of type `bar´, the same mechanism is applied. The type of a database is currently one of `year´, `refer´ or `foot´ (the default being `year´). We plan the extend this mechanism: more than these three types can be used; `year´ and `refer´ are sometimes the same as empty, but `refer´ has an implicit \nocite.

All entries from the database files are read, and stored if useful. For each entry X that has a crossreference to Y, missing fields in X are copied from Y. After that Y is discarded (unless cited via \cite or \nocite). An error is signaled in case some references are undefined. After that, the sort label is computed, entries are sorted, the print label is computed, and everything is printed on the bbl file. This is apics_.bbl if the jobname is `apics´. Note the underscore in the name.

This is the XML version of the reference above, as used in the Raweb2004.

58 <citation from='foot' key='60' id='bid9' userid='footcite:thesefabien'
59   type='phdthesis'>
60 <bauteurs><bpers prenom='F.' part='' nom='Seyfert' junior=''</bauteurs>
61 <btitle>Problèmes extrémaux dans les espaces de Hardy,
62    Application à l'identification de filtres hyperfréquences à cavités
63 couplées</btitle>
64 <btype>Ph. D. Thesis</btype>
65 <bschool>Ecole de Mines de Paris</bschool>
66 <byear>1998</byear>
67 </citation>

4.4. The format of a name

We shall discuss in this section how names can be used in a BibTeX file, and how Tralics constructs keys. We have already mentioned a procedure that gives `Sey´ from `Seyfert´. It is not satisfactory, but is used only in rare cases (when the year is strange, or a strange key has been used). The important point that, when we fetch the first three letters of Lo\“ic, we do not obtain neither `Lo\´ nor `Lo\”´. The mechanism explained here is more subtle. The LaTeX companion explains that, in order to get `Göd´ for the key, you should use one of the first names shown here, not the others.

 author = {A. G{\"o}del and  B. G{\"{o}}del},
 editor = {C. {G{\"{o}}del} and D. {G\"{o}del}}

The rule is that special BibTeX characters are formed by a left brace followed by a backslash. In the case C, the brace in inside another brace. In fact, if the bibliography contains the following

68 @Article{GoA,
69   author = {A. G{\"o}del      }, title="X"}
70 @Article{GoB,
71   author = {B. G{\"{o}}del    }, title="X" }
72 @Article{GoC,
73   author = {C. {G{\"{o}}del}  }, title="X" }
74 @Article{GoD,
75   author = {D. {G\"{o}del}    }, title="X" }

then the translation by Tralics2.9 looks like this. If you compare with lines 60 and 61 above, you can see that the full first name appears, empty attribute pairs part and junior are not shown.

76 <biblio>
77 <citation from='year' key='Ga' id='bid2' userid='cite:GoC' type='article'>
78 <bauteurs><bpers prenom='C.' nom='Gödel' prenomcomplet='C.'/></bauteurs>
79 <btitle>X</btitle>
80 </citation>
81 <citation from='year' key='Gb' id='bid3' userid='cite:GoD' type='article'>
82 <bauteurs><bpers prenom='D.' nom='Gödel' prenomcomplet='D.'/></bauteurs>
83 <btitle>X</btitle>
84 </citation>
85 <citation from='year' key='Göd' id='bid0' userid='cite:Goa' type='article'>
86 <bauteurs><bpers prenom='A.' nom='Gödel' prenomcomplet='A.'/></bauteurs>
87 <btitle>X</btitle>
88 </citation>
89 <citation from='year' key='Göd' id='bid1' userid='cite:GoB' type='article'>
90 <bauteurs><bpers prenom='B.' nom='Gödel' prenomcomplet='B.'/></bauteurs>
91 <btitle>X</btitle>
92 </citation></biblio>

The same file processed by BibTeX gives the following keys: {G{\"}}a, {G\"}b, G{\“o}da and G{\”{o}}db. The first two keys are invalid. The reason why suffixes a and b are added is that a special BibTeX function removes braces and funny characters when comparing keys. Such a function is not implemented in Tralics, thus labels G{\“o} and G{\”{o}}d are considered different, although their translation is the same. In Tralics, the best thing to do is use `Gödel´ as name.

Since lots of errors may be found in bibliography files, Tralics tries to be clever. First, it replaces `\c{c}´ by `ç´ and `\c{C}´ by `Ç´. It also replaces `\v {c}´ by `{\v c}´. Expressions of the form \a´e are replaced by \´e. We also replace backslash-space by a single space. Maybe other replacements of this kind will be made in a future version. For instance, we could expand all accent characters, and interpret double-hat constuct, so that `é´, `\´e´, and `^^e9´ are interpreted in the same way (the translation is the same).

After that, characters or group of characters are classified, this will make parsing easier. A sequence like `{foo}´ will be considered as a single random character; something like \´e as a single lower case letter, \´E as a single uppercase letter. The expression \´{e} will be replaced by {\´e} with a warning, \"\i will be rejected (unless inside braces) because a single character is needed after backslash-accent. Commands like \foo are also rejected. Note that an ampersand & is an error (some people try to use this instead of `and´). Character categories are: space, comma, dash, and tie (this is a ~). In a case like this,

93 @Article{cruz,
94   author = {Maria {\MakeUppercase{d}e La} Cruz},
95 title="X" }

the print key computed by BibTeX is {\MakeUppercase{d}e La}C, this typesets as `De LaC´. Such a construct is not understood by Tralics, that thinks that the last name is `Cruz´.

If more than one author is given, in the author or editor list, you should use `and´ as separator. Case is irrelevant, a space is required. For instance, the following citation contains 3 authors and others. The print key is `AAJA+´, because the last author has a double last name.

96 @Article{many,
97   author = {Joe~And and And,Joe and Joe-And And others}
98 title="X" }

The BibTeX transformation of this is

99 \bibitem[AAJA{\etalchar{+}}]{many}
100 Joe And, Joe And, Joe-And, et~al.

If the list is too long, you can use `others´ as the last name (case is important). A name has four components: von, First, Last and Junior. On line 32, you can see the value of the full first name, then the abbreviated first name, then the von part (empty) then the last name, then the junior part (empty). In Tralics, the von part is always merged with the last name. Consider somebody named Jean de la Fontaine. French rules say that the particle `de´ should be omitted, unless preceded by the first name or a word like `Monsieur´. In particular, in the dictionary, you will find him between La Follette (an American politician) and Lafontaine (a Canadian politician), not between Delacroix and Delage. More interesting is the case of Marie Joseph Gilbert Motier, marquis de La Fayette. The name of this guy is `Motier´, but he is known as `La Fayette´. Another example is William Thomson (For his work on the transatlantic cable Thomson was created Baron Kelvin of Largs in 1866. The Kelvin is the river which runs through the grounds of Glasgow University and Largs is the town on the Scottish coast where Thomson built his house.) How this guy should be cited is unclear: William Thomson or Lord Kelvin?

The simple case is when two fields are given, with a comma between. The first field is the last name, the other field is the first name. Then comes the case of three fields: last name, junior, and first name. You cannot use more than three fields, that is, you cannot give more than two commas. In the case no comma is given, we look at a `von´ part. This is something that starts at a lower case letter. For instance,

101 @Article{poussin,
102   author = {Charles Louis Xavier Joseph de la Vall{\a'e}e Poussin   },
103 title="X" }

This is what BibTeX puts in the bbl file:

104 \bibitem[dlVP]{poussin}
105 Charles Louis Xavier~Joseph de~la Vall{\a'e}e~Poussin.
106 \newblock X.

The translation by Tralics is the same, but no ties are inserted (BibTeX inserts one for the first name, the von part, the last name, see TeXbook, page 92); in my opinion, it is better to split a line between two names, rather than split a name (what hyphenation patterns should be used in a case like `Michel Goosens´, the current patterns, here english, or those found in the bibliography, thus french if we cite the French version of the LaTeX companion?). The `De La Cruz´ example shows how you can fool BibTeX. Tokens between names are recognized. For instance, consider:

107 @Article{strange,
108   author = {A-b-C and A.b.C and A~b~C and A.Bb.Cc},
109 title="X" }

This is how BibTeX interprets the names. Authors number two and four have only a last name, no von part, no first name.

110 \bibitem[bCAbCA]{strange}
111 A~b~C, A.b.C, A~b~C, and A.Bb.Cc.
112 \newblock X.

This is the translation by Tralics. You can see that, for the last author, one dot has been replaced by a space: this is done in case no other way is found to split the name, but there is an upper case letter on each side of the dot. You can also see that BibTeX inserts some characters (here ties) instead of dashes. Tralics keeps the dashes, whenever possible.

113 \citation{bCAbCB}{cite:strange}{bid3}{year}{article}
114 \bauthors{\bpers[A]{A.}{}{b-C}{}
115           \bpers[]{}{}{A.b.C}{}
116           \bpers[A]{A.}{}{b~C}{}
117           \bpers[A]{A.}{}{Bb.Cc}{}}
118 \cititem{btitle}{X}
119 \endcitation

Here is another example.

120 @Article{strange2,
121   author = {Jean-Claude XX and J.-Ch. YY and J.-{Ch.} ZZ},
122 title="X" }

This is the translation by BibTeX, in `abbrv´ mode. The format used in plain mode is {ff }{vv }{ll}{, jj}, and in abbrv mode, it is {f. }{vv }{ll}{, jj}. This is explained in any good reference about BibTeX(note: ).

123 \bibitem{strange2}
124 J.-C. XX, J.-C. YY, and J.-C. ZZ.
125 \newblock X.
126 \bibitem{poussin}
127 C.~L. X.~J. de~la Vall{\a'e}e~Poussin.
128 \newblock X.

This is the translation by Tralics. The quantity `{Ch.}´ is considered as a single character. No dot is added after it, since it is terminated by a dot.(note: )

129 \citation{XYZ}{cite:strange}{bid3}{year}{article}
130 \bauthors{\bpers[Jean-Claude]{J.-C.}{}{XX}{}
131           \bpers[J.-Ch.]{J.-C.}{}{YY}{}
132           \bpers[J.-{Ch.}]{J.-{Ch.}}{}{ZZ}{}}
133 \cititem{btitle}{X}
134 \endcitation

The print key is computed as follows: Each author gives an initial (if the name is complicated, more than one will be used, for instance Poussin gives four letters `dlVP´). If a single author is cited, and if it gives less than three letters, then the first three letters of its name are used (for instance, Seyfert gives `Sey´). If more than four authors are given, only the first three ones give an initial, there is a `+´ sign at the end. If `and others´ is given, there is also a `+´ sign.

We show here the sort key, as computed by Tralics, for some the entries shown above. Remember that these entries have no year field and that the title is X.

135 cru m. {\makeuppercase{d}e la}. cruz          x
136 g c. {g{\"{o}}del}          x
137 g d. {g\"{o}del}          x
138 g{\"o}d a. g{\"o}del          x
139 g{\"{o}}d b. g{\"{o}}del          x
140 aaja+ j. and   j. and    joe-and  etal        x
141 dlvp c. l. x. j. de la vall{\'e}e poussin          x

These are the keys, for the same entries, computed by BibTeX, using the alpha style. You can see that BibTeX uses last name and first name, whereas Tralics uses abbreviated first name then last name. The format is: {vv{ } }{ll{ }}{ ff{ }}{ jj{ }}.

142 delac    dela cruz  maria        x
143 god    godel  a        x
144 god    godel  b        x
145 g    godel  c        x
146 g    godel  d        x
147 aaja    and  joe   and  joe   joe and   et al        x
148 dlvp    de la vallee poussin  charles louis xavier joseph        x

4.5. Commands for the bbl

The Raweb DTD explains that the following items can appear inside a bibliography entry.

In almost every case, if the database file contains a field `foo´ with value `bar´, the bbl file will contain \cititem{bfoo}{bar}, and this is translated into <bfoo>bar</bfoo>. The \cititem command takes two arguments. The second argument is translated as usual. The first argument is the name of the resulting element. There is a hook: in the case where \cititem-foo is defined (this is \cititem followed by a dash followed by the name of the field), this macro is used instead of the default procedure. If the database contains a `url´ field, the result is a call to the \url command, that will produce a <xref> element. The \cititem command should be used only in a bibliography.

If the entry in the database contains a `author´ or `editor´, the \bauthors or \beditors commands will be called. These two commands must be used inside a bibliography. They take a single argument, translate it, and put the result in a <bauteurs> or <beditor> element. Note: the bibliography part of the Raweb DTD was meant to be temporary. For this reason, the names were chosen so as to replace them easily with new names (hence the prefix `b´); For some reason, `auteurs´, `editeur´ and attributes of `bpers´ have French names. Later on, we decided to modify the Tralics names, hence the `bauthors´ and `beditors´. Because `bauteurs´ had a final s, we added an s to both command names; not the best choice.

The \bpers command takes one optional argument, and 4 required arguments. The translation is an empty <bpers> element with following attributes: prenomcomplet for the optional argument, and prenom, part, nom, junior for the required arguments.

The \citation command constructs a <citation> element. It takes 5 required arguments, and an optional argument. The optional argument is ignored. Other arguments are converted to attributes. The whole text, up to \endcitation is translated in bibliography mode, and added to the <citation> element. Example:

149  \citation{a}{b}{c}{d}{e}
150   \cititem{foo}{bar}
151   \beditors{\bpers[a]{b}{c}{d}{e} \bpers[]{B}{}{C}{} \cititem{etal}{}}
152  \endcitation

The translation is

153 <citation from='d' key='a' id='c' userid='b' type='e'>
154   <foo>bar</foo>
155   <bediteur>
156     <bpers prenom='b' part='c' nom='d' junior='e' prenomcomplet='a'/>
157     <bpers prenom='B' nom='C'/>
158     <etal/>
159   </bediteur>
160 </citation>

4.6. Other commands

The \bibliography command takes one argument, this is a comma separated list of database files. Spaces are ignored. The command can be given more than once. This command (the last occurrence) defines the position where the bibliography should be inserted.

The command \insertbibliohere can be used to force the position of the bibliography. It overwrites the location specified by the previous command.

The environment `thebibliography´ can be used for typesetting the bibliography. There is an optional argument (ignored), a required argument (ignored), an optional argument (ignored). The result is an XML element whose name is defined by \refname, by default `Bibliography´, and whose content is formed of the translation of the environment. You can redefine this \refname command. An error is signaled if strange commands appear in the argument, but not for invalid characters (in particular, space cannot appear in an element name). The command can be empty. In this case, the name will not appear in the XML result.

The command \bibliographystyle takes one argument. Its translation is empty. The argument is remembered. This is the style to use. If the argument is `bibtex:´, this is an indication that BibTeX should be used instead of Tralics for the production of the bbl. The style can be given after the colon, or with the invocation of another command. If the argument is `program:foo´, this means to use foo as program. For instance \bibliographystyle{program:cat -v}. In this example, this will print the auxiliary file; this is not good, because the command should create the bbl file (its argument is jobname.aux, data must be written on jobname.bbl). A second \bibliographystyle command can be used for specifying the style (the default is `plain´). Example. Consider a file that contains these lines

161 \documentclass{article}
162 \begin{document}
163 \AtEndDocument{\bibitem{unused}Hey}
164 \bibliography{torture}
165 \bibliographystyle{bibtex:}
166 \cite{poussin,cruz,many,strange,unused}
167 \end{document}

When Tralics sees the \end{document} command, it evaluates it (with the hooks, etc.) After that, a bbl is created and translated. If there is no unsolved entry, nothing happens. If no style command indicates that BibTeX or an external program should compute the bbl, then Tralics does it, as explained above. In the case of the Raweb, three database files are used: apicsfoot_2004, apicsrefer_2004, and apics2004. These files are typed `foot´, `refer´ and `year´. In the non-Raweb case, files in the list indicated by \bibliography are used. If a file is named `miaou+refer´ or `miaou+foot´ and does not exist, then miaou is tried instead; in this case the type will be `refer´ and `foot´ (otherwise, it is `year´). In the case an external program is used, a minimal auxiliary file is created. In the case of the example, it will contain

168 \citation{poussin}
169 \citation{cruz}
170 \citation{many}
171 \citation{strange}
172 \bibstyle{plain}
173 \bibdata{torture}

The database torture.bib contains a sequence of entries, plus the following lines. In order to understand the last line, you have to remember that character strings are always balanced against braces. Hence it is not: open brace concatenated with 1 concatenated with 1 and close brace. It is: open brace, double quote, space, sharp, etc, up to double quote, close brace.

174 @String{ stra= {\def}}
175 @String{ strb= "#1" }
176 @String( strc= "\mycmd " )
177 @Preamble (stra # strc # strb )
178 @Preamble( "{" #1 #1 "}")

After that, the external program is called, and the bbl file is read. In the example this gives the following. The first line is the preamble.

179 \def\mycmd #1{" #1 #1 "}
180 \begin{thebibliography}{1}
181  
182 \bibitem{many}
183 Joe And, Joe And, Joe-And, et~al.
184 \newblock X.
185  
186 \bibitem{strange}
187 A~b~Cde.
188 \newblock X.
189  
190 \bibitem{poussin}
191 Charles Louis Xavier~Joseph de~la Vall{\a'e}e~Poussin.
192 \newblock X.
193  
194 \bibitem{cruz}
195 Maria {\MakeUppercase{d}e La}~Cruz.
196 \newblock X.
197  
198 \end{thebibliography}

After that, the bibliography is translated and inserted. The resulting XML file is shown here.

1 <?xml version='1.0' encoding='iso-8859-1'?>
2 <!DOCTYPE std SYSTEM 'classes.dtd'>
3 <!-- Translated from latex by tralics 2.9.1, date: 2006/11/02-->
4 <std>
5 <biblio>
6 <Bibliography><p id='bid2'>
7 Joe And, Joe And, Joe-And, et al.
8 X.</p>
9 <p id='bid3'>
10 A b Cde.
11 X.</p>
12 <p id='bid0'>
13 Charles Louis Xavier Joseph de la Vallée Poussin.
14 X.</p>
15 <p id='bid1'>
16 Maria De La Cruz.
17 X.</p>
18 </Bibliography>
19 </biblio><p><cit><ref target='bid0'/></cit>, <cit><ref target='bid1'/></cit>,
20 <cit><ref target='bid2'/></cit>, <cit><ref target='bid3'/></cit>,
21 <cit><ref target='bid4'/></cit></p>
22 <p id='bid4'>Hey</p>
23 </std>
24  

Finally, we show here everything printed on the screen, including all warnings by BibTeX.

1 This is tralics 2.9.1, a LaTeX to XML translator
2 Copyright INRIA/MIAOU/APICS 2002-2006, Jos\'e Grimm
3 Licensed under the CeCILL Free Software Licensing Agreement
4 Starting translation of file testb.tex.
5 Configuration file identification: standard $ Revision: 2.24 $
6 Read configuration file /Users/grimm/work/cvs/tralics/confdir/.tralics_rc.
7 Document class: article 2006/08/19 v1.0 article document class for Tralics
8 Bib stats: seen 5(1) entries
9 This is BibTeX, Version 0.99c (Web2C 7.5.4)
10 The top-level auxiliary file: testb.aux
11 The style file: plain.bst
12 Database file #1: torture.bib
13 Warning--empty journal in many
14 Warning--empty year in many
15 Warning--empty journal in strange
16 Warning--empty year in strange
17 Warning--empty journal in poussin
18 Warning--empty year in poussin
19 Warning--empty journal in cruz
20 Warning--empty year in cruz
21 (There were 8 warnings)
22 Math stats: formulas 0, kernels 0, trivial 0, \mbox 0, large 0, small 0.
23 Buffer realloc 0, string 1240, size 12510, merge 4
24 Macros created 97, deleted 0; hash size 1565; foonotes 0.
25 Save stack +20 -20.
26 Attribute list search 1476(1402) found 906 in 1097 elements (1076 at boot).
27 Number of ref 0, of used labels 0, of defined labels 0, of ext. ref. 0.
28 Modules with 0, without 0, sections with 0, without 0
29 Output written on testb.xml (593 bytes).
30 No error found.
31 (For more information, see transcript file testb.log)

Some comments. Line 6 shows the name of the configuration file. If this file contains a line that starts with `## tralics ident rc=´ then all characters after the equals sign are printed (see line 5). Since version 2.5 (pl4), in the case where character number 30 is a dollar sign, a space will be added after it.(note: ) The reason for this is that the RCS software interprets a string like `Revision´ in dollar signs; we do not want it to replace the 2.11 by the revision number of the LaTeX document. We shall explain elsewhere how to read the statistics.

Line 8 shows the number of entries in the biblist. If some entries are solved, they are shown in parentheses. Here, we have 5-1=4 unsolved entries. If line 5 of the source file is commented out, then BibTeX is not used, and lines 9 to 21 will be replaced by the single line `Seen 4 bibliographic entries´.

The standard configuration file contains a line that says that `article´ is an alias for `std´. The `std´ configuration defines two quantities: the name of the DTD, hence the root element, it is <std>, see line 4 of the XML result. It defines xml_biblio to be `bibliography´. This is the name of the element that will hold the bibliography. The default value is `biblio´, but it can be redefined (see line 5). Do not confuse this with the name of the element produced by the environment `thebibliography´, that appears line 6 in the XML result.

5. Other commands

5.1. Character encoding

We have to distinguish between input encoding, internal encoding and output encoding. The internal encoding of TeX is ASCII (i.e. 65 is the internal code of the upper case letter A), at least for all characters with code between 32 and 126. The input encoding is the mechanism that converts the code of the letter A supplied by computer into the code 65. Almost all input encodings are nowadays ASCII-based, they produce the same value for the letter A; the results may be different for a character like é. The output encoding indicates for a letter, say A, which position in the font to use. We shall not discuss the output encoding here. Let´s just notice that the character `{´ exists in the font cmtt10, but not in other text fonts of the computer modern family. If you read a version of this document that uses the original encoding (OT1), braces shown in error messages are taken from a math font, hence are upright. Some years ago, a 8bit encoding (called T1) was designed, which contains braces. You can compare Figure 1 in appendix F of the [4] (describing the font cmr10) with Table 7.32 of [6], describing ecrm1000.

The first version of TeX was using 7bit input and output characters (but fonts and dvi files were coded on 8bits). There is an extension Ω to TeX that accepts 16bit characters as input, using different encoding schemes. Characters that are not part of the ASCII specifications (less than 32 or greater than 126) are not guaranteed to be treated the same in all implementations. For this reason, it it wise to load the inputenc package, with the current encoding as argument. The effect will be that some characters, like é will become active, and expand to \´e. As a result: only ASCII letters are allowed in control sequence names. On the other hand, if you say \begin{motclés}, then LaTeX complains with LaTeX Error: Environment motcl\´es undefined. Don´t try to define the motcl\´es environment: the expansion of the accent depends on the context: it is é for \begin and \´e for the macro that prints the error message. Non-ASCII characters may be printed by TeX as ^^ab (in some older version of TeX, I had to pretend, via locale settings, that my computer did not understand English in order for it to output the guillemet as «).

A silly question concerns end-of-line markers. Some systems like Unix use LF (line feed) as line separators, some others like Macintosh use CR (carriage return) and Windows uses CR-LF. This is replaced by TeX by a single character: the carriage return with ASCII code 13. Tralics interprets CR-LF, CR and LF alike: as an end-of-line marker. This marker will be replaced by the character whose code is in \endlinechar, provided that this value is in the range 0–255(note: ). The default value is 13, a character of category 5. The tokeniser converts this into a \par token, a space token or ignores it depending on the state. This space token has value 32 (but Tralics uses 10, so as to keep the same line breaks in the XML result as in the TeX source). Note that, whenever a line is read, spaces at the end of the line are removed. If you want a space after a control sequence, you say something like `\TeX\␣´, and if this construct appears at the end of a line, the space is ignored; if the endline character has category code 5, it will be converted to a space, and everything works fine; if this character is for instance 65, you may get a strange error, like this

! Undefined control sequence.^^J
l.170 ...reaks in the \XML\ result as in the \TeX\^^J
                                                  ^^J
? ^^J

We have shown here the end of line as ^^J. There are four lines: the error messages, two context lines, and the line with the prompt. The two context lines show that the space at the end of the line is removed. TeX does not print the undefined control sequence: it assumes that it is either the last token on the first context line, or a token marked as `<recently read>´ or something like that; in our case, the undefined control sequence is the one obtained by replacing ^^J by the value of the endline character.

There is a way to enter special characters in TeX, for instance ^^J is a line feed. The algorithm is the following: whenever TeX sees two consecutive identical characters of category code 7, followed by a character whose number is x, it replaces these three characters by the character whose code is y, where y=x-64 if x64, and y=x+64 if x<64. Hence ^^? yields y=127 (this is the delete character). All characters with codes between 1 and 26 can be obtained using the form ^^A, ^^B, etc. The null character is ^^@, characters with code between 27 and 31 are ^^[, ^^\, ^^], ^^^ and ^^_. Character 32 can be represented as ^^`. All other characters are ASCII characters. This is an example of use:

27=\char`\^^[, 28=\char`\^^\,  29=\char`\^^], 30=\char`\^^^, 31= \char`\^^_

Because some characters in the list are of category code 15 (invalid), we have used the construction \char`\A (with A replaced by some other character). There is no difference between \char`\A and \char`A, unless the category code of the character is one of 0, 5, 9, 14, or 15. The result is the character at position 65 or whatever in the current font; the example above selects positions 27 to 31. The translation is

27=&#x1B;, 28=&#x1C;, 29=&#x1D;, 30=&#x1E;, 31= &#x1F;

Note that these characters are invalid in XML1.0, so that this example is not good; if you compile this document with LaTeX, you will see [not compiled with latex]. In general you will see a ff ligature or a oe one; this depends on the output encoding.

When TeX switched to 8 bits, the rule changed a little bit: the previous rule applies only if 0x127, it gives 0y127. Another test was added: if you say ^^ab, these four characters are replaced by the single character whose code is ab (in base 16, i.e. 171 in base ten in this case). In such a case two characters are needed: a letter or a digit; only lower case letters between a and f are allowed. Thus every character in the range 0-255 has such a representation. Note that, by default, the character ^^ab has category code 12, hence is valid. What appears in the dvi file depends on the output encoding, in the case of a 7bit encoding, the character is unknown, a warning is printed in the transcript file, that´s all, otherwise, it should be an opening guillemet, but it could as well be ń. The purpose of a package like inputenc is to change the category code of all special characters, so that it behaves like a command and produces, in the dvi, something that is, as much as possible, independent of the output encoding.

According to this rule, the character 32 has can be entered as ^^20. There is one situation where the space character can be used in this way: at the end of the line, when \endlinechar is non trivial. Note that, in the case where the resulting character has category 7, it can participate in a hat-hat construct. Here is an example.

{1^^{^^ab2^^5e^ab3^^5e^5e^ab4\def\Abc{ok}\def\bAc{OK}\^^41bc\b^^41c}
{\catcode `\é=7 ééab $xé2$ %next line should produce M
éé
%$1^è=^^^AééT$ %% hat hat control-A
$1^è=^^^A$ %% hat hat control-A
}\def\msg{a message.^^J}

Some explanations are needed. ^^{ is a semi colon, ^^ab is an opening French guillemet, ^^5e is a hat (recursion...), ^^41 is the uppercase letter A. The first line of the example explains that such funny characters can appear in a control sequence name. The second line shows that the hat-hat mechanism can be used with other characters than a hat. It also shows that, if the mechanism cannot be applied, a character with category 7 behaves like a superscript character, whatever its numeric value. The line that follows shows that the end-of-line character is ASCII 13, aka control-M (usually written as ^M). After that, there are two lines containing a control-A character, shown here as ^A. It is preceded by hat-hat, so that the effect should be a single A. The line that is commented out contains a control-T written as ééT (for some strange reasons, this character is invalid in XML1.0, but valid as an entity in XML1.1, [9], [8]). The last line is just a real example of ^^J. This character is printed by Tralics as LF, or CR-LF on Windows. This is the translation of Tralics:

<p>1;&#xAB;2&#xAB;3&#xAB;4okOK
&#xAB; <formula type='inline'>
<math xmlns='http://www.w3.org/1998/Math/MathML'
 ><msup><mi>x</mi> <mn>2</mn> </msup></math></formula
 > M<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'
 ><mrow><msup><mn>1</mn> <mi>&#xE8;</mi> </msup><mo>=</mo><mo></mo
 ><mi>A</mi></mrow></math></formula>
</p>

We inserted some newline characters at unusual places (just before greater than signs), other spaces were produced by Tralics; in order to make sure that 8bit characters are printed correctly, we asked Tralics for a seven bit output.

As said above, Ω accepts 16bit characters, using the notation ^^^^abcd. This syntax was implemented in Tralics2.7, via the \char command (remember that in Tralics, the \char and \chardef commands accept 27bit integers); as a consequence, these characters could not be used in a command name; thios restriction does not appluyy anymore (the default category code of characters with code greater then 127 is other, namely 12). Example

\def\foo#1#2#3{#1=#2=#3=}
\foo^^^^0153^^^^0152^^^^0178
 ^^^^017b^^8?

It is translated by Tralics as &#x153;=&#x152;=&#x178;= &#x17B;x?. The argument to \foo could also have been: \oe\OE{\“Y}. The transcript file contains lines of the form:

[8] \foo^^^^0153^^^^0152^^^^0178
\foo #1#2#3->#1=#2=#3=
#1<-^^^^0153
#2<-^^^^0152
#3<-^^^^0178

It is possible to ask for UTF-8 output in the transcript file. This gives characters that are hard to see using latin1, because characters in the range 128–128+32 are in general unprintable. What is shown here as hat-Ó is a single character.

[2] \foo^^^^0153^^^^0152^^^^0178
\foo #1#2#3->#1=#2=#3=
#1<-Å^Ó
#2<-Å^Ò
#3<-Ÿ
{Push p 1}
Character sequence: Å^Ó=Å^Ò=Ÿ= .

The original version of the Tralics documentation said: Si on a un texte qui contient essentiellement des caractères 7bits, et très peu d´autres caractères, l´utilisation de caractères 16bits consomme énormément de place. This means that using a 16bit encoding consumes a lot of space if you write a French document (and even more, for an English one). The sentence has 159 ASCII characters and 6 others; these can be input using iso-8859-1 (aka latin-1) as input encoding(note: ). In TeX, it uses 165 bytes, in Ω, it uses 330 bytes. Using a construction like \´e we need 177 bytes (and 7 bits per byte). Using UTF-8 requires only 171 bytes (8 bits per byte). This explains why UTF-8 is popular. We shall explain (in the second part of this document) how UTF-8 is encoded and how TeX may read it. In the case of Tralics, the situation is: you can (via an argument to the Tralics program) specify that the sources are encoded using UTF-8 or latin1 (this being the default). However, if the tex file contains, on the first line “utf8-encoded” UTF-8 encoding will be used, if it contains “iso-8859-1” then latin1 encoding will be used.

5.2. New encoding scheme

Since version 2.9, internal encoding of Tralics is 16bit utf8. This has two consequences that will be explained here. The first is that some tables are now much larger. The numeric argument to \catcode, \mathcode, \lccode, \uccode, \sfcode, \delcode, which is a character number can now be anything between 0 and 65535. We also changed the numbers of registers: there are 1024 instead of 256.

The result of a ^^^^abcd construct fits on 16bits, hence is a character, hence can appear in a command name (in the case of a multicharacter control sequence, it must have category code `letter´; initially all character with code greater than 128 have category `other´). In order to save space, a short-verb character must fit on 8bits; otherwise, its category code will not be properly restored when you undeclare it (category other will be used).

All characters are valid in math mode. The translation of an ASCII character may depend on the font, otherwise, it is always <mi>. For instance, in the case of $\mathbf\´e$, expansion of the accent command produces a 8bit character, unaffected by the font change, and the translation is a <mi> containing the e-acute letter. Full 21 bit characters are allowed in Math mode. An expression $x$ is considered trivial math and translates into a <simplemath> element only if the character fits on seven bits and has category letter.

The default input and output encoding is latin1, which is no more the internal encoding. As a consequence, there are two conversion procedures. We explained above that the input encoding can be given on the first line of the file. Otherwise a default encoding will be used. This can be explained in the configuration file. As a consequence, the main input file is read without conversion, then the configuration file is considered, and then the main input file is converted; all other files are immediately converted.

On the other hand, a character like é is represented as é in the internal tree. This character will appear, in the output file, in the form &#e9; if you call Tralics with option -oe8a or -oe1a, as é if you call Tralics with option -oe1 or é if you call Tralics with option -oe8. If the option contains a, the XML file contains only 7bit ASCII characters; the only difference between the two options is the encoding declaration. These options specify also the encoding used for the transcript file. You can specify it independently with the options -te8a, -te1a, -te8, or -te1. If the character is too big to fit in the encoding, then the hat-hat notation is used (see example above). Because each XML file contains its encoding, a XML processor will handle the file produced by Tralics independently of the output encoding. Moreover, whatever the encoding, input or output, you know that ^^^^03b7 is Greek letter eta.

5.3. Changing the input encoding

We mentioned in the previous section that whenever Tralics reads a file, it converts its content, according to the current encoding (that can be given at the start of the file, using ASCII characters), with an exception for the main input file. The situation is a bit more complex: configuration files, tcf files, bibliography data files, and TeX files opened by \openin use a fixed encoding; other source files use a variable encoding.

The default encoding is stored in \input@encoding@default. The default value is one, but can be changed via an option to the program (utf8 or latin1 select encoding 0 or 1 respectively).

The current encoding is stored in \input@encoding. This is an attribute of the current input file, it can be changed at any time. The new encoding is used when Tralics needs to read a new line in order to fetch the next token. Nothing special is done in the case of \read.

Whenever a file is opened, its initial encoding is computed. If the file has a fixed encoding, then all lines are immediately converted, otherwise lines are converted when needed. If the first line of the file contains the string utf8-encoded, then encoding 0 is assumed, if the line contains iso-8859-1, then encoding 1 is assumed, and if the line contains tralics-encoding:NN where NN is a sequence of one or two digits forming a number less than 34, then encoding NN is assumed. There are other heuristics. For instance, if %&TEX encoding = UTF-8 appears near the start of the file, then encoding 0 is assumed. In all other cases, the default encoding is assumed.

In the current version of Tralics, there are 34 possible encodings. Encoding number 0 is UTF8; this is an encoding where an ASCII character is represented by a single byte (with the same value as the character), and other characters use a variable number (between 1 and 4) of bytes. In encodings like UTF16, a character is represented by more than one byte. There is currently no support for such encodings yet. Stated otherwise, we assume that character C is represented by a byte B, and the encoding specifies the value C at position B. Encoding 1 is latin1 (also known as iso-8859-1), it has B=C. For the 32 remaining encodings, it is possible to specify, for each byte B, the associated character C (default is B). Trying to set the current or default encoding to a value outside the range 0-33 is ignored; trying to modify an encoding outside the range 2-33 raises an Illegal encoding error, and invalid byte value gives Illegal encoding position error. In case of an illegal character value (negative, zero, 65536 or more), the byte value is used instead. The magic command is \input@encoding@val; it reads an encoding, a byte and a value. In the example that follows we change the encoding number 2 so that \FOO is read as \foo:

1 \input@encoding@val 2 `O =`o
2 \input@encoding@val 2 `F =`f
3 \let\foo\bar
4 \showthe\input@encoding@val 2 `O
5 \input@encoding=2
6 \show\FOO
7 \showthe\input@encoding@val 2 `O
8 \showthe\input@encoding
9 \input@encoding@default=0
10 \showthe\input@encoding@default
11 \input@encoding=1

This example shows three commands in read or write mode: when the command is prefixed by \showthe it read a value from memory and prints it on the terminal, otherwise a number is scanned and written in memory. The equals signs before the number is optional. No less than 13 integers are scanned, some are given as an explicit integer, some as a character code. We assume that, for encoding 2, all characters map to themselves. Since \FOO is read as \foo, the \show command should print \bar, on lines 4 and 7 you see the value stored of encoding 2 for the character O (first upper case, then lower case), this is twice 111. Other values shown are 2 and 0.

We describe from now on the content of the inputenc package. You load it by saying \usepackage [foo,bar] {inputenc}. The effect of this command is the following. First, a symbol name is defined for each of the 23 known encoding, for instance utf8 for UTF-8 (encoding 0), latin1 for latin1 (encoding 1), etc. The command \inputencodingname holds the current input coding name, and \encoding@value converts this to an integer. The command \inputencoding can be used to change the encoding. It is defined as:

12 \def\inputencoding#1{%
13   \the\inpenc@prehook  %% pre-hook
14   \edef\inputencodingname{#1}%
15   \input@encoding=\encoding@value{\inputencodingname}%
16   \the\inpenc@posthook} %% post-hook

There are two hooks (token lists) that do nothing, added here for compatibility with the LaTeX package. You can use them to output as messages, such as: switching from encoding A to encoding B (the initial value of the encoding name is \relax, this can be used by the pre-hook).

The options, foo and bar in the example, should be valid names. The last name becomes the current and default encoding. As mentioned above, the current encoding applies to an input file, and there is no reason to change the encoding of the package file. Hence, the following is executed:

17   \input@encoding@default\encoding@value{bar}%
18   \AtBeginDocument{\inputencoding{bar}}

If the options are, for instance ansinew and applemac, the tables associated to these encodings are defined; some other tables might also be defined, but you should not rely on this (of course, latin1 and utf8, can be used anywhere, because they are builtin). The package contains

19 \edef\io@enc{\encoding@value{latin9}}
20 \DeclareInputText{164}{"20AC}
21 \DeclareInputText{166}{"160}
22 \DeclareInputText{168}{"161}
23 \DeclareInputText{180}{"17D}
24 \DeclareInputText{184}{"17E}
25 \DeclareInputText{188}{"152}
26 \DeclareInputText{189}{"153}
27 \DeclareInputText{190}{"178}

The code above defines the latin9 (iso-8859-15) encoding. It is very like latin1, but defines the Euro sign at position 164. Defining 256 characters per encoding using this method is inefficient. For this reason you can see

28 \input@encoding@val \encoding@value{latin2} -96 160
29 160 "104 "306 "141 164 "13D "15A 167

As explained above, the command on the start of the line reads 3 integers: an encoding value (here, the encoding of latin2), a byte position and a character value. The byte position must a number between 0 and 255. Here we use an extension: If a negative number minus N has been read, followed by A such that the sum of A and N is at most 256, then N values will be read, and stored at position A and following (here N is 96, and we have shown only the first eight values).

5.4. Characters and Accents

There are some commands that put an accent over a letter. You can say a\accent 98 cde, this works in TeX, but not in Tralics: you will get an error, Unimplemented command \accent. The number 98 is read, and converted to an integer. The Unicode character will be used; thus the translated result is `abcde´.

You can say \a´e. (note: )This is a command introduced by LaTeX so as to allow accents inside a tabbing. Some care must be taken. If you say \a{par}{b} in LaTeX, you get an error of the form: Paragraph ended before \@changed@cmd was complete. The Tralics error message is: wanted a single token as argument to \a. If you say \a\foo12, there is a single token, and the error is: Bad syntax of \a, argument is \foo. In fact, the token after \a must be a valid accent character. After that \a´ is handled exactly like . You can say `\= U´, the space after the command is ignored. You cannot say `\={ U}´, the space is not removed, this is an error. In fact, the argument list of the accent command should contain exactly one token (exception: double accents will be explained later). This token should be a character, with code between 0 and 128. Hence \´Ê is wrong, you must say \´{\^E}} if you want Ế. The message is Error in accent, command = \´; Cannot put this accent on non 7-bit character É. If the token \i is given, it will be replaced by i, so that \”\i and \“i produce the same result. You can say \=\AE, \=\ae, \AE, \ae, \AA, \aa, \O, \o. The result looks like ǢǣǼǽǺǻǾǿ.

You can put an accent on a letter only in the case where this gives a Unicode character. In the case of \c{a} and \c{\=a}, the error message is the same: Error in accent, command = \c; Cannot put this accent on letter a. Table 1 indicates on which letters you can put an accent. See the html page http://www-sop.inria.fr/apics/tralics/doc-chars.html for a list of some glyphs.

Table 1. All possible accents. You can put an accent on any letter, except Q. You can put accents on non-letters, for instance \ae, see text. Some characters accept two accents. In general, you can put an accent on a lower case letter, an upper case letter. There is one exception: you cannot put a dot over a lower case I, because there is already a dot. For h, j, t w, and y, there are accents that apply only to lowercase letters.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
\^ Ââ Ĉĉ Êê Ĝĝ Ĥĥ Îî Ĵĵ Ôô Ŝŝ Ûû Ŵŵ Ŷŷ Ẑẑ
Áá Ćć Éé Ǵǵ Íí Ḱḱ Ĺĺ Ḿḿ Ńń Óó Ṕṕ Ŕŕ Śś Úú Ẃẃ Ýý Źź
\` Àà Èè Ìì Ǹǹ Òò Ùù Ẁẁ Ỳỳ
\” Ää Ëë Ḧḧ Ïï Öö Üü Ẅẅ Ẍẍ Ÿÿ
\c Çç Ḑḑ Ȩȩ Ģģ Ḩḩ Ķķ Ļļ Ņc n Ŗŗ Şş Ţţ
\u Ăă Ĕĕ Ğğ Ĭıi Ŏŏ Ŭŭ
\v Ǎǎ Čč Ďď Ěě Ǧǧ Ȟȟ Ǐǐ ǰ Ǩǩ Ľľ Ňň Ǒǒ Řř Šš Ťť Ǔǔ Žž
\~ Ãã Ẽẽ Ĩĩ Ññ Õõ Ũũ Ṽṽ Ỹỹ
\H Őő Űű
\k Ąą Ęę Įį Ǫǫ Ųų
\. Ȧȧ Ḃḃ Ċċ Ḋḋ Ėė Ḟḟ Ḣḣ İ Ŀŀ Ṁṁ Ṅṅ Ȯȯ Ṗṗ Ṙṙ Ṡṡ Ṫṫ Ẇẇ Ẋẋ Ẏẏ Żż
\= Āā Ēē Ḡḡ Ħħ Īī Ōō Ŧŧ Ūū Ȳȳ
\r Åå Ůů
\b Ḇḇ Ḏḏ Ḵḵ Ḻḻ Ṉṉ Ṟṟ Ṯṯ Ẕẕ
\d Ạạ Ḅḅ Ḍḍ Ẹẹ Ḥḥ Ịị Ḳḳ Ḷḷ Ṃṃ Ṇṇ Ọọ Ṛṛ Ṣṣ Ṭṭ Ụụ Ṿṿ Ẉẉ Ỵỵ Ẓẓ
\f Ȃȃ Ȇȇ Ȋȋ Ȏȏ Ȓȓ Ȗȗ
\C Ȁȁ Ȅȅ Ȉȉ Ȍȍ Ȑȑ Ȕȕ
\T Ḛḛ Ḭḭ Ṵṵ
\V Ḓḓ Ḙḙ Ḽḽ Ṋṋ Ṱṱ Ṷṷ
\D Ḁḁ
\h Ảả Ẻẻ Ỉỉ Ỏỏ Ủủ Ỷỷ

Some accents are not standard. Examples:

If in the table you see `I´ instead of `x´, this means that the accent applies only on capital I. If you see h, j, t, w or y, this applies only to the lower case letter. Otherwise the accent applies to both upper case letter and lower case letter.

There is a possibility to put double accents (for Vietnamese, for instance). The following ones are recognized, for upper and lower case letters, the order of the accents is irrelevant. Inside braces, there is an accent command, optional spaces, and a character (maybe enclosed in braces).

\"{\=U} \"{\=A} \"{\=O} \"{\'U} \"{\'I} \"{\`U} \.{\=A} \.{\=O}
\={\~ O} \k{\=O} \'{\~U} \'{\O} \'{\=O} \'{\=E} \'{\.S} \c{\' C}
\'{\^A} \'{\^O} \'{\^E} \`{\=O} \`{\=E} \d{\=L} \d{\=R}
\`{\^ A} \`{\^ E} \H{\'U} \H{\'O} \H{\`U} \H{\`O} \H{\h U} \H{\h O}
\H{\~U} \H{\~O} \H{\d U} \H{\d O} \d{\^A} \d{\^O} \d{\^E} \~{\^A}
\~{\^O} \~{\^E} \h{\^A} \h{\^O} \h{\^E} \u{\'A} \u{\`A} \u{\h A}
\u{\~A} \u{\d A} \~{\" O} \^{\'O} \^{\`O} \u{\c E} \.{\v S} \.{\d S}

This is the translation.

&#x1E7A; &#x1DE; &#x22A; &#x1D7; &#x1E2E; &#x1DB; &#x1E0; &#x230;
&#x22C; &#x1EC; &#x1E78; &#x1FE; &#x1E52; &#x1E16; &#x1E64; &#x1E08;
&#x1EA4; &#x1ED0; &#x1EBE; &#x1E50; &#x1E14; &#x1E38; &#x1E5C;
&#x1EA6; &#x1EC0; &#x1EE8; &#x1EDA; &#x1EEA; &#x1EDC; &#x1EEC; &#x1EDE;
&#x1EEE; &#x1EE0; &#x1EF0; &#x1EE2; &#x1EAC; &#x1ED8; &#x1EC6; &#x1EAA;
&#x1ED6; &#x1EC4; &#x1EA8; &#x1ED4; &#x1EC2; &#x1EAE; &#x1EB0; &#x1EB2;
&#x1EB4; &#x1EB6; &#x1E4E; &#x1ED0; &#x1ED2; &#x1E1C; &#x1E66; &#x1E68;
Figure 2. Some characters
img1

You can see a representation on figure 2. We show here the HTML output for these characters. Ṻ Ǟ Ȫ Ǘ Ḯ Ǜ Ǡ Ȱ Ȭ Ǭ Ṹ Ǿ Ṓ Ḗ Ṥ Ḉ Ấ Ố Ế Ṑ Ḕ Ḹ Ṝ Ầ Ề Ứ Ớ Ừ Ờ Ử Ở Ữ Ỡ Ự Ợ Ậ Ộ Ệ Ẫ Ỗ Ễ Ẩ Ổ Ể Ắ Ằ Ẳ Ẵ Ặ Ṏ Ố Ồ Ḝ Ṧ Ṩ ṻ ǟ ȫ ǘ ḯ ǜ ǡ ȱ ȭ ǭ ṹ ǿ ṓ ḗ ṥ ḉ ấ ố ế ṑ ḕ ḹ ṝ ầ ề ứ ớ ừ ờ ử ở ữ ỡ ự ợ ậ ộ ệ ẫ ỗ ễ ẩ ổ ể ắ ằ ẳ ẵ ặ ṏ ố ồ ḝ ṧ ṩ. The first character in the list is: latin capital letter u with macron and diaeresis, the second one is latin capital letter a with diaeresis and macron. The order of accents is not the same. For simplicity, in Tralics, this is irrelevant. You can notice that the LaTeX output is strange. First, we have defined \h to be a no-op. For the figure, we used the following code:

\newcommand\hook@above[1]{%
 \leavevmode\setbox0\hbox{#1}\dimen@\ht0 \advance\dimen@.5ex
 \rlap{\raise\dimen@\hbox{\kern2pt\char11}}#1}

This code works, provided that the font has, at position 11, something that looks like a hook (for the T1 encoding, this is a cedilla). In LaTeX you cannot put a \" accent on \=U. No error is signaled, it is just that TeX puts the accent before the accentee in case the accentee is not a character, instead of putting it above(note: ). You can say \"{\´U} because \´U is a character in the T1 encoding. The \mathaccent command has not these limitations(note: ). The first character of the figure was composed via

\UnicodeCharacter{x1E7A}{\ensuremath{\ddot{\mbox{\=U}}}}

In German, the umlaut character has a special meaning. The following example shows what can be input. See the babel documentation for details.

\language=2
"a"o"u"e"i"""A"O"U"I"E
"s"z"S"Z"c"C"f"F"l"L"m"M"n"N"p"P"r"R"t"T
"""-"~"|"="`"'"<">

and the Tralics translation.

äöüëïÄÖÜÏË
ßßSSSZckCKffFFllLLmmMMnnNNppPPrrRRttTT
--&#x0201E;&#x0201D«»

The previous hack does not apply if the double quote character has category code 11 (letter), is in an URL, or in a file name to be read (for instance, via \includegraphics).

The translation of the dash character is the following. If this character appears in an URL or while reading a file name, it is left unchanged. If its category code is 11 (letter), usually inside a verbatim environment, its translation is a dash followed by a \textnospace, unless you invoke Tralics with the -nozerowidthspace switch, case where the translation is a single hyphen. Otherwise, a test is made for a ligature: three hyphens in a row produce &#x2014; (mdash), and two hyphens produce &#x2013; (ndash).

The characters: :;!?«» are handled normally if inside an URL, when translating a filename, when their category code is 11 (letter, typically, inside a verbatim), or when the current language is not French. The translation of « is an opening guillemet with some space after it. If the character that follows is (after expansion), a normal space, or a ~, or a \,, it will be discarded. The translation of » is a space plus the character. If the previous character is a space, it will be removed. (TeX has a primitive \unhskip that can remove a space; the Tralics equivalent works in usual cases). The other four punctuation characters are handled like a closing guillemet. In any case, the space added by these characters is a non-breaking one.

The characters `´<> behave in a special manner, in the same case as the guillemets. In fact, if they are doubled, French guillemets will be used instead. Thus <<foo>> and ``foo'' and «foo» behave the same, if the current language is French. Otherwise, a \textnospace will be added after the character, in the same way as for a dash, namely outside an URL, file name, but if the category code is 11 (letter), and the magic switch has not been given. Example:

\language = 0
test ligatures: <<>>``''-- et --- !?:;
\language=1
test ligatures: <<>>``''-- et --- !?:;
test ligatures:\verb=<<>>``''-- et --- !?:;=

This is the translation

test ligatures: &lt;&lt;&gt;&gt;``''&#x2013; et &#x2014; !?:;
test ligatures : «  »«  »&#x2013; et &#x2014; ! ? : ;
test ligatures :<hi rend='tt'>&lt;&#x200B;&lt;&#x200B;
     &gt;&#x200B;&gt;&#x200B;`&#x200B;`&#x200B;'&#x200B;'&#x200B;-&#x200B;
     -&#x200B; et -&#x200B;-&#x200B;-&#x200B; !?:;</hi>

Conversion into HTML gives test ligatures: <<>>“”– et — !?:; test ligatures : «  »“”– et — ! ? : ; test ligatures :<<>>``''-- et --- !?:;.

The translation of the apostrophe depends on a flag. If Tralics is called with the switch -nostraightquotes, the translation is the same as \textasciiacute, the character U+B4, otherwise it is the quote character U+27. The character is handled normally if inside an URL, when translating a filename, when their category code is 11 (letter, typically, inside a verbatim). This is the translation of the same example as above, whith options -nostraightquotes and -nozerowidthspace. We added option -oe1a, this shows nobreak space as &#xA0;.

test ligatures: &lt;&lt;&gt;&gt;``&#xB4;&#xB4;&#x2013; et &#x2014; !?:;
test ligatures&#xA0;: &#xAB;&#xA0;&#xA0;&#xBB;&#xAB;&#xA0;&#xA0;
  &#xBB;&#x2013; et &#x2014;&#xA0;!&#xA0;?&#xA0;:&#xA0;;
test ligatures&#xA0;:<hi rend='tt'>&lt;&lt;&gt;&gt;``''--&#xA0;et
  &#xA0;---&#xA0;!?:;</hi>

The soul package provides some commands. Example; \ul gives test for ul, \so gives test for so, \st gives test for st, \caps gives test for caps, \hl gives test for hl.

5.5. Verbatim material

We have seen a little example of verbatim code above. It shows that some &#x200B; characters are inserted, this is so that, if the XML file is read, a double dash will not be interpreted as an en-dash. What the \verb command produces is a sequence of characters, whose category codes are 12, except for some, that are of category 11, namely `´-<>~&:;?!«». You can compare this with the LaTeX code, shown in section 2.12: the \@noligs command makes some characters of category code 13, the associated action is: output the character, with a zero kern in front. There is an exception: the space character is replaced by the \nobreakspace token, but this can be changed.

You can say \verb*+x y+ or \verb+ x y+. All characters between the two plus signs are collected. Any character can be used instead of the plus sign (Try \verb*abca and \verb =a= !). In the case where a star is given, spaces are replaced by \textvisiblespace, otherwise by \nobreakspace. You can say \DefineShortVerb\+, after that +foo+ is the same as \verb+foo+. Note that the command must be followed by something like `\+´ or `\*´, i.e., a macro whose name is formed of a single character. You can say \UndefineShortVerb\+, this will undo the previous command. The syntax is the same. If the character fits on 8 bits, the old category code is restored; otherwise, it is set to 12 (other). Note: assume that the input encoding is latin1, but you declare ^^^^abcd as a short verb. When Tralics sees the four hats, it replaces these 8 bytes by a single character, say C, and enters verbatim mode until finding character C. Since this character does not exist in the current environment, it cannot be found directly; since we are in verbatim mode, it cannot be found using the four-hat construction. For this reason an error is signalled when the end of line is reached (an implicit C character is inserted, so that next line will be translated normally).

In the case where `+´ is a short verb character, you can say \SaveVerb{foo}+\bar+. This has as effect to remember in a private command all tokens that +\bar+ gathers. When you say \UseVerb{foo}, these tokens are re-inserted in the input stream. Example:

\DefineShortVerb\+
\SaveVerb{foo}+\bar +
\UndefineShortVerb\+
\UseVerb{foo}

The transcript file will contain, for the \UseVerb command the following line.

\savedverb@foo ->\verbprefix {\verbatimfont \bar\nobreakspace }

Here, the \ before `b´ is not a command delimiter, for otherwise there would have been a space after \bar. Note: another explanation is that the `b´ is not of category code 11, so that the command is \b; exercise: find all interpretations of this line.

There are various packages that provide a verbatim-like environment. In Tralics, you can define your own via

\DefineVerbatimEnvironment{MyVerbatim}{Verbatim}{xx=yy}

This defines MyVerbatim to be an environment that behaves like Verbatim, that is an extension of the basic verbatim environnment that takes some optional parameters (here, the default value of xx is yy). The end of a verbatim environment is defined as a line that contains optional spaces, the \end token, optional spaces, the name of the environment enclosed in braces. Additional characters on the current line are assumed to be after the verbatim environment.

In the case of a verbatim environment, all characters on the line are gathered (final spaces disappear, as usual), with category codes as explained above. If this gives an empty list, a no-break space character is added(note: ). As is the case of \verb, the \verbatimfont command is prepended. This is defined to be \tt. Moreover, \verbatimprefix is also added in front of the token list. In the case of the \verb command, there is \verbprefix instead. These two commands are defined as \@empty. You can redefine them. Each line is followed by \par and \noindent. If the environment is followed by an empty line, or a \par command, this command is removed, as well as the last \indent. Example that shows use of the prefix commands:

\DefineShortVerb{\|}
\def\verbatimfont#1{{#1}}
\def\verbprefix#1{A#1A}
\def\verbatimprefix#1{B#1B}
Test: \verb+foo+ and |bar|
\UndefineShortVerb{\|}
\begin{verbatim}
line1
line2
\end{verbatim}

The translation is:

<p>Test: AfooA and AbarA</p>
<p noindent='true'>Bline1B</p>
<p noindent='true'>Bline2B</p>
<p noindent='true'></p>

The Verbatim environment is an extension of the verbatim environment. There is an optional argument, an association list. If you say `numbers=true´, then lines will be numbered (instead of `true´, you can say `left´ or `right´, or anything, the value is ignored). If you say `counter=17´, then lines will be numbered, using counter 17, if you say `counter=foo´, and `foo´ is a counter name, then lines will be numbered, using counter foo. If you say `firstnumber=N´, where N is a number, then lines will be numbered starting from N; if you say `firstnumber=last´, then lines will be numbered incrementing the previous value. The default counter is FancyVerbLine. Other features defined by the fancyvrb package have not yet been implemented.

If a line number M is given, the following piece of code is inserted before the verbatim line: {\verbatimnumberfont{M}}\space. The funny command is \let equal to \small at the start of the run. The number is incremented for each line.

Characters after \begin{Verbatim}, but on the same line, are ignored. The same is true if an optional argument is given: all characters that follow the closing bracket of the optional argument are ignored. The opening bracket is only looked for on the current line (unless the end of line character is commented out).

\begin{Verbatim}                   [numbers=true]
TEST
\end{Verbatim}
and without
\begin{Verbatim}
[ok]TEST
\end{Verbatim}
\begin{Verbatim} %
[ok] this is handled as comment
TEST
\end{Verbatim}
 
\def\verbatimfont#1{{\it #1}}
\def\verbatimnumberfont{\large}
\tracingall
\count3=4
\begin{Verbatim}[counter=3]
5,one line
\end{Verbatim}
\begin{Verbatim}[counter=03]
6,one line
\end{Verbatim}
\newcounter{vbcounter}
\setcounter{vbcounter}8
\begin{Verbatim}[counter=vbcounter]
9,one line
\end{Verbatim}
\begin{Verbatim}[counter=vbcounter]
10,one line
\end{Verbatim}

This is the translation.

<p noindent='true'><hi rend='small1'>1</hi> <hi rend='tt'>TEST</hi></p>
<p noindent='true'>and without</p>
<p noindent='true'><hi rend='tt'>[ok]TEST</hi></p>
<p noindent='true'></p>
<p noindent='true'><hi rend='tt'>TEST</hi></p>
<p noindent='true'><hi rend='large1'>5</hi> <hi rend='it'>5</hi>,one line</p>
<p noindent='true'><hi rend='large1'>6</hi> <hi rend='it'>6</hi>,one line</p>
<p noindent='true'></p>
<p noindent='true'><hi rend='large1'>9</hi> <hi rend='it'>9</hi>,one line</p>
<p noindent='true'><hi rend='large1'>10</hi> <hi rend='it'>1</hi>0,one line</p>
<p noindent='true'></p>

Two additional keywords have been added. In order to be compatible, you should add the following code to the TeX document.

\csname define@key\endcsname{FV}{style}{}
\csname define@key\endcsname{FV}{pre}{}

If you say style=foo, then the token \FV@style@foo is added in front of the token list generated by the verbatim environment. If you say pre=bar, then \FV@pre@bar is added before the token list (and before the style token mentioned above), and \FV@post@bar is inserted near the end (to be precise: before the last \par or \par\noindent. For a case like this

\begin{Verbatim}[pre=pre,style=latex,numbers=true]
first line
second line
\end{Verbatim}
third line

the tokens gathered by the verbatim environment, shown in the transcript file in verbose mode, and re-indented in order to make the structure easy to recognise, are

{Verbatim tokens:
 \FV@pre@pre \FV@style@latex
  \par \noindent {\verbatimnumberfont {1}}
      \verbatimprefix {\verbatimfont first\nobreakspace line}
  \par \noindent {\verbatimnumberfont {2}}
      \verbatimprefix {\verbatimfont second\nobreakspace line}
 \FV@post@pre
 \par \noindent }

Assume that the following definitions are given

\def\FV@pre@pre{\begin{xmlelement*}{pre}}
\def\FV@post@pre{\end{xmlelement*}}
\def\FV@style@xml{\XMLaddatt{class}{xml-code}}
%\def\verbatimnumberfont#1{\xbox{vbnumber}{#1}}

Then the translation is

<pre class='latex-code'>
<p noindent='true'>
     <hi rend='small'>1</hi>
     <hi rend='tt'>first&nbsp;line</hi></p>^^J
<p noindent='true'>
     <hi rend='small'>2</hi>
     <hi rend='tt'>second&nbsp;line</hi></p>^^J
</pre><p noindent='true'>third line^^J
</p>

Note: We have re-indented a little bit the code, and marked newline characters by ^^J. As you can see, each verbatim line gives exactly one line in the XML output, and this line is formed of a <p> element. If you apply a style sheet with the following definition

<xsl:template match="p">
  <xsl:choose>
    <xsl:when test="parent::pre">
      <xsl:apply-templates/>
    </xsl:when>
    <xsl:otherwise>
      <p>
        <xsl:if test="@noindent = 'true'">
          <xsl:attribute name="class">nofirst noindent</xsl:attribute>
        </xsl:if>
        <xsl:apply-templates/>
      </p>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

then <p> elements are discarded in a <pre>, and some action is done in case of noindented paragraphs. If moreover the translation of <pre> is defined by the following code

<xsl:template match="pre">
  <pre>
    <xsl:attribute name="class">
      <xsl:value-of select="@class"/>
    </xsl:attribute>
    <xsl:apply-templates/>
  </pre>
  <xsl:text>&#x0A;</xsl:text>
</xsl:template>

we get finally

<pre class="latex-code"><small>1</small> <tt>first line</tt>
<small>2</small> <tt>second line</tt>
</pre>
<p class="nofirst noindent">third line</p>

This is not valid HTML, since <small> is forbidden in a <pre>. We can modify the style sheet so that if <hi> is in a <pre>, then a special action is taken in the case rend=´small´; we can also remove the useless <tt>. A better solution: we uncomment the definition of \verbatimnumberfont. This will have as effect that verbatim line numbers will be in a <vbnumber> element, and we can apply the following transformation.

<xsl:template match="vbnumber">
  <span class='prenumber'>
    <xsl:apply-templates/>
  </span>
</xsl:template>

Thus, the HTML code will be

<pre class="latex-code"><span class="prenumber">1</span> first line
<span class="prenumber">2</span> second line
</pre>
<p class="nofirst noindent">third line</p>

This document was converted into HTML using the techniques shown here. The style sheet changes the background color of the <pre> element, according to its class, and the background of the <span> to the background of the page.

Note how the `style´ option of the verbatim environment gives a `class´ attribute in HTML document. If you say

\DefineVerbatimEnvironment{verbatim}{Verbatim}
{listparameters={\topsep0pt },pre=pre}

then verbatim behaves like Verbatim, said otherwise, an optional argument is scanned. Moreover, the list on the second line will be put in \verbatim@hook; whenever a verbatim environment of type `Something´ is read, the value of the command \Something@hook is considered (this should be undefined or a command that takes no argument), and the tokens are added to the optional argument, before other arguments.

You can say \numberedverbatim or \unnumberedverbatim. After that, verbatim environments will be automatically numbered or not. This does not apply to Verbatim environments.

There is a command \fvset that takes an associated list as argument. If it contains `showspaces=true´ or `showspaces=false´, this changes how spaces are interpreted in a verbatim environment or command (except for \verb*, case where the space is always visible).

5.6. Case change

There are different commands for changing the case of letters. For instance, the translation of

\uppercase{Einstéin: $E=mc^2$}
\lowercase{Einstéin: $E=mc^2$}

is

<p>EINSTÉIN: <formula type='inline'>
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow><mi>E</mi><mo>=</mo><mi>M</mi><msup><mi>C</mi> <mn>2</mn> </msup>
</mrow></math></formula>
einstéin: <formula type='inline'>
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow><mi>e</mi><mo>=</mo><mi>m</mi><msup><mi>c</mi> <mn>2</mn> </msup>
</mrow></math></formula>
</p>

There are two tables that control these conversions: the lc-table and the uc-table. If the lc value of a character is non-zero, it´s the lowercase equivalent of the character; otherwise, the character is left unchanged by \lowercase. The same is true for the uc-table. You can use \lccode and \uccode for changing these tables. They are initialized like this: for all integers x with value between `a´ and `z´, and between `à´ and `ÿ´, the uc value is x-32, the lc value is x, the same holds for x-32. There are four exceptions: the pair 215, 247, this is multiplication and division sign, and the pair 223, 255 this is ß and ÿ. On the other hand, we used the pair 255, 376 (for ÿ and Ÿ).

You can use the two commands \MakeUppercase and \MakeLowercase. These commands have a regular syntax (in the example that follows, the \expandafter would be useless for \lowercase). They convert letters, as for \uppercase and \lowercase, plus some commands that define some characters. This example shows the list of all the recognised commands.

\def\foo{foo}
\def\List{{abcABCéÉ\foo
     \oe\OE\o\O\ae\AE\dh\DH\dj\DJ\l\L\ng\NG\ss\SS\th\TH}}
\expandafter\MakeUppercase\List
\expandafter\MakeLowercase\List

The translation is

ABCABCÉÉfoo&#x152;&#x152;ØØÆÆÐÐ&#x110;&#x110;&#x141;&#x141;&#x14A;&#x14A;SSSSÞÞ
abcabcééfoo&#x153;&#x153;øøææðð&#x111;&#x111;&#x142;&#x142;&#x14B;&#x14B;ßßþþ

This gives ABCABCÉÉFOOŒŒØØÆÆÐÐĐĐŁŁŊŊSSSSÞÞ and abcabcééfooœœøøææððđđłłŋŋßßþþ.

Since Tralics version 2.9, all commands listed above expand to characters, that have a non-trivial uc/lc pair. Hence, you can say:

\def\foo{foo}
\edef\List{{abcABCéÉ\"y\"Y\foo
     \ij\IJ\oe\OE\o\O\ae\AE\dh\DH\dj\DJ\l\L\ng\NG\ss\SS\th\TH}}
\expandafter\uppercase\List
\expandafter\lowercase\List

This gives ABCABCÉÉŸŸFOOIJIJŒŒØØÆÆÐÐĐĐŁŁŊŊßSSÞÞ, and abcabcééÿÿfooijijœœøøææððđđłłŋŋßSSþþ.

5.7. Simple commands

We consider here some commands that take no arguments. Unless told otherwise, they are not allowed in math mode. A new paragraph is started (via \leavevmode) in vertical mode.

The following commands all start with `text´. They are forbidden in math mode.

The following commands are accepted in text and math mode.

Following commands expand to a Unicode Character.

5.8. The fp package

5.8.1. Introduction

This is an implementation in C++ of the package by Michael Mehlich. It implements fixed point arithmetics in TeX. Each number is formed by a sign, then 18 digits before the point and 18 digits after the point. Since 10 9 2 30 , four 32bits integers are sufficient. In the code, we shall sometimes write a number as

x= i=-6 5 b i B i =10 -18 ( i=0 11 c i B i )

where B=1000, b i and c i are integers between 0 and 999. This requires 12 integers, instead of 4, but is useful for internal operations. You can say

\FPadd\foo{10}{3.5}
\FPmul\xbar\foo\foo

This will put 13.5 in \foo and 282.5 in \xbar. In verbose mode, you will see that the transcript file contains lines of the form:

{\FPadd}
{\FP@add}
{FPread for \FP@add=+10.}
{FPread for \FP@add=+3.5}

In reality, the first input line is converted into

\FP@add\foo10..\relax3.5..\relax

Most commands follow this scheme. There are some exceptions. You can use \FPprint. This takes one argument and prints (typesets) it. The algorithm is a bit strange: if the argument list is empty, the result is 0. If the argument is 123, or more generally a list of tokens, where the first has category code 12 (other), then nothing happens, the arguments is translated normally. If the argument is `foo´, the result is `13.5´. More generally, if the first item is a character not of category code 12, the command behaves like \csname. Don´t try constructions like \FPprint{$x^2$}. You can say \FPset{gee}{foo}. The second argument is handled as for \FPprint. The first argument should be a command name, or a sequence of characters that becomes a command name via \csname. The effect of the command is the same as \def\gee{13.5}.

The general mechanism for a command like \FPadd or \FPsincos is to call intermediary commands like \FP@add or \FP@sincos. These read some command names (these must be definable, no check is make, as for \let), then parse numerical arguments, compute results and store the results in the commands. The result is always normalized: trailing zeroes are removed as well as leading zeroes (but at least one digit is returned before the point). If the number is negative a sign is added. A special case is when the result is boolean. In this case the syntax has the form

\FPiflt{0.21}{0.20} Wrong\else Correct\fi

As a side effect, \ifFPtest is made equivalent to \iftrue or \iffalse. The following line is valid in Tralics, it gives an error in LaTeX.

\iffalse  \FPiflt{0.21}{0.20} \bad\else \badagain\fi \fi

Numbers are read as follows: We assume that \FP@add sees a string that contains two dots and a \relax, see above. This means that you lose if the argument of the user command contains a \relax. Otherwise, we have a list A, a dot, a list B, a dot, a list C, then \relax. As you can see from the \FPmul\xbar\foo\foo example, these quantities are obtained by expanding the argument (here \foo) in a \edef. For some reason quantities \A and \B are expanded again. In a case like

\FPadd\foo{\noexpand\noexpand\noexpand\V}{12}

this gives \V after expansion, this is wrong. In a case like

\def\V{10.2}
\FPadd\foo{\noexpand\V.4}{12}

expansion of A is 10.2, this is equally wrong: After expansion, there should remain only digits in A and B; there can be an optional sign at the start of A: any combination of + and - characters is OK. Note that C, as well as all digits after a space in A or B are ignored. Thus, the following two lines are valid for Tralics, invalid in the TeX case.

\def\V{10 .2}
\FPadd\foo{\noexpand\V.4}{12}
\FPadd\foo{\V.4}{12}

5.8.2. The list of all commands

In the list that follows, \C, \Ca, \Cb are command names, and \V, \Va, \Vb are values.

5.8.3. Alternate syntax

The command \FPupn implements a postfix language that allows you to write shorter code. Here is an example

\FPupn\foo{7 20 2 sub 100 2000 - add +}
      \testeq\foo{1925.}
\FPupn\foo{20 2 div 100 2000 / add 3 mul 2 *}
      \testeq\foo{180.}
\def\mthree{-3}%there is no unary minus in this language
\FPupn\foo{ 3 abs mthree abs 3 sgn 10 * mthree sgn 100 * + + +}
      \testeq\foo{-84.}
\FPupn\foo{2 3 min 400 500 max +}
      \testeq\foo{502.}
\FPupn\foo{12.43745678 2 round 12.35745678 2 trunc -}
      \testeq\foo{-0.09}
\FPupn\foo{e 1.2  exp + 2.3 ln + 3 4 pow + 5 6 root +}
      \testeq\foo{72.302276955235951659}
\FPupn\foo{pi 0.7 - sin cos sincos - tan cot tancot +}
      \testeq\foo{-2.894412996263821897}
\FPupn\foo{0.3 arcsin 0.1 * arccos 0.1 * arcsincos -
           arctan arccot arctancot -}
      \testeq\foo{0.751779218345560029}
\FPupn\foo{3.4 seed random}
       \testeq\foo{0.000023479107778276}
\FPupn\foo{1.1 2.3 3.4 pop swap copy add sub}
      \testeq\foo{0.1}

The \testeq command can be used to test the code. It is an error if the two arguments are not the same. Some comments. Consider the last expression. We put 1.1, 2.3 and 3.4 on the stack. After that we pop an item. After that we swap. The stack holds 1.1 (top stack), followed by 2.3. Then we duplicate the top stack. Then we add. The topstack is now 2.2. After subtraction, we get 0.1. If you say `2 3 -´, the result is 1, because - and sub use arguments in a different order. The same is true for / and div. Note the order of 10 2 pow, this gives 1024. If strange words are seen, like `mthree´, they are replaced by \mthree. Note that `e´ and `pi´ are predefined.

If you don´t like postfix language, you can use \FPeval. Here are some examples.

\FPeval\foo{1000-100-10-1}
   \testeq\foo{889.}
\FPeval\foo{1000+100+10+1}
   \testeq\foo{1111.}
\FPeval\foo{1000-100+10+1}
   \testeq\foo{911.}
\FPeval\foo{1000+100-10+1}
   \testeq\foo{1091.}
\FPeval\xfoo{(20 - 2) + (2000-100) + 7}
   \testeq\xfoo{1925.}
\FPeval\xfoo{(20/2 + 2000/100)*3*2}
   \testeq\xfoo{180.}
\FPeval\foo{210/2/3/5}
   \testeq\foo{7.}
\FPeval\foo{210*2/3/5}
   \testeq\foo{28.}
\FPeval\foo{210/2*3/5}
   \testeq\foo{63.}
\FPeval\foo{210/2/3*5}
   \testeq\foo{175.}
\FPeval\foo{210*2*3/5}
   \testeq\foo{252.}
\FPeval\foo{210*2/3*5}
   \testeq\foo{700.}
\FPeval\foo{210/2*3*5}
   \testeq\foo{1575.}
\FPeval\xfoo{abs(3) + abs(-3) + (sgn(3)* 10) + (sgn(-3) * 100)}
   \testeq\xfoo{-84.}
\FPeval\xfoo{min(2:3) + max(400,500)}
   \testeq\xfoo{502.}
\FPeval\xfoo{round(12.43745678,2) -  trunc(12.35745678, 2)}
   \testeq\xfoo{0.09}
\FPeval\xfoo{e + exp(1.2)  + ln(2.3) + pow(3, 4) + root(5, 6)}
   \testeq\xfoo{72.302276955235951659}
\FPeval\xfooa{sin(cos(sin(0.7 - pi))) - cos(cos(sin(0.7 - pi)))}
\FPeval\xfoo{tan (cot(tan(xfooa))) + cot(cot(tan(xfooa)))}
   \testeq\xfoo{-2.894412996263821897}
\FPeval\xfooa{arcsin (arccos (arcsin(0.3)*0.1)*0.1) -
               arccos (arccos (arcsin(0.3)*0.1)*0.1)}
\FPeval\xfoo{arctan(arccot(arctan(xfooa))) - arccot(arccot(arctan(xfooa)))}
   \testeq\xfoo{0.751779218345560029}
\FPeval\foo{2+3*4+5*3^2}
   \testeq\foo{58.999999999999998665}
\FPeval\foo{3^2*5+4*3+2+1}
   \testeq\foo{59.999999999999998665}
\FPeval\foo{(+3+4)*(-5-6)}
   \testeq\foo{-77.}

If you wonder what happens, you can look the transcript file. You can see something like:

{\FPeval}
{FPpostfix 1  2  3  mul add  400 500  max  sin 4  pow add}
{\FP@upn}
{FPupcmd ??}
{FPupcmd ??}
{FPupcmd ??}
{FPupcmd mul}
{FPread for \FP@upn=+3.}
{FPread for \FP@upn=+2.}
{FPupcmd add}
{FPread for \FP@upn=+6.}
{FPread for \FP@upn=+1.}
{FPupcmd ??}
{FPupcmd ??}
{FPupcmd max}
{FPread for \FP@upn=+500.}
{FPread for \FP@upn=+400.}
{FPupcmd sin}
{FPread for \FP@upn=+500.}
{FPupcmd ??}
{FPupcmd pow}
{FPread for \FP@upn=+4.}
{FPread for \FP@upn=-0.467771805322476126}
{FPupcmd add}
{FPread for \FP@upn=+0.522845423476396576}
{FPread for \FP@upn=+7.}
{FPread for \FP@upn=+7.522845423476396576}

The second line is the expression converted from infix to postfix. Each `??´ represents a string that does not start with a letter. This is generally a number.

5.9. Action before translation

Normally, translation applies only to what is between \begin{document} and \end{document}. This is a very special environment, in fact, it leaves the semantics stack pointer unchanged. There are two hooks. You can say

\AtBeginDocument{\foo}
\AtEndDocument{\xbar}

These commands remember the tokens in a special list, that is inserted in the input stream when \begin{document} or \end{document} is seen. After that, the meaning of the command changes: it becomes `evaluate now´, more precisely \@firstofone. The last action in the begin-document hook is to change the definition again, so that an error may be signaled, for instance Can be used only in preamble: \AtBeginDocument. On the other hand, the \end{document} command inserts a special marker that closes every open file, thus stopping translation at the end of the hook (the bibliography is translated after that). The command \@onlypreamble takes as argument a command name and adds it to the list of commands that become invalid after the preamble.

Before the begin of the document, you can use commands of the form

\documentclass[doc-opt]{doc-class}
\usepackage[pack-opt]{pack-name}

There are some differences with LaTeX, see next section. If `doc-opt´ contains `useallsizes´ this is the same as if a line in the configuration file has said to use all font sizes. If it contains `french´ or `english´, this defines the default language.

Before version 2.9, the name or options of the class could indicate the top-level section; for instance, book assumed `leadingpart´ and report assumed `leadingchapter´; these keywords are no more recognised. You have to say \toplevelsection{\part} in the class file if you want it to be `part´. The default is `part´ for a book, `chapter´ for a report, `section´ otherwise. If the top-level section is part, chapter, or section, the translation of \subsection is, respectively, a <div3>, <div2> or <div1> element. Moreover, an attribute pair chapters=`true´ or part=`true´ is added to the main element, so that a post-processor can decide that <div1> is subsection, section or chapter.

If the packages `calc´, `fp´ or `fancyhdr´ are loaded, then the meaning of some command changes, as explained elsewhere. If the `babel´ package is loaded, the following languages are recognized: english, american, british, canadian, UKenglish, USenglish (these have number 0), french, francais, frenchb, acadian, canadien (these have number 1) austrian, german, germanb, naustrian, ngerman (these have number 2). The first language in the list is the default language. If a package is named `french´ or `frenchle´ or `german´, the default language is also set. The default language can be used in the attribute list of the main document element. Setting the default language also set the current language (value of \language).

Note that the current version of the babel package accepts 63 options, which are language names, and if you specify an option not in the list, for instance `foo´, then foo.ldf is loaded if possible, so that the number of options could be greater. There are three other options that are recognised by certain languages and whose purpose is to make some characters active. We have shown in section 5.4 how Tralics handles double quotes in German. For instance, in spanish, if option `activeacute´ is given then ´a is a shorthand for \´a, and this applies to 12 other characters. In catalan, you can also use option `activegrave´, this makes `a a shorthand for \`a, it applies only to A, E and O. Finally option `KeepShorthandsActive´ controls whether shorthands are activated by default. The `french´ package no longer exists, there are two versions `frenchpro´ (commercial) and `frenchle´ (free); there are two versions of the for German, `german´ and `ngerman´. All features of these packages can be found in babel (with possibly differences in the syntax). These packages have no options.

The `calc´ and `fancyhdr´ packages have no options. The `fp´ package has two options, `debug´ and `nomessages´ that are ignored by Tralics.

The standard configuration file contains lines like these:

  on package loaded calc CALC = "true"
  on package loaded foo/bar FOO1 = "true"
  on package loaded *foo/bar FOO2 = "true"
  on package loaded foo/*bar FOO3 = "true"
  on package loaded *foo/*bar FOO4 = "true"

You can also say

  on_package_loaded calc CALC = "true"
  on_package_option calc CALC = "true"
  on_class_option article CALC = "true"
  on class option */* CALC = "*+"

Before version 2.8, these lines of codes provoked some actions. They are now ignored. You should use classes and packages instead.

5.10. Classes and packages

We explain in this section how Tralics implements package and classes.

Assume that we have a file named myclass.clt, whose content is given here and will be explained later:

1 \ProvidesClass{mypackage}[2006/08/19 v1.0 myclass document class for Tralics]
2 \NeedsTeXFormat{LaTeX2e}[1995/12/01]
3 \DeclareOption{a}{\typeout{option A}}
4 \DeclareOption{b}{\typeout{option B}}
5 \DeclareOption{d}{\typeout{option D}}
6 \AtEndOfClass{\typeout{End of class}}
7 \typeout{Before execute options}
8 \ExecuteOptions{a}
9 \ProcessOptions\relax
10 \endinput

and a file named mypack.plt, containing this single line:

11 \ProvidesPackage{mypack}[2006/10/10 My package]

a file named mypack1.plt with

\ProvidesPackage{mypack1}[2006/10/10 My package]
\typeout{Loading file mypack1}
\DeclareOption{x}{}
\DeclareOption{y}{}
\DeclareOption{z}{}
\ProcessOptions \relax
\endinput

and finally a file named mypack2.plt

12 \ProvidesPackage{mypack2}[2006/10/10 My package]
13 \DeclareOption{e}{\typeout{Option E}}
14  
15 \@ifpackageloaded{mypack}
16   {\typeout{Seen package mypack}}
17   {\typeout{Package mypack missing}}
18  
19 \@ifpackagelater{mypack}{2006/11/11}
20   {\typeout{Seen good package mypack}}
21   {\typeout{Package mypack obsolete}}
22  
23 \@ifpackagewith{mypack1}{x}
24   {\typeout{Seen mypack with x}}
25   {}
26 \@ifpackagewith{mypack1}{x,y}
27   {\typeout{Seen mypack with x and y}}
28   {}
29 \@ifpackagewith{mypack1}{x,y,z}
30   {\typeout{Seen mypack with x, y and z}}
31   {}
32  
33 \ProcessOptions\relax
34 \endinput

Assume that we have a source document containing the following lines

35 \AtBeginDocument{\typeout{Begin Document}}
36 \documentclass[a,b,c,e]{myclass}[2007/03/05]
37 \typeout{In preamble}
38 \usepackage{mypack}
39 \usepackage[y,x,w]{mypack1}
40 \usepackage{mypack2}[2000/00/00]
41 \usepackage[y,x,w]{mypack1}
42 \usepackage[aa,bb]{mypack1}
43  
44 \begin{document}
45 Text
46 \end{document}

When Tralics translates the document above, you will see

47 This is tralics  2.11.7, a LaTeX to XML translator
48 Copyright INRIA/MIAOU/APICS 2002-2008, Jos\'e Grimm
49 Licensed under the CeCILL Free Software Licensing Agreement
50 Starting translation of file toto.tex.
51 Warning: class myclass claims to be mypackage.
52 Document class: mypackage 2006/08/19 v1.0 myclass document class for Tralics
53 Before execute options
54 option A
55 option A
56 option B
57 Warning: You have requested, on line 3, version
58 `2007/03/05' of class myclass,
59 but only version
60 `2006/08/19 v1.0 myclass document class for Tralics' is available
61 End of class
62 In preamble
63 Loading file mypack1
64 Unknown option `w' for package `mypack1'
65 Seen package mypack
66 Package mypack obsolete
67 Seen mypack with x
68 Seen mypack with x and y
69 Option E
70 Option clash in \usepackage mypack1
71 Old options: y,x,w.
72 New options: aa,bb.
73 Tralics Warning: Unused global option:
74     c.
75 Begin Document

The \documentclass command has three arguments, an optional one, that defines the class options, a required one that defines the class name, and an optional one that indicates a date. Tralics reads 8 digits, with some separators, but LaTeX is a bit more exigent, four digits for the year, two digits for the month, two for the year, with slashses as separator, see above.

Evaluating the command is complicated. In fact, Tralics reads the file with extension `.clt´ (instead of `.cls´), in either the current directory or the directory containing other configuration files, compares dates, and evaluates options. The behavior of \usepackage is similar. There are two differences. The first is that the mandatory argument of \usepackage can contain a comma-separated list of files; the second is that class options that are not used by the class can be used by packages. These options are called global options.

The \usepackage command must be used after \documentclass, before \begin{document}, there is a synonym \RequirePackage that can be used before the documentclass (this subltety is not implemented in Tralics, both commands are always defined the same). There is also \LoadClass; this behaves like \documentclass, with some exceptions: for instance, the options of this command are not global options; there has to be a single \documentclass (LaTeX has additional requirements).

The two commands \LoadClassWithOptions and \RequirePackageWithOptions behave the same as the commands without the `WithOptions´ but they take only two arguments: you give only a file name and maybe a date, you do not give options, because current options are used. Finally, \InputClass is a command defined by Tralics, that behaves like \input, but: the file (with extension .clt) is looked at in the same place as class files, and it can contain option declarations that apply to the current class (outside a class or package, you cannot declare options).

The class file should contain an identification line. This is like line 1 above, starting with the command \ProvidesClass. You can also use \ProvidesPackage, the behavior is the same. You can also use \ProvidesFile; in this case the identification line is printed on the transcript file, nothing more happens. In the case of the two other commands, the line is printed on the transcript file (in the case of a class, on the terminal as well, see transcript, line 52), and the date is parsed and remembered. The argument of the command should be the same as the file name, or else a warning is printed (line 51).

You can use the commands \AtEndOfPackage or \AtEndOfClass. These commands take an argument, whose content is added to the list of commands to be executed at the end of the class or package. In fact, when the end of file is seen, Tralics will insert and evaluate these tokens (example line 61); moreover a warning will be issued if there are options and the package does not process them (either via \ProcessOptions or via \PassOptionToPackage). Finally, a warning will be issued if the class or package is obsolete, i.e., earlier than the date argument of the usepackage or documentclass command (lines 57 to 60).

The commands \@ifclassloaded or \@ifpackageloaded take three arguments, P, A and B; they evaluate the token list A in case the class or package P is loaded, the token list B otherwise (example line 65). The commands \@ifclasslater or \@ifpackagelater take four arguments, P, D, A and B; they evaluate the token list A in case the class or package P is loaded with a date more recent than D, the token list B otherwise (example line 66). The commands \@ifclasswith or \@ifpackagewith take four arguments, P, L, A and B; they evaluate the token list A in case the class or package P is loaded with options L, the token list B otherwise (example line 67, 68). The order of elements in L is irrelevant, the test is true if the package has been loaded with additional options.

The two commands \PassOptionToClass and \PassOptionToPackage take two arguments: an option list and a class name (or a package name), they add the options to the list of options of the package (this is uselesss if the class or package is not loaded later).

The command \DeclareOption takes two arguments A and B, where A is an option name and B a token list, the action associated to the option. If the option is processed, this list is evaluated. The command \ExecuteOptions takes a list of options as argument, and processes them in the given order (it processes only options defined in the current file). In the transcript given above, line 54 is the result of such an action. The command \ProcessOptions executes all options relevant to the current file. In the example, we have four class options, a, b, c, and e. The class defines a, b, and d, you see the result on lines 55 and 56. Nothing special happens for an option like d that is defined in the class but never referenced. Undefined package options generally provoke a warning (line 64). Unused class options (like c and e) become global options. In fact, option e is defined in `mypack2´, see transcript line 69. Thus, we have a single unused global option (see lines 73 and 74). Note the order of evaluation: if a star follows \ProcessOptions, the order is defined by the user (main document) otherwise by the system (class or package). The LaTeX documentation claims that reading this star may provoke expansion of commands that follows(note: ), hence advises to use \relax in the case where no star is given.

If you say \DeclareOption*{\foo}, the \foo command is applied to every options not secified elsewhere. The name of the option will be in \CurrentOption. You can use \OptionNotUsed if you want to say that an option is not used. The command \NeedsTeXFormat takes a normal argument and an optional argument, but Tralics does nothing with them. The command \default@ds is an alias for \OptionNotUsed. You should not use it.

We give now an extract of the transcript file. Notice that, usually, the optional date argument of the \usepackage command is omitted, and Tralics reads the start of the next line in order to see if it is there. For instance, the first token on input line 6 is read in order to see if it is an open bracket (transcript line 118, 119). The command is evaluated later (transcript line 130). Note that Tralics restores a variable, cur_file_pos, this is the index of the current file in the list of packages or classes; it is zero outside a class or package. You can also see that the category code of the `@´ character is set to 11 (letter) at the start of the file, and restored later.

Read carefully lines 99 to 102. In non-verbose mode, only line 101 is printed. This line says that options a, b, a, and b are executed. In fact, if the package defines options A and B, uses options a and b, and we have global options x, y, there is a first pass; we mark all options from the list A, B that are in the list a, b, x, y (see transcript, lines 99 and 100, or 145, 146); when an option is marked, its code is put in the todo list, and its code is removed. As a side effect, options are executed only once (on the other hand \ExecuteOptions leaves the option unchanged, so that the code may be executed more than once; once options are processed, executing them is a no-op). If you say \ProcessOptions*, the loop is on the global option list x, y, options are marked if they are in A, B. There is then a second pass on the list a, b. If the element is in the list A, B, it will be executed (in the case where is no star in the command, the option has already been used; otherwise it will be added to the to-do list; notice how this defines the order of evaluation of the options). If the element is not in the list, a fall back behaviour is used; if \DeclareOption* {\foo} has been issued, then then \def \CurrentOption {a} \foo will be added to the to-do list. Otherwise a warning is printed in the case of a package (see line 147), the option is added to the list of global options otherwise. The todo list is executed at the end of this second loop.

76 [3] \documentclass[a,b,c,e]{myclass}[2007/03/05]
77 {\documentclass}
78 ++ file myclass.clt exists.
79 ++ Input stack ++ 1 myclass.clt
80 ++ Made @ a letter
81 ++ Opened file myclass.clt; it has 15 lines
82 [1]
83 [2] \ProvidesClass{mypackage}[2006/08/19 v1.0 myclass document class for Tralics]
84 Warning: class myclass claims to be mypackage.
85 Document class: mypackage 2006/08/19 v1.0 myclass document class for Tralics
86 [3] \NeedsTeXFormat{LaTeX2e}[1995/12/01]
87 [4]
88 [5] \DeclareOption{a}{\typeout{option A}}
89 [6] \DeclareOption{b}{\typeout{option B}}
90 [7] \DeclareOption{d}{\typeout{option D}}
91 [8] \AtEndOfClass{\typeout{End of class}}
92 [9] \typeout{Before execute options}
93 Before execute options
94 [10] \ExecuteOptions{a}
95 {Options to execute->a}
96 {Options code to execute->\typeout{option A}}
97 option A
98 [11] \ProcessOptions\relax
99 Marking option a
100 Marking option b
101 {Options to execute->a,b,a,b}
102 {Options code to execute->\typeout{option A}\typeout{option B}}
103 option A
104 option B
105 [12] \endinput
106 ++ End of file myclass.clt
107 ++ Catcode of @ restored to 12
108 Warning: You have requested, on line 3, version
109 `2007/03/05' of class myclass,
110 but only version
111 `2006/08/19 v1.0 myclass document class for Tralics' is available
112 ++ cur_file_pos restored to 0
113 ++ Input stack -- 1 myclass.clt
114 {\typeout}
115 End of class
116 [4] \typeout{In preamble}
117 In preamble
118 [5] \usepackage{mypack}
119 [6] \usepackage[y,x,w]{mypack1}
120 ++ file mypack.plt exists.
121 ++ Input stack ++ 1 mypack.plt
122 ++ Made @ a letter
123 ++ Opened file mypack.plt; it has 1 lines
124 [1] \ProvidesPackage{mypack}[2006/10/10 My package]
125 Package: mypack 2006/10/10 My package
126 ++ End of file mypack.plt
127 ++ Catcode of @ restored to 12
128 ++ cur_file_pos restored to 0
129 ++ Input stack -- 1 mypack.plt
130 {\usepackage}
131 [7] \usepackage{mypack2}[2000/00/00]
132 ++ file mypack1.plt exists.
133 ++ Input stack ++ 1 mypack1.plt
134 ....
135 {\usepackage}
136 ++ file mypack2.plt exists.
137 ++ Input stack ++ 1 mypack2.plt
138 ++ Made @ a letter
139 ++ Opened file mypack2.plt; it has 27 lines
140 ...
141 [9] \usepackage[aa,bb]{mypack1}
142 {\usepackage}
143 [10]
144 Option clash in \usepackage mypack1
145 Old options: y,x,w.
146 New options: aa,bb.
147 {\par}
148 [11] \begin{document}
149 {\begin}
150 {\begin document}
151 +stack: level + 2 for environment
152 {\document}
153 +stack: ending environment document; resuming document.
154 +stack: level - 2 for environment
155 +stack: level set to 1
156 ++ Input stack ++ 1 (AtBeginDocument hook)
157 [1] \let\do\noexpand\ignorespaces
158 ++ End of virtual file.
159 ++ cur_file_pos restored to 0
160 ++ Input stack -- 1 (AtBeginDocument hook)
161 LaTeX Warning: Unused global option(s):
162     c.
163 atbegindocumenthook= \typeout{Begin Document}\let\AtBeginDocument
164     \@notprerr\let\do\noexpand\ignorespaces

No error is signaled if the class or package does not exist. If you compile the example below with Tralics, no error is signaled, if you compile with LaTeX, the following errors are signaled:

1 \documentclass{article}
2  
3 \usepackage[foo]{calc}
4 \usepackage[a,b]{xcalc}
5 \usepackage[a]{xcalc}
6 \makeatletter
7 \@ifpackagewith{calc}{foo}{}{\bad}
8 \@ifpackagelater{calc}{2005/12/12}{}{\xbad}
9 \@ifpackageloaded{xcalc}{\ybad}{}
10 \@ifpackagelater{xcalc}{2000/01/02}{\zbad}{}
11 \@ifpackagewith{xcalc}{a}{}{\tbad}
12 \begin{document}
13 \end{document}

Note that \usepackage takes an optional argument, not given here, so that the first four errors are signaled after looking ahead for one token, they correspond to commands on line 3 and 4. In the case of Tralics, if a package is not found, the \usepackage declaration is ignored, this explains why \ybad is not called; no error is signaled for unknown options; however, the options are remembered, so that the second \usepackage will not try to load the file again, but checks the options. Builtin packages (calc, fp, fancyhdr, babel, french, frenchle, german) behave in a special way. If no `plt´ file is found, they are remembered in the table with 2006/01/01 as date; since the calc package is v4.1b, dated 1998/07/07, this explains why Tralics and LaTeX disagree when asked whether or not the package is older than december 2005. On the other hand, the default date of a file is 0000/00/00, so that an inexistant package is never later than a non-null date; the same is true for a package that does not provide a date, or before the identification of the package is evaluated; thus \zbad is not called.

The xkeyval package is an extension to keyval. It provides three extensions, that can be used in a package or a class:

\DeclareOptionX{opF}{\def\opF{#1}}
\DeclareOptionX{Cw}{}
\ExecuteOptionsX{keya,keyb=1}
\ProcessOptionsX \relax

The main document can start with

\documentclass[Cu,Cv,Cw,foo=E,opF={\foobar,gee},unused,Unused=U]{article}
\usepackage[opA,opB=C,epW=5,opC=\foo,opE]{testkeyval}

Let´s make the following assumptions: the testkeyval package defines all keys used in the execute optionsX command, and most of the keys used in the usepackage declaration. You will see the following

xkeyval: Unknown option `epW=5'
xkeyval: Unknown option `opE'

We have shown above that the package defines opF and Cw; these are global class options. The package sees option oF with value, formed a a brace, a command, a comma, three characters, and a closing brace. You will see

Tralics Warning: Unused global options
   unused,Unused=U.

This mechanism is neither fully compatible with pure LaTeX, nor with the xkeyval extension, but should work in all practical cases.

5.11. Expandable commands

We give here the list of all expandable commands, and some examples. Missing in this list are: some commands defined in the next section, all commands defined via \def in the C++ code, see section 6.13.

5.12. Other expandable commands

Not all commands defined here can be expanded, see the list at the end of Chapter 2. Essentially, this describes the ifthen package, the calc package, the newtheorem mechanism, and some input-output commands.

5.13. Other non-expandable commands

5.14. Special commands

We introduced the command \@reevaluate. It takes three arguments A, B and C, and applies A and B on C. The important point here is that C is read as text (not tokenized), so that category code changes are allowed. There is a starred version, in which A, B, C are environments. In fact, arguments A and B are environment names, and C is the content of the current environment (see example for how to use it).

With the following definitions

\makeatletter
\notrivialmath=1
\def\foo#1{x#1x}
\def\xbar{\catcode`\$=12\catcode`\^=12 \ybar}
\def\ybar#1{y#1y}
\newenvironment{wbar}{\catcode`\$=12\catcode`\^=12w}{w}
\newcommand\Fct{\@reevaluate\foo\xbar}
\newenvironment{Env}{\@reevaluate*{center}{wbar}}{}

the translation of

{\Fct{$1^{er}$}}
\begin{Env}$3^{eme}$ \end{Env}

is

<p>x1<hi rend='sup'>er</hi>xy$1^er$y</p>
<p rend='center'>3<hi rend='sup'>e</hi></p>
<p>w$3^eme$ w</p>

This is a part of the transcript file showing the expansion of the command.

[11] {\Fct{$1^{er}$}}
{begin-group character}
+stack: level + 2 for brace
\Fct ->\@reevaluate \foo \xbar
{\@reevaluate}
{Reeval: \foo{$1^{er}$}%
\xbar{$1^{er}$}%
}

This shows the expansion in the case of a starred command. Note that the current environment is terminated; then everything up to \end{whatever} is read.

[12] \begin{Env}$3^{eme}$ \end{Env}
{\begin}
{\begin Env}
+stack: level + 2 for environment
\Env ->\@reevaluate *{center}{wbar}
{\@reevaluate}
+stack: ending environment Env; resuming document.
+stack: level - 2 for environment
{Reeval: \begin{center}$3^{eme}$ \end{center}%
\begin{wbar}$3^{eme}$ \end{wbar}%
}

5.15. Trees

We explain here some commands from the tree-dvips package by Emma Pease. A tree is defined by some nodes and connectors. Each node has a name, whose scope is limited to the current page (Tralics does no validity test for the names). A connector can be attached to the top, bottom, left or right of a node (abreviation is one character of `tblr´), or a corner (two letter, one of `tb´ followed by one of `lr´).

For instance

\node{a}{Value of node A}
\nodepoint{b} \nodepoint{c}[3pt]\nodepoint{d}[4pt][5pt]
\nodeconnect{a}{b}
\nodeconnect[tl]{a}[r]{c}
\anodeconnect{a}{b}
\anodeconnect[tl]{a}[r]{c}
\barnodeconnect[3pt]{a}{d}
\nodecurve{a}{b}{2pt} ?
\nodecurve[l]{a}[r]{b}{2pt}[3pt]
\nodetriangle{a}{b}
\nodebox{a}
\nodeoval{a}
\nodecircle[3pt]{a}

Translation

<node name='a'>Value of node A</node>
<node name='b'/>
<node xpos='3pt' name='c'/><node ypos='5pt' xpos='4pt' name='d'/>
<nodeconnect nameA='a' nameB='b' posA='b' posB='t'/>
<nodeconnect nameA='a' nameB='c' posA='tl' posB='r'/>
<anodeconnect nameA='a' nameB='b' posA='b' posB='t'/>
<anodeconnect nameA='a' nameB='c' posA='tl' posB='r'/>
<barnodeconnect nameA='a' nameB='d' depth='3pt'/>
<nodecurve nameA='a' nameB='b' posA='b' posB='t' depthB='2pt' depthA='2pt'/>?
<nodecurve nameA='a' nameB='b' posA='l' posB='r' depthB='3pt' depthA='2pt'/>
<nodetriangle nameB='b' nameA='a'/>
<nodebox nameA='a'/>
<nodeoval nameA='a'/>
<nodecircle nameA='a' depth='3pt'/>

5.16. Linguistic macros

The gb4e package allows you to input the following (extract of the thesis of C. Romero)

\begin{exe}
\ex \label{agen1}
\gll ... \th et hit er {\bf \textit{ahte}}.\\
     ... that OBJ-it already PRET-possessed.\\
\glt \textit{... that (he) already owned it.} (CMLAMBX1,31.377)
 
\ex \label{agen2}
\gll ... the love that men to hym {\bf \textit{owen}}.\\
     ... the love that SUBJ-men to OBJ-him PRES-owe.\\
\glt \textit{... the love that men owe him.} (CMCTPARS,313.C2.1087)
\end{exe}

The exe environment is used for numbered examples; it is implemented as a list environment, the \ex command behaves like \item (each item is numbered, the item number is saved is a global counter). The TeX source of the package (as used by Tralics) can be found in the distribution. The non-trivial part in the example above is the \gll command. It takes three lines of text (there is also \glll that takes four lines), the first line is a sequence of words (here in old English), the second line another sequence (translated literally, with possible annotations), and the last line is the translation of the whole, with a bibliographic reference. Words in the first two lines are vertically aligned. The algorithm (by Marcel R. van der Goot) is the following; the list is split into words (a space acts as a word separator), each word is typeset via:

\hbox{#2\strut#3 }% adds space

where #3 is the word, and #2 is \eachwordone for the first line, \eachwordtwo for the second line, and \eachwordthree for the third line (case of \glll). These commands default to \rm. The words are put in a list (a \vbox, argument #1) like this

\setbox#1=\vbox{\hbox{XXX}\unvbox#1}

After that, the two or three lists are merged (the code uses \unvbox and \lastbox in order to get the next element of the list). The command \vtop is used to put two words one above the other, and these boxes are merged together using the following code

\setbox\gline=\hbox{\unhbox\gline \hskip\glossglue \vtop{XXX}}

The glue betweeen the boxes is 0pt plus 2pt minus 1pt (remember that each hbox is terminated by some glue). The Tralics implementation is the following. There are two commands \cgloss@gll and \cgloss@glll written in C++, and the package renames them to \gll and \glll. It is not clear what the translation should be (a list of boxes containing boxes?) In the current implementation, we use a table. This means that the resulting XML is easy to interpret; the only drawback is that we loose linebreaks (from the \glossglue). This is the translation of the example.

<list type='description'>
<item id='uid1692' label='650'>
<table rend='inline'><row><cell halign='left'>...</cell>
<cell halign='left'>þet</cell>
<cell halign='left'>hit</cell>
<cell halign='left'>er</cell>
<cell halign='left'><hi rend='bold'/><hi rend='it'>
    <hi rend='bold'>ahte</hi></hi><hi rend='bold'/>.</cell>
</row><row><cell halign='left'>...</cell>
<cell halign='left'>that</cell>
<cell halign='left'>OBJ-it</cell>
<cell halign='left'>already</cell>
<cell halign='left'>PRET-possessed.</cell>
</row></table>
<p noindent='true'><hi rend='it'>... that (he) already owned it.</hi>
     (CMLAMBX1,31.377)</p>
</item>
<item id='uid1693' label='651'>
<table rend='inline'><row><cell halign='left'>...</cell>
<cell halign='left'>the</cell>
<cell halign='left'>love</cell>
<cell halign='left'>that</cell>
<cell halign='left'>men</cell>
<cell halign='left'>to</cell>
<cell halign='left'>hym</cell>
<cell halign='left'><hi rend='bold'/><hi rend='it'>
        <hi rend='bold'>owen</hi></hi><hi rend='bold'/>.</cell>
</row><row><cell halign='left'>...</cell>
<cell halign='left'>the</cell>
<cell halign='left'>love</cell>
<cell halign='left'>that</cell>
<cell halign='left'>SUBJ-men</cell>
<cell halign='left'>to</cell>
<cell halign='left'>OBJ-him</cell>
<cell halign='left'>PRES-owe.</cell>
</row></table>
<p noindent='true'><hi rend='it'>... the love that men owe him.</hi>
     (CMCTPARS,313.C2.1087)</p>
</item></list>

5.17. Special parsing rules

In the TeXbook, chapter 24, you will find the definition of <general text>. This rule explains that TeX expects a brace-delimited list of tokens, where the starting brace can be either a character, or a token like \bgroup; it can be preceded by optional spaces and \relax tokens. We give here a list of all cases where this rule can be applied.

6. Running Tralics

6.1. Introduction

There is a number of ways to alter the translation of your TeX document. One solution consists in using a ult file: this is a TeX file that Tralics loads automatically before the main source. The file has the same name as the main source, with a different extension, and is in the same directory.

All other configuration files are searched in a list of directories (default being confdir). There are four such types: file with extensions clt and plt are TeX files, that contains code associated to classes and packages (the u in ult stands for user, the other letters are for LaTeX and Tralics).

The file .tralics_rc is known as the default configuration file, its use is considered obsolete. Configuration files of this kind consist in a sequence of subfiles, and a rule for choosing a Type, that is, either a subfile or an external file, for instance ra2007.tcf. The suffix tcf stands for Tralics configuration file, there structure and use is explained here. The default value for the Type is the current document class. In the description of command line arguments below, some options are marked `Raweb only´, this means that they are meaningful only when the Type (after removal of trailing digits) is ra.

The tcf file defines the DOCTYPE: this is the second line of the XML output; if the doctype is foo+bar.dtd, this means that the dtd file is bar.dtd and the root element is <foo>. The DOCTYPE can also be given as a command line argument or in the TeX source using a special syntax.

The tcf file may contain a sequence of assignments. Some of them control the attributes of the root element, but in general they alter the name of XML elements and attributes. These names can also be given as command line argument, or in the TeX source.

The tcf file may contain some TeX code. In fact, the file ra.tcf contains code that ought to be in ra.clt, and exists only for historical reasons.

Finally, a tcf file can contain a TitlePage block: this is a description of how commands like \maketitle can be translated using meta-data (title, author, keywords, etc) defined earlier.

6.2. The command line arguments

If you call Tralics without arguments, you will see something like

This is tralics 2.11.7, a LaTeX to XML translator, running on macarthur
Copyright INRIA/MIAOU/APICS 2002-2008, Jos\'e Grimm
Licensed under the CeCILL Free Software Licensing Agreement
Say tralics --help to get some help

In any case, the first three lines are printed. The version number may vary; we shall describe here the behavior of version 2.12 (released in April 2008). Command line arguments are read and interpreted from left to right. If an argument does not start with a hyphen, it is the name of the source file (only one input file is accepted); otherwise it is called an option. Some option names are shown with a hyphen, it is optional (in fact, dashes and underscores are ignored in option names), so that `-help´ and `--help´ are synonyms. Some options take no argument, for instance -version (whose effect is to print the version number and quit); others, for instance -input-file, take an argument. The argument is the character string that follows, preceded by an optional equals sign. Example

tralics -foo = bar gee     #1
tralics -foo= bar gee
tralics -foo =bar gee
tralics -foo=bar gee       #4
tralics -foo = "bar gee"   #5
tralics -foo = bar\ gee
tralics -foo  bar\ gee
tralics -foo = " bar gee"  #8
tralics -foo = \ bar\ gee
tralics -foo  \ bar\ gee
tralics "-foo = bar gee"   #11
tralics -foo\ =\ bar\ gee\

We assume here that a command line interpreter (usually called a shell) reads the line you type, converts it in character strings, finds the executable program associated to the first string, and calls it with all these strings as arguments. There are five arguments on the first line (the first argument is the name of the program, it is currently ignored). We assume here that spaces can be inserted into an argument by either enclosing the string in quotes, or by escaping the space with a backslash, and that characters after a sharp sign are ignored. Assume that -foo is a Tralics option that takes a value; then the previous line are interpreted as follows.

The first three examples are similar but for spaces around the equals sign. Cases 1 and 4 are equivalent, the argument of -foo is bar, and there is a second option gee. In case 2, the argument of -foo is empty, and there are two options bar, gee. In case 3, the optional equals sign is omitted, hence the argument is =bar, and there is a second option. Thus you should either put no space or two spaces surrounding the equals sign.

Remaining examples show what happens if you put spaces in the argument. In cases 5, 6 and 7, the argument is bar-space-gee. In cases 8, 9, 10 it is space-bar-space-gee. Lines 11 and 12 are the same, except for the trailing space. Since Tralics removes spaces before and after the equals sign, the argument is bar-space-gee (plus space in the last case).

Here is the list of all options, in alphabetic order.

Example. Assume that we have a file, named xii.tex, containing

\let~\catcode~`76~`A13~`F1~`j00~`P2jdefA71F~`7113jdefPALLF
PA''FwPA;;FPAZZFLaLPA//71F71iPAHHFLPAzzFenPASSFthP;A$$FevP
A@@FfPARR717273F737271P;ADDFRgniPAWW71FPATTFvePA**FstRsamP
AGGFRruoPAqq71.72.F717271PAYY7172F727171PA??Fi*LmPA&&71jfi
Fjfi71PAVVFjbigskipRPWGAUU71727374 75,76Fjpar71727375Djifx
:76jelse&U76jfiPLAKK7172F71l7271PAXX71FVLnOSeL71SLRyadR@oL
RrhC?yLRurtKFeLPFovPgaTLtReRomL;PABB71 72,73:Fjif.73.jelse
B73:jfiXF71PU71 72,73:PWs;AMM71F71diPAJJFRdriPAQQFRsreLPAI
I71Fo71dPA!!FRgiePBt'el@ lTLqdrYmu.Q.,Ke;vz vzLqpip.Q.,tz;
;Lql.IrsZ.eap,qn.i. i.eLlMaesLdRcna,;!;h htLqm.MRasZ.ilk,%
s$;z zLqs'.ansZ.Ymi,/sx ;LYegseZRyal,@i;@ TLRlogdLrDsW,@;G
LcYlaDLbJsW,SWXJW ree @rzchLhzsW,;WERcesInW qt.'oL.Rtrul;e
doTsW,Wk;Rri@stW aHAHHFndZPpqar.tridgeLinZpe.LtYer.W,:jbye

If you call Tralics, with the option `find-words´, you can see that the XML file contains once drumming and drummers, twice piping and pipers, 3 times leaping and lords, 4 times dancing and ladies, 5 times milking and maids, 6 times swimming and swans, 7 times laying and geese, 8 times rings and gold, 9 times calling and birds, 10 times hens and french, 11 times doves, turtle and `and´, 12 times tree, pear, in, partridge, me, to, gave, love, true, my, christmas, of, day, the, on. There are 45 words with a single letter. The words: twelve, eleven, ten, nine, eight, seven, six, five, four, three, two, appear x times, where 13-x is the value of the word. The words first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth, eleventh, twelfth appear once. Amazing isn´t it? The file was written by D. Carlisle, it is available on the CTAN. This is not really a LaTeX file, so that some features cannot be applied (for instance, there is no at-begin-document hook). The \bye command was implemented in Tralics for this example to compile without error.

6.3. Configuration files

A configuration file is a way to alter the translation, using a special syntax. It contains some rules that define a Type and some tcf blocks, where a tcf block is identical to the content of a tcf file, and a Type is the name of a tcf file (tcf stands for Tralics configuration file). The Type can be given as a command line argument, or in the main source, provided that the following magic line appears near the beginning of the document (the tcf file name is between quotes):

% Tralics configuration file 'test0.tcf'

A tcf file may contain some blocks: for instance, a TitlePage block, described later, or a Command block, that contains LaTeX commands inserted at the start of the document; it contains also assignments of various types. In particular, it contains the Document Type used in the XML output. As already mentioned, the Document Type information can be given as a command line argument; it may also be given in the main source file, if a magic like the following appears near the start of the document (the DTD is classes.dtd, with <book> as root element):

% Tralics DOCTYPE = book classes.dtd

We explain here the default configuration file (that has little use anymore), the old default configuration file (in use before 2006), the tcf file for the Raweb, a tcf and plt file for Research Reports (we will show how the same document can be compiled in two different ways).

6.3.1. The standard configuration file

We give here the content of the standard configuration file. As you can see, there are lots of comments. There is one assignment, this is a rule that says that the Type to use is the document class of the input file. There is a block that says that if this Type is report, book, article, minimal, and if no tcf file is found for this Type, then std.tcf should be used instead; this block says also that torture1 and torture2 are aliases for torture (used only for debugging). Finally, there is a block defining std.tcf: it says that the Doctype to use is classes.dtd, and the root element is <std>.

Lines 2 and 3 were modified: we added the letter `x´ after the dollar sign, for otherwise RCS would replace the identification of the original file by the identification of the LaTeX file.

# This is a configuration file for tralics.
# $xId: tralics_rc,v 2.24 2006/07/24 08:23:17 grimm Exp $
## tralics ident rc=standard $xRevision: 2.24 $
 
# Copyright Inria. Jos\'e Grimm. Apics. 2004/2005 2006
# This file is part of Tralics.
 
% Some comments: comments start with % or #
 
% this means: take the documentclass value as type name
Type = \documentclass
 
## First type defined is the default. Since version 2.8,  there is only
## one type.
 
BeginType std#      standard latex classes
  DocType = std classes.dtd
End
 
BeginAlias
  torture torture1 torture2
  std report book article minimal
End

Comments: If no configuration file is found, default rules apply. In particular, the default Type is the document class, and if no tcf file is found, the DocType to use is unknown from unknown.dtd, unless it is a standard LaTeX class, case where std from classes.dtd is used. This means that the standard configuration file has become useless.

6.3.2. The old configuration file

We shall describe here the old configuration file, used before the notion of tcf files was invented.

Lines starting with a sharp sign or percent sign are comment lines (ignored). Some lines start with `Begin´, and others with `End´. To each Begin, there should be an associated End. Blocks can be nested. Characters after `End´ are ignored, so that you can say `BeginFoo´ followed by `EndBar´, although it is not recommended. All other lines should be comment lines, empty, or indented.

Note the `x´ after the dollar sign; it does not appear in the source file, (see comment in the previous subsection). The third line is a bit special: when Tralics loads the file, it prints the revision number on the terminal.

1 # This is a configuration file for tralics.
2 # $xId: tralics_rc,v 2.15 2005/08/02 09:22:56 grimm Exp $
3 ## tralics ident rc=standard $xRevision: 2.15 $

This is the Copyright notice. In the current version, the semantics of the RA is in the ra.tcf file (described later).

4 # Copyright Inria. José Grimm. Apics. 2004/2005
5 # This file is part of Tralics.
6 # (See the file COPYING in the Tralics main directory)
7 # If you modify this file, by changing the semantics of type RA,
8 # please remove the `standard' on the `tralics ident' line above,
9 # or replace it by `non-standard'.
10  
11 % Some comments: comments start with % or #

A configuration file is split into main sections, one for each type. We start with the RA, or raweb.

12 ## configuration for the RA (Inria's Activity Report)
13 ## First type defined is the default
14  
15 BeginType RA     % Case RA
16   DocType = raweb raweb3.dtd
17   DocAttrib = year \specialyear
18   DocAttrib = creator \tralics

This comment explains how to parametrize some element or attribute names that were built-in in a previous version of Tralics. We shall see later how Language can be used (default value is `language´), the same for lang_en and lang_fr that have `english´ and `french´ as default value. Translation of a \caption produces an element <caption>, whose name will be changed to <head> by the post-processor of figures (it will be left unchanged if the caption is not in a figure or a table). The variable xml_caption_name can be used to change the first name, and xml_scaption_name can be used to change the second name. The title of a `topics´ (defined by \declaretopics) is in a <t_title> element, the name can be changed by xml_topic_title. A reference to a topic uses the num attribute; this attribute name can be changed by att_topic_num. The identification of an Inria Team is in <accueil>, this can be changed via xml_accueil_name. It is formed of a long name in <projetdeveloppe> and a short name in <projet>, the name of these elements can be changed via xml_project_name or the `expanded´ version. The section with the composition of the team is <composition>, its name can be changed by xml_composition_ra_name .

19   #(new)
20 #  Language = "xml:lang"
21 #  lang_en =  "en"
22 #  xml_scaption_name= "caption"
23 # xml_topic_title=""
24 #  xml_project_name = "title"
25 #  xml_expanded_project_name = "longtitle"
26 #  xml_accueil_name = "identification"
27 #  xml_composition_ra_name = "team"
28 #  att_topic_num = "id"

Processing of the Raweb needs converting the XML output of Tralics into XSL/Format, HTML, etc., via some external commands like `xsltproc´, `latex´, etc. Originally, Tralics was in charge of these commands, and the configuration file explains how to call these tools. These lines are not needed anymore.

29   makefo="xsltproc --catalogs -o %A.fo %B %C";
30   makehtml = "xsltproc --catalogs  %B %C";
31   call_lint = "xmllint --catalogs --valid  --noout %C"
32   makepdf = "pdflatex -interaction=nonstopmode %w"
33   %makedvi = "latex -interaction=nonstopmode %w"
34   % makedvi et dvips pour marie-pierre
35   %dvitops = "dvips %w.dvi -o %w.ps"
36   %makedvi = "latex -interaction=nonstopmode %w"
37   generatedvi = "latex -interaction=nonstopmode %t"
38   % old latex: "latex \\nonstopmode\\input{%t}"
39   generateps = "dvips %t.dvi -o %t.ps"

This defines the list of valid Raweb sections, themes and URs (research units). If you change these lines please: a) remove the `standard´ on line 3, or b) make sure that it matches the official list, or c) make sure that this remains a private copy. A star after a section name says that topics are not allowed(note: ).

40 #these are new in version 2.0
41   theme_vals = "com cog num sym bio"
42   section_vals = "composition*/presentation*/fondements/domaine/logiciels/"
43   section_vals = "+resultats/contrats*/international*/diffusion*/"
44   ur_vals = "Rocquencourt//Sophia/Sophia Antipolis/Rennes//Lorraine//";
45   ur_vals = "+RhoneAlpes/Rhône-Alpes/Futurs//"

Due to some inertia, people continue using the obsolete environment. We make sure an error is signaled.

46 BeginCommands
47  \newenvironment{body}{\obsoleteEnvBody The body environment is %
48      obsolete since 2003}
49    {End of obsolete environment body}
50  \newenvironment{abstract}{\obsoleteEnvAbstract The abstract %
51     environment is obsolete since 2003}
52   {End of obsolete environment abstract}
53 EndCommands
54  
55 End

This is an example of titlepage environment; it will be discussed later. In fact, we shall give below the content of the RR.tcf file, it is identical.

56 ## configuration for the RR (Research Report of Inria)
57 ## not yet complete
58  
59 BeginType RR#      Case RR
60 ...
89 EndType

A short definition for standard classes.

90 BeginType std#      standard latex classes
91   DocType = std classes.dtd
92   xml_biblio = "bibliography"
93 End

Some aliases.

94 # (types Article and slides are not defined, hence this is useless)
95  
96 BeginAlias
97   Article report
98   slides inriaslides foiltex
99 End

This command has to be outside any block.

100 % this means: take the documentclass value as type name
101 Type = \documentclass

More aliases. Note that toto matches RR (first in list) and report matches std (because `unknown´ is undefined).

105 BeginAlias
106   RR toto# ra2001
107   RA ra toto ra2001x%etc
108   torture torture1 torture2
109   unknown report
110   std report book article minimal
111 End

For fun.

112 ## an empty type
113 BeginType MP
114 EndType

This is used for testing Tralics.

115 BeginType torture
116   DocAttrib = creator \tralics
117   DocType  = ramain raweb.dtd
118   on package loaded calc CALC = "true"
119   on package loaded foo/bar FOO1 = "true"
120   on package loaded *foo/bar FOO2 = "true"
121   on package loaded foo/*bar FOO3 = "true"
122   on package loaded *foo/*bar FOO4 = "true"
123   url_font = "\large "
124   no_footnote_hack = "false"
125   on class loaded calc CALC="true"
126  
127   use_font_elt = "true"
128   xml_font_small = "font-small"

A bunch of declarations omitted here. The list of all options is given later, in test.tcf.

154   xml_underline_name = "font-underline"
155  
156 BeginCommands
157   % These commands are inserted verbatim in the file
158   \def\recurse{\recurse\recurse}
159 EndCommands
160 EndType

This may be used for typesetting a bibliography, exactly like the Raweb.

161 BeginType rabib     % Case RA
162   DocType = raweb raweb3.dtd
163   DocAttrib = year \specialyear
164   DocAttrib = creator \tralics
165  
166 BeginCommands
167   % These commands are inserted verbatim in the file
168   \newcommand\usebib[2]{\bibliography{#1#2,#1_foot#2+foot,#1_refer#2+refer}}
169 EndCommands
170 EndType

6.3.3. The ra.tcf file

This is the tcf file used for the Raweb2006. Read carefully the copyright notice.

1 # This is a configuration file for tralics, for the Raweb
2 # $xId: ra.tcf,v 2.3 2006/07/25 16:29:39 grimm Exp $
3 ## tralics ident rc=standard-ra $xRevision: 2.3 $
4  
5  
6 # This file is part of Tralics.
7 # Copyright Inria. Jos\'e Grimm. Apics. 2004/2005, 2006
8 # (See the file COPYING in the Tralics main directory)
9 # If you modify this file, by changing the semantics of the RA,
10 # please remove the `standard-ra' on the `tralics ident' line above,
11 # or replace it by `non-standard'.

A tcf file is a like the configuration file, but it applies to a single type of document; for this reason, there is no need to explain how the type is computed (no `Type´ declaration), neither to what type a block applies (there is no `BeginType´ block). These three lines are the same as before. Note that the 2007 DTD is raweb7.dtd.

12   DocType = raweb raweb3.dtd
13   DocAttrib = year \specialyear
14   DocAttrib = creator \tralics

These line are as before, without commented out lines.

15   makefo="xsltproc --catalogs -o %A.fo %B %C";
16   makehtml = "xsltproc --catalogs  %B %C";
17   call_lint = "xmllint --catalogs --valid  --noout %C"
18   makepdf = "pdflatex -interaction=nonstopmode %w"
19   generatedvi = "latex -interaction=nonstopmode %t"
20   generateps = "dvips %t.dvi -o %t.ps"

This values are the same as those shown above.

21   theme_vals = "com cog num sym bio"
22   ur_vals = "Rocquencourt//Sophia/Sophia Antipolis/Rennes//Lorraine//";
23   ur_vals = "+RhoneAlpes/Rh\^one-Alpes/Futurs//"

In 2006, section_vals has the same value as shown above; in 2007 it is replaced by the following lines.

24   fullsection_vals = "/composition/Team/presentation/Overall Objectives/\
25      fondements/Scientific Foundations/domaine/Application Domains/\
26      logiciels/Software/resultats/New Results/\
27      contrats/Contracts and Grants with Industry/\
28      international/Other Grants and Activities/diffusion/Dissemination"

New in 2006 are the two lists affiliation_vals and profession_vals. The syntax is the same as for other lists. The value given here is an example; the real names are in French.

29   affiliation_vals ="Inria//Cnrs//University//ForeignUniversity//"
30   affiliation_vals ="+Public//Other//"
31   profession_vals = "Scientist//Assistant//Technical//PHD//"
32   profession_vals = "+PostDoc//StudentIntern//Other//"

We have the same obsolete environments as before. Moreover, we declare that \keywords is the same as \motscle; this is needed because we removed the \keywords command (for the Raweb, this is an environment, using it as a command will fail in a very strange manner).

33 BeginCommands
34  \let\keywords\motscle
35  \newenvironment{body}{\obsoleteEnvBody The body environment is %
36      obsolete since 2003}
37    {End of obsolete environment body}
38  \newenvironment{abstract}{\obsoleteEnvAbstract The abstract %
39      environment is obsolete since 2003}
40   {End of obsolete environment abstract}
41 EndCommands

This is the command block for the ra2007. The last line does not appear in the file, but is automatically added in Raweb mode; the command uses the values saved by \theme, \UR and its aliases, \project and its alias, \isproject. Some are defined as doing nothing (like \maketitle, \loadbiblio, \declaretopic, \TeamHasHdr). The \module command is redefined: if the last argument is empty, a default value is used instead.

42 BeginCommands
43  \makeatletter
44  \def\declaretopic#1#2{} %% obsolete in 2007
45  \def\TeamHasHdr#1{} %% temporary
46  \def\theme#1{\def\ra@theme{#1}}
47  \def\UR#1{\def\ra@UR{#1}}
48  \def\isproject#1{\def\ra@isproject{#1}}
49  \let\ResearchCenterList\UR
50  \let\ResearchCentreList\UR
51  \def\projet#1#2#3{\def\ra@proj@a{#1}\def\ra@proj@b{#2}\def\ra@proj@c{#3}}
52  \let\project\projet
53  \def\moduleref#1#2#3{\ref{mod:#3}}
54  \let\oldmodule\module %% Compatibility
55  \renewcommand\module[4][]{\oldmodule{#2}{#3}{\@ifbempty{#4}{(Sans Titre)}{#4}}}
56 \let\htmladdnormallinkfoot\@@href@foot
57 \let\htmladdnormallink\@@href
58  \makeatother
59  \let\maketitle\relax
60  \let\loadbiblio\relax
61  \let\keywords\motscle
62 %%% \AtBeginDocument{\rawebstartdocument} %%% pseudo line
63 EndCommands

New in 2008 is the following list. The argument of the catperso environment must be one of XX, YY, ZZ, interpreted as xx, YY and zz. If the declaration is omitted, there is no restriction on the argument. Whether or not there will be such a restriction in the file ra2008.tcf is still undecided.

64 catperso_vals = "XX/xx/YY//ZZ/zz"

6.3.4. The RR.tcf file

We indicate here the content of the RR.tcf file, it defines commands for the title page of a Research Report.

1 ## tralics ident rc=RR.tcf $Revision: 1.29 $
2 ## configuration for the RR (Research Report of Inria)
3  
4  
5   DocType = rr raweb.dtd
6 BeginTitlePage
7   \makeRR <RRstart> "" "type = 'RR'"
8   alias \makeRT "" "type = 'RT'"
9  
10   <UR> -
11   \URSophia ?+<UR>
12   \URRocquencourt ?+<UR>
13   alias \URRocq
14   \Paris ?<UR> <Rocquencourt>
15   \URRhoneAlpes ?+<UR>
16   \URRennes ?+<UR>
17   \URLorraine ?+<UR>
18   \URFuturs ?+<UR>
19  
20   \RRtitle q<title> "pas de titre"
21   \RRetitle q<etitle>  "no title"
22   \RRprojet <projet> "pas de projet"
23   \motcle <motcle> "pas de motcle"
24   \keyword <keyword>  "no keywords"
25   \RRresume p<resume> "pas de resume"
26   \RRabstract p<abstract> "no abstract"
27   \RRauthor + <author> <auth> "Pas d'auteurs"
28   \RRdate <date> "\monthyearvalfr"
29   \RRNo <RRnumber> "????"
30  
31   \RRtheme <>  +"pas de theme" % CES
32   <Theme> -                    % E
33   \THNum ?+<Theme>             % CE
34   \THCog ?+<Theme>             % CE
35   \THCom ?+<Theme>             % CE
36   \THBio ?+<Theme>             % CE
37   \THSym ?+<Theme>             % CE
38  
39 %%  \myself \RRauthor "grimm"  % CCS
40 %%  \cmdp <cmdp> +"nodefault"  % CES
41 %%  \cmda <cmdA> A"\cmdAval"   % CES
42 %%  \cmdb <cmdB> B"\cmdBval"   % CES
43 %%  \cmdc <cmdC> C"\cmdCval"   % CES
44  
45 End
46  
47 BeginCommands
48   \let\RRstyisuseful\relax
49 End

6.3.5. The RR.plt file

We indicate here the content of the RR.plt file, it also defines commands for the title page of a Research Report. This is a TeX file, loaded whenever the package `RR´ is used. Note that, if the RR.tcf file is loaded, the line 48 above defined a command that is checked on line 4 below, so that the file is ignored. We shall explain later how these two files can be used.

1 % -*- latex -*-
2 \ProvidesPackage{RR}[2006/10/03 v1.1  Inria RR for Tralics]
3  
4 \ifx\RRstyisuseful\relax\endinput\fi
5  
6 \newcommand\RRtitle[1]{{\let\\\ \xbox{ftitle}{#1}}}
7 \newcommand\RRetitle[1]{{\let\\\ \xbox{title}{#1}}}
8 \newcommand\RRauthor[1]{\xbox{author}{#1}}
9 \newcommand\RRprojet[1]{\xbox{inria-team}{#1}}
10 \newcommand\RRdate[1]{\xbox{date}{#1}}
11 \newcommand\RRNo[1]{\xbox{rrnumber}{#1}}
12 \newcommand\RRtheme[1]{\xbox{theme}{#1}}
13 \newcommand\keyword[1]{\xbox{keyword}{#1}}
14 \newcommand\motcle[1]{\xbox{motcle}{#1}}
15 \newcommand\THNum{THnum}
16 \newcommand\THCom{THcom}
17 \newcommand\THCog{THcog}
18 \newcommand\THSym{THsym}
19 \newcommand\THBio{THbio}
20 \newcommand\URSophia{\xbox{location}{Sophia Antipolis}}
21 \newcommand\URLorraine{\xbox{location}{Lorraine}}
22 \newcommand\URRennes{\xbox{location}{Rennes}}
23 \newcommand\URRhoneAlpes{\xbox{location}{Rhône-Alpes}}
24 \newcommand\URRocq{\xbox{location}{Rocquencourt}}
25 \newcommand\URFuturs{\xbox{location}{Futurs}}
26 \newcommand\RRresume[1]{\begin{xmlelement*}{resume}#1\end{xmlelement*}}
27 \newcommand\RRabstract[1]{\begin{xmlelement*}{abstract}#1\end{xmlelement*}}
28  
29 \let\makeRT\relax
30 \let\makeRR\relax

6.3.6. Sample files

The Tralics distribution comes with a bunch of test files. There are two directories: the Test directory contains source files, and the Modele directory contains the translation. In particular, the file tpa2.tex explains how to use a tcf file to change the names of most XML elements.

6.4. The action before translation

As explained at the start of the Chapter, Tralics first reads all options. Some of them are marked `Raweb only´; this means that they are not used, unless the Type is ra (i.e. you are translating the Raweb, see next section); this section describes how the Type is computed.

Unless Tralics is called with option interactive-math, an input file name is required. The program is aborted if more than one input name is given. It must be the name of a TeX file: an extension tex is added if needed, so that foo and foo.tex are the same. As an exception foo.xml is also equivalent to foo.tex. We consider two examples, the xii.tex shown above, and the following LaTeX file hello1.tex:

\documentclass{article}
\begin{document}
Hello, world!
\end{document}

6.4.1. Files and Paths

The standard way to use Tralics is to type `tralics filename´ in a terminal, example:

grimm@macarthur$ tralics hello1
This is tralics 2.12, a LaTeX to XML translator, running on macarthur
...
Output written on hello1.xml (179 bytes).
No error found.
(For more information, see transcript file hello1.log)
grimm@macarthur$ ls hello1*
hello1.log      hello1.tex      hello1.xml

The ls command shows the source, the result of the translation and the transcript file. If the file hello1.ult were present, it would be been read by Tralics, and if the source were a bit more complicated the files hello1.img and hello1_.bbl might have been created. All these files are in the same directory, and this paragraph explains what you can do if input or output files are elsewhere.

Consider now a graphical interface to Tralics, where you drag and drop the TeX source; in such a case there is no shell anymore, hence no current directory; what Tralics gets is an absolute path name (that may be of the form /users/somebody/somewhere). In early versions, such an absolute path was a fatal error. Currently, only Unix-like pathnames are implemented.

Consider now a system, like the Raweb, where the XML file produced by Tralics is converted to another XML file (with a different DTD), and further processed. Thus a great number of files are created, and managing all these becomes uneasy. As the example below shows, you can ask Tralics to put the files it creates in another directory, you can chose the name of the XML output (so that Tralics can create foo-t.xml from foo.tex, and this file can be processed again into foo.xml), and you can also chose the name of the transcript file.

grimm@macarthur$ tralics hello1 -o h2 -logfile=h3 -output_dir=../Test
This is tralics 2.12, a LaTeX to XML translator, running on macarthur
...
Output written on ../Test/h2.xml (179 bytes).
No error found.
(For more information, see transcript file ../Test/h3.log)

The input path is a colon separated list of directories. For instance `../foo/A:/bar/B/::gee:´ contains five elements, two of them being empty. The empty slot represents the current directory, it will be added at the end if omitted. The current directory may also be given as a single dot. A final slash is silently removed. In this example, the path means: search in subdirectory A of the sibling directory foo, then if the subdirectory B of the directory bar that is is at the root, then in the current directory, then in the subdirectory gee of the current directory, then in the current directory again; this rule does not apply if a file starts with a dot or a slash.

A special case is when the main input file name starts with a dot or a slash, for instance /usr/grimm/home/hello or ./Test/hello.tex. In this case, the name is split into pieces. One piece is the entry name, say hello, another one is the directory name (everything before the final slash), and the last part is the extension (here .tex). If no output directory is given on the command line, the directory of the input file is used. In the same fashion

You can also specify the name of the transcript file; By default, this is the entry name. If for instance you use /foo/bar, then input file will be /foo/bar.tex and the transcript file will be /foo/bar.log; you may change the name of the transcript file, so as to get /foo/myfile.log; you may change the directory of the transcript file, so as to get /mydir/bar.log; you may change both.

Consider again the case where the input is /foo/bar. If no input path is given, then Tralics behaves as if the file was bar, and the input path was `foo:´. This has as consequence that, if bar inputs another file, say bar1, it is first searched in the same directory as bar, and then in the current directory. Moreover, if no output directory is specified, files written by bar are put in this directory, thus can be read again. If the user gives an input path, it will be left unchanged, and the input path is not considered for the main path. Example: Directories foo and foo1 contain files bar and bar1; bar inputs bar1, input path contains foo1. If the main file is /foo/bar, it will input /foo1/bar1. If the input path contains both foo and foo1, and the main file is bar, you will get either /foo/bar and /foo/bar1 or /foo1/bar and /foo1/bar1, depending on the order.

6.4.2. Finding the configuration

There are some options that tell Tralics not to produce an XML file, we shall not explain them. Thus, after parsing all arguments, Tralics reads the complete source (main input file). It opens the transcript file, and print on the terminal a line like Starting translation of file hello1.tex. The transcript file will contain a bit more information, namely

Transcript file of tralics 2.12 for file hello1.tex
Copyright INRIA/MIAOU/APICS 2002-2008, Jos\'e Grimm
Tralics is licensed under the CeCILL Free Software Licensing Agreement
Start compilation: 2008/04/19 18:27:18
OS: Apple, machine macarthur
Starting translation of file hello1.tex.
Using iso-8859-1 encoding (idem transcript).
Left quote is ` right quote is '
Input path (../FO:../Test:)
++ Input encoding is 1 for ../Test/hello1.tex

After that, Tralics reads the configuration file. You can use the -noconfig option, this inhibits reading a configuration file. In this case the transcript file contains

No configuration file.
No type in configuration file
Seen \documentclass article
Potential type is article
Using some default type
dtd is std from classes.dtd (standard mode)
OK with the configuration file, dealing with the TeX file...

The first line says that no configuration file is considered, so that an empty one will be used instead. The TeX source is scanned for a document class. If this is a standard one (book, article, report, minimal, the DTD is std from classes.dtd, otherwise unknown from unknown. Consider now the same file, without the -noconfig option. We get

++ file .tralics_rc does not exist.
++ file ../confdir/.tralics_rc exists.
Configuration file identification: standard $ Revision: 2.24 $
Read configuration file ../confdir/.tralics_rc.
Configuration file has type \documentclass
Seen \documentclass article
Potential type is article
Defined type: std
++ file article.tcf does not exist.
++ file ../confdir/article.tcf does not exist.
Alias torture does not match article
Potential type article aliased to std
Using type std
dtd is std from classes.dtd (standard mode)

There are some lines starting with a double plus sign. Whenever Tralics searches if a file exists, it will print such in line in the transcript file. The first two lines that do not start with a double plus are also printed on the terminal (this is an easy way to check that that right configuration file has been seen). The standard configuration file says that they Type is the document class (here article). This is a true type, provided that it is defined, and the configuration file does not define it. It could be defined in article.tcf. But you can see that there is no such file. As a consequence, the behavior is the same as if no configuration file has been given.

This is what happens if the option config=rabib is given

Trying config file from user specs: rabib.tcf
++ file ../confdir/rabib.tcf exists.
Configuration file identification: rabib.tcf $ Revision: 2.2 $
Read configuration file ../confdir/rabib.tcf.
Using tcf type rabib
dtd is raweb from raweb3.dtd (standard mode)

You can notice that a tcf file is being searched in the confdir directory. If the name starts with a slash or a dot, no extension is added, and the file is not searched in the configuration path. Assume that the source file contains a line of the form

% Tralics configuration file 'test0.tcf'

and you neither specify a configuration file, nor inhibit loading one. Then you will get

Trying config file from source file `test0'
++ file test0.tcf does not exist.
++ file ../confdir/test0.tcf exists.
Read configuration file ../confdir/test0.tcf.
Using tcf type test
catperso_vals: AA -> BB
catperso_vals: CC -> CC
catperso_vals: XX -> xx
dtd is unknown from unknown.dtd (standard mode)

As you can see, tcf extension is added, and the file is searched in the current directory first, then in the configuration path.

You can call Tralics with option type=rabib. This just says that the name of the tcf file should be rabib, instead of the document class; it is thus useless if the name of the tcf file to use has been given as shown above. It can be useful in the case of a plain TeX file, that has no document class. In the example that follows, we say that the type is ra12.

++ file .tralics_rc does not exist.
++ file ../confdir/.tralics_rc exists.
Configuration file identification: standard $ Revision: 2.24 $
Read configuration file ../confdir/.tralics_rc.
Configuration file has type \documentclass
Seen \documentclass article
Potential type is ra12
++ file ra12.tcf does not exist.
++ file ../confdir/ra12.tcf does not exist.
++ file ra.tcf does not exist.
++ file ../confdir/ra.tcf exists.
Configuration file identification: standard-ra $ Revision: 2.3 $
Read tcf file ../confdir/ra.tcf
Using type ra
...
dtd is raweb from raweb3.dtd (mode RAWEB2007)

Note that no file ra12.tcf was found, and ra.tcf was used searched for. As a consequence, the effective type is ra, and Raweb mode is entered; this is an error, since current file is not a ra file. In fact, you will be faced to Fatal error: Input file name must be team name followed by 2007. Note that you can compile a file named foo2006 in Raweb mode, as long as this matches the year option (if used) and the document class is ra2006.

6.4.3. Old behaviour

The algorithm is the following.

  1. If you say tralics -noconfig, then no configuration file is read at all.

  2. If you say tralics -configfile=foo, then Tralics will print Trying config file from user specs, and try to use this file.

  3. If you say tralics -configfile=foo.tcf, then Tralics will print the same as above; it will also search the file in the `confdir´ directory.

  4. If the source file contains `% tralics configuration file 'foobar'´, then Tralics will print Trying config file from source file, and try to use this file. In case of failure, and if the name `foobar´ contains no dot, the suffix .tcf is added, and the next rule is applied.

  5. If the source file contains `% tralics configuration file 'foobar.tcf'´, then Tralics will print the same as above; it will also search the file in the `confdir´ directory.

  6. The default configuration file is named .tralics_rc (or tralics_rc on Windows). The current directory is looked at first, then the tralicsdir, finally the home directory.

  7. If you say tralics -dir TOTO, then TOTO/src/.tralics_rc is the second try.

  8. The home directory, or its src subdirectory, is searched next. (Depending on the operating system, this can fail, because there is no standard way of defining the home directory of the user).

  9. If you set the shell variable TRALICSDIR to somedir, or RAWEBDIR to somedir, then somedir/src/.tralics_rc is the last try. If neither variable is set, then some default location will be used.

In current version, rules 4, 7, 8 and 9 have been removed.

6.4.4. Preparing the translation

Let´s consider again file hello1, compiled with option type=rabib. The transcript file contains the following lines.

OK with the configuration file, dealing with the TeX file...
There are 4 lines
Starting translation
\notrivialmath=1
{\countdef \count@=\count255}

After that, there is a bunch of lines of the form `countdef x=y´, and in verbose mode, the bootstrap code, as explained later. The meaning of the last line shown here is: all bootstrap lines have been correctly read.

{changing \countref395332=0 into \countref395332=1}
[1] %% Begin bootstrap commands for latex
[2] \@flushglue = 0pt plus 1fil
...
[47] %% End bootstrap commands for latex
++ Input stack empty at end of file

Our configuration file contains a block of TeX code. The transcript file shows them

[19]   % These commands are inserted verbatim in the file
[20] \newcommand\usebib[2]{\bibliography{#1#2,#1_foot#2+foot,#1_refer#2+refer}}

Our configuration file contains also

DocAttrib = variable "va'&quot;lue"
DocAttrib =Foo \World
DocAttrib =A \specialyear
DocAttrib =B  \tralics
DocAttrib =C  \today

The effect is to add an attribute to the main element. The normal syntax is: DocAttrib = foo “bar”. The attribute name must contain only ASCII letters, the value can consist of any character. An apostrophe is replaced by &apos;, double quotes must be given as &quot;, as well as some other special characters. Using a command name instead of a string means that the value of the command should be used. The value \tralics is replaced by a string of the form `Tralics version 2.9´, and \specialyear is replaced by the year as used by the Raweb (the current year, in general). The command \Word is undefined, and an error is signaled.

Before translating the document, the ult file is checked first. Here, the star says that the @ character should be of category letter while loading the file, and the plus sign says that the file should be searched in the same directory as the main file, and not elsewhere. We finish by showing the class file is found.

[1] \InputIfFileExists*+{hello1.ult}{}{}
++ file hello1.ult does not exist.
[1] \documentclass{article}
[2] \begin{document}
++ file article.clt does not exist.
++ file ../confdir/article.clt exists.

6.5. Translating the Raweb

Raweb mode is entered if a configuration file is found that says that the type to use is `RA´ or `ra´. The document class should be ra97, ra98, or, for later years, ra1999. The example has ra2003. This must match the name of the input file, which is miaou2003. The document can be translated in one of three versions: first, you may try latex, this gives miaou2003.dvi; then we have a mode in which miaou2003.tex is converted into miaou.tex, and latex can produce miaou.dvi. Finally, Tralics may produce miaou.xml, and this can be compiled into wmiaou.dvi.

Historically, we had a Perl script for the conversion, this was extended to a translator, then re-written in C++. You could edit the script and change it (for instance, if a non-standard name for the LaTeX executable is needed). Since Tralics is nowadays a binary file, you cannot edit it. For this reason the configuration file contains some lines (see old configuration file, lines 29 to 39) that can be modified. These are copied into user_param.pl and, after Tralics has produced a XML file, it calls an external program (defined by the `externalprog´ switch, default being rahandler.pl). If the current year (2003 in the example below) is 2007 or more, simplified ra mode is entered, not postprocessor is called, and no file user_param.pl is created. Here is an example:

1 $::makefo='xsltproc --catalogs -o %A.fo %B %C';
2 $::makehtml='xsltproc --catalogs  %B %C';
3 $::checkxml='xmllint --catalogs --valid  --noout %C';
4 $::makepdf='pdflatex -interaction=nonstopmode %w';
5 $::makedvi='';
6 $::dvitops='';
7 $::generate_dvi='latex -interaction=nonstopmode %t';
8 $::generate_ps='dvips %t.dvi -o %t.ps';
9 $::raweb_dir='/user/grimm/home/cvs/raweb';
10 $::raweb_dir_src='/user/grimm/home/cvs/raweb/src/';
11 $::ra_year='2003';
12 $::no_year='miaou';
13 $::tex_file='miaou';
14 $::todo_fo=0;
15 $::todo_html=0;
16 $::todo_tex=0;
17 $::todo_lint=0;
18 $::todo_ps=0;
19 $::todo_xml=1;
20 1;

Here is an example of a source file, valid in 2003.

1 \documentclass{ra2003}
2 \theme{Num}
3 \isproject{YES} % \isproject{OUI} works also
4 \projet{MIAOU}{Miaou}{Mathématiques et Informatique de
5    l'Automatique et de l'Optimisation pour l'Utilisateur}
6 \def\foo{bar}
7 \UR{\URSophia\URFuturs}
8 \declaretopic{abc}{Topic abc}
9 \declaretopic{def}{Topic def}
10 \begin{document}
11 \maketitle
12 ...
13 \begin{module}{composition}{en-tete}{}
14 \begin{catperso}{Head of project team}
15 \pers{Laurent}{Baratchart}[DR INRIA]
16 \end{catperso}
17 \end{module}
18 \begin{module}{diffusion}{dif-conf}{Conferences and workshops}
19 \begin{glossaire}\glo{A}{B\par C}\glo{A1}{B1\par C1}\end{glossaire}
20 \begin{participants}
21 \pers{Laurent}{Baratchart},
22 \pers{José}{Grimm}
23 \end{participants}
24 \begin{motscle}
25 meromorphic approximation, frequency-domain identification,
26 extremal problems
27 \end{motscle}
28 \end{module}
29 \loadbiblio
30 \end{document}

This is what Tralics prints, for the full miaou2003 document, in verbose mode

This is tralics 2.5 (pl7), a LaTeX to XML translator
Copyright INRIA/MIAOU/APICS 2002-2005, Jos\'e Grimm
Licensed under the CeCILL Free Software Licensing Agreement
Starting xml processing for miaou2003.
Configuration file identification: standard $ Revision: 2.14 $
Read configuration file /user/grimm/home/cvs/tralics/.tralics_rc.

The lines that follow show the assignments from the configuration file. Note that the year in the mode reflects the compilation data, not the year in the source file.

makefo=xsltproc --catalogs -o %A.fo %B %C
makehtml=xsltproc --catalogs  %B %C
makepdf=pdflatex -interaction=nonstopmode %w
generatedvi=latex -interaction=nonstopmode %t
generateps=dvips %t.dvi -o %t.ps
theme_vals=com cog num sym bio
dtd is raweb from raweb3.dtd (mode RAWEB2005)

Following lines are specific to the Raweb. You can see a summary of all the tests done by the program that converts miaou2003.tex to miaou.tex. The statistics (number of environments, keywords, etc) are computed by a preprocessor, that has been removed in 2007.

Ok with the config file, dealing with the TeX file...
Activity report for MIAOU (Miaou)
Mathématiques et Informatique de l'Automatique et de l'Optimisation pour l'Utilisateur
There are 138 environments
Checked 15 keyword env with 60 keywords (52 unique)
Checked 8 catperso and 31 participant(es) envs with 146 \pers
There were 2 topics
Sections (and # of modules): 1(1) 2(1) 3(2) 4(6) 5(5) 6(13) 7(4) 8(5) 9(3).

Whenever a section or a chapter is translated, a line is printed on the terminal. There is a complaint at the end, about a lonely module without title. A title is invented, namely `(Sans Titre)´. A non-trivial task for the post-processor is to remove it (it should not appear on the HTML pages). In 2007, this has become an error.

Translating section composition
Translating section presentation
Translating section fondements
Translating section domaine
Translating section logiciels
Translating section resultats
Translating section contrats
Translating section international
Translating section diffusion
Bib stats: seen 57 entries
Seen 64 bibliographic entries
(SansTitre) Only one module seen in the section
Problem with sans titre 1
There was 1 NoTitle not handled

Tralics prints now statistics.

Used 1756 commands
Math stats : formulas 503, non trivial kernels 299, cells 10227,
   special 1 trivial 149, \mbox 5 large 0 small 118.
List stats: short 0 inc 10 alloc 43456
Buffer realloc 41 string 15750 size 610086; merge 7
Macros created 80 deleted 0
Save stack +1582 -1582
Attribute list search 7539(1494) found 3154 in 5616 elements (1401 after boot)
Number of ref 92, of used labels 36, of defined labels 73, of ext. ref. 19
Modules with 24, without 16, sections with 9, without 15
There were 6 images.
Output written on miaou.xml (250758 bytes).
No error found.
(For more information, see transcript file miaou2003.log)

Here you can see the call to the post-processor.

rahandler.pl v2.12, (C) 2004 INRIA, José Grimm, projet APICS
Postprocessor did nothing

Since 2006, the syntax of the \pers command in a `catperso´ environment has changed. Example:

\begin{catperso}{Category test}
\pers{Jean}{Dupond}{Scientist}{Inria}
\pers{Jean}{Dupont}{Assistant}{Cnrs}[][yes]
\pers{Jean}{Durand}{Technical}{University}[][]
\pers{Jean}{Durant}{PHD}{ForeignUniversity}[with a T]
\pers{Jean}{Dumas}{PostDoc}{Public}[with a S]
\pers{Jean}{Dumat}{StudentIntern}{Other}[bla bla ][no]
\pers{Jean}{Dumont}{ Other }{Other}[bla bla ][no]
\end{catperso}

Here are the commands specific to the Raweb:

More information is available on the Web page.

The following commands can be used in any document, but they are specific to the Raweb.

6.6. Tracing commands

In some cases, TeX or Tralics produce wrong results, incomprehensible error messages, and so on. In these cases, you must use specialized commands to see what happens. Since the internal structure of TeX is not the same as Tralics, the results in the transcript file may be different.

We have explained the command \show: it prints the meaning of a command (useful for a user defined command) and \showthe (this shows the value of a variable, counter, a token list, etc). We have also mentioned that \showbox prints the content of a TeX box or XML element. There is a command \showlists; its effect is to indicate the global context; this is not implemented in Tralics. The typical example is from the TeXbook. Given the test file:

\tracingcommands=1
\hbox{
$
\vbox{
\noindent$$
x\showlists
$$}$}\bye

This is the result of the \showlists command.

### display math mode entered at line 5
\mathord
.\fam1 x
### internal vertical mode entered at line 4
prevdepth ignored
### math mode entered at line 3
### restricted horizontal mode entered at line 2
\glue 3.33333 plus 1.66666 minus 1.11111
spacefactor 1000
### vertical mode entered at line 0
prevdepth ignored

This example does not compile in Tralics: you cannot put a \vbox in a math formula. You cannot put a display math formula in a formula.

TeX provides 9 commands of the form \tracingXXXX described earlier. Each variable defines an integer (in general, positive means verbose). There is a command \tracingall that turns everything on. In Tralics, it sets \tracingmacros, \tracingoutput, \tracingcommands and \tracingrestores to 1. Only these variables are useful in Tralics (the command \tracingmath is new in version 2.11, it controls the math printing). For instance, \tracingonline controls whether or not anything is printed on the terminal; for Tralics, debugging information is only printed on the transcript file. Variables like \tracingparagraphs and \tracingpages show line-break and page-break calculations, performed by TeX but not by Tralics. The command \tracingoutput shows boxes when they are shipped out (in Tralics, the whole XML tree is printed at the end; if the command is positive, lines are printed, whenever used by the scanner), \tracinglostchars indicates all characters not found in the fonts (Tralics never looks at font properties).

The command \tracingstats indicates that TeX should gather all statistical information available; in Tralics, statistics are always computed; if you call it with the `silent´ switch, statistics are not printed on the terminal. Note that the `verbose´ switch calls \tracingall.

There are three remaining commands: \tracingmacros is used whenever a user command is expanded, \tracingrestores whenever things are popped from the save stack, and finally, \tracingcommands for all other commands. Let´s start with the example given on page 2.1. This is what you see if \tracingoutput is positive:

[4] \def\foo#1{\xbar#1}
[5] \def\xbar#1{{\itshape #1}}
[6] \foo{12}

It shows the input. This is what you see if \tracingmacros is positive:

\foo #1->\xbar #1
#1<-12
\xbar #1->{\itshape #1}
#1<-1

This is what you see if \tracingrestores is positive:

+stack: level + 2 for brace
{Push p 1}
{font restore }
+stack: l