Tralics, a LaTeX to XML translator; Part I

2. Expansion

One part of the work of TeX is to replace all user defined tokens by primitives; this is the main objective of the `expansion´ process. In this respect, there is little difference between TeX and Tralics. In this chapter, we review some constructions.

2.1. Defining new commands

A definition is typically of the form


You may wonder why the commands are not called `\foo1´, `\foo2´ and `\foo3´. The reason is that, if digits have standard category codes, they are not of type letter, so that `\2foo´ is the command \2, followed by the letters `foo´ (the tokens are 2 f11 o11 o11) and `\foo2´ is the command \foo followed by the digit 2 (the tokens are foo 212). It is possible to create the token foo2 via \csname foo2\endcsname, and it is also possible to change the category code of 2. This is in general a bad idea: If you say \setlength{\parindent}{\foo2+2cm}, it is impossible to design the \setlength command so that `\foo2´ is read as a command and `2cm´ as a dimension. On the other hand, if you say \def\foo2#1#2{#2#1}, TeX expects, after the second #, the character 2 with category code 12; if not it complains with: Parameters must be numbered consecutively. In Tralics, the message is a bit different, it says Error while scanning definition of \foo2 expecting #2; got #{Character 2 of catcode 11}.) Note how 211 is printed.

Before \def, you can put a prefix: it can be \long, indicating that the command accepts whole paragraphs as arguments; it can be \outer, indicating that the command cannot be the argument of another command; it can be \protected, indicating that the command should not be expanded in an \edef (this is an ϵ-TeX extension); it can be \global. This last prefix can be put before any assignment, it says that the assignment is global (unless \globaldefs is non-zero). More than one prefix can be used, the order is irrelevant. After the \def comes the object to define (this is either an active character, or a command name), then what TeX calls <parameter text>, and this is followed by the body. The body starts with the first opening brace (any character of category code 1) and ends with the first closing brace (any character with category code 2) that makes the body balanced against braces. These braces are not part of the body. The parameter text is an arbitrary sequence of tokens, but cannot contain braces. If it contains a # (in fact, any character of category code 6), it has to be the final character of the sequence, or be followed by the digits 1, 2, 3, up to 9, in order. If there is some text between #3 and #4 (or between #3 and the start of the body), this imposes a constraint on the third argument. If there is some text before #1, this imposes a constraint on the command itself. In the body you can use ##, this will be replaced by a #; you can also use #1, #2, etc., this will be replaced by the value of the first, second, etc., argument. As above, the # is any character of category 6, the digits are of category 12, you cannot access the second argument if only one is available. If you define \foo2 as above, TeX will signal a second error: Illegal parameter number in definition of \foo2.

Once you have defined the commands, you can use them. We give here an example, and the translation by Tralics

\fooi\fooii12\fooiii+ok. {\itshape 3} =xyz{}!
foo21Seenok <hi rend='it'>3</hi> xyz.!

and also by LaTeX `foo21Seenok 3 xyz.!´ Some explanations. The first command takes no argument, thus is easy to understand. The second command takes two arguments, its body is `#2#1´ so that the expansion is the token list formed by the tokens of the second argument followed by the tokens of the first argument. In the case of `\foo12´, the arguments are `1´ and `2´ (a list of length one). In the case of `\fooii {AB} {CD}´, the arguments are `AB´ and `CD´, a list of length two. This is because TeX ignores initial spaces when reading undelimited arguments; in any case, an argument is well-balanced against braces (same definition as above for the body of a command). The shortest possible sequence of tokens is read (in the case of an undelimited argument, this sequence is never empty). If the result starts with an open brace and ends with a closing braces, these are removed, provided that the remaining token list is well-balanced; for instance, in the case `\fooii{}a´, the first argument is empty. If the command is not \long, then \par tokens are forbidden in the argument. In any case tokens that are defined to be \outer are forbidden in a parameter.

In the case of \fooiii, the situation is a bit more complicated. Fetching the argument is more involved than in the general case. The specification is: plus sign, argument, dot, argument, equals sign, argument, sharp sign. Note first that the `+´ sign is not part of the command name, but is required after it whenever used. The first argument here is the shortest sequence (possibly empty) of tokens, that is a balanced list, and this is followed by the required token list (here, a single dot). Here it is `␣{\it␣3}␣´; a pair of initial and final braces disappear, if possible. The `#{´ after `#3´ says that the third argument is delimited by an open brace. This brace is left unread. Such a construction is rare: it occurs only four times in the LaTeX sources, two example will be given later in section 2.10.

Consider the following example: `\def\opt[#1]{}´. If you say `\opt[foo]´ or `\opt[{foo}]´, the argument is `foo´. If you say `\opt[{[foo]}]´, it is `[foo]´. It is important to know that braces are required if you want a closing bracket in the argument. In the case of `\item[{\it foo}]´, the braces are useless; the scope of the \it command is limited to `foo´ because an additional pair of braces is added somewhere in the body of the \item command. The following example is non-trivial:

\if b\expandafter\@car\f@series\@nil\boldmath\fi

Both commands \@car and \@cdr read a normal (undelimited) argument, and a second argument delimited by \@nil, and return one of these. These commands are implemented in Tralics in the C++ kernel for efficiency. The third line shows a use of \@car, where the arguments are the expansion of \f@series; the main assumption is that this token list does not contain the \@nil token, which is a reserved command. The caller of the macro must also ensure that the list is not empty, for otherwise the first argument would be be \@nil, and the end of the second argument would never be seen if the \@nil does not appear in the document text. Note that an error is signaled and scanning stops at the first \par token (or empty line) because the command is not outer.

Let´s assume that \f@series expands to a non-empty list, for instance `mc´ (this means that the current font has medium weight and is condensed). Then `\@car md\@nil´ expands to `m´. The third line of our example uses \@car to get the first character of \f@series, and compares it to `b´ (the result is true if the current font is bold, extra bold, bold condensed, etc). This code is used for typesetting the LaTeX2ϵ logo in bold version as LaTeX2ϵ. The commands \if and \expandafter will be explained later. Note that \if fully expands what follows the letter b. This means that you are in trouble if \f@series expands to an empty list, or if the first token is a command whose expansion may cause problem (perhaps because it has delimited arguments and \@car gobbled the delimiter), or is empty, or is a list that starts with the letter b.

The following example is from the TeXbook:

\def\cs AB#1#2C$#3\$ {#3ab#1 c##\x #2}
\cs AB{\Look}C${And\$ }{look}\$ 5

If you feed this to Tralics(note: ), you will get three errors (one because of the `##´, and two undefined commands). In verbose mode, the transcript file of Tralics will contain the following

\cs AB#1#2C$#3\$ ->#3ab#1 c##\x #2
#3<-{And\$ }{look}

One question is: should arguments be in braces or not? As seen elsewhere, some commands have a special syntax, and cannot be followed by braces (for instance, in the case of `\catcode`\$´ the argument is the backtick followed by the dollar). In a case like $a \over b+c$, there are two arguments, one before and one after the command. An expression like $a\over b\over c$ is a error. The error message says to add some braces, but they are used only for grouping. A similar error message is issued if you say $a^b^c$. But compare `$a^{b^c}d$´ and `$a\over {b\over c}d$´: the translation is a b c d and a b cd. In the case of \sqrt \frac12, braces are inserted by TeX when converting \frac into \over; since Tralics replaces \over by \frac, no such braces are added and an error is signaled because of missing braces.

It is sometimes important to know which braces disappear or remain. As an example, you can say `\def\ap{a´}´ in order to get a ' ; but if you say this `$x_\ap\not=x_{\ap}$´, you get x a ' x a ' . In fact, you cannot say that `\ap´ is the argument of the underscore command; this is because TeX expands everything; in one case, it sees that the underscore is followed by the letter a, in the second case by a brace, hence a delimiter for a math list.

In general, you will be faced with the following problem: you say `\def\foo#1{\xbar#1}´ and `\def\xbar#1{{\itshape #1}}´. Note the double braces: the outer braces delimit the argument (of \def, i.e., the body of \xbar), the inner braces delimit the scope of \itshape. When you say `\foo{12}´ only the first letter is in italics, another level of braces is needed. This is what you can see in the transcript file of Tralics:

\foo #1->\xbar #1
\xbar #1->{\itshape #1}
{begin-group character {}
+stack: level + 3 for brace
{font change \itshape}

In this example, braces are missing in \foo, a remedy is to add a pair of braces in the argument, like `\foo{{12}}´. A comment in the TeX source says: Braces are effectively removed when they surround a single Ord without sub/superscripts, or when they surround an accent that is the nucleus of an Ord atom. This is the case in `{{\tilde x}^2}^3´, hence you get a Double superscript error; in this case adding additional braces has no effect; the only solution consists in adding something in the inner list (for instance a kern of width zero).

It is possible to define commands inside commands. For instance, you can say


When the scanner reads a token list, it handles `#´ signs (in fact, any character of category 6) in a special manner inside a definition. The token list of the previous line is def foov #23 {1 def xbar #6 112 {1 125 #6 112 #6 125 #6 #6 112 }2 }2. As you can see, there are three possibilities for a sharp sign: before the brace that defines the body, it is #23, and the digit that follows is omitted(note: ), it is 125, 225, in the body when followed by 1, 2, etc(note: ). It is #6 when followed by a sharp sign. Said otherwise, a double sharp sign in a definition is equivalent to a normal one outside. Note the following trick.

\def\foo#1^2{#1^1## #^ ^# ^^}

A quantity like 125 is shown as ^1, because the hat character appears as ^2 (i.e., the token ^25) in the <parameter text> part of the definition. Hence TeX prints \foo=macro: #1^2->^1^1## ^^ ## ^^. On the other hand, Tralics uses a different mechanism for macros: it remembers the number of arguments and the items between them, hence does not make the difference with a macro defined as `\def\xbar^1#2{...}´(note: ). \foo=macro: #1#2->#1#1## ^^ ## ^^.

Assume now that you say `\foov{17}´. The result of the expansion is the token list shown above, with 125 replaced by 112 712. When \xbar is defined the #6 will read the character that follows, in this case 112. The situation is as if you had said `\def\xbar#1{17#1#17##1}´. Evaluating \xbar may signal an error, because of the `##´ (no error is signaled in case the argument of \xbar is `\gee´, a command that ignores its first and third argument). If you call \foo with `25´ instead of `17´ as argument, you will get the following error Illegal parameter number in definition of \xbar(note: ).

2.2. Defining commands in LaTeX

You can say


The first two lines define the same commands as in the start of section 2.1. It is not possible to define \fooiii. However, you can define \fooiv, a command that takes an optional argument. In fact, you call it like this `\fooiv[X]YZ´; the expansion will be `SeenXYZ´. You can put a pair of braces around the arguments, like `\fooiv[{X}]{Y}{Z}´, the result is the same. Braces are needed for the first argument in case you want a closing bracket in it. If the first argument is `bar´, you can omit the `[bar]´: for this reason, the argument is called optional. In LaTeX, \fooiv expands to \@protected@testopt, which is a command to make \fooiv robust (i.e., in some cases, the test for an optional argument is delayed); it then expands to \\fooiv, which is a command that takes three arguments. In Tralics, no auxiliary command is used. If you say `\show\fooiv´, Tralics will print the following on the transcript file.

\fooiv=opt \long macro: bar#2#3->Seen#1#2#3.

Commands defined by \newcommand are \long unless a star is used (they accept paragraphs as arguments.) The `opt´ before it shows that the command takes an optional argument. We show the value of this argument instead of #1 before the ->. The following is printed by LaTeX

> \fooiv=macro:
->\@protected@testopt \fooiv \\fooiv {bar}.

Since being \long deals with reading parameters, in LaTeX, it is the auxiliary command \\fooiv which is \long. This shows how to ask LaTeX for the meaning of the auxiliary command and its answer:

> \\fooiv=\long macro:

The philosophy of LaTeX is that a user should not randomly redefine commands. For this reason, you must use \newcommand (for an undefined command) or \renewcommand (for overwriting an existing command). In the same fashion, \renewenvironment is used to redefine an environment; we shall see later that an environment `foo´ is defined by two commands: \foo and \endfoo. You should never define \endfoo. This explains error messages of the form: LaTeX Error: Command \endfoobar already defined. Or name \end... illegal, see p.192 of the manual. In Tralics, we do not check that the command starts with `end´; the error message is \newcommand: cannot define \foo; token is already defined. You can use \providecommand, the syntax is the same. In this case, the definition is silently ignored if the command already exists. You can use \DeclareRobustCommand, this is defined by Tralics to be the same as \providecommand although the LaTeX behavior is different. You can say `\global\def\foo{}´, this is the same as `\gdef\foo{}, it defines \foo globally. You cannot use the \global prefix for LaTeX commands. You can use \CheckCommand. This is like \newcommand, but it does not define the command; instead it defines a dummy command, then checks that the dummy command has the same definition as the real one and produces a warning in case of mismatch; this feature can be used before overwriting a command.

It is now time to explain that braces have two different purposes: as a delimiter for an argument list, and also for grouping: in the same fashion as the formula z(x+y) can be considered as z applied to x+y or the product of z and x+y. In the case of `\textit{12}´, the braces delimit the arguments, in the case of `{\itshsape 12}´, the braces are used for grouping. In both cases, all characters up to the closing brace are in italics, but this property depends on the semantics of the operator, not the syntax. There is a big difference between these two use of braces: the tokenizer produces token lists that are always balanced (there are as many opening delimiters as closing delimiters, where delimiters are characters of category code 1 and 2). On the other hand, if you say `\let\bgroup={´, the \bgroup has the same meaning as an opening brace, hence triggers the start of a new group; but it is not an explicit character (such things are called “implicit characters” in the TeXbook). When you say `\hbox...´ the opening brace can be implicit or explicit (in this case, braces are used both as delimiters and for grouping). Groups can also be defined by math shift characters (if you like empty lines in the source of a math formula, you can say `$\let\par\relax ...$´), or implicitly for a cell in a table, or via \left and \right in a math formula, or via \begingroup and \endgroup (they define a “semi simple group”).

One difference between plain TeX and LaTeX is the existence of named groups: instead of saying `\beginfoo´ and `\endfoo´, you say `\begin{foo}´ and `\end{foo}´. This is interpreted by LaTeX as

  1. When \begin{foo} is seen,

    1. a test is made to see if `\foo´ exists: if it does not exist, an error is signaled and steps (1.3) and (1.4) are skipped (via a call to \expandafter):

    2. the command \begingroup is executed (with space hacking);

    3. the name `foo´ is stored in \@currenvir;

    4. the command \foo is executed.

  2. When \end{foo} is seen,

    1. the command \endfoo is executed;

    2. the name `foo´ is compared with \@currenvir, an error is signaled in case of mismatch;

    3. the command \endgroup is executed (with more space hacking).

This mechanism is not symmetric. It is implemented in Tralics in a similar manner (but there are some differences that can be analyzed by a malicious user). The first remark is the following: on entry, you may get a message that says LaTeX Error: Environment unknown undefined, on exit you would get LaTeX Error: \begin{document} ended by \end{unknown}. The trick is that the \endfoo token (created by \csname) is never undefined (its default action is \relax). One important point is that the command used in step (1.4) is \foo, not \beginfoo. In [6, example 7-3-1], there is an example of `bfseries´ as an environment; there is no command \endbfseries. Note that in step (1.4), the token that comes after \foo is the token after `\begin{foo}´, and this means that \foo can grab its arguments; on the other hand the token after \endfoo in step (2.1) is the start of the sequence that checks the environment name: thus \endfoo cannot read its argument (we shall see in a minute why steps (2.1) and (2.2) cannot be swapped). In the current version of Tralics, the “space hacking” is not implemented; we shall not discuss it here.

There are some tokens whose name start with `end´, you should no use these as environments. Consider \begingroup and \endgroup, the commands explained above; consider \input, \endinput, these are TeX primitives for inputting from a file; consider \beginL, \endL, \beginR, \endR, the ϵ-TeX extensions for left-to-right or right-to-left writing; consider \citation and \endcitation, these are Tralics commands for the bibliography; the command \endsec indicates the end of a section; the \endlinechar is a reference to an integer register that contains the character to be added at the end of each line. Commands \endgraf and \endline are aliases to \par and \cr.

This is how you can define new environments:

\newenvironment{x}[2]{#1BY\begin{y}#2AY} {by\end{y}ay}
\begin{x}a b c \end{x}

This typesets as aBYZbAY c byzay. The \begin part reads two arguments. The \end part takes no argument; it could use the first argument of the \begin, provided that this one saves it in a command. In verbose mode, the following is printed by Tralics in the transcript file. We have removed all lines with `Character sequence´ and `Text´.

1 [185] \begin{x}a b c \end{x}
2 {\begin}
3 {\begin x}
4 +stack: level + 3 for environment entered on line 185
5 \x #1#2->#1BY\begin {y}#2AY
6 #1<-a
7 #2<-b
8 {\begin}
9 {\begin y}
10 +stack: level + 4 for environment entered on line 185
11 \y ->Z
12 {\end}
13 {\end x}
14 \endx ->by\end {y}ay
15 {\end}
16 {\end y}
17 \endy ->z
18 {\endgroup (for env)}
19 +stack: ending environment y; resuming x.
20 +stack: level - 4 for environment from line 185
21 {\endgroup (for env)}
22 +stack: ending environment x; resuming document.
23 +stack: level - 3 for environment from line 185

At lines 4, 10, 20 and 23, you can see that the current “level” changes (this is what TeX calls the “semantic level”). The default level is level one, our example was done at level two, the first environment is at level three, the second at level four(note: ). When you see `level + 4´, it is because the level has just incremented; if you see `level - 4´ it means that the level will decrease. At lines 18 and 21, you see that Tralics uses a special `\endgroup´ token. Look closely at lines 13 and 19: when Tralics sees `\end{x}´, the current environment is `y´, it is only after evaluation of \endx that the environment is `x´ again; this example shows that steps (2.1) and (2.2) cannot be swapped. In Tralics the name of the environment cannot be modified by the user.

Because of the \begingroup command, everything, until the \endgroup, is local to this group; in particular \@currenvir will be restored. If you say something like


the command associated to \end{zfoo} is locally redefined. In some cases, this is a big mistake: in Tralics, the start command can assume that the corresponding end command is executed or an error is signaled. In fact, the meaning of \endzfoo is stored on a special stack, and restored by \end{zfoo}. There is a big hack in LaTeX (and also in Tralics): since no text should follow the end of the document, there is no need to store on the stack every definition given between the start and end of the document; thus \document executes a \endgroup; logically, \enddocument should insert a \begingroup token; in LaTeX, this is not needed because step (2.3) is never executed. In Tralics we re-insert a \begin, because we have to typeset the bibliography. (as a consequence, the start-line in the trace is the line that contains \end). Moreover, action cannot be completely trivial, because we have to re-insert all tokens saved by \AtEndDocument. We show here the transcript file, assuming that only one token has been saved, namely \empty. You can see the stack increase and decrease; you can see the \endinput that closes the current file; you can also see a second \enddocument command whose action is to pop the XML stack; it is marked `pop (module)´ for historical reasons.

[31] \end{document}
{\end document}
+stack: level + 2 for environment entered on line 31
\empty ->
{Pop (module) 2: document_v div0_v div1_v}
{\endgroup (for env)}
+stack: ending environment document; resuming document.
+stack: level - 2 for environment from line 31
++ Input stack empty at end of file

The last line of the transcript file shown above says that the current file was not inputted by another one. What happens if a file foo.tex contains \input tralics-rr, followed by some junk? Well, the purpose of the pseudo command \endallinput is to forget about everything. The transcript file would contain

++ End of file tralics-rr.tex
++ cur_file_pos restored to 0
++ Input stack -- 1 tralics-rr.tex
++ Input stack empty at end of file

Clearly, you cannot use a document environment in a document; if you try, LaTeX complains with LaTeX Error: Can be used only in preamble (the preamble is everything before \begin{document}). The error message of Tralics is a bit more explicit: Two environments named document. If you put \begin{it} before \begin{document}, LaTeX does not complain. The trouble is at the end: you will get an error of the form LaTeX Error: \begin{it} on input line 9 ended by \end{document}, followed by a TeX warning : (\end occurred inside a group at level 1). In Tralics, an error is signaled at the start: \begin{document} not at level 0. On page 6.5, you see statistics of the form `Save stack +1582 -1582´; this means that the semantic stack pointer has increased 1582 times, it has decreased the same number of times, so that the end of the document has been seen at level zero, no warning is issued in the case the two numbers are not the same.

The package checkend contains a magic command whose effect is to unwind the stack, signaling an error if unclosed items are seen. This command should only be used at end of document, in the end-document hook. The result of using the package produces a result like the following:

Error signaled at line 687 of file testkeyval.tex:
Non-closed \begingroup started at line 683.
Non-closed brace started at line 437.
Non-closed environment `it' started at line 213.

2.3. Some small examples

Remember that \foo and \; are two commands who differ only in the following behavior: when the tokenizer sees a backslash followed by a semi colon (whose category code is not letter), it constructs a command whose name is formed by that character (and sets the internal state to a mode in which spaces are not ignored). On the other hand, if the backslash is followed by a letter, all letters are read (and the state is set such that following spaces will be ignored). By space, we mean here every character that has the category code of a space. A space after \verb is never ignored, but it is unwise to use this space as delimiter. In the case of \foo, the tokenizer allocates a slot on the hash table (unless \foo already exists). The possibility to change category codes dynamically is interesting (however, the implementation of \verb in Tralics uses no category code changes, and is more efficient). The two commands \makeatletter and \makeatother change the category codes of the at sign character @, to letter and other. For instance


In this example, we have two user commands: \foo that defines a variable, and \usefoo that uses it. The variable \foo@val has a reserved name; there is a command \check@foo that makes sure that the argument is correct. The default category code of @ is 12; in most of the examples, we shall assume that it is 11, because these examples come from the LaTeX kernel or style files where the default category code is 11.

As explained above, `\catcode`\$=3´ changes the category code of the dollar sign. What follows has to be a character code (a number between 0 and 255) followed by an optional equals sign, followed by a valid category code (an integer between 0 and 15). Assume that you say \def\A{25}, followed by `\catcode\A7.´. In the case where standard category codes are in effect this is tokenised as catcode A 712 .12. But when a number is read, all tokens are expanded, until the end of the number is found (in the case where the number is formed by digits, one space character after the number will be read, if possible). In this case, TeX reads the digits 2, 5 and 7. It stops reading at the dot. This is an error (signaled by Tralics as Bad character code replaced by 0: 257). Then TeX reads an optional equals sign (there is none) and an integer (there is none). Hence you get a second error (Missing number, treated as zero). The result is that you have changed the category code of the null character to zero (like backslash). Since version 2.9, Tralics accepts 16bit characters, so that the number 257 is valid, and you changed the category code of the letter `latin small letter a with caron´ to zero.

If you want to put 7 in the category code of the character defined by the command \A, you should say `\catcode\A=7~´.(note: ) It is possible to make \A a reference to the character number 25, by using \chardef. Thus you can say `\chardef\A25~´ and `\catcode\A7~´. Note that, in the context of routines like scanint, a character number is a valid number; so that \A can be used as the number 25, wherever a number is required. In the sources of LaTeX you can see `\chardef\active=13´. You will also see `\mathchardef\@cclvi=256´; there is no difference between \chardef and \mathchardef, except that a character is in the range 0-255, while a math char can take larger values (less than 2 15 ). You can use \countdef\B26 (this will make \B as a reference to count register number 26), \dimendef\C27 (this will make \C as a reference to dimension register number 27), \skipdef\D28 (this will make \D as a reference to skip register number 28), \muskipdef\E29 (this will make \E as a reference to muskip register number 29), and \toksdef\F30 (this will make \F as a reference to token register number 30). There is no `\boxdef´. The reason is that, if you want to copy the value of counter 1 into counter 0, you say \count0=\count1. If you say \count@=\B this will put the value of the counter 26 into \count@ (this is the counter 255). However, you say \setbox0=\copy1 if you want to copy the content of box 1 into box 0: the syntax is not the same. Note that \setbox0=\box1 copies and clears the box number one. When you use a command like \chardef, a line will be added to the transcript file, even in non-verbose mode, see section 6.13.

Commands can be defined via `\let´. You say \let\A=\B, where \A is a token that can be defined (active characters or commands; TeX does not care if the token is defined or not). It is followed by <equals><one optional space>. This means that TeX reads all space tokens; if the first unread token is an equals sign, it is read as well as the next token, provided that it is a space. If the equals sign is followed by two space tokens, only one is read. Instead of \B, you can put any token. After that, the current meaning of \A will be the current meaning of \B. For instance, if you say \let\foo\bar\show\foo you will get \foo=macro:->\mathaccent "7016\relax. In plain TeX, you would see a space instead of \relax (both a space and a \relax indicate the end of the number). In Tralics, you would see \foo=\bar, this is because \bar is a primitive, instead of a user defined command. If you say \let\A=+, then \A will behave like a + character (of category 12). In fact, this is called an implicit character, and sometimes an explicit character is required. For instance in the case \parindent=-3.4pt, the minus sign, the digits, the dot, and the two letters pt must be explicit characters. However, after

\let\bgroup={  \let\egroup=} \let\sp=^ \let\sb=_

there is no difference between $x\sp\bgroup a\sb b\egroup$ and $x^{a_b}$. The assignments shown here are made by Tralics when bootstrapping, and the command so defined should be considered primitives. A token list has to be well balanced against explicit braces. For instance


satisfies the requirements. The body of the command consists in {1 catcode `12 }2 =12 012 egroup. If you evaluate \foo, the \catcode command will read the four tokens that follow; it will modify the category code of the opening brace. All this happens inside a group opened by {1 and closed by egroup, so that this is harmless. One use of \let is the following:

\def\fooA{a very long command}
\def\fooB{another very long command}
\def\xbar#1{\ifx 0#1\let\foo\fooA \else \let\foo\fooB\fi}

Here we use the fact that \let just moves a pointer.(note: ) This is faster than copying a list. In particular, consider

\def\xbar#1{\ifx 0#1\fooA \else \fooB\fi}
\def\xbar#1{\ifx 0#1\let\foo\fooA \else \let\foo\fooB\fi\foo}

The first line executes conditionally one of \fooA and \fooB. However, this command cannot read an argument (because \fooA is followed by \else and \fooB by \fi). In the second case, we define \foo conditionally, and it can read its arguments without problem.

You can use the following construct

% example of use

This typesets as ABA. Beware: the \addtofoo command can be used only once (the old value of \oldfoo has to be saved...). We shall see later how to replace in the definition above the \oldfoo by its value, using either tokens lists or \edef, using a method where \oldfoo is a temporary. This is another example:

% example

Here `\B´ typesets as `toto´. In fact \B is defined as `\tmp\tmp´, where \tmp is the old definition of \B, namely a command that expands to `\C´. It you say \def\C{ti}\B, you will get `titi´. If in \double the \let is replaced by a \def as \def#1{#2}, the expansion of \tmp would have been \B, and \B would have been the same as \B\B. You see the problem? This could provoke a stack overflow, a parameter stack overflow, or even a program crash.

Let´s mention the existence of \futurelet\A\B\C. It is the same as \let\A\C\B\C. The usefulness of such a construct will be explained later.

You can say \expandafter\A\B. In such a case, TeX reads the first token, saves it somewhere, calls expand if possible, re-inserts the saved token. Nothing special happens if the second token (here \B) cannot be expanded, because it is a non-active character, or a command like \par or \relax. But assume that \A is a command that uses one argument (for instance \textit) and \B expands to `foo´. If you use \expandafter, only the first letter will be in italics. Assume that \foo expands to a dollar sign. Then $\foo is an empty math formula because \foo is not expanded, but \expandafter$\foo.$$ is a display math formula with a dot. The main reason why tokens are not expanded after a dollar sign (when TeX looks for an other dollar sign) is that a test $\ifmmode true\fi$ should evaluate to true. You can use \expandafter if you want the test to be executed outside math mode. Note: if a table contains a template of the form `$#$´, if the cell starts with \ifmmode, then the test is expanded (i.e. evaluated) before math mode is entered, because TeX is looking for an \omit token. As a consequence you should always put `\relax´ before a test (this is not needed if a command is made “Robust”).

Look carefully at the following lines:

1 \def\toto{\titi!}\def\titi{\tata}\def\tata{\tutu}
2 \expandafter\expandafter\expandafter\def\toto{5}
3 \let\E\expandafter \E\E\E\def\toto{6}
4 \def\E{\expandafter} \E\E\E\def\toto{7}
5 \expandafter\def\toto{8}

On the first line we define three commands \toto, \titi and \tata. As we shall see, lines 2, 3 and 4 do not change the meaning of \toto, so that, on line 5, the expansion of `\toto´ is `\titi!´. In this case, the effect of the \expandafter is to replace `\toto´ by `\titi!´. Hence, line 5 defines a macro \titi, that has to be followed by an exclamation point, takes no argument, and expands to 8. Consider now line 2. The first \expandafter puts apart the \expandafter token; it expands the next token, which is \expandafter, and the expansion of this is: read the token that follows (here `\def´), and expand the token that follows. This is `\toto´, that expands to `\titi!´. If we pop back the two tokens, line 2 is equivalent to `\expandafter\def\titi!{5}´. This looks like line 5, so that it is the same as `\def\tata!{5}´. There is no difference between lines 2 and 3: the \E command behaves exactly like \expandafter. Consider now line 4. What TeX does is expand the first token. It is \E, it expands to `\expandafter´. Since the token can be expanded, it will. Thus TeX reads and remembers the token that follows. It expands the next token (the third `\E´). Its expansion is `\expandafter´. Hence, line 4 is equivalent to `\E\expandafter\def\toto{7}´. Now, the \E in this list has as effect to try to expand the second token; it is \def, which cannot be expanded. Hence this `\E´ is useless. Line 4 is equivalent to `\expandafter\def\toto{7}´. And this defines \titi. We give here the trace of Tralics (it is a bit more complete then the trace of TeX):

\E ->\expandafter
{\expandafter \E \E}
\E ->\expandafter
\E ->\expandafter
{\expandafter \expandafter \def}
{\expandafter \def \toto}
\toto ->\titi !
{\def \titi !->7}

A question is : how many commands with two characters can be defined in Tralics? The answer is 255 squared (all characters but the null character are allowed(note: )). Of course, if you say `\def\++{}´, this defines the `\+´ command not the `\++´. You could imagine to change category codes (but, in a construction like `\def\{}{}, it is impossible to give a different role to the first and second opening brace). The solution is given by \csname, you can use it like this `\csname1+1=2\endcsname´. Note that this typesets nothing: when \csname manufactures a new control sequence name, it defines it as being \relax (the control sequence will exist, until the end of the job). You can hide the \csname command, like this

\def\nameuse#1{\csname #1\endcsname}

If you want to define such a beast, you must use \expandafter.

\def\namedef#1{\expandafter\def\csname #1\endcsname}

The two commands \@namedef and \@nameuse are defined by LaTeX and Tralics like \namedef and \nameuse.

You can also say \namedef{++}#1{#1+#1} followed by \nameuse{++}{3}. This should give 3+3. If you want a macro named \{}, you can say \nameuse{\string\{\string\}}, provided that \escapechar=-1. If you do not like this setting of \escapechar, you can define a command, say \Lbra, that expands to {12 (an inactive opening brace character) using whatever method seems best. For instance

{\escapechar=-1 \xdef\Lbra{\string\{}\xdef\Rbra{\string\}}}

We explained above what happens when three \expandafter come in a row. Thus, it should not surprise you that the following command defines \foo.


A more realistic example of \csname is

\def\newcounter#1{\expandafter\newcount\csname c@#1\endcsname}

There are ten such commands in LaTeX, \newcount, \newtoks, \newbox, \newdimen, \newskip, \newmuskip, \newread, \newwrite, \newlanguage are implemented in Tralics. The equivalent of \allocate takes as argument a type (for counters, dimensions, skip registers, muskip registers, box registers, token registers, input registers, output register, math families, language codes, insertions, etc) and allocates a unique number depending on the type, and puts it in \allocationnumber. Count registers between 10 and 19 are used for this purpose, and the user should not modify them. Command \new@mathgroup is not implemented because math groups are unused. Note that \newsavebox and \newdimen are the same as \newbox and \newskip since Tralics does not check redefinition of the command; the command \newinsert is not implemented (this requires a box register, a count register, a dimen register and a skip register; each unprocessed float in LaTeX uses a insert, this may trigger a too many unprocessed floats error). The command \newhelp is not implemented in Tralics, it allocates no counter.

For instance, if you say \newcount\Foo, the allocated number could be 110, if you say \newskip\Bar, the number could be 46. In the first case, the result is as if you had said \countdef\Foo110. In the case of \newcounter{foo}, the result is as \newcount\c@foo111. Note that there are only 256 count registers available in TeX(note: ). You can use registers zero to nine as scratch registers (Do not forget that \count0 contains the current page number), LaTeX uses registers 10 to 21 for its allocation mechanism. In the current version, the first free counter is 79. Some other counters are allocated by the class, and the package (in the transcript file, one line is printed for every call to \allocate, for instance: \c@chapter=\count80; in Tralics, the line looks like {\countdef \c@foo=\count43}).

A very important point is that all tokens between \csname and \endcsname are fully expanded. It is an error if a non-character token remains. Thus it is important to know which commands are expanded, and those that cannot be expanded. The exact rules are in the TeXbook, chapter 20. As a rule of thumb, commands that do no typesetting and modify no internal table can be expanded. More precisely: user defined commands, conversions like \string, \number, conditionals like \fi, marks, and some special commands like \csname, \expandafter, \the can be expanded. A construction like \csname\char`A\endcsname is invalid.

If you say \noexpand\foo, the result is \foo, unexpanded. Example:

1 \def\FOO{12}
2 %\csname\noexpand\FOO\endcsname  %bad
3 \edef\xbar{\noexpand\FOO}
4 \noexpand\FOO
5 \expandafter\textit\FOO
6 \expandafter\textit\noexpand\FOO
7 \count0=1\FOO
8 \count0=1\noexpand\FOO

Line two is an error: the no-expanded \FOO is not a character. On line 3, the body of \xbar is `\FOO´, it will be expanded later. The translation of line 4 is empty (the command \FOO is temporarily seen as \relax, and \relax does nothing). Because of the \expandafter, the argument of \textit on line 5 is 1, on line 6 it is 12. On line 7, 112 is put in \count0, because \FOO is expanded. On line 8, 1 is put in the register, and 12 is typeset. On lines 8 and 6, \FOO is expanded twice, the first expansion being inhibited by the \noexpand.

Some quantities are never expanded, for instance \lowercase (this is black magic), \def (more generally all assignments), \relax (it does nothing, but stops scanning integers, dimensions, glue, etc), \hbox, \par(note: ), \left, etc. There are cases when an expandable token is not expanded: ten cases in TeX, and four additional cases in ϵ-TeX, these are described in section 6.12. Be careful with constructs like \csnameé\endcsname: LaTeX may signal an error involving \unhbox.

A command can be defined via \edef instead of \def (\xdef is the same as \edef, with an implicit \global prefix). All tokens, unless defined with \protected, in the body of the definition are expanded. Example:

\def\A{\B\C} \def\C{1}
{\let\B\relax \global\edef\D\bgroup{\A\noexpand\C\egroup}}
{\let\B\relax \global\edef\E\Bgroup{\A\noexpand\C\Egroup}

In this example, we consider two groups, that define (locally) a command \B and (globally) two commands \D and \E. The difference between these two commands is that \bgroup is an implicit character: when evaluated, it behaves like an opening brace, but it cannot be expanded. On the other hand, \Bgroup expands to an open brace. The \edef expands tokens following an explicit opening brace. It stops reading after having found an explicit closing brace (resulting from the expansion of \Egroup, not \egroup). The expansion of `\A´ is `\B\C´, this is expanded again. Since \B is relax, it cannot be expanded, and is left unchanged. The expansion of `\C´ is `1´, so that the full expansion of `\A´ is `\B1´. The expansion of `\noexpand\C´ is `\C´. Thus, the example is equivalent to


You can put three \noexpand in a row followed by some token X. After the first expansion, the result is \noexpand followed by X, after the second expansion, the result is X. In the example that follows, the value of \B is \xbar.


Consider a realistic example like this

\def\cons#1#2{\begingroup\let\@elt\relax\xdef#1{#1\@elt #2}\endgroup}

We can say something like

\def\A{}\def\B{}  %init
\let\do\relax% just in case
\add\A x, \add\A y, \add\A z,
\cons\B{ab}, \cons\B{cd}, \cons\B{ef}.

This gives two ways to add some tokens to a list. Because both commands use \edef, full expansion is in use; you have to be very careful if the tokens contain macros that can be expanded. For the case of \add, we assume that \do does nothing; for the case of \cons, the command resets \@elt to \relax. The body of \A will be \do{x}\do{y}\do{z} and the body of \B will be \@elt ab\@elt cd\@elt ef. Note the absence of braces: if you really need them, you should add them to the argument of the \cons command. The built-in command \@cons

The major problem with \edef is that it is not aware of assignments. Assume that \def\@A\B{}, and \def\C{B \let\@A\D}, \def\E{\C} have been somehow evaluated. Consider now an \edef containing \E. This implies expansion of \C, hence of `\let\@A\D´. The \let command cannot be expanded. Hence \@A is expanded, and you get the following error: Use of \@A doesn´t match its definition from inside \C. You have never heard of this command \@A, and never used \C! For this reason some commands are made robust: for instance \hspace expands to `\protect\hspace ´ (the second command here has a space at the end), and \protect is defined to be \relax, or \noexpand, and sometimes \string. This mechanism works only if you use \protected@edef instead of \edef. (Note: \protect behaves like \string inside \protected@write, which is a variant of \write).

2.4. Variables in TeX

By variable, we mean everything that the user can modify or watch changing. For instance, the current hash table usage is not a variable (it varies, of course, but the value is available only at the end of the run, in the transcript file). The current vertical list is updated whenever a paragraph is split into lines; you cannot access this list, however the \output routine gets the part of it that should be typeset on the current page in the box register 255. There are general purpose variables, and specialised ones: for instance \spacefactor makes sense only in horizontal mode, and the height of the material on current page (\pagetotal) can be used only between paragraphs (in fact, it is updated by TeX whenever a line is added to the page; you can consult, and even modify, this value at any time). There are variables that you cannot modify (the version number, for instance) or only once (the magnification), or in the preamble (i.e., LaTeX reads some variables at begin-document, changes done later to these variables are ignored).

Variables can be classified into two categories depending on their use: in some cases you need to put a prefix before \foo if you want to use it, in other cases the prefix is required for modification. For instance, if \foo is a user-defined command, you say \let\foo, or \def\foo, if you want to change the value, and simply \foo if you want to use it. In the same fashion \font\tenrm defines a font, and \tenrm is a use. On the other hand, if you say \pageno=3, this will set the current page number to 3 (this is plain TeX syntax, the LaTeX syntax will be explained later). If you say something like \hskip-\fontdimen2\font, the \hskip command is a prefix that says that the variable that follows will be used. In this case, this is some dimension from a font. Note that \fontdimen is a prefix so that \font does not define a new font, but refers to the current font. The meaning of the above piece of code is: insert horizontal space, whose amount is the opposite of the second parameter of the current font (i.e., normal interword space).

According to the TeXbook, a <font> can be a command like \tenrm defined by \font \tenrm =somefont, of the null font \nullfont, or the current font \font, or a family member (\textfont, \scriptfont, or \scriptscriptfont, followed by a 4bit integer). In the case of \hyphenchar or \skewchar, a <font> follows the command. This gives a reference to an integer, the hyphenchar or skewchar of the font (if this integer is not a valid character, the font has no hyphenchar or skewchar). In the case of \fontdimen, there is an integer P, a font, and this defines a reference to a dimension. The integer P must be positive and not greater than the number of parameters in the font (initialised by TeX to the number of parameters in the font metric file, 7 for a normal font, 13 for math extension, 22 for math symbols, see TeXbook, appendix F). You can get an error: Font somefont has only 7 fontdimen parameters. In Tralics, the value is zero if P is out-of-range. In TeX, the last loaded font table can be dynamically increased: if you assign a value at position P>M, this will increase M. In Tralics, this is possible for all fonts, if P<10 5 .

The value of a variable can be

You can say `\afterassignment\foo\count0=3´; in this case, the command \foo is pushed on a special stack, and popped after assignment is complete. There is only room for one token on this special stack. For instance, if you write the following:

\afterassignment \fooA\afterassignment\fooB

the transcript file of Tralics will contain (in verbose mode)

[9] \afterassignment \fooA\afterassignment\fooB
{\afterassignment: \fooA}
{\afterassignment: \fooB}

At this point, the after assignment stack contains \fooB. The order of evaluation is now the following: \fooD is expanded; this gives \relax, which terminates scanning of the number; it will be read again, after evaluation of \fooB:

[10] \fooC\count0=1\fooD
\fooC ->\relax
+scanint for \count->0
\fooD ->\relax
+scanint for \count->1
{after assignment: \fooB}
\fooB ->\relax

You can use the \showbox command for displaying the content of a box. This is a little example. It uses \everyhbox and \afterassignment. Note the order in which these tokens are inserted.


This is what TeX prints in the log file:

> \box0=
.\T1/cmr/m/n/10 1
.\T1/cmr/m/n/10 2
.\T1/cmr/m/n/10 3
.\T1/cmr/m/n/10 4

The first line of the trace starts with \hbox or \vbox, followed by the dimensions (height, depth, width; the unit is `pt´ by default), optionally followed by `shifted 27.1´ if the the box is shifted, and by `glue set 0.19´ if the glue has to be stretched or shrunk. After that, you will see the content of the box, one line per item (no more than \showboxbreadth lines are printed per box), each item is preceded by a context (a sequence of N dots at depth N, tokens at depth greater than \showboxdepth are not shown). In the box, you can see things like `\penalty -51´ or `\kern 28.45274´ or `\glue 3.0 plus 1.0´ or `\glue(\baselineskip) 2.28015´ (this last glue is inserted automatically by TeX, it knows where it comes from, so that the name can be printed), \special{...}, \write4{\indexentry...}. The interesting point in the last object is that we have a list of tokens that will be evaluated later (when the page is shipped out). Tralics does not put \kern, \penalty, neither \glue in a box. The \special command is not implemented; finally \write is never delayed. In our example, the box contains four items, which are characters (TeX shows a command that contains the name of the font; in our example, the font is something like `ecrm1000´).

In Tralics, you would see the same characters, but no font and no size. On the other hand, you can say something like

\setbox0=\xbox{foo}{1\xbox{bar}{2} %

and you will see

<foo y='2'>Test1<bar x='1'>Test2</bar> 3</foo>

Note the two commands that were used to add attributes to the current XML elements, and the last constructed one. We have added another command, \XMLaddatt that takes as optional argument the id of the element to which the attribute value pair should be added. This is an integer; if omitted, the current element is used. You can use \XMLlastid or \XMLcurrentid (there are references to variables, you must use \the if you want the value). If you want to overwrite an existing attribute pair, you must use a star. The previous example can be written like this:

\setbox0=\xbox{foo}{1\xbox{bar}{2} %

If \foo is any command then \show\foo will show its value. Here are some examples

\def\Bar#1#{#1} \show\Bar
\let\foo\par \show\foo
\renewcommand\foo[2][toto]{#1#2} \show\foo
\let\foo=1 \show\foo
\let\foo=_ \show\foo
\let\foo=\undef \show\foo

This is what Tralics prints (it differs slightly from the LaTeX output)

\Bar=macro: #1#->#1.
\foo=opt \long macro: toto#2->#1#2
\foo=the character 1.
\foo=subscript character _.
\bgroup=begin-group character {.

In the case of a variable, you can say \the\foo, the result is a token list that represents the value of \foo (if \foo is a token list, \the\foo is the value of \foo, otherwise, it is a list of characters). The command \showthe will show the value, i.e. print on the terminal the token list returned by \the. Example

\widowpenalty=3 \Show\widowpenalty
\parindent1.5pt \Show\parindent
\leftskip = 1pt plus 2fil minus 4fill \Show\leftskip
\thinmuskip = 3mu plus -2fil minus 4fill \Show\thinmuskip
\count0=17 \Show{\count0}
\dimen0=17pt \Show{\dimen0}
\skip0=17pt plus 1 pt minus 2pt \Show{\skip0}
\muskip0=17mu plus 1 mu minus 2mu \Show{\muskip0}
\font\xa=cmr10 at 11truept
\fontdimen6\xa = 11pt \hyphenchar\xa=`\-
\toks0={\foo = \foo} \def\foo{foo}

This is what Tralics prints on the screen.

\show: 3
\show: 1.5pt
\show: 1.0pt plus 2.0fil minus 4.0fill
\show: 3.0mu plus -2.0fil minus 4.0fill
\show: 17
\show: 17.0pt
\show: 17.0pt plus 1.0pt minus 2.0pt
\show: 17.0mu plus 1.0mu minus 2.0mu
\show: 11
\show: 98
\show: 79
\show: 11.0pt
\show: 45
\show: 25
\show: cmr10
\show: \foo= \foo

The typeset result is: 31.5pt0.0pt0.0mu1717.0pt17.0pt plus 1.0pt minus 2.0pt17.0mu plus 1.0mu minus 2.0mu11987911.0pt 45 25cmr10 foo= foo(note: ).

In the case of \the\foo, \showthe\foo, \advance\foo, \multiply\foo, \divide\foo, the token that follows the first command is fully expanded.

2.5. All the variables

All variables (exceptions will be given later) are in the table of equivalents: this table contains the current meaning of quantities that are saved/restored by the grouping mechanism of TeX. In TeX this table is divided into six parts; in Tralics, the layout is slightly different, for instance, because TeX makes a heavy using of glue (each space character produces a glue item), while Tralics ignores them completely. This big table contains the following objects

  1. the current equivalent of single character control sequences (for ~ as well as \~);

  2. the hash table (in Tralics, there are two such tables, if the command \foo produces <bar gee=´true´>, the three strings `bar´, `gee´ and `true´ are in a special table).

  3. all glue parameters.

  4. all quantities that fit on 16 bits.

  5. all integers.

  6. all dimensions.

The glue parameters are the following (unused by Tralics, initialised to 0, unless stated otherwise.

The token parameters are the following (initially empty; unused by Tralics unless stated otherwise):

The integer parameters are the following. These parameters are zero, unless stated otherwise.

The following quantities are read only variables. They are integers, unless stated otherwise.

The counters defined in Tralics are the following. The counters are not used unless specified otherwise, but you can say \renewcommand\thepage{...}, this is not an error.

The dimension parameters are the following:

The registers are the following

Since version 2.9 of Tralics, all characters have 16 bits, so that the number of characters 256 should be replaced by 2 16 . i.e. 65536, in the sizes above. Moreover, the number of other registers (from \count to \box above) has been increased to 4096.

Some quantities are meaningful when TeX makes lines into pages. The dimension \pagegoal contains the current page height (minus the size of all potential insertions). The current page height has a natural value in \pagetotal and a shrink part in \pageshrink, the stretch part is in \pagestretch, its `fil´ part is in \pagefilstretch, its `fill´ part in \pagefillstretch and its `filll´ part in \pagefilllstretch. The depth of the box is a constant dimension, in \pagedepth. Whenever the output routine is called, TeX increases the value of the integer counter \deadcycles; an error is signaled if the value is too big, it is reset to zero by \shipout. In \prevdepth, you can find the depth of the most recent box on the current vertical list, in the integer \prevgraf the number of lines in the most recent paragraph that has been completed or partially completed. Of course, all these value are zero in Tralics.

In plain TeX, you can use \nointerlineskip and \offinterlineskip. These commands change the value of \prevdepth. They are ignored by Tralics.

2.6. Using the variables

There are three routines defined in Tralics, named scanint, scandimen and scanglue that read a integer, a dimension and glue. Assume that \count0 is 1, \parindent is 3pt, and you say \skip\count0=2pt plus \parindent\relax. The transcript file of Tralics will contain

[346] \skip\count0=2pt plus \parindent \relax
+scanint for \count->0
+scanint for \skip->1
+scanint for \skip->2
+scandimen for \skip->2.0pt
+scandimen for \skip->3.0pt
{scanglue 2.0pt plus 3.0pt\relax }
{changing \skip1=0.0pt into \skip1=2.0pt plus 3.0pt}

The exact rules will be given later. The following happens here: After \skip there is an integer, an optional equals sign, then glue. After \count there has to be an integer. Thus, scanint reads an integer for \count, and an other one for \skip. A glue item is formed of a dimension (the natural width), optionally followed by `plus´ and a dimension (the stretch part), optionally followed by `minus´ and a dimension (the shrink part). In this case, there is no stretch part, because of \relax. The second dimension comes from the variable \parindent; the first dimension is explicit: the integer part of the dimension is read by scanint.

An integer can be explicit or implicit: an implicit integer comes from a command (it can be a variable like \date, or a constant like \active). In all other cases, the number can be followed by one optional space. In general, the number will be given as a non-empty sequence of digits, like 01239; you can specify digits in base 16 as “FF, this is 255, in this case, letters between A and F (uppercase, category 11 or 12) are allowed. You can specify digits in base 8 as ´177, this is 127. You can also specify a digit as a character: `A is 65. You can say `\A, this is also 65; note that a backslash is needed in cases like `\%. Only one letter is allowed in the command name, digits and quotes must have category 12.

An integer or a dimension can be preceded by a sign. This is a sequence of arbitrary length, formed of spaces or +12 or -12 signs. If the number of minus signs is odd, this changes the sign of the result. Hence if you say \count0=+-+´77 and \count1=-\count0, this will put 63 in \count1.

A dimension can be implicit or explicit. You can say \count0=\dimen0: in this case the value of the dimension in sp units is put in the count register. You can say \dimen0=\skip0: the shrink and stretch part of the glue is ignored. You can also say \count0=\skip0 (guess what happens). It is not possible to convert (directly) an integer to a dimension or glue. An explicit dimension is formed of a factor and a unit of measure. The factor can be an integer (hence -´77pt is a valid dimension), or a decimal number (given in base ten, like 1.5, or 1,5). Units can be pt, pc, in, bp, cm, mm, dd, cc, sp. The case is irrelevant: Pt, pt, PT and pT are valid, the category code may be anything (it cannot be active, because everything is fully expanded). Units shown above can be preceded by true (note that Tralics ignores magnification, thus the `true´ prefix). Units can also be em or ex. These values depend on the current font. Tralics always assumes that the font is a ten point one. A unit of measure can also be an integer, a dimension, or glue. For instance \dimen0=1\count0 will multiply the value of \count0 by one. This is the dual to \count0=\dimen0. You can say \parindent=1.2\parindent if you want to increase it by 20%.

A glue is formed of three parts: a dimension, a stretch part, and a shrink part. The stretch part can be a dimension (it can use special units like `fil´, `fill´ and `filll´, these are called infinite, of first, second and third order). You can say \skip0=0pt plus 1fil. For some strange reasons, after fil you can put a second L, and a third one. As is the case with other units like ex or em, the case is irrelevant. Spaces are ignored after the L. Moreover, TeX continues scanning for an L after having found `filll´; if found, it signals the following error: Illegal unit of measure (replaced by filll). In the case of \skip0=2\skip0, the equals sign is followed by a dimension: there is a factor 2, and a unit (the fixed part of \skip0). As a consequence, this multiplies by two the fixed part of the glue, and sets the shrink and stretch part to zero (unless the code above is followed by `plus´ or `minus´).

Note: if you say \chardef\foo=123\foo, then \foo is made equal to 123: the first thing that \chardef does it to make \foo the same as \relax, so that scanning stops after digit 3. On the other hand in the case of \count0=3\ifnum... the conditional is evaluated while reading the number, thus before the assignment is complete. In particular, if the test compares \count0 with something else, the value might be different from three. Assume that \count0 and \count13 contain the value 7. What happens if you say: \count0=2\ifnum\count0=\count13\fi4 ? It will put 2 in \count0 and typeset 4. In fact, after the digit 3 is sensed, the \fi token terminates the \ifnum. It does so by inserting a \relax token, and a second \fi token. The effect of \relax is to finish reading the number. Thus \ifnum can compare the two values. If these two values are different, the expansion of the conditional is empty, and 24 is put in \count0. But the test is true, and TeX reads again the inserted \relax: it has as effect to stop scanning of the number 2. After that the inserted \fi is read. The transcript file of Tralics might look like the following. Since version 2.9, the transcript file contains also assignments. So you can see the order: when the \fi is seen, the last \count, hence the RHS of the equality, is not yet evaluated and a \fi token is inserted, preceded by a \relax token; these are evaluated later; the \relax token is seen by \count, and left unchanged. After that, we have the number 13, hence the value of \count13, hence the truth value of the test. Now, the body of the conditional is read; it consists solely of the \relax. This one is seen by the first \count, that has the value needed by the assignment. After the assignment is complete, the \relax is considered again: it is read, and the inserted \fi is evaluated.

[2677] \count0=2\ifnum\count0=\count13\fi4
+scanint for \count->0
+scanint for \count->0
+scanint for \ifnum->7
+scanint for \count->13
+scanint for \ifnum->7
+iftest989 true
+scanint for \count->2
{changing \count0=7 into \count0=2}
Character sequence: 4 .

A token list is a like a command without arguments. You can say \foo={ABC} if you want to put something it it, and \the\foo if you want to use the list. The equals sign is optional. You can insert a \relax between the equals sign and the opening brace. In the example that follows, you can see that, after the optional equals sign, you can put as many spaces or \relax tokens as you like; tokens are expanded, as long as no brace is found. The last line of the example shows that the token that follows \the is expanded (if \the itself is expanded). Thus, the last line adds some tokens at the end of the list. Note the space in \A: without it, TeX would see something like \the\toks0\the\toks0, and the second \the is expanded by the scanint routine, so that this inserts in \toks0 the content of \toks01 followed by a sharp sign.

\def\myrelax{ \relax}
\def\A{\toks0 }
\A=\expandafter{\the\A \the\A}\showthe\toks0

The \showthe command prints `1##\the \A´, but only a single # is in the list.

We have seen on page 2.3 how to use \cons to add some tokens to a command via \edef. The code that follows adds tokens to a list. The command is called \addto@hook in Tralics and is long, but the body is the same.


The command \newtoks defines its argument as a reference to a token register, for instance \toks23. Whenever you use \addtohook with \A as first argument, it is like the assignment \A=\expandafter{\the\A...} shown in the previous example. Other example

% \xdef\L{...\the\T}

Let´s assume that \L is a parameterless command, and \T a reference to a token register. The first line puts the value of \L in \T. The second line explains what we do in the third one. Remember that \xdef expands everything in the body. All tokens are fully expanded (except that the result of \the is not expanded). As a result, this will put some tokens in front of \L. Let´s explain which tokens. We assume that \count@ is a reference to some counter, that the counter contains 65, this is the ASCII code of the upper case letter A, and we assume that the category code is 11. The first token is \catcode, it cannot be expanded, it will be left unchanged. The second token is \the. It can be expanded, the result is the value of the counter, the two characters 65. The equals sign cannot be expanded. Then comes \the; this expands what follows. The \catcode command reads a number. Because of \the, it reads two digits 6 and 5, and looks at the \relax. Note: this \the is useless, this example revealed a bug in Tralics. This is the log of Tralics. The last line indicates the value of \L:

[18]   \xdef\L{\catcode\the\countx=\the\catcode\the\countx\relax\the\T}
{\the \countx}
{\the \catcode}
{\the \countx}
+scanint for \catcode->65
{\the \T}
\the->\catcode 48=12\relax .
{\def \L ->\catcode 65=11\relax \catcode 48=12\relax }

There are some advantages in putting items in a box. For instance, if it takes a long time to translate a piece of text that will be used several times, it can save some time. A second possibility is to create a box in a given context and use it in another one (this can be used for instance to put verbatim material in a section title; not in the toc, because the toc is obtained by reading characters from a file, but the box can be used for page headings). Finally, one can put some text in a box, measure the size of the box, and do some action according to the size of the box; it is not possible to measure a box in Tralics because no typesetting is done. Note that there is a limited number of boxes (there is a limit on the number of token registers, but you can always put your token list in a macro; in the same fashion, it is always possible to store integers and dimensions into token lists, i.e., in commands). Note that, if you want to implement arithmetics on big numbers, if you represent a number x=x k B k as a sequence of commands, try to access to x k via \csname x\the\k\endcsname, and parse this as an integer, then you get something inefficient. It is much more efficient to say \fontdimen\k\x (there is a TeX file by Denis Roegel that computes thousands of digits of π using font tables as auxiliary memory).

2.7. Counters

The most useful registers are counters. Rather than saying `\count16=0´, at the risk of destroying variables used by other packages, you should use named counters, together with an allocation scheme. We have seen that `\newcount\foo´ does that. In LaTeX, we can do more. If you say `\newcounter{foo}[bar]´ then a counter foo is defined that depends on bar. Let´s assume(note: ), for simplicity , that the allocation mechanism allocates count register 17. Then \c@foo is a reference to `\count17´. It is assumed that no package defines a command that starts with c@, or p@ or cl@, so that \c@foo, \cl@foo, and \p@foo are reserved for the counter foo. In LaTeX, there is a command \value that takes one argument and expands to \csname c@#1\endcsname. The same command exists in Tralics, but it signals an error in the case where \c@foo is not a reference to a count register. You can say `\value{foo}=10´, this will put 10 into the counter, you can say `\the\value{foo}´, this will typeset the value of the counter. You should not use this low-level TeX syntax. In fact, if you say `\value{foo}=10\the\value{foo}´ this will put 103 into the counter (assuming that it contained 3). Compare this with \parindent=10\parindent where there is an implicit multiplication.

Assignment should be done via `\setcounter{foo}{10}´. This is the same as `\global\value{foo}=10\relax´ (plus a check that `foo´ is a counter). The \relax has as effect to stop scanning the number. The \global makes the assignment global. In the same fashion, `\addtocounter{foo}{4}´ is the same as `\global\advance\value{foo}4\relax´. You can say something like `\parindent=\value{foo}\xbar´, this puts in \parindent the value of \xbar (let´s assume it is a dimension) multiplied by the the value of the foo counter. If you want to typeset the value of the counter, you say `\number\value{foo}´. You can also use \romannumeral or \Romannumeral (this last command is not defined by TeX) instead of \number (it has to be followed by a number, for instance \value...). The following commands take as argument the name of a counter, and typeset the value: \arabic (it gives 7), \roman (it gives vii), \Roman (it gives VII), \alph (it gives g), \Alph (it gives G), \fnsymbol (it gives **). The following commands: \@arabic, \@roman, \@Roman, \@alph, \@Alph, \@fnsymbol are used internally by LaTeX. They are defined in Tralics for compatibility reasons. Hence `\number\value{foo}´ is the same as `\@arabic\c@foo´ and the same as `\arabic{foo}´; using \arabic is the best choice.

Three operations are defined: \advance that increments a counter (or a dimension, or a glue), \multiply that multiplies it by an integer, and \divide that divides it by an integer. In the case of integer division, TeX divides the absolute values, and adds the required sign to the quotient (the remainder is not computed). The following piece of code puts in \count0 the number of hours and in \count2 the numbers of minutes (quotient of remainder of the division of \time by 60).

\divide\count0 60
\multiply\count2 60
\advance\count2 \time

You can say \newlength\foo. This allocates a new skip register. You can use \setlength and \addtolength, in the same way as \setcounter and \addtocounter. However, assignments are local. Using plain TeX syntax, you can say:

\advance\dimen0 by-\dimen1

Note that \dimen@, \dimen@i, and \dimen@ii are aliases for \dimen0, \dimen0 and \dimen2, these quantities are defined but not used by the LaTeX kernel (but they are used by packages). All registers with number less than ten can be used freely, others should use the allocation mechanism. Example


After this operation, the counter foo contains 5. This means that the difference between 2mm and 0.2cm is 5sp (two thousands of a micrometer). Note: Tralics uses exactly the same algorithms as TeX, hence produces the same results.

Appendix A.3.1 of [6] describes the calc (package) package. It allows to write commands like that:

   \thehours h \theminutes min}
      1st\or 2nd\or 3rd\or 4th\or 5th\or
      6th\or 7th\or 8th\or 9th\or 10th\or
      11th\or 12th\or 13th\or 14th\or 15th\or
      16th\or 17th\or 18th\or 19th\or 20th\or
      21st\or 22nd\or 23rd\or 24th\or 25th\or
      26th\or 27th\or 28th\or 29th\or 30th\or
      January\or February\or March\or April\or May\or June\or
      July\or August\or September\or October\or November\or
      December\fi\space \number\year}
The time is \printtime, \today.

In this case, the result of Tralics could be: `The time is 16h 37min, 7th December 2004.´

You can do operations on integers like this:

\setcounter{Ac}{(1+2)*(3+4)-20}          %% \theAc=1
\addtocounter{Ac}{(1*2)+(3*-4)+(34/7)}   %% \theAc=-5

and on dimensions:

\setlength{\Bc}{(1cm+2cm)*(3+4)-200mm}                    %%\the\Bc=28.4526pt
% exact results should be 1.0pt
\setlength\Bc{\the\Bc*\ratio{25.4pt}{722.7pt}}            %%\the\Bc=0.99985pt
\Bc=1in \setlength\Bc{\the\Bc * 100 / 7227}               %%\the\Bc=0.99998pt
\Bc=1in \setlength\Bc{\the\Bc * \real{ 0.01383700013837}} %%\the\Bc=1.00018pt
\Bc=1cm \setlength\Bc{\the\Bc / \real{28.452755}}         %%\the\Bc=0.99985pt
\Bc=1cm \setlength\Bc{\the\Bc * \ratio{254pt}{7227pt}}    %%\the\Bc=0.99985pt
\Bc=1in \setlength\Bc{\the\Bc / \ratio{7227pt}{100pt}}    %%\the\Bc=1.00018pt
\Bc=1IN \setlength\Bc{\the\Bc / \ratio{7227PT}{100pT}}    %%\the\Bc=1.00018pt

In LaTeX, there is a command called \stepcounter. Its effect is to increment a counter, and reset all counters that depend on it (see example below). There is also \refstepcounter whose purpose is to define the current label. This is not implemented in Tralics (see later for how \label works). The idea is that, for a counter `foo´, the printed value of the label is defined by `\p@foo\thefoo´. Here \thefoo is normally `\arabic{foo}´, but the quantity can be redefined. For instance, the book class has \renewcommand \thesection {\thechapter .\@arabic \c@section} (the article class has no chapter, and does not redefine \thesection). Both book and article classes say: \renewcommand\thesubsection{\thesection.\@arabic\c@subsection}.

Here we define some counters, and make them depend on other counters.

\newcounter{toto}          \setcounter{toto}{10}
\newcounter{titi}[toto]    \setcounter{titi}{20}
\newcounter{tata}[titi]    \setcounter{tata}{30}
\newcounter{tutu}[toto]    \setcounter{tutu}{40}

Here we call \stepcounter. The typeset result should be 11101=11101.

\stepcounter{toto} %  kills titi, tutu
\stepcounter{tata} %%% \thetata=31,
\stepcounter{titi} %% \thetata=0 % titi=1

The magic is accomplished by the following command:

\def\@addtoreset#1#2{\expandafter\@cons\csname cl@#2\endcsname {{#1}}}

The first argument is the counter to define (for instance `tutu´), and the second argument is the dependent counter (for instance `toto´). The \@cons command is defined like on page 2.3. It modifies the command \cl@toto by adding \@elt{tutu}. If you say \stepcounter{toto}, then LaTeX executes `\let \@elt \@stpelt \csname cl@#1\endcsname´. Here is a part of the transcript file of Tralics that shows what happens (you won´t see the \csname, because characters needed for \c@toto and \cl@toto are read and expanded only once by Tralics.)

[720] \stepcounter{toto}
\stepcounter->\global \advance \c@toto 1\relax {\let \@elt \@stpelt \cl@toto }
+scanint for \c@toto->1
{globally changing \count45=10 into \count45=11}
{begin-group character {}
+stack: level + 3 for brace entered on line 720
{\let \@elt \@stpelt}
{changing \@elt=undefined}
{into \@elt=\@stpelt}
\cl@toto ->\@elt {titi}\@elt {tutu}
\@elt->\global \c@titi 0\relax
+scanint for \c@titi->0
{globally changing \count46=20 into \count46=0}
\@elt->\global \c@tutu 0\relax
+scanint for \c@tutu->0
{globally changing \count48=40 into \count48=0}
{end-group character }}
+stack: killing \@elt
+stack: level - 3 for brace from line 720
[721] \stepcounter{tata}
\stepcounter->\global \advance \c@tata 1\relax {\let \@elt \@stpelt \cl@tata }
+scanint for \c@tata->1
{globally changing \count47=30 into \count47=31}
{begin-group character {}
+stack: level + 3 for brace entered on line 721
{\let \@elt \@stpelt}
{changing \@elt=undefined}
{into \@elt=\@stpelt}
\cl@tata ->
{end-group character }}
+stack: killing \@elt
+stack: level - 3 for brace from line 721

2.8. Fonts

One of the question we can ask is: what does \it do? As explained above, this is an unofficial command, thus could be implemented to do anything. Let´s assume that it is defined in LaTeX2.09 compatibility mode. It is then possible to explain what happens, but it is harder to explain what Tralics should do. A software like latex2html (that we studied carefully when implementing the first version of Tralics in Perl) uses a lot of energy in order to translate font changes properly. It is however very difficult to tell it that \french is a similar command (in fact, what we wanted is more than just finding the scope of the \french, we also wanted French syntax rules to apply, we wanted dashes instead of bullets in lists, etc.). In this paragraph, we shall explain all the gory details concerning fonts (however, look at [6] for what is in a .fd file).

One big table in TeX is the table of fonts: there are N fonts with N characters in them (currently N=256, and this is a small limit, in Ω, this value is 2 16 ; the dvi format specifies N=2 32 ). A book like [6] uses lots of fonts indirectly, via inclusion of PostScript files. Note that metric files designed for Ω cannot be read by TeX. The hyphenation algorithm considers as a word only sequences of characters from the same font (hence 256 characters per font is a hard limit). A metric file contains all that it needed for TeX to typeset a character; it does not contain glyphs. Essentially, it contains three tables, indicating for each character its height, its depth and its width. There are two other tables, the lig/kern table, and the kern table, that indicate, for instance in the case VA that some negative space should be used to make the characters narrower, and in the case of fi to use a single glyph instead of two. There is another table (useful only for math mode) that explains how to construct, for instance, braces of various sizes. Finally, there are some parameters. One parameter is the design size (the design size of a ten point font is 10pt), other parameters are the slant, the width of a space (this is glue), the two values of ex and em, and extra space. Math fonts have extra parameters, see [4, appendix G]. A font has two integer parameters: hyphen char, and skew char. These values are not in the metric file: when the font is loaded TeX uses the values of \defaulthyphenchar and \defaultskewchar. Note: Tralics does not read TFM files, it sets all parameters to zero.

You load a font by saying \font\preloaded=cmr7 scaled \magstep4 or \font\foo=cmr10 at 12pt. Such a construction will read a file cmr7.tfm or cmr10.tfm and apply a scale factor (a factor 2 in the first case, and 1.2, in the second case). A font like ecrm exists in size 5, 6, 7, 8, 9, 10, 10.95 (magstephalf), 12, 14.4, 17.28, 20.74, 24.88 (magstep 1, 2, 3, 4, and 5 respectively), 29.86 and 35.83. There are some slight differences between cmr10 at 12pt and cmr12 (see the TeXbook for details). You can simply say \font\tenrm=cmr10. After that you use it like this {\tenrm test}. This gives: test. You can use \fontdimen1\tenrm like any dimension. For instance, using \the to typeset the value, we get 0.0pt for the slant, 0.0pt plus 0.0pt minus 0.0pt for the interword space, 0.0pt for the ex-height, 0.0pt for the quad,(note: ) 0.0pt for the extra space. Parameters for the current font are: 0.0pt for the slant, 0.0pt plus 0.0pt minus 0.0pt for the interword space, 0.0pt for the ex-height, 0.0pt for the quad, 0.0pt for the extra space. If you say


you specify all font parameter, and you switch (from the font named `cmr10 at 10.0pt´) fontfamily Ucmrfontseries Umfontshape Unfontesize U10pt12ptselectfont Uto the default ten point font with T1 encoding, namely `cmr10 at 10.0pt´. The default font in this document uses `lmr´ as family. The parameters are now: 0.0pt for the slant, 0.0pt plus 0.0pt minus 0.0pt for the interword space, 0.0pt for the ex-height, 0.0pt for the quad, 0.0pt for the extra space. As you can see, they are not exactly the same. However, the glyphs are similar. The current font name can be printed via \fontname\font. If you read the XML version, all dimension is zero, and font names empty. For cmr10, the slant is 0, the interword space is 3+1/3pt plus 1+2/3pt minus 1+1/9pt, the ex height is 4.30554, the quad is 10.00002pt, the extra space is 1+1/9pt.

The commands shown above are provided by LaTeX. The effect of \selectfont is to take all values (stored by the other commands) and create a font name (say \tenrm for simplicity, see example below for a real name), check the font, and make it the current font. Printing a character like e-acute can depend of the encoding (in some cases the character is in the font, in other cases a combination of two characters is needed). As a consequence, checking the font means to inform some commands of an encoding change. In the example above, the quantity 10pt is the size of the font, but the value 12pt is the baseline skip, changing it means changing some other parameters (for instance the value of \strut). An important task of \selectfont is to associate to the font name \tenrm a real name (say cmr10) and call the \font commmand. The real name is computed according to rules defined in a font definition file, for instance t1cmr.fd, that depend only on the encoding and family; there are rules that say how to deal with the case where the desired series, shape or size are unavailable. All these commands are implemented in Tralics. The size and encoding is currently ignored. We shall describe below some commands that change the series and shape of the current font (for instance \bfseries, \itshape) that are easily related to parameters of \selectfont. Interpreting the argument of \fontfamily is a bit more complicated: for instance pcr is interpreted as cmtt (the name cmtt will be explained below, while pcr refers to a Courier font). There is another bunch of font commands, implemented in Tralics, that provoke an Unimplemented NFSS command error; for instance \DeclareTextAccent is a command that takes three arguments A, E and N, and says that accent A in encoding E is at position N in the font.

An important characteristic of a font is how glyphs are represented: For TeX, this is irrelevant, since the dvi file contains only the metrics. However, the reader will see some black and white pixels (of ink on a sheet of paper, or dots on a screen, or points on a wall projected by a beamer). All fonts designed by Knuth are produced by the metafont program that produces both the metrics and the glyphs as bitmaps (in the form of gf file, usually packed as pk files). If the resolution of these bitmaps is different from that required by the printing device, some interpolation, extrapolation is required (this is sometimes called `antialiasing´, it may involve colored pixels instead of black and white). In general, people print a dvi file by converting the first into PostScript format; in a PostScript or pdf file, a font can be specified via different formats, Type1, Type3, TrueType etc. The simplest format is Type3, namely bitmaps. Some software like Acrobat Reader prefer Type1 (a format in which characters are defined by small programs). There is no direct way to produce a Type1 file from a metafont file, so that not all TeX fonts exist in Type1. For instance, the computer modern fonts (in version OT1) have been translated but not the T1 version (said otherwise, cmr10 exists in Type1 format, but not ecrm1000). On the other hand, most commercially available fonts are not produced by metafont, hence cannot be used directly by TeX. In this document, we experiment the Latin Modern font family; it is very similar to Computer Modern.

In modern distributions, the engine behind LaTeX is pdfTeX, so that producing pdf instead of dvi is as easy; in this case, the engine needs the glyphs. Since it is no more restricted to informations found in the metric files, funny effects can be achieved. An extension of TeX, called XeTeX, produces spectacular results; as in the case of Ω, the result can be a variant of the dvi format, called xdv or odvi.

In the case of a format like plain TeX, fonts are used according to the following scheme. First you define fonts like \tenrm, in three sizes (thus, you define \sevenrm, \fiverm), and different variants (say \teni, \tensy, \tenex, \tensl, \tentt, etc). Then you say \textfont0=\tenrm, \scriptfont0=..., \scriptscriptfont0=...: this defines family zero. You do the same for family 1, 2, 3, etc. We shall see later how certain math symbols use a specified family, in other cases the family specified by the \fam variable is used (there are only 16 families available). The size of a symbol is defined by the current style (displaystyle, textstyle, scriptstyle, or scriptscriptstyle). Then you say \def\it{\fam4\tenit}. Thus \it has two effects: one is to switch to \tenit, the second one is to set \fam to 4. Now, you can define a command \twelvepoint that modifies all the fonts values, using larger values. Guess what happens for a definition like \def\it{\tenit\fam4}.

In the case of a format like LaTeX, the situation is different. There are some high level commands like \large, that are defined like \@setsize \large {14pt} \xiipt\@xiipt (note: infinite recursion may be possible), and the \xiipt command is like the \twelvepoint command mentioned above. This is rather complicated. The situation became worse when people tried to replace computer modern fonts by other fonts. We shall describe here only the user interface of the NFSS (new font selection scheme).

There is a clear distinction between \textit and \mathit: they are to be used in text mode or math mode only; the command \it chooses one of them. Guess how \mathit is defined. In fact, it switches to some family (the number is not hard-code as the 4 above), to which a font is associated. This may be OT1/cmr/m/it/10; an important point is that the size may vary (depending on the current math style of the current font size), but the encoding is fixed: if the current encoding is T1, a different font is used in lath mode and in text mode.

We already mentioned that a important characteristic of the font is the encoding: We met OT1 (Original encoding by Knuth) and T1 (“Cork” encoding, similar to latin 1). There is an obsolete OT2 encoding for cyrillic, and new ones: T2A, T2B, T2C. The companion mentions over twenty standard font encodings. In the example of \showbox above, TeX told us that the current font was \T1/cmr/m/n/10. The first two letters indicate the encoding. There are different families of fonts. Assume that you use Computer Modern fonts (you do this by selecting a package; after that, your whole document will be in computer modern, unless you use fonts selected via \font or \selectfont). There are six sub-families: Roman, Sans, Typewriter, Fibonacci, Funny roman, and Dunhill. The name of these families are: cmr, cmss, cmtt, cmfib, cmfr, cmdh. The default family in this document is cmr. You can chose another family via the commands \rmfamily, \ttfamily and \sffamily (no command is provided for the other families). The commands \textrm, \textsf and \texttt take an argument and typeset it using the family. The commands \rm, \sf, \tt do the same, but they reset the series to medium, and the shape to normal. The series of a font can be: bold, bold extended, semibold, etc. In LaTeX you have \mdseries and \bfseries (you have also \textmd and \textbf, which are commands that take an argument; you have also \bf that selects roman family, bold series, normal shape). The shape can be: normal, italic, slanted, upright italic, small caps, etc. In LaTeX we have \upshape, \itshape, \slshape, and \scshape (and as, usual, \textup, \textit, \textsl and \textsc; there is also \it, \sl, \sc). There are two commands \em (a declaration) and \emph (that takes an argument) that use upright shape if the current font has a slant, and italics shape otherwise. These rules explain the cmr/m/n part in the font. In fact, the `cmr´ part comes from the command \rmdefault, but these commands are not implemented in Tralics. The command \textnormal takes an argument as is the equivalent of \normalfont.

There are two parameters that define the size of the font. First, document class options indicate the size used by \normalsize. In our example it is 10pt. There are ten commands that change the font size. In increasing order they are \tiny, \scriptsize \footnotesize, \small, \normalsize, \large, \Large, \LARGE, \huge, and \Huge. There is a command \selectfont; its purpose is to combine everything, the result will be \T1/cmr/m/n/10. There is another process that converts this to the font name ecrm1000, using font definition files.

In math formulas, you see things like α ' and e ¨, but never `á´ and `ë´. If you want an acute accent you use \acute, if you want a double dot accent you say \ddot. In fact, the textfont used for math is very often a 7bit font, without accented letters. If you want x e `me you should say x$^{\grave{e}me}$, or perhaps x$^{\hbox{ème}}$ (this gives x ème , letters are too big). Note that Tralics may translate this as xe; if you do not like it, either set the notrivialmath counter to zero, or an an empty group in the formula before the hat. A solution is x\textsuperscript{ième}, xième. In French, you say 1er, 1re, 1ers, 1res, 2e, 3es, etc., via 1\ier, 1\iere, 1\iers, 1\ieres, 2\ieme, 3\iemes. In English, you say 1st, 2nd, 3rd, 4th.

2.9. Spaces

In TeX spaces are ignored after a command like \foo, and a sequence of spaces is treated as a single one. The exact rule is the following. There is a variable whose values can be N (start of line), or M (middle of line) or S (when spaces are skipped). Whenever a line is read, TeX removes every space character at the end of the line. It inserts the value of \endlinechar (provided this is a valid character, an integer between 0 and 255). The state in N. Spaces are ignored if the state is S or N; if the state is M, a space produces a space token, and the state is changed to S; in this sentence a “space” is any character whose category code is 10. If TeX sees an end-of-line character (category 5), it ignores all other characters on the current line. If the state is N (line was empty), the tokeniser returns a \par token, if the state is M it returns a space token, otherwise the character is ignored. Note: in Tralics, the space token produced by an end-of-line is a line-feed character, this is to keep line breaks in the XML translation. If TeX sees a backslash (or any character of category code 0), it reads a command; the state will be S if the character is a letter or a space, it will be N otherwise. If TeX sees anything else, the state will be M.

For instance, if you say `x␣{␣}␣␣␣{␣}y´ the tokeniser sees 5 spaces. If you say \def\A{␣} and \def\B{␣\A␣\A␣}, then the body of \A contains a space as well as the body of \B. Full expansion of \B contains three spaces and x␣\B\y contains four spaces. The command \space is defined like \A above.

Spaces discarded by the tokeniser do not appear in the translation. However, spaces produced by the tokeniser can be ignored in some cases. A typical example: a command can take a space as argument, and ignore the argument. For instance \\ is a command that ignores spaces that follow it using explicit scanning (i.e. \futurelet). We already mentioned that spaces between arguments are generally ignored. Spaces can be ignored because you say \ignorespaces: the effect of this command is to expand what follows, until a non-expandable token is seen. If it is a space, it is ignored, and the process continues. A space can be ignored because of a syntax rule (for instance, before an equals sign in an assignment). In LaTeX you can see things like that \end{x} \end{y} \end{z}, each `end(xxx)´ being on a line by itself: this produces a space, and the LaTeX environment mechanism is clever enough to remove these spurious spaces. It is also possible to remove a space from typeset material via \unskip.

Spaces are ignored in math mode. The reason is that spaces are used to separate words, and there are no words in math formulas. There are operators, and these operators know how much white space to use. In the case of x+y=z, on each side of the plus sign there is some glue, the value comes from \medmuskip, it is 2.22 plus 1.11 minus 2.22; on each side of the equals sign there is \thickmuskip, namely 2.77 plus 2.77 (the unit is pt).(note: ) After the zed, there is a kern of value 0.4398. Note: the plus sign is followed by a penalty of 700, the equals sign by a penalty of 500. Plain TeX defines

\medmuskip=4mu plus 2mu minus 4mu
\thickmuskip=5mu plus 5mu

In Tralics, constant values are used (expressed in terms of em units; one em is 18mu, in the example above one em is 10pt). You can say \:, \> and \;. This produces a space (thin, medium, thick) using the values given above. You can also use \!, this is the negative of thin space. The translation of $A\:B\>C\;D\!E$ is:

<mrow><mi>A</mi><mspace width='0.166667em'/>
      <mi>B</mi><mspace width='0.222222em'/>
      <mi>C</mi><mspace width='0.277778em'/>
      <mi>D</mi><mspace width='-0.166667em'/><mi>E</mi></mrow>

The \space command expands to a single space token. It may disappear in all cases where the syntax says that a space is optional (because in general these rules imply expansion); in a case like \let\foo\space, tokens are not expanded, and \foo is made equivalent to the current value of \space. The \␣ command cannot be expanded. It starts a paragraph (if used in vertical mode). It inserts some white space whose value is the same as if the current space factor were 1000. You can use it after an abbreviation like Mr. in order to indicate that the dot is not an end of sentence marker. You can also use it after a command like \TeX if you want to leave some space. In math mode, Tralics interprets it as a space of width 6pt. The ~ character is usually active, its expansion is \nobreakspace. This is defined in Tralics to translate to &nbsp;. You can say \quad or \qquad. This inserts some space (the width is one or two em). If you say \hskip 1cm, this will append some glue (in Tralics, it will generate a sequence of &nbsp; whose width is more or less 1cm). Note: in the current version, entity names are no more generated, hence &nbsp; is replaced by the Unicode character U+A0, and we assume the the width of this character is one forth of a quad. In math mode, both the tilde character and \nobreakspace will give 3.33pt; inside an URL, the result is a tilde character. If you say \kern1cm this will append a kern (like glue, but the size is fixed). This is ignored by Tralics. A normal space produces glue (the value of the glue depends on some font parameters; it can also depend on the current space factor). A glue may disappear at a line break. Kerns will not. In LaTeX, you use \hspace instead of \hskip. You can use \hspace*, in this case, spaces at start of line are not ignored. Note the syntax \hspace{2cm} vs \hskip2cm\relax.

A\space\space B\ \ C\quad\qquad etc
a\hskip2cm b\hspace{3cm}etc.

Translation is (we have replaced nobreak space by tilde)

<p>A  B  C~~~~~~~~~etc
<p spacebefore='56.9055pt'>y</p>
<p spacebefore='56.9055pt'>etc.

When TeX wants to split a paragraph into lines of equal width, it will have to stretch and shrink the glue that appears on the line; it will remove interword glue at break points. An item of glue has the form x+y-z, where x, y and z are dimensions (y and z can be expressed in terms of fil, fill and filll), all three values can be positive or negative. We can express this as: we have a vector of size 9: x 0 is the regular part of the glue, x 1 , x 2 , x 3 and x 4 are the stretch component (in units of pt, fil, fill, and filll, only one of these components can be given), x 5 , x 6 , x 7 and x 8 are the shrink components (in units of pt, fil, fill, and filll, only one of these components can be given). When two pieces of glue are added, all components are added. The convention is that x 2 is much larger than x 1 , so that the sum of x 1 and x 2 is x 2 (said otherwise if we add 1pt plus 2pt and 3pt plus 4fil, the result is 4pt plus 4fil). Such simplifications are not done when TeX computes the sum of all glue items in a paragraph (as a result, addition is associative). The command \hfil is equivalent to \hskip0pt plus 1fil, the command \hfill is equivalent to \hskip 0pt plus 1fill, the command \hfilneg is equivalent to \hskip 0pt plus -1fil, the command \hss is equivalent to \hskip 0pt plus 1fil. It is an error to use infinite shrinkage, like \hss, in a paragraph, TeX complains with: Infinite glue shrinkage found in a paragraph. However you can say 123\hbox to1cm{\hss xxxxxxx\hss}456, the result is 123xxxxxxx456, said otherwise, the text is centered, no overfull neither underfull box is signaled.

The commands \vfil, \vfill, \vfilneg, \vss, behave in the same fashion, in vertical mode, adding vertical space. Tralics translates \hfil, \hfill, \hfilneg, and \hss as \leavevmode followed by an element <hfil>, that has the same name as the command. It translates \vfil, \vfill, \vfilneg, and \vss in the same fashion, by using \par instead of \leavevmode. The three commands \bigskip, \medskip and \smallskip are used to insert vertical space between paragraphs, of size 12pt, 6pt and 3pt respectively (in LaTeX, this is some glue that the user can modify, however, Tralics ignores the shrink and stretch parts of the glue inserted by \hskip, \vskip, \hspace and \vspace.) These four commands read an argument (in LaTeX, \hspace and \vspace accept an optional star, that translates to an empty vertical or horizontal rule, Tralics ignores the star). In the case of an horizontal space, \leavevmode is executed, then ~ are produced (one every 4 pt, a negative dimension produces nothing). In the case of a vertical space, the current paragraph is terminated; if after that the mode is vertical, a new paragraph is started, it has an attribute spacebefore with as value the dimension. In LaTeX, the behavior is different (see appendix A.1.5 of [6]). In math mode, you can also use \mskip and \mkern, these command use mu as unit, where 18mu is one em. Since Tralics does not know the value of an em, it uses 10pt, so that the dimension is first divided by 18, then multiplied by 10. Example

c\bigskip d\smallskip e\medskip f
$\mskip3mu\mkern2mu \mskip 18mu$

Translation is

<p spacebefore='12.0pt'>d</p>
<p spacebefore='3.0pt'>e</p>
<p spacebefore='6.0pt'>f
<formula type='inline'><math xmlns=''>
<mrow><mspace width='1.66656pt'/><mspace width='1.111pt'/>
<mspace width='10.0pt'/></mrow></math>

In TeX, there is no command that starts a paragraph. The \leavevmode command is implemented as \unhbox\voidb@x, where \unhbox starts a new paragraph if needed, and produces nothing, provided that its argument is the void box; the paragraph may contain the current indentation and the value of \everypar. This is a primitive in Tralics, the value of \everypar is unused. Both commands \indent and \noindent make sure the current mode is horizontal, the first one inserts the current indentation (an empty box with the width of \parindent). In TeX, you can use \indent anywhere in a paragraph. In Tralics, the translation of

a\noindent b \indent c
{\centering a\noindent b \indent c\par d}
{\raggedright a\noindent b \par\indent c\par d}


<p noindent='true'>b</p>
<p rend='center' noindent='false'>c
<p rend='center'>b</p>
<p rend='center'>c</p>
<p rend='center'>d
<p noindent='true' rend='flushed-left'>b</p>
<p noindent='false' rend='flushed-left'>c</p>
<p rend='flushed-left'>d</p>

The rules are the following: if \indent or \noindent appear in an empty paragraph, that is not centered, and that has no noindent attribute, one is set. Otherwise a new paragraph is started. It will have a noindent attribute, unless the paragraph is centered. The value of \parindent is never considered.

The translation of \par is a bit complicated. Nothing happens inside a \hbox, in \term(note: ), or if the current mode is not horizontal. The current XML element should be a <p>. A final space is removed from it. It will be popped from the stack. This restores the mode to the value of the previous mode. It restores the current XML element to the parent of the <p>. A newline character is added to it. There is an exception: in cases like \noindent\par, or \bigskip\par, or \\\par, the \par command was ignored until version 2.5 (pl7). The behavior is now: if the paragraph is empty, but there are attributes, then the <p> is removed, and attributes are added to the next <p> element.

The translation of \\ depends on the context. The command can be followed by an optional star, and an optional dimension. Inside a cell, this indicates the end of the cell as well as the the end of the row. You can say \newline, this is like \\ without optional argument and array test. In vertical mode, LaTeX complains with There´s no line here to end, but Tralics ignores the command. Inside a title, the command is ignored. Otherwise, the behavior is like \noindent; if an optional argument is given, it behaves like \vskip. For instance, the translation of

a \\b \\[2cm] c \newline[3cm]d \noindent e \vskip 4cm f


<p noindent='true'>b</p>
<p noindent='true' spacebefore='56.9055pt'> c</p>
<p noindent='true'>[3cm]d</p>
<p noindent='true'>e</p>
<p spacebefore='113.81102pt'>f</p>

Many people do not know that \\ takes an optional argument, and try to use different tricks in order to avoid errors triggered by \\\\. We have seen for instance


Remember that \protect is like \noexpand, it is not a LaTeX command that takes an argument! More strange cases can be found in [3].

The commands \nolinebreak, \nopagebreak, \pagebreak, and \linebreak are defined by LaTeX to take an optional argument, an integer between 0 and 4. They insert some penalty, but depend on the mode, like \hspace and \vspace. They are ignored in Tralics. The command \break, \nobreak, and \allowbreak, are defined by LaTeX, they insert some penalty(zero, plus or minus infinity). They are ignored by Tralics. The commands \fillbreak, \goodbreak, \eject, \smallbreak, \medbreak, \bigbreak are defined by LaTeX to terminate a paragraph and insert some penalty. In Tralics, they behave like \par. Note. The last chapter of the second part of this document explains that, when converting XML to Pdf, special rules must be used when hyphenating URLs: ambiguities can be avoided when text is split a slashes. For this reason, Tralics inserts a <allowbreak> element in these cases, and when the command \allowbreak is used as well.

2.10. Conditional expansion

In the previous paragraphs we have shown how to define a macro `\foo´ that expands to `\bar´ and a macro `\bar´ that expands to `gee´.(note: ) Can a translator replace all \foo by \bar and all \bar by gee? the answer is obviously no; first because, if you say `\something\bar´, the argument will be (after expansion) `gee´, while in the case of `\something gee´ it will be `g´; there is a second problem, that occurs in latex2html: if you replace `\bar´ by its value, you get `\somethinggee´, and this is wrong, if you reparse it(note: ); some commands can be randomly redefined (for instance, at first use) like this:

   \def\NFSS{NFSS (New Font Selection Scheme)\global\def\NFSS{NFSS}}

The last reason is conditional expansion. Our original translator (written in Perl) has some troubles in these cases.

In this section, we shall consider cases where expansion depends on the context. We have already seen the commands \noexpand for delayed execution and \expandafter that changes the order of expansion, in section 6.12 we will describe \protected which inhibits expansion in a \edef. We shall analyze three commands: \Color, \Map and \Loop.

2.10.1. Constructing commands dynamically

Using colors in TeX is not completely trivial, one reason is that there are different color models, more or less adapted to the task (printing on paper, on transparencies, or using a video projector). The color package proposes


Note that the brace character that indicates the start of the body of \textcolor is preceded by a sharp sign. This means that the argument of the command is everything before the brace. In a case of \textcolor {green} {text}, it is empty. The \color command takes two arguments (the color model, empty in the example, and the color); it changes the current color, which is magically restored at the end of the group. One of the reasons why colors are not implemented in Tralics is also the scope of the command is unclear. Assume that we have two commands `\enrouge´ and `\envert´ that take an argument and typeset it in red and green; they could be defined as


We explain here how to solve the following problem. We want to define a command \Color that takes two arguments, a color and text; if the color is `rouge´ or `vert´ it should call \enrouge or \envert. Otherwise, some default action is specified (an error could be signaled, in the following, we assume that the color should be ignored). One solution to this problem uses tests, as explain in the next section. This means that we have to change the macro if a new color (for instance `\enbleu´ for blue) is added to the list. The following works

    \def\color#1{\csname en#1\endcsname}

The only drawback with this method is that it might produce unexpected results in the case where the command defined by \csname already exists (try `\color{d}{document}´).

There are many commands that use \csname. The problem mentioned above can be avoided if the command contains a non-letter character. For instance, when the counter foo is defined, the command \p@foo is created, and this command is used whenever the counter is printed. No package should define commands starting with p@. In some cases the construction can be

\csname\string\color @#1\endcsname

This constructs a command with a backslash in its name, and can be created only via \csname, thus offers a good protection.

2.10.2. Iterating over lists

In this paragraph we explain how to apply a command to all items in a list. The list could be defines as follows


The last line is an example of CVS (comma separated values). The LaTeX command \@for can be used to apply a command to every item, and \@tfor should be used in the second case. Here is an example.

\def\List{}\def\thelist{12,3,4,5,6} % list is expanded here

We give here the transcript file produced by Tralics. The same algorithm is used as in LaTeX. Arguments of \@for are respectively an element name, the colon-equal separator, the list to work on (it will be expanded), the \do-separator, and the code to be applied. On lines 4 and 5 you see the expansion: there is a call to \@forloop, taking as arguments the expanded list where two dummy items have been added, the end marker \@@, the element name and the code. The command is optimised in teh case where the list is empty, or has a single element; in the general case, you will see assignment of \Elt (lines 7-8) and expansion on lines 9, 10, 11. Note that \@iforloop is used; you can see on lines 36, 37, 38 the expansion of \@iforloop, which is a simple recursive function. Other assignment of \Elt can be seen on lines 22 and 34. On lines 47 and 48 you can see the expansion of \@break@tfor. What you do not see is that this command gobbles all tokens inserted by \@for and friends (namely, everything up to the \@@ token, the element name and the code). Caveat: the expansion of the LaTeX command with the same name is a double \fi.

1 [6] \Lfor\Elt:=\thelist\do{\edef\List{\List\Elt}\if\Elt4\BreakTfor\fi}
2 {\@for}
3 \thelist ->12,3,4,5,6
4 \Lfor<- \@forloop 12,3,4,5,6,\@nil ,\@nil \@@ \Elt {\edef \List {\List \Elt }
5 \if \Elt 4\BreakTfor \fi }
6 {\@forloop}
7 {changing \Elt=undefined}
8 {into \Elt=macro:->12}
9 \@forloop<- \edef \List {\List \Elt }\if \Elt 4\BreakTfor \fi \def \Elt {3}
10 \edef \List {\List \Elt }\if \Elt 4\BreakTfor \fi \@iforloop 4,5,6,\@nil
11 ,\@nil \@@ \Elt {\edef \List {\List \Elt }\if \Elt 4\BreakTfor \fi }
12 {\edef}
13 \List ->
14 \Elt ->12
15 {changing \List=macro:->}
16 {into \List=macro:->12}
17 +\if1
18 \Elt ->12
19 +iftest1 false
20 +\fi1
21 {\def}
22 {changing \Elt=macro:->12}
23 {into \Elt=macro:->3}
24 {\edef}
25 \List ->12
26 \Elt ->3
27 {changing \List=macro:->12}
28 {into \List=macro:->123}
29 +\if2
30 \Elt ->3
31 +iftest2 false
32 +\fi2
33 {\@iforloop}
34 {changing \Elt=macro:->3}
35 {into \Elt=macro:->4}
36 \@iforloop<- \edef \List {\List \Elt }\if \Elt 4\BreakTfor \fi \relax
37 \@iforloop 5,6,\@nil ,\@nil \@@ \Elt {\edef \List {\List \Elt }\if
38 \Elt 4\BreakTfor \fi }
39 {\edef}
40 \List ->123
41 \Elt ->4
42 {changing \List=macro:->123}
43 {into \List=macro:->1234}
44 +\if3
45 \Elt ->4
46 +iftest3 true
47 {\@break@tfor}
48 \BreakTfor<- \fi
49 +\fi3

2.10.3. Mapping a command

We consider here the following task. We have a list, like \mylista above, and we want to apply a command, say \foo to every element of the list. The solution we propose here is faster than the previous one; remember that, in the case of \@for, for every element, the unread part of the list, together with five additional tokens, the element name and the body, all these tokens are read, and pushed back in the stream. Our solution is as simple as


Then we say \Map\textit\mylista. This produces A1B2C3. This is a however a bit unsatisfactory: in some cases the list delimiter is different from \do, an example is given above: at the start of a chapter, we want to reset all counters that depend on the chapter counter, in this case \@elt is used as delimiter. We could imagine a map-with-argument macro, that would take as argument the \do. But this is nothing else than \let! Our definition is so simple that people just say `\let\do\@makeother\dospecials´, see for instance 2.12. The difference between the first two versions of \Map is that the second command takes arguments, hence removes an additional level of braces. If you say \Map\foo{\do{A}}, the command \foo is executed in a group in the first case (and it is a mistake to put braces around it).

In the third case, the first argument #1 can consist in more than one token. For instance, if you say \def\foo#1#2{␣#1#2␣} then \Map{\foo A}\mylista gives ` AA1 AB2 AC3 ´. Note that there are too many spaces in this example: the last space in \foo is spurious.

2.10.4. Creating a list via pattern matching


% \newcommand{\fooiv}[3][bar]{Seen #1 #2 #3}
\def\fooivaux[#1]#2#3{Seen #1 #2 #3}

The commented line is interpreted by LaTeX in the same fashion as the two other lines (except that the internal name is a bit more complicated than `\fooivaux´). We shall explain later how `\@ifnextchar´ works(note: ). We are interested here in how LaTeX converts the `[3]´ into `[#1]#2#3´. Since the number of arguments is between zero and nine, a short sequence of conditionals could be used. Instead, the following code is used by LaTeX:

1 \long \def \@yargdef #1#2#3{%
2   \ifx#2\tw@
3     \def\reserved@b##11{[####1]}%
4   \else
5     \let\reserved@b\@gobble
6   \fi
7   \expandafter
8     \@yargd@f \expandafter{\number #3}#1%
9 }
10 \long \def \@yargd@f#1#2{%
11   \def \reserved@a ##1#1##2##{%
12     \expandafter\def\expandafter#2\reserved@b ##1#1%
13     }%
14   \l@ngrel@x \reserved@a 0##1##2##3##4##5##6##7##8##9###1%
15 }

In the case of `\newcommand\fooiii[3]{foo}´ the \@yargdef command is called with three arguments, the first is \fooiii, the command to be defined, then comes `\@ne´ (some randomly chosen token), then `3´ (the number of arguments) and finally `{foo}´, the body of the command to be defined. This argument is not read, but the code relies on the fact that it starts with an opening brace. The objective is to produce `#1#2#3´. In the case of `\newcommand\fooi{foo}´, arguments are the same with 0 as third argument, the objective is to produce the empty string. In the case of \fooiv, the second argument is `\tw@´, this is something different from `\@ne´, the objective is similar, but a bit different: we want `[#1]#2#3´.

In order to make things easier to understand, we shall proceed to the following simplifications: let´s forget about the percent signs (their purpose is to suppress unwanted space). Let´s forget about `\long´ (is it really needed?) and `\l@ngrel@x´ (this is something that adds conditionally a `\long´ token before the definition). Let´s simplify the names: we write `\Ra´ and `\Rb´ instead of `\reserved@a´ and `\reserved@b´. We also write `\ydef´ and `\yaux´ instead of `\@yargdef´ and `\@yargd@f´. Finally, we replace the arguments by X, Y, Z, and `##´ by a simple `#´. Hence we get

\def \ydef XYZ{
  \ifx Y\tw@
  \expandafter \yaux \expandafter{\number Z}X
\def\yaux XY{
  \def\Ra #1X#2#{\expandafter\def\expandafter Y\Rb #1X}
  \Ra 0#1#2#3#4#5#6#7#8#9#X

Let´s start the analysis with the lines 7 and 8. Because of the two `\expandafter´ tokens, the first token to be expanded is `\number´. This means that Z is replaced by its numeric value. Said otherwise, the number of arguments can be `03´, or ``\^^C´, or even `\value{page}´ if the page number if not too big. In Tralics, only explicit numbers are allowed (You will get a message like Only one token allowed; I will assume that the command takes no argument.) In general, lines 7 and 8 are equivalent to \yaux{Z}X.

Let´s now explain lines 2 to 6. We are in a simple case of a conditional (the commands \@ne and \tw@ are normally equivalent to 1 and 2, they compare unequal), so that line 3 is executed in case of an optional argument, and line 5 otherwise. In the last case \Rb is a command that takes an argument and ignores it; otherwise \Rb is a command that takes an argument, delimited by the character `1´, ignores it, and the expansion is `[#1]´ (four tokens). Remember that we want `[#1]#2#3´, that is a good starting point.

Consider now lines 11 and 12. In order to simplify explanations, we replace X by Z and Y by X (i.e. use the argument names of the outer function). We shall denote by U and V the arguments of \Ra. Thus \Ra is

    \def\Ra UZV{\expandafter\def\expandafter X\Rb UZ}

The question now is what are the values of U and V? In order to answer this question we shall write line 14 in a different way. Let s(n) be the sequence #1#2....#n#, and S(n) the sequence #n...#9. The content of line 14

    \Ra 0#1#2#3#4#5#6#7#8#9#Z

can be interpreted as \Ra, 0, s(n-1), n, S(n+1), #Z, whenever n is a digit between 1 and 9. Said otherwise, whenever Z is a digit between 1 and 9, the first argument U of \Ra is 0s(n-1) (the second argument is ignored, it is everything up to the first brace, the one that delimits the body). Obviously, in the case where Z is the digit 0, U is empty. We leave it as an exercise to the reader to see what happens in the case where Z is a sharp sign(note: ). In all other cases, U is the sequence 0s(9). The important point is that, whatever Z, TeX will not read beyond the opening brace of the body.

Assume now that we want to construct a normal command (case \Rb is gobble). It always gobbles a zero (if Z is zero, U is empty, and Z is gobbled). Thus \Rb UZ expand to: nothing if Z is 0, s(n-1)n if Z is a digit between 1 and 9, and #1#2#3#4#5#6#7#8#9#Z otherwise. This yields an error You already have nine parameters which is adequate in case Z is a number larger than nine. Consider now the case of an optional argument. Here \Rb is a bit different: it reads the `0#1´ part and replaces it by `[#1]´. You will get a Runaway argument? error (or some other strange behavior) in case Z is `0´ because pattern matching fails (of course, you should never try to make optional the first argument of a function that takes none).

2.10.5. A variant of the previous problem

In the previous paragraph we have shown how to convert an integer, say 3, into a sequence #1#2#3. One trouble with sharp signs is that you have to double them, and if you define a command in the body of the other one, they must be doubled again. Thus we state our problem as: given an integer N between 1 and 9, construct \sharp1\sharp2...\sharpN. After that, we can evaluate the `\sharp´ command(note: ), replacing it by `#´. One solution (original LaTeX code) uses a loop from N down to Y (with Y=1 in the case of normal argument, Y=2 otherwise(note: )). Some variants will be discussed later on. The current LaTeX code uses pattern matching, as explained above, this leasa toi the following solution

  \def\Sharp{########}% needs 8 #
\ydef{\acmd}{3}{\string \acmd\space called with #1, #2, and #3.}
Test \acmd A{BC}D.

Test \acmd called with A, BC, and D..

2.10.6. Loops

A silly question is: can we do loops without conditionals? The answer will be given later. We assume here that our loop will be of the form: while N is not too big, do something and increment N. This mechanism needs modifying a table (the location of N) hence is not pure expansion. In our example, we will write `\sharp\the\count0´, and hope that this will evaluate to `#3´ later on, assuming that \count0 contains `3´ now. How that can be implemented is left as an exercise to the reader. See also section 2.11.1. We shall explain later all the silly details concerning conditionals in TeX, all we need to know is that you can test a<b and a>b, but neither ab nor ab. Here is our code:

\def\code{\advance\count0 by 1 \sharp\the\count0}
\def\Loop{\ifnum\count0<\count1 \code\Loop\fi}

Assume that \count0 holds 0 and \count1 holds 3. In this case the test is true, `\code´ is evaluated, then `\Loop´. The effect of evaluating `\code´ is to increment the counter and produce `\sharp1´. The loop terminates after `\sharp3´ has been produced. Notice that recursion is not terminal (but it would be in most computer languages): when the test is found false, there are four `\if´ tokens not yet evaluated. This example is atypical, in that the counter is modified before its use; exchanging the `\sharp...´ and the `\advance...´ part implies changing initial and final value (1,4 instead of 0,3).

Our \Loop command is not generic, in that the name of the counters are built-in. Thus Knuth proposes the following:

\def\iterate{\body \let\next\iterate \else\let\next\relax\fi \next}
\let\repeat\fi % this makes \loop...\if...\repeat skippable
\loop \ifnum \count0<\count1 \sharp \the\count0 \advance\count0 by 1\repeat

Note that the last line contains an `\ifxx´ where the associated `\fi´ is the `\repeat´ at the end of the line. Thus, in the case where the `\loop´ command is not expanded this line is well-balanced regarding conditionals. In the case where `\loop´ is expanded, the value of the `\repeat´ token is irrelevant, it just serves as delimiter, and the `\fi´ has to be found in \iterate. In order for \iterate to work, the `\body´ should expand to an incomplete conditional, without \else part. It conditionally sets \next, and evaluates it after \fi; this trick makes the recursion terminal.

An alternate version is given by LaTeX, as follows(note: )

  \iterate \let\iterate\relax}

Adding `\let\iterate\relax´ at the end of the definition has no real importance; but it causes no harm either. Note the \expandafter trick: if the test in the loop is false, neither \expandafter nor \iterate are expanded, if the test is true, \fi is evaluated before \iterate. Thus recursion is terminal. One difference with the TeX method is that the body of the loop is put in \iterate rather than in a auxiliary command. The interesting point is the `\relax´. Guess what happens in this case:

\count0=0 \count1=4
\bloop \ifnum \count0<\count1 \the\count0 \advance\count0 by 1\repeat

If you use LaTeX in verbose mode, you can see that the test is true, true and false, where you expect it to be true four times. The printed result is `0´ (hence the question: what did the second iteration do?). Using Tralics, you will get more information.

1 +\ifnum6
2 +scanint for \count->0
3 +scanint for \ifnum->0
4 +scanint for \count->1
5 +scanint for \ifnum->4
6 +iftest6 true
7 {\the}
8 {\the \count}
9 +scanint for \count->0
10 \the->0.
11 Character sequence: 0.
12 {\advance}
13 +scanint for \count->0
14 {\expandafter \iterate \fi}
15 +\fi6
16 \iterate ->\ifnum \count 0<\count 1 \the \count 0 \advance \count 0 by 1\expandafter \...
17 +\ifnum7
18 +scanint for \count->0
19 +scanint for \ifnum->0
20 +scanint for \count->1
21 +scanint for \ifnum->4
22 +iftest7 true
23 {\the}
24 {\the \count}
25 +scanint for \count->0
26 \the->0.
27 +scanint for \count->10
28 {\advance}
29 +scanint for \count->0
30 {\expandafter \iterate \fi}
31 +\fi7
32 \iterate ->\ifnum \count 0<\count 1 \the \count 0 \advance \count 0 by 1\expandafter \...
33 +\ifnum8
34 +scanint for \count->0
35 +scanint for \ifnum->10
36 +scanint for \count->1
37 +scanint for \ifnum->4
38 +iftest8 false
39 +\fi8

Lines 16 and 32 are a bit too long; there are two token \iterate\fi that are replaced by \...

As you can see, all tests have a serial number. On lines 2–5, you can see why the first test is true: it is because the numbers 0 and 4 are compared. On lines 18–21, you see why the second test is true, and on lines 34–37, you see why the last test is false; in fact, \count0 contains ten. On line 27, you see something strange. Explanations: Assume that you say `\advance \Foo4´, where \Foo is a reference to some counter. In this case, the trace of Tralics will contain +scanint for \Foo->4, and everybody understands this. If you replace \Foo by \count0, the trace will contain \count; it will also contain a line for the zero in \count0. Hence, the number that appears in line 27 is the value read by the \advance in line 12. What happened is the following: after `by´ we have seen the digit `1´. In the case of \loop, the next token would be `\relax´, and this stops scanning of the number. But here, we have `\expandafter´, which is expandable and expanded, as a consequence, this finishes the first conditional. After that comes the test; it is true, because we did not increment our counter yet. Then comes `\the´ which is expandable. This reads `\count0´, as well as the space after it. The expansion of `\the...´ is the digit zero; so far, we have read 10, and continue reading. The next token is `\advance´ and this is not expandable. Hence \advance has read everything up to the next \advance. Is it needed to explain what happens next? Let´s just notice that, at line 39, Tralics (and also TeX) are still reading tokens for the second \advance. Since version 2.9, Tralics prints an additional line, between line 27 and 28, of the form \count0 changed fro 0 to 10.

2.11. Conditionals in TeX

We shall discuss in this section the following commands

2.11.1. Syntax of the conditionals

A conditional has the form \if test true-code \else false-code \fi. The \else part is optional; conditionals can be nested, and this nesting is independent from anything else. The command \unless (provided by ϵ-TeX) can be used before the if-like command (except \ifcase), its effect is to reverse the truth value of the test. Conditionals are expanded: this means that conditionals are evaluated inside a \edef, you can use \noexpand to delay evaluation, and \expandafter to change the order of expansion.

An important point is the following: if you define a command \ifthenelse with three argument, that evaluates the first argument as a boolean, and expands conditionally to the second or third argument, then these two arguments must be balanced, and category codes are fixed. In the case of \if, there is no such limitations: if the test is found false, then all tokens are read at high speed until finding a `\else´, and normal processing occurs, or until finding a `\fi´, that indicates the end of the conditional; if the test is true, and if there is an \else part, all tokens between `\else´ and `\fi´ are read at high speed. Consider for instance this piece of code

    \ifnum \A=\B do-nothing \else {\let\fi\relax\C}\fi

Assume that the test is false; this means that the else part is evaluated. Locally `\fi´ is redefined to do nothing, and `\C´ is evaluated. Let´s assume that `\C´ does nothing special (it could typeset `Hello, world!´). In this case the `\fi´ after the brace terminates the conditional. Assume now that the test is true. Skipping over the \else part at high speed just means compare the actual value of a token with `\if´ or `\fi´: in the first case, the if-counter is incremented, in the second case it is decremented, in all other cases the counter is left unchanged; reading stops when the counter is zero. Here, the conditional is terminated by the first `\fi´. This means that you have to be very careful: the end of the conditional can change, depending on whether the test is true or false. When we say: `compare the actual value of the token´, this means that the name is irrelevant, only the meaning is used, for instance `\repeat´ has the same value as `\fi´, and \loop...\if...\repeat is well balanced.

All constructions indicated above have a then-part and an else-part, except \ifcase: this command reads a number (see section 2.6 for details) and you can specify action for the case zero, the case one, the case two, using \or as separator, and an optional \else for other cases. Any other use of the \or command wil signal a Extra \or error. For instance, we can solve the problem of constructing \sharp1...\sharp N as follows (assuming `\N´ holds the value of N)

\ifcase \N \error{You cannot use zero here}
 \or \sharp1
 \or \sharp1\sharp2
 \or \sharp1\sharp2\sharp3
 \or \sharp1\sharp2\sharp3\sharp4
 \or \sharp1\sharp2\sharp3\sharp4\sharp5
 \or \sharp1\sharp2\sharp3\sharp4\sharp5\sharp6
 \or \sharp1\sharp2\sharp3\sharp4\sharp5\sharp6\sharp7
 \or \sharp1\sharp2\sharp3\sharp4\sharp5\sharp6\sharp7\sharp8
 \or \sharp1\sharp2\sharp3\sharp4\sharp5\sharp6\sharp7\sharp8\sharp9
 \else \error{Argument must be non-negative, at most nine}

The simple conditional `\if AB ... \else ... \fi´ compares two characters A and B, it shares some features with \ifcat. It expands tokens, using the following rules

The command \if compares the two numeric codes, and \ifcat compares the category codes. If you say something like

\catcode `\A=3
\catcode `\A=11
\if\fooi\fooii H\fi \ifcat\fooi\fooii\else e\fi
\if\bgroup{l\fi \ifcat\egroup}l\fi \if\relax\par o\fi
\if01\else,\fi \ifcat01 w\fi \if\par1\else o\fi
\if\noexpand\fooii\relax r\fi \if\fooii Ald\fi \if!!!\fi

this should typeset as `Hello, world!´.

You must be very careful using a construction like `\if\A\B...´, because of the following

Plain TeX provides an \outer macro(note: ) \newif that takes an argument \iffoo (whose name starts with the two letters if) and makes it a new conditional; the ifthen package provides the more LaTeXish syntax \newboolean{foo}. This means that \iffoo true-code \else false-code \fi becomes valid, and evaluates false-code. You can say `\footrue´ and the condition becomes true (true-code is evaluated) or `\foofalse´ and it becomes false (false-code is evaluated). The \global prefix is allowed before the command. The ifthen package provides \setboolean{foo}{true} where the second argument is case insensitive. These commands could be implemented as


The trouble with this definition is that, when `\iffoo´ is read at high speed, it is not recognized as a conditional (it is a user defined command), see discussion about `\ifhph´ in [4, Chapter 20]. For this reason, the commands \iftrue and \iffalse were added to TeX, they evaluate respectively to true and false, and the following lines work (because \let is used instead of \def):


You can use `\ifnum´ or `\ifdim´: in both cases a numeric quantity, an operator, and another numerical quantity are read. Three operators are recognized: less than, greater than and equal to. In the case of `\ifnum´, both quantities have to be numbers, otherwise dimensions. Note that glue is converted to a dimension (and possibly a number), by ignoring the shrink and stretch part. If you want to compare two items of glue, you must split them into components and check them in order. The example that follows shows also that math glue must first be converted into ordinary glue. All the commands shown here are fully expandable; without the \relax, this piece of code gives three errors (and TeX is still trying to see if the `fill´ is not a `filll´).

\muskip0=36mu plus 18mu minus 1fill\relax

Here is an example that uses no extension.

\count0=0 \count1=1 \dimen0=1pc \dimen1=12pt
\skip0=1cm minus3fill  \skip1=1mmplus 2fill
\ifnum \count0<\count1
  \ifdim \dimen0=\dimen1
    \ifdim \skip0>\skip1 ok \fi\fi\fi

This is the trace of Tralics. Note that for LaTeX, all lengths allocated by \newlength are “rubber” length, i.e. associated to a \skip register. Such quantities are automatically converted into rigid length (however, if you replace in the example `\skip1´ by `1mmplus 2fill´, then only a rigid dimension is read, the `plus 2fill´ is not part of the condition).

+scanint for \count->0
+scanint for \ifnum->0
+scanint for \count->1
+scanint for \ifnum->1
+iftest26 true
+scanint for \dimen->0
+scandimen for \ifdim->12.0pt
+scanint for \dimen->1
+scandimen for \ifdim->12.0pt
+iftest27 true
+scanint for \skip->0
+scandimen for \ifdim->28.45274pt
+scanint for \skip->1
+scandimen for \ifdim->2.84526pt
+iftest28 true

This is one solution to our problem of producing N sharp signs in a row:

\ifnum \N>0 \sharp1\fi\ifnum \N>1 \sharp2\fi\ifnum \N>2 \sharp3\fi
\ifnum \N>3 \sharp4\fi\ifnum \N>4 \sharp5\fi\ifnum \N>5 \sharp6\fi
\ifnum \N>6 \sharp7\fi\ifnum \N>7 \sharp8\fi\ifnum \N>8 \sharp9\fi

The following construction is a priori more efficient (on the average there are less tests) but it takes more memory.

\ifnum \N>0 \sharp1\ifnum \N>1 \sharp2\ifnum \N>2 \sharp3%
\ifnum \N>3 \sharp4\ifnum \N>4 \sharp5\ifnum \N>5 \sharp6%
\ifnum \N>6 \sharp7\ifnum \N>7 \sharp8\ifnum \N>8 \sharp9%

You can test whether a character can be read from an input channel, via the \ifeof command. Here is an example from the Tralics torture file. The file tortureaux.tex has six lines, the first one contains abc, the second one is empty, the third one contains \a \b {\c, the fourth one contains {} \d} \e, the next one contains 123, the last one is empty. The \testeq commands compares two commands: things should be equal here. (See TeXbook, exercise 20-18, if you do not understand the setting of \endlinechar). Commands starting with `bad´ are not evaluated in this example. Details can be found in section 5.12.

\openin 5=tortureaux
\ifeof5 \badifeofatentry\fi
\read 5 to \foo\testeq\foo{abc}
\read 5 to \foo\testeq\foo{}
\read 5 to \foo\testeq\foo{\a\b{\c{} \d} \e}
\global\read 5 to \foo
\ifeof3\else \badifeofnonexists\fi

You can say \ifvoid25, \ifhbox25 or \ifvbox25. In TeX these command would test the content of box register 25: if empty, the \ifvoid is true, the other tests are false; if not empty, the box contains a horizontal list or a vertical list, and \ifhbox and \ifvbox are respectively true, the two other tests being false. In Tralics, a box contains a character string or an XML element, but there is no associated orientation; hence \ifhbox and \ifvbox always evaluate to false. Instead of 25, any number can be given (provided it is a valid register number) In the example that follows, only the first equals sign is part of an assignment, and box number one is tested.


You can say \ifmmode, \ifvmode, \ifhmode and \ifinner. These commands check the current mode. The first three evaluate to true if the mode is math mode, vertical mode, or horizontal mode. The last is true if the mode is inner (internal vertical mode, restricted horizontal mode, or (nondisplay) math mode). The following example shows these modes.

\def\wm{\edef\res{\ifinner i\else I\fi
   \ifhmode h\else H\fi
   \ifvmode v\else V\fi
   \ifmmode m\else M\fi}\res}
\par \wm$$\wm \hbox{\wm $\wm$} \eqno \wm$$

The result is: `IHvM IHVm ihVM iHVm iHVm´. If you remove the `\edef´, the trouble will be that typesetting the `I´ enters horizontal mode. This example fails if `$$...$$´ is replaced by `\[...\]´, because \eqno switches to inner math mode, and `\]´ checks for outer math. The same test provokes an error in Tralics, because of the implemenation of \eqno, that expands all tokens, including the token that follows \edef. Tralics knows whether is is in or out of math mode; in math mode it knows whether it is in display math or not. In these cases, it produces the same result as TeX. Outerwise \ifinner is false, and \ifvmode or \ifhmode produce results in accordance to the current mode, that has little to do with TeX modes.

An extension of ϵ-TeX is \isdefined. This reads a token, and yields true unless it is a macro (or active character) that is undefined. The command \ifcsname reads all characters up to \endcsname and constructs a character string in the same way as \csname. The value is true if a command with that name exists (possibly undefined); it is false otherwise (the important point is that the command is not created). In the example that follows, assuming \foo and \FOO undefined, you will see aBc (or abc, in case someone dedfined \undefined). You will also see DEF, because the LaTeX command \@ifundefined creates the token if it deos not exists, and sets it to \relax.

\ifcsname foo\endcsname A\else a\fi
\ifx\foo\undefined  B\else b\fi
\ifdefined\foo  C\else c\fi
\ifcsname FOO\endcsname E\else e\fi
\ifdefined\FOO F\else f\fi

The command \iffontchar is another extension; it reads a font identifier (for instance \font denotes the current font) and an integer (a character position); it yields true if the font specifies a character at that position.

The last conditional to explain is \ifx. This reads two tokens and compares them. Two tokens are equal if they are character tokens (implicit or explicit) with same character value and category code, or two TeX primitives with the same meaning, or two user-defined commands with the same value (same arguments, same body, same \long and \outer flags)(note: ),(note: ).

2.11.2. Examples of conditional commands

Using \ifx we can code our \Color command properly, like that


It is possible to avoid these assignments in the \Color macro, provided that they are hidden elsewhere. For instance


Note that the ifthen package provides the \equal command as helper for such a situation: you could say \ifthenelse{\equal{A}{B}}{X}{Y} instead of \ifstringeq {A}{B}{X}{Y}. Caveat: the \equal command fully expands its two arguments, our version expands nothing.

In any computer language, you would define a command that compares two strings and returns true or false; this is not possible in TeX because commands return no value. All you can do is modify some variable (a command, a register, a token list, etc). This assignment can be done by the caller or the callee. Here is a solution where the token \next is set by the caller:


Note that, if \envert accepts an optional argument, for instance if \envert[clair]{text} typesets the text using light green, you can say \Color{vert}[clair]{text}. We consider now a case where the assignment is done by the callee (via \equaltrue or \equalfalse; there is a variant that uses \setboolean).

 %%variant: \setboolean{equal}{\ifx\tempa\tempb true\else false\fi}
    \ifequal\let\next\envert\else \let\next\relax\fi\fi

A subtlety of TeX is that tokens are read only when needed. Said otherwise, if you say `\if AB C\else D\fi´, TeX will evaluate the test; it will remember that a new conditional has started. If the test is false, it will skip at high speed until the \else, and resume normal evaluation; but if the test is true, it will resume normal evaluation right now. It is only when TeX sees an \else token (and this can be another one) that it will read all tokens at high speed until the \end. And, when TeX sees the \fi, it will pop the conditional stack. Consider the following example:


Assume that the test is true. Then \aux reads all tokens, up to `\fi´, provides a \fi to finish the conditional now, then expands to its first argument (which is argument 3 of \ifstringeq). In the case where the test is false, the same thing happens. This is nicer that the solution that consists in defining conditionally \next and evaluating it after the \fi, it avoids an assignment.

2.11.3. Testing the next token

Let´s consider now a variant of the color problem. We want to write a command with three arguments A, B and C, it is assumed to read a token, compare it with A, and expand to B or C. We need an auxiliary command that reads the token. Thus the solution


Note that we have put an equals sign after `\let\tempa´ and `\let\lettoken´ for the case where the token to match is an equals sign. If you want to catch spaces, a bit more complicated machinery must be used. There is a problem with this command, because, if the argument of \ifaux is not a single token, say `foobar´, then only `f´ will be put in \lettoken and `oobar´ will be typeset. On the other hand, if the argument is empty, then `\ifx´ will be put in \lettoken; after that \lettoken will be expanded. Since this is \ifx, the following tokens will be compared (said otherwise `\tempa´ and `\let´), this is not exactly what is required. In order to solve this problem, we first modify slightly our code:


The \ifnch command given above looks like the LaTeX version of the beast. In fact, spaces are ignored in LaTeX, so that there is an additional test. Moreover, some variables have a different name, nevertheless, here is the code:


The problem is the \ifaux command. The question is: can we rewrite it in such a way as to read a single token, before calling \ifnch. Recall that we want to distinguish between `{x}´ and `x´. A very interesting question is the following: if we read the opening brace, how can we put it back in the input stream? we cannot do so by just expanding a macro (because the body is always well balanced). You could try something like {\ifnum0=`}\fi (that leaves an unmatched brace after expansion), or something like `{\iffalse}\fi´. Our solution is much simpler. There is a TeX primitive that gets the token without reading it. To be precise, \futurelet reads a token A, that has to be a command name or an active character, then a second token B, then a third token C. The value of the token is put in A, using the equivalent of \let, then C and B are pushed back in the input stream (in this order, the token to be read first is B). The code of \ifnextchar is hence the following:


What `\futurelet\lettoken\ifnch´ does is read a token. This could be a space character, an equal sign, an open brace, a closing brace, whatever. It puts it back in the input stream. It puts it also in \lettoken. After that, it evaluates \ifnch (which is a command that should take no argument, of course; it should consult \lettoken and depending on the value, call a command that, maybe, reads the token). There are some variants. For instance amsmath has a version that omits the comparison with <@sptoken>. The xkeyval package provides a version where the category codes of the character to test and the actual token may be different.

2.11.4. Reading a space

We consider in this paragraph the following problem: is it possible to define a command \sptoken that behaves like a space character inside \ifx? One problem with the current version of Tralics is that, as has been mentioned earlier, a newline character in the source file produces a new line character in the XML file; thus has a different representation as a normal space. Thus, there are two different space tokens N and S (they have the same category code, but a different value, 13 or 32). If a macro requires an argument delimited by a space, both these characters can be used. When comparing token lists, these tokens are considered equal. However, when using \ifx, these two tokens compare unequal. Our purpose is to create \sptoken that compares equal to S; it is trivial to create the N token, and compare them.

We give here three solutions. The first one uses \futurelet. If the arguments are A, B and C, where A is the command to define, and C the space, then B has to be a command (if it is a character, it will be typeset); this cannot be \foo, since spaces after \foo disappear, it has to be something like `\;´. This command must read the space, otherwise it appears in the output. We provide two solutions: a command that is delimited by a space, and a command that takes an argument (remember that spaces disappear before undelimited arguments):

\def\; {}\futurelet\SPtoken\; % comment required
\def\;#1{}\futurelet\SPtoken\; 0

In both cases, the command \; cannot be used for typesetting (in the LaTeX kernel, it is used for computing the \SPtoken, and correctly redefined after that). We give here an example, where the redefinition is temporary, inside the box. We can discard the content of the box.

\setbox0\hbox{\def\;{}\global\futurelet\SPtoken\; }

We give now a solution using \let. Remember the syntax, after \let and \sptoken (the token to be assigned), comes <equals> and <one optional space> and <token>, where the last token is our space token. Since <equals> reads an arbitrary number of spaces and an optional equals sign, an equals sign is required. Our optional space cannot be optional. So we must produce a double space. This is not completely trivial. We give here two solutions (the comment is necessary)

\def\makesptoken#1{\let\sptoken= #1}\makesptoken{ }
\def\:{\let\Sptoken= } \:  % this makes \Sptoken a space token

And now, how can we define \@xifnch? this command is assumed to read a space, discard it, and check again for the next character. Thus the question is to design a macro that reads a space. This cannot be done via \def\@xifnch#1..., since spaces are ignored before undelimited arguments; we cannot use the technique of the command `\;´ above, because we cannot read what follows the space; the solution consists in a command that takes no argument, and that starts with a compulsory token, like \def\foo\bar{etc}. The non trivial point is that we want \bar to be replaced by a space token, but spaces disappear after \foo. We give here two solutions.

\def\:{\Foo}\expandafter\def\: {etc}

2.11.5. Variants of the Map problem

Let´s consider the following variant of the \Map command. If we have \do{A}\do{B}\do{C}, we want to separate arguments with a comma, and put a period after the last argument; we might as well do something with the argument, say, typeset it in italics. This is not always possible. In one of the style sheets used by the Raweb, a Perl postprocessor is used for replacing some commas by a period. We assume here that we know where the list ends. For instance, we assume that we can put a `\endl´ token at the end of the list. Then we can write something like

\def\foo#1#2\endl{\textit{#1}\ifx#2\endl\endl.\else, \foo#2\endl\fi}

Then `\foo{A}{B and C}{D}\endl´ produces `A, B and C, D.´ as expected. Let´s analyze the code and try to see why it is wrong. We assume that you never say \foo\endl, because the list is assumed non-empty. We also assume that the list does not contain the \endl token (in LaTeX, you should use the special marker `\@nil´ only as list delimiter). In our case, the first argument is `A´, the second is `{B and C}{D}´. In the case where the second argument is empty, the test is true, because \endl is compared against itself. In our case, the test is false because the brace is compared with the letter B. If we put the second argument in a pair of braces, we get an error: Too many }´s, because the test is true, and a part of `#2\endl\endl´ has been evaluated. This means that our test is wrong. The only safe way to check whether #2 is empty is to put it in a command, and check whether this is the same as \empty. We shall give a second version of the code where the test is replaced by \ifx\endl#2\endl. In the case where #2 is empty, the test evaluates to true, and if #2 evaluates to some token list that does not start with \endl, the test will be false; this is better.

Note that, when \foo is called again, it compares `D´ with `\endl´. Does this surprise you? In fact, if you say `\foo{A}{XY}{UV}\endl´, you get `A, XY, U, V.´. The trouble is the following: when TeX reads the arguments of a command, a pair of braces disappears, when possible. Thus arguments are `A´ (without braces) and `{XY}{UV}´ (it is not possible to remove the braces). When \foo is called again, arguments are `XY´ and `UV´, without braces. This explains why the test compares U and V (by the way, if `UV´ is replaced by `UUVV´, the test will be true, yielding an Undefined control sequence error). When \foo is called again, arguments are now `U´ and `V´, an unwanted result. There is a simple way to avoid disappearance of braces: it suffices to put a token before each item, for instance like this

\def\foo\do#1#2\endl{\textit{#1}\ifx\endl#2\endl.\else, \foo#2\endl\fi}

The good way of testing that the argument is empty is to use \@iftempty, which a has different syntax:

\def\foo\do#1#2\endl{\textit{#1}\@iftempty{#2}{.}{, \foo#2\endl}}

A more elegant solution: notice that #2 starts with \do, unless it is empty. There is no need to read the argument for seeing this, we can use the \ifnextchar command. With the solution proposed here, the token that marks the end of the list is evaluated: we use \relax, because this is harmless.

\def\foo{\def\do##1{\textit{##1}\@ifnextchar{\do}{, }{.}}}

Note that we can replace \relax by something more useful, for instance a period:

\def\foo{\def\do##1{\textit{##1}\@ifnextchar{\do}{, }{}}}

An alternate solution could use `\ifprevchar´ instead of `\ifnextchar´. There is no such command in LaTeX, but the idea is the following: instead of putting a comma after each argument but the last, we can put a comma before each argument but the first. All we need to do is to know if this argument is the first. In one application, we have coded this as: apply \do-first on the first argument, and map \do-other on the rest of the list. If side effects are allowed, we can use a piece of code like this (note how the final period is typeset):

\def\do#1{\iffirst\firstfalse\else , \fi\textit{#1}}

In fact, there is no need to use an auxiliary command, it suffices to modify \do itself:

\def\foo{\def\do##1{\textit{##1}\def\do####1{, \textit{####1}}}}

If you think that there are two many sharp signs, you can try

\newcommand\normaldo[1]{, \textit{#1}}

There are other possibilities implying conditional commands. We shall see later how to define a comment environment that ignores the content of it. It is as if you said


One can make the following strange construct {\ifnum0=`}\fi. In this case, we compare two numbers, zero and the internal code of the brace (which is in general non-zero). The result of the test is false, but who cares? the body of the conditional as well as the else part is empty. Hence, the result is like \bgroup, there are some differences because TeX has two brace counters: the balance counter and the master counter; there is only one counter in Tralics. For details, see the TeXbook and its appendix D, where it is said “If you understand [...] you´ll agree that the present appendix deserves its name.” (the name of the appendix is `Dirty Tricks´).

A piece of code like this causes trouble to Tralics

  \ifdim \wd\tempboxa >\hsize
  \else \hbox to \hsize{\hfil\box\tempboxa\hfil}%

It is a simplification of the \@makecaption command of the article class. The idea is to center the caption of an image if it fits on a line (centering is achieved via \hfil). The argument is typeset in a temporary box, and the width of the box is compared against \hsize. Captions in the Raweb are always centered, but this is not aesthetic.

2.11.6. More examples

Consider again the following example


It would be much simpler to write:


The problem here is that the commands \tempb and \tempc may take an argument, that would be \else or \fi. The remedy is


In general, you need an \expandafter before each token between \else and \fi. The command \@afterfi can be used to simplify such definitions. Its effect is easy: it reads all token, up to the \fi tokens, evaluates \fi, then the other tokens. Such a command is provided by the following packages: typehtml, grabhedr, gmutils, gmverb, morehelp, splitbib, babel, and maybe others. Example:

   \else\@afterfi\fct v\fi}

If the test is true, then somecode is evaluated, then everything between \else and \fi is discarded. But if the test if false, the else part is interpreted as if it were \fi\fct v. The command \@afterelsefi is to be used in the true part (all tokens between \else and \fi are discarded). In the example that follows, \fct is called with two arguments, the first one is u or v, the second is 2.

   \ifnum\count0=#1 %
   \@afterelsefi \fct u
   \else\@afterfi\fct v\fi}
\def\fct#1#2{} \test32

The piece of code that follows computes the factorial of a number, using only expandable commands (it requires \numexpr, an extension provided by ϵ-TeX).

    \number \numexpr#1*\JGfactorial{(#1-1)}\relax
  \else 1\fi}
      \number \numexpr#1*\factorial{(#1-1)}\expandafter\relax
    \numexpr\ifnum\numexpr#1<0 0\else1\fi\expandafter\relax

Ulrich Diez, wrote versions 3 and 4; Version 3 uses a space character instead of \space using one of the techniques shown above; he then produced version 4, which gives a different value for the factorial of a negative number, and the space after the digit 1 is not needed anymore. In fact, if the argument is zero or one (case where the first \ifnum is false, version 1 and 2 return the character 1, while versions 3 and 4 return the digits of the number 1, computed by \number; in case 3, an optional space is read after the integer constant, in case 4, the \relax token is an end marker for \numexpr, an no optional space is needed after it (I guess that the purpose of this \numexpr if to avoid any problems if \space is redefined); the first \numexpr is needed for the product, and the two other calls are needed if the command calls itself). The difference between versions 2 and 3 is the placement of \number. I put it just before \numexpr, because \numexpr can be used only in a context where a number is seen. Ulrich puts it before the \ifnum. Does this make any difference? If you want to compute the factorial of a number, no. What about the following code:


The effect is the following. The command \JGfactorial is expanded twice, and the result is put in a command; evaluating this command yields the desired result. The same can be applied to \UDfactorial. In any case, the first expansion gives the body of the macro. The second expansion expands the \ifnum and \number respectively. In one case you get lines two and three of \JGfactorial. This is something like


If you do not use this command, TeX will signal an unterminated \if. If you call it twice, you will get an extra \else error. On the other hand, if you consider \UDfactorial, the one-level expansion of \number implies expansion of the \ifnum, then the \numexpr of the body; expansion of the command means considering all tokens up to the final \relax, and since this \relax is preceded by \expandafter, everything up to the final \fi is taken into account. Thus, the one-level expansion of the body is a number, the desired result.

2.11.7. Producing N asterisks in a row

In appendix D of the TeXbook, there are some examples of how to produce N asterisks in a row. The question is: can we produce this using pure expansion? this is a solution given by D. Kastrup:

\def\nlines#1{\expandafter\nlineii\romannumeral\number\number #1 000\relax}

This produces `AAAAA´. The idea is the following: `\romannumeral3000´ expands to `mmm´. It is then rather easy to convert this sequence of m into a sequence of A. The argument of the command can be `\count0´; the `\number´ has as effect to convert the value of this counter into a number, it gobbles a space. The argument of the command can be `\count1␣´; the second `\number´ will gobble the second space (I don´t know if there is some other reason for these two \number commands). Here is the same idea, without tests:

\def\recur#1{\csname rn#1\recur}
\def\replicate#1{\csname rn\expandafter\recur
  \romannumeral\number\number#1 000\endcsname\endcsname}
\dimen0=4sp \replicate{\dimen0}{P}

You may wonder how this works. Here is the transcript file of Tralics.

1 [216] \replicate{\dimen0}{P}
2 \replicate #1->\csname rn\expandafter \recur \romannumeral
3    \number \number #1 000\endcsname \endcsname
4 #1<-\dimen 0
5 {\csname}
6 {\expandafter \recur \romannumeral}
7 +scanint for \dimen->0
8 +scanint for \number->4
9 +scanint for \number->4000
10 +scanint for \romannumeral->4000
11 \recur #1->\csname rn#1\recur
12 #1<-m
13 {\csname}
14 \recur #1->\csname rn#1\recur
15 #1<-m
16 {\csname}
17 \recur #1->\csname rn#1\recur
18 #1<-m
19 {\csname}
20 \recur #1->\csname rn#1\recur
21 #1<-m
22 {\csname}
23 \recur #1->\csname rn#1\recur
24 #1<-\endcsname
25 {\csname}
26 {\csname->\rn}
27 \rn #1->
28 #1<-\recur
29 {\csname->\rnm}
30 \rnm #1->\endcsname {#1}#1
31 #1<-P
32 {\csname->\rnm}
33 \rnm #1->\endcsname {#1}#1
34 #1<-P
35 {\csname->\rnm}
36 \rnm #1->\endcsname {#1}#1
37 #1<-P
38 {\csname->\rnm}
39 \rnm #1->\endcsname {#1}#1
40 #1<-P
41 {\csname->\rn}
42 \rn #1->
43 #1<-P
44 Character sequence: PPPP .

This is now something else, it is part of a command defined in the RR style file:

  \edef\foo{\ifnum 0<0#1x\else y\fi}\def\xbar{x}%
  \else \global\compteurtheme=0 \@latex@error{Pas un thème #1}\@eha\fi

Assume that #1 contains a positive number, for instance 25. In this case, the test will be true, \foo will be defined as `x´, and will be equal to \xbar. In this case, our command puts 25 in \compteurtheme. Some other tests (not shown here) are done for instance, the value should be a number between 1 and 4, or a number with two digits, each one being between 1 and 4. Assume that the argument is not a number, say it is `gee´; then \ifnum will compare 0 and 0, the test will be false, \foo will be defined as `y´ hence is not equal to \xbar. Assume that the argument is `3a´; this is not a theme, but a theme and a subtheme. In this case, the test is true, but \foo expands to `3x´, and this is not equal to \xbar. Nowadays, themes are `com´, `cog´, etc, and this piece of code has become useless. It is replaced by something different, see end of section 6.9.

2.12. A nontrivial command \verb

The code that follows is a simplified version of a LaTeX command

1 \def\verb{%
2   \bgroup
3     \let\do\@makeother \dospecials
4     \verbatim@font\@noligs
5     \@vobeyspaces \frenchspacing\@sverb}
7 \def\verb@egroup{\global\let\VBG\@empty\egroup}
8 \let\VBG\@empty
10 \def\@sverb#1{%
11   \catcode`#1\active
12   \lccode`\~`#1%
13   \gdef\VBG{\verb@egroup\error{...}}%
14   \aftergroup\VBG
15   \lowercase{\let~\verb@egroup}}

Note first that this code contains two empty lines, that are read by TeX as a \par token (it is ignored, provided that the definition is read in vertical mode). Lines 5, 7, and 15 are terminated by a brace and the end of line character produces a space token, that is ignored for the same reasons. Lines 1, 10, 12, and 15 are terminated by a % character, since otherwise, it would produce a space character (ignored in case the command is executed in vertical mode, and that is not always the case). In the case of lines 2, 3, 4, etc., the end of line is converted into a space character that disappears because it follows a command name.

This code defines a command \verb that starts a group via \bgroup. At line 3, \dospecials is executed, after redefining \do. This changes the category code of all special characters (included all characters made active by packages like babel(note: )). Line 4 changes the current font to a typewriter one, and it executes a piece of code that inhibits ligatures (for instance the one that converts a double dash in an en-dash). Note that this document contains a great number of verbatim examples, either inline or as environments. In some cases, we use a smaller font; it is hence important to allow the user to parameterize commands like these. Line 5 contains three commands: The first makes an end-of-line character active (usually, it will behave like \par), the second enters so-called french spacing mode (a mode where the width of a space is constant), and the last command \@sverb will be explained later. The `s´ in the name of this command comes from the `starred´ version of \verb: If you say `\verb*+ +´, you will get `´. We have omitted the test with the star character.

On lines 7 and 8, we define a command \VBG that does nothing (i.e. expands to the empty list) and a command that evaluates to \egroup preceded by a global assignment of \VBG to nothing. On line 13, \VBG is defined as calling \verb@egroup plus some error, whose text is not shown here. Thus \VBG is a command that 1) resets \VBG to a harmless command, 2) closes the current group, 3) signals an error.

Let´s consider lines 11 and 12. We assume that the argument of \@sverb is some character c (If you say \def\foo{\verb\foo=\foo then \foo, you will get an error Improper alphabetic constant, and after that, you´re really in trouble. In the usual case, the character that follows \verb is read with category code 11 or 12, because of the code line 3.) Line 11 makes the character c active (of category 13); the category code will recover its old value at the end of the group, and line 13 changes the lc-code of the tilde character (the lc-code will recover its value at the end of the group). The lc-code of a character will be used for hyphenation, as well as conversion from upper case to lower case. We assume here, for the sake of simplicity, that hyphenation is inhibited by the use of a verbatim font. Note that Tralics does not care about subtleties like hyphenation. For this reason, when you say \verb+foo+, it will execute \verbprefix {\verbatimfont foo}. You can redefine both commands (the prefix is empty, the font defaults to \tt). Notice that Tralics grabs the argument, contrarily to LaTeX.

Line 14 contains the special command \aftergroup. This reads a token, saves it on a stack, and re-inserts it at the end of the current group.

Let´s come back to the LaTeX implementation of \verb. So far, we have read a character, changed its category code, changed the lc-code of the tilde character, changed the font and other tables, redefined \VBG, aftergrouped it (code on line 13: the token is popped at the end of the current group, that was opened on line 2, and normally closed on line 7). Line 15 is a kludge: what \lowercase does is replace in its argument every character by its lower case equivalent (using the lc-code table). The result is evaluated again. Here the argument is formed of three tokens: \let, the tilde and \verb@egroup. Since ~ is a character that has a lower-case equivalent, it will be replaced by that, namely the character c. Note: category codes are left unchanged by this procedure. It is hence important that ~ be an active character (because \let modifies that value of ~) and that c be active (otherwise, there is no meaning in changing the value of c).

Consider the case of \verb+\toto+. Here the character c is the plus sign. After line 15 has been executed, the situation is the following: all characters are of category other, ligatures are disabled, french spacing is active, current font is typewriter, a group is opened, and a token is waiting for the group to terminate. In such a situation, you cannot go outside LaTeX properly. In fact, the carriage return has been made active in order to help error recovery (this is not shown here), and the `+´ sign has been made active: this will help us. TeX sees now the following tokens \12 t11 o11 t11 o11 +13. The first five tokens are added to the current horizontal list as characters in the current font, while the last one is expanded. The expansion is that of \verb@egroup, see line 7. This defines globally \VBG, then closes the group, restoring everything. It does not restore \VBG (because the last assignment was global). After the group, the after-grouped token \VBG is evaluated but it does nothing.

So far, so good: the translation of `\verb+\foo+´ is the same as `\texttt{\char`\\foo}´. Note that the author could have entered the previous expression as `\verb-\verb+\foo+-´, or using the fancyvrb package as `|\verb+\toto+|´, but he used \quoted{\BS verb+\BS foo+}, because, in the HTML file produced by Tralics, different colors are used for verbatim material; this is explained in the second part of this document.

Consider now the following example:

\def\duplicate#1{#1#1} `\duplicate{\verb+x+}++'

You would expect `xx++´ but you get x+x+ in LaTeX, an error in tralics. Explanations: the expansion of \duplicate is verb +12 x11 +12 verb +12 x11 +12 +? +?. The last two plus signs have not been read, and their category code is still unassigned. The \verb command reads the +12 via \@sverb. It changes the category code of the plus sign. The second \verb does the same. It reads the +? as a +13, this finishes evaluation of \verb. The second \verb command does the same. In the case where you replace ++ by --, the \verb command will see an end of line character before a plus character and complain with LaTeX Error: \verb ended by end of line.

Consider now the following example:

\def\braceme#1{{#1}} `\braceme{\verb+x+}++'

You get the following error LaTeX Error: \verb illegal in command argument. Let´s try to see how this is done. The expansion of \braceme produces the following tokens: {1 verb +12 x11 +12 }2. After \@sverb has finished, the first non-inactive character is }2, this closes the current group. Hence, as above, this restores category code, fonts, lc-codes, etc. It does not restore \VBG because assignment is global (\gdef at line 13 is like \global\def). The trick is now that a \VBG token is popped from the aftergroup stack. This one calls \verb@egroup and signals an error. What \verb@egroup does is to close a group (the one opened by \braceme), and reset \VBG to something harmless. Note that TeX is in a clean mode when the error is signaled. Tralics has no such error handling mechanism (however, no category codes are changed when scanning for the end of the command, so nothing harmful can be done). What this example shows is that error recovery is not completely trivial; nevertheless nice things can be done.

Note the following special cases;

\verb test

In the first case, the delimiter is a space character; the first line is terminated by a space and you would expect it to be interpreted in the same way as the second line. The trouble is that TeX removes all spaces characters at the end of the line (regardless of category codes). The last line has also a problem: the delimiter is character 171 (double hat mechanism), and one \verb has changed category codes, the double hat sequence is not seen any more as such, and an error is signaled.

There is a variant to \verb, it is the `verbatim´ environment. The classical exercise is: write a command that reads everything up to \end{verbatim} (backslash and braces are of category 12 in this token list). There are different packages that solve this problem; For instance fancyvrb is one of them. A solution is also given in the first chapter. It does not allow an optional space after `\end´.

We give here the LaTeX implementation of the \end command.

  \csname end#1\endcsname\@checkend{#1}%

As you can see, if you say \end{foo}, then \endfoo is executed first. After that the current environment in \@currenvir is compared with the argument, in case of error the variable \on@line contains the start line of the environment. After that, the group is terminated, and we have two tests. The first uses \expandafter, this means that the command \@doendpe is executed outside the environment in the case where the variable \if@endpe is true inside the environment. This command is very complicated (it redefines \par and modifies \everypar), and not implemented in Tralics; the effect is to suppress the indentation of the following paragraph. On the other hand, the two commands \@ignoretrue and \@ignorefalse redefine \if@ignore globally, so that no \expandafter is needed for this one.

This is an example of \aftergroup.

  \@endpefalse \color@setgroup \ignorespaces}

The effect of the \edef command is to replace the previous definition by the following (where `17´ is to be replaced by the current line number). One important point here is that implementing colors in LaTeX is non trivial, and for this reason, there are two hooks (the commands with the name `color´, that do nothing if the package is not loaded). Colors are not implemented in Tralics.

  \def\@currenvline{ on input line 17}%
  \@endpefalse \color@setgroup \ignorespaces}

The order of evaluation is the following. Assume that the current environment is X. The \begin command opens a group via \begingroup and changes the environment name to `lrbox´. The command starts with \endgroup, closing this group. After that, we put something in the box whose number is the argument of the environment; the content is a hbox, whose start is defined by the brace (and this brace is a group); we start a group with \begingroup, and call \aftergroup. This pushes a brace on the stack; this brace indicates the end of the hbox, but it will be evaluated later. After that, we change again the name of the current environment (it was restored to X by the \endgroup, but we made a copy of it in the \edef). When the end of the environment is reached, the following happens. First, the end-code is executed (this removes space at the end of the box), and \endgroup is executed. As a side-effect this restores the current environment name to X. It also pops the after group stack, namely the closing brace that terminates the \hbox. One important point here is that the \setbox assignment is done outside the environment (it could done inside, with a \global prefix). Such a piece of code is illegal. The lrbox environment is not implemented in Tralics version 2.10.

2.13. Expandable tokens

Assume thar \err is un undefined command. The following code

\ifnum1=0\err1 \err1 \fi

will signal two errrors: when TeX reads the second number, it expands undefined command (hence a first error), and continues scanning, until finding the space; the test is true, hence the second error.

We give here the list of all tokens that can be expanded.

Back to main page