Tralics, a LaTeX to XML translator; Part I

5. Other commands

5.1. Character encoding

We have to distinguish between input encoding, internal encoding and output encoding. The internal encoding of TeX is ASCII (i.e. 65 is the internal code of the upper case letter A), at least for all characters with code between 32 and 126. The input encoding is the mechanism that converts the code of the letter A supplied by computer into the code 65. Almost all input encodings are nowadays ASCII-based, they produce the same value for the letter A; the results may be different for a character like é. The output encoding indicates for a letter, say A, which position in the font to use. We shall not discuss the output encoding here. Let´s just notice that the character `{´ exists in the font cmtt10, but not in other text fonts of the computer modern family. If you read a version of this document that uses the original encoding (OT1), braces shown in error messages are taken from a math font, hence are upright. Some years ago, a 8bit encoding (called T1) was designed, which contains braces. You can compare Figure 1 in appendix F of the [4] (describing the font cmr10) with Table 7.32 of [6], describing ecrm1000.

The first version of TeX was using 7bit input and output characters (but fonts and dvi files were coded on 8bits). There is an extension Ω to TeX that accepts 16bit characters as input, using different encoding schemes. Characters that are not part of the ASCII specifications (less than 32 or greater than 126) are not guaranteed to be treated the same in all implementations. For this reason, it it wise to load the inputenc package, with the current encoding as argument. The effect will be that some characters, like é will become active, and expand to \´e. As a result: only ASCII letters are allowed in control sequence names. On the other hand, if you say \begin{motclés}, then LaTeX complains with LaTeX Error: Environment motcl\´es undefined. Don´t try to define the motcl\´es environment: the expansion of the accent depends on the context: it is é for \begin and \´e for the macro that prints the error message. Non-ASCII characters may be printed by TeX as ^^ab (in some older version of TeX, I had to pretend, via locale settings, that my computer did not understand English in order for it to output the guillemet as «).

A silly question concerns end-of-line markers. Some systems like Unix use LF (line feed) as line separators, some others like Macintosh use CR (carriage return) and Windows uses CR-LF. This is replaced by TeX by a single character: the carriage return with ASCII code 13. Tralics interprets CR-LF, CR and LF alike: as an end-of-line marker. This marker will be replaced by the character whose code is in \endlinechar, provided that this value is in the range 0–255(note: ). The default value is 13, a character of category 5. The tokeniser converts this into a \par token, a space token or ignores it depending on the state. This space token has value 32 (but Tralics uses 10, so as to keep the same line breaks in the XML result as in the TeX source). Note that, whenever a line is read, spaces at the end of the line are removed. If you want a space after a control sequence, you say something like `\TeX\␣´, and if this construct appears at the end of a line, the space is ignored; if the endline character has category code 5, it will be converted to a space, and everything works fine; if this character is for instance 65, you may get a strange error, like this

! Undefined control sequence.^^J
l.170 ...reaks in the \XML\ result as in the \TeX\^^J
? ^^J

We have shown here the end of line as ^^J. There are four lines: the error messages, two context lines, and the line with the prompt. The two context lines show that the space at the end of the line is removed. TeX does not print the undefined control sequence: it assumes that it is either the last token on the first context line, or a token marked as `<recently read>´ or something like that; in our case, the undefined control sequence is the one obtained by replacing ^^J by the value of the endline character.

There is a way to enter special characters in TeX, for instance ^^J is a line feed. The algorithm is the following: whenever TeX sees two consecutive identical characters of category code 7, followed by a character whose number is x, it replaces these three characters by the character whose code is y, where y=x-64 if x64, and y=x+64 if x<64. Hence ^^? yields y=127 (this is the delete character). All characters with codes between 1 and 26 can be obtained using the form ^^A, ^^B, etc. The null character is ^^@, characters with code between 27 and 31 are ^^[, ^^\, ^^], ^^^ and ^^_. Character 32 can be represented as ^^`. All other characters are ASCII characters. This is an example of use:

27=\char`\^^[, 28=\char`\^^\,  29=\char`\^^], 30=\char`\^^^, 31= \char`\^^_

Because some characters in the list are of category code 15 (invalid), we have used the construction \char`\A (with A replaced by some other character). There is no difference between \char`\A and \char`A, unless the category code of the character is one of 0, 5, 9, 14, or 15. The result is the character at position 65 or whatever in the current font; the example above selects positions 27 to 31. The translation is

27=&#x1B;, 28=&#x1C;, 29=&#x1D;, 30=&#x1E;, 31= &#x1F;

Note that these characters are invalid in XML1.0, so that this example is not good; if you compile this document with LaTeX, you will see [not compiled with latex]. In general you will see a ff ligature or a oe one; this depends on the output encoding.

When TeX switched to 8 bits, the rule changed a little bit: the previous rule applies only if 0x127, it gives 0y127. Another test was added: if you say ^^ab, these four characters are replaced by the single character whose code is ab (in base 16, i.e. 171 in base ten in this case). In such a case two characters are needed: a letter or a digit; only lower case letters between a and f are allowed. Thus every character in the range 0-255 has such a representation. Note that, by default, the character ^^ab has category code 12, hence is valid. What appears in the dvi file depends on the output encoding, in the case of a 7bit encoding, the character is unknown, a warning is printed in the transcript file, that´s all, otherwise, it should be an opening guillemet, but it could as well be ń. The purpose of a package like inputenc is to change the category code of all special characters, so that it behaves like a command and produces, in the dvi, something that is, as much as possible, independent of the output encoding.

According to this rule, the character 32 has can be entered as ^^20. There is one situation where the space character can be used in this way: at the end of the line, when \endlinechar is non trivial. Note that, in the case where the resulting character has category 7, it can participate in a hat-hat construct. Here is an example.

{\catcode `\é=7 ééab $xé2$ %next line should produce M
%$1^è=^^^AééT$ %% hat hat control-A
$1^è=^^^A$ %% hat hat control-A
}\def\msg{a message.^^J}

Some explanations are needed. ^^{ is a semi colon, ^^ab is an opening French guillemet, ^^5e is a hat (recursion...), ^^41 is the uppercase letter A. The first line of the example explains that such funny characters can appear in a control sequence name. The second line shows that the hat-hat mechanism can be used with other characters than a hat. It also shows that, if the mechanism cannot be applied, a character with category 7 behaves like a superscript character, whatever its numeric value. The line that follows shows that the end-of-line character is ASCII 13, aka control-M (usually written as ^M). After that, there are two lines containing a control-A character, shown here as ^A. It is preceded by hat-hat, so that the effect should be a single A. The line that is commented out contains a control-T written as ééT (for some strange reasons, this character is invalid in XML1.0, but valid as an entity in XML1.1, [9], [8]). The last line is just a real example of ^^J. This character is printed by Tralics as LF, or CR-LF on Windows. This is the translation of Tralics:

&#xAB; <formula type='inline'>
<math xmlns=''
 ><msup><mi>x</mi> <mn>2</mn> </msup></math></formula
 > M<formula type='inline'><math xmlns=''
 ><mrow><msup><mn>1</mn> <mi>&#xE8;</mi> </msup><mo>=</mo><mo></mo

We inserted some newline characters at unusual places (just before greater than signs), other spaces were produced by Tralics; in order to make sure that 8bit characters are printed correctly, we asked Tralics for a seven bit output.

As said above, Ω accepts 16bit characters, using the notation ^^^^abcd. This syntax was implemented in Tralics2.7, via the \char command (remember that in Tralics, the \char and \chardef commands accept 27bit integers); as a consequence, these characters could not be used in a command name; thios restriction does not appluyy anymore (the default category code of characters with code greater then 127 is other, namely 12). Example


It is translated by Tralics as &#x153;=&#x152;=&#x178;= &#x17B;x?. The argument to \foo could also have been: \oe\OE{\“Y}. The transcript file contains lines of the form:

[8] \foo^^^^0153^^^^0152^^^^0178
\foo #1#2#3->#1=#2=#3=

It is possible to ask for UTF-8 output in the transcript file. This gives characters that are hard to see using latin1, because characters in the range 128–128+32 are in general unprintable. What is shown here as hat-Ó is a single character.

[2] \foo^^^^0153^^^^0152^^^^0178
\foo #1#2#3->#1=#2=#3=
{Push p 1}
Character sequence: Å^Ó=Å^Ò=Ÿ= .

The original version of the Tralics documentation said: Si on a un texte qui contient essentiellement des caractères 7bits, et très peu d´autres caractères, l´utilisation de caractères 16bits consomme énormément de place. This means that using a 16bit encoding consumes a lot of space if you write a French document (and even more, for an English one). The sentence has 159 ASCII characters and 6 others; these can be input using iso-8859-1 (aka latin-1) as input encoding(note: ). In TeX, it uses 165 bytes, in Ω, it uses 330 bytes. Using a construction like \´e we need 177 bytes (and 7 bits per byte). Using UTF-8 requires only 171 bytes (8 bits per byte). This explains why UTF-8 is popular. We shall explain (in the second part of this document) how UTF-8 is encoded and how TeX may read it. In the case of Tralics, the situation is: you can (via an argument to the Tralics program) specify that the sources are encoded using UTF-8 or latin1 (this being the default). However, if the tex file contains, on the first line “utf8-encoded” UTF-8 encoding will be used, if it contains “iso-8859-1” then latin1 encoding will be used.

5.2. New encoding scheme

Since version 2.9, internal encoding of Tralics is 16bit utf8. This has two consequences that will be explained here. The first is that some tables are now much larger. The numeric argument to \catcode, \mathcode, \lccode, \uccode, \sfcode, \delcode, which is a character number can now be anything between 0 and 65535. We also changed the numbers of registers: there are 1024 instead of 256.

The result of a ^^^^abcd construct fits on 16bits, hence is a character, hence can appear in a command name (in the case of a multicharacter control sequence, it must have category code `letter´; initially all character with code greater than 128 have category `other´). In order to save space, a short-verb character must fit on 8bits; otherwise, its category code will not be properly restored when you undeclare it (category other will be used).

All characters are valid in math mode. The translation of an ASCII character may depend on the font, otherwise, it is always <mi>. For instance, in the case of $\mathbf\´e$, expansion of the accent command produces a 8bit character, unaffected by the font change, and the translation is a <mi> containing the e-acute letter. Full 21 bit characters are allowed in Math mode. An expression $x$ is considered trivial math and translates into a <simplemath> element only if the character fits on seven bits and has category letter.

The default input and output encoding is latin1, which is no more the internal encoding. As a consequence, there are two conversion procedures. We explained above that the input encoding can be given on the first line of the file. Otherwise a default encoding will be used. This can be explained in the configuration file. As a consequence, the main input file is read without conversion, then the configuration file is considered, and then the main input file is converted; all other files are immediately converted.

On the other hand, a character like é is represented as é in the internal tree. This character will appear, in the output file, in the form &#e9; if you call Tralics with option -oe8a or -oe1a, as é if you call Tralics with option -oe1 or é if you call Tralics with option -oe8. If the option contains a, the XML file contains only 7bit ASCII characters; the only difference between the two options is the encoding declaration. These options specify also the encoding used for the transcript file. You can specify it independently with the options -te8a, -te1a, -te8, or -te1. If the character is too big to fit in the encoding, then the hat-hat notation is used (see example above). Because each XML file contains its encoding, a XML processor will handle the file produced by Tralics independently of the output encoding. Moreover, whatever the encoding, input or output, you know that ^^^^03b7 is Greek letter eta.

5.3. Changing the input encoding

We mentioned in the previous section that whenever Tralics reads a file, it converts its content, according to the current encoding (that can be given at the start of the file, using ASCII characters), with an exception for the main input file. The situation is a bit more complex: configuration files, tcf files, bibliography data files, and TeX files opened by \openin use a fixed encoding; other source files use a variable encoding.

The default encoding is stored in \input@encoding@default. The default value is one, but can be changed via an option to the program (utf8 or latin1 select encoding 0 or 1 respectively).

The current encoding is stored in \input@encoding. This is an attribute of the current input file, it can be changed at any time. The new encoding is used when Tralics needs to read a new line in order to fetch the next token. Nothing special is done in the case of \read.

Whenever a file is opened, its initial encoding is computed. If the file has a fixed encoding, then all lines are immediately converted, otherwise lines are converted when needed. If the first line of the file contains the string utf8-encoded, then encoding 0 is assumed, if the line contains iso-8859-1, then encoding 1 is assumed, and if the line contains tralics-encoding:NN where NN is a sequence of one or two digits forming a number less than 34, then encoding NN is assumed. There are other heuristics. For instance, if %&TEX encoding = UTF-8 appears near the start of the file, then encoding 0 is assumed. In all other cases, the default encoding is assumed.

In the current version of Tralics, there are 34 possible encodings. Encoding number 0 is UTF8; this is an encoding where an ASCII character is represented by a single byte (with the same value as the character), and other characters use a variable number (between 1 and 4) of bytes. In encodings like UTF16, a character is represented by more than one byte. There is currently no support for such encodings yet. Stated otherwise, we assume that character C is represented by a byte B, and the encoding specifies the value C at position B. Encoding 1 is latin1 (also known as iso-8859-1), it has B=C. For the 32 remaining encodings, it is possible to specify, for each byte B, the associated character C (default is B). Trying to set the current or default encoding to a value outside the range 0-33 is ignored; trying to modify an encoding outside the range 2-33 raises an Illegal encoding error, and invalid byte value gives Illegal encoding position error. In case of an illegal character value (negative, zero, 65536 or more), the byte value is used instead. The magic command is \input@encoding@val; it reads an encoding, a byte and a value. In the example that follows we change the encoding number 2 so that \FOO is read as \foo:

1 \input@encoding@val 2 `O =`o
2 \input@encoding@val 2 `F =`f
3 \let\foo\bar
4 \showthe\input@encoding@val 2 `O
5 \input@encoding=2
6 \show\FOO
7 \showthe\input@encoding@val 2 `O
8 \showthe\input@encoding
9 \input@encoding@default=0
10 \showthe\input@encoding@default
11 \input@encoding=1

This example shows three commands in read or write mode: when the command is prefixed by \showthe it read a value from memory and prints it on the terminal, otherwise a number is scanned and written in memory. The equals signs before the number is optional. No less than 13 integers are scanned, some are given as an explicit integer, some as a character code. We assume that, for encoding 2, all characters map to themselves. Since \FOO is read as \foo, the \show command should print \bar, on lines 4 and 7 you see the value stored of encoding 2 for the character O (first upper case, then lower case), this is twice 111. Other values shown are 2 and 0.

We describe from now on the content of the inputenc package. You load it by saying \usepackage [foo,bar] {inputenc}. The effect of this command is the following. First, a symbol name is defined for each of the 23 known encoding, for instance utf8 for UTF-8 (encoding 0), latin1 for latin1 (encoding 1), etc. The command \inputencodingname holds the current input coding name, and \encoding@value converts this to an integer. The command \inputencoding can be used to change the encoding. It is defined as:

12 \def\inputencoding#1{%
13   \the\inpenc@prehook  %% pre-hook
14   \edef\inputencodingname{#1}%
15   \input@encoding=\encoding@value{\inputencodingname}%
16   \the\inpenc@posthook} %% post-hook

There are two hooks (token lists) that do nothing, added here for compatibility with the LaTeX package. You can use them to output as messages, such as: switching from encoding A to encoding B (the initial value of the encoding name is \relax, this can be used by the pre-hook).

The options, foo and bar in the example, should be valid names. The last name becomes the current and default encoding. As mentioned above, the current encoding applies to an input file, and there is no reason to change the encoding of the package file. Hence, the following is executed:

17   \input@encoding@default\encoding@value{bar}%
18   \AtBeginDocument{\inputencoding{bar}}

If the options are, for instance ansinew and applemac, the tables associated to these encodings are defined; some other tables might also be defined, but you should not rely on this (of course, latin1 and utf8, can be used anywhere, because they are builtin). The package contains

19 \edef\io@enc{\encoding@value{latin9}}
20 \DeclareInputText{164}{"20AC}
21 \DeclareInputText{166}{"160}
22 \DeclareInputText{168}{"161}
23 \DeclareInputText{180}{"17D}
24 \DeclareInputText{184}{"17E}
25 \DeclareInputText{188}{"152}
26 \DeclareInputText{189}{"153}
27 \DeclareInputText{190}{"178}

The code above defines the latin9 (iso-8859-15) encoding. It is very like latin1, but defines the Euro sign at position 164. Defining 256 characters per encoding using this method is inefficient. For this reason you can see

28 \input@encoding@val \encoding@value{latin2} -96 160
29 160 "104 "306 "141 164 "13D "15A 167

As explained above, the command on the start of the line reads 3 integers: an encoding value (here, the encoding of latin2), a byte position and a character value. The byte position must a number between 0 and 255. Here we use an extension: If a negative number minus N has been read, followed by A such that the sum of A and N is at most 256, then N values will be read, and stored at position A and following (here N is 96, and we have shown only the first eight values).

5.4. Characters and Accents

There are some commands that put an accent over a letter. You can say a\accent 98 cde, this works in TeX, but not in Tralics: you will get an error, Unimplemented command \accent. The number 98 is read, and converted to an integer. The Unicode character will be used; thus the translated result is `abcde´.

You can say \a´e. (note: )This is a command introduced by LaTeX so as to allow accents inside a tabbing. Some care must be taken. If you say \a{par}{b} in LaTeX, you get an error of the form: Paragraph ended before \@changed@cmd was complete. The Tralics error message is: wanted a single token as argument to \a. If you say \a\foo12, there is a single token, and the error is: Bad syntax of \a, argument is \foo. In fact, the token after \a must be a valid accent character. After that \a´ is handled exactly like . You can say `\= U´, the space after the command is ignored. You cannot say `\={ U}´, the space is not removed, this is an error. In fact, the argument list of the accent command should contain exactly one token (exception: double accents will be explained later). This token should be a character, with code between 0 and 128. Hence \´Ê is wrong, you must say \´{\^E}} if you want Ế. The message is Error in accent, command = \´; Cannot put this accent on non 7-bit character É. If the token \i is given, it will be replaced by i, so that \”\i and \“i produce the same result. You can say \=\AE, \=\ae, \AE, \ae, \AA, \aa, \O, \o. The result looks like ǢǣǼǽǺǻǾǿ.

You can put an accent on a letter only in the case where this gives a Unicode character. In the case of \c{a} and \c{\=a}, the error message is the same: Error in accent, command = \c; Cannot put this accent on letter a. Table 1 indicates on which letters you can put an accent. See the html page for a list of some glyphs.

Table 1. All possible accents. You can put an accent on any letter, except Q. You can put accents on non-letters, for instance \ae, see text. Some characters accept two accents. In general, you can put an accent on a lower case letter, an upper case letter. There is one exception: you cannot put a dot over a lower case I, because there is already a dot. For h, j, t w, and y, there are accents that apply only to lowercase letters.
\^ Ââ Ĉĉ Êê Ĝĝ Ĥĥ Îî Ĵĵ Ôô Ŝŝ Ûû Ŵŵ Ŷŷ Ẑẑ
Áá Ćć Éé Ǵǵ Íí Ḱḱ Ĺĺ Ḿḿ Ńń Óó Ṕṕ Ŕŕ Śś Úú Ẃẃ Ýý Źź
\` Àà Èè Ìì Ǹǹ Òò Ùù Ẁẁ Ỳỳ
\” Ää Ëë Ḧḧ Ïï Öö Üü Ẅẅ Ẍẍ Ÿÿ
\c Çç Ḑḑ Ȩȩ Ģģ Ḩḩ Ķķ Ļļ Ņc n Ŗŗ Şş Ţţ
\u Ăă Ĕĕ Ğğ Ĭıi Ŏŏ Ŭŭ
\v Ǎǎ Čč Ďď Ěě Ǧǧ Ȟȟ Ǐǐ ǰ Ǩǩ Ľľ Ňň Ǒǒ Řř Šš Ťť Ǔǔ Žž
\~ Ãã Ẽẽ Ĩĩ Ññ Õõ Ũũ Ṽṽ Ỹỹ
\H Őő Űű
\k Ąą Ęę Įį Ǫǫ Ųų
\. Ȧȧ Ḃḃ Ċċ Ḋḋ Ėė Ḟḟ Ḣḣ İ Ŀŀ Ṁṁ Ṅṅ Ȯȯ Ṗṗ Ṙṙ Ṡṡ Ṫṫ Ẇẇ Ẋẋ Ẏẏ Żż
\= Āā Ēē Ḡḡ Ħħ Īī Ōō Ŧŧ Ūū Ȳȳ
\r Åå Ůů
\b Ḇḇ Ḏḏ Ḵḵ Ḻḻ Ṉṉ Ṟṟ Ṯṯ Ẕẕ
\d Ạạ Ḅḅ Ḍḍ Ẹẹ Ḥḥ Ịị Ḳḳ Ḷḷ Ṃṃ Ṇṇ Ọọ Ṛṛ Ṣṣ Ṭṭ Ụụ Ṿṿ Ẉẉ Ỵỵ Ẓẓ
\f Ȃȃ Ȇȇ Ȋȋ Ȏȏ Ȓȓ Ȗȗ
\C Ȁȁ Ȅȅ Ȉȉ Ȍȍ Ȑȑ Ȕȕ
\T Ḛḛ Ḭḭ Ṵṵ
\V Ḓḓ Ḙḙ Ḽḽ Ṋṋ Ṱṱ Ṷṷ
\D Ḁḁ
\h Ảả Ẻẻ Ỉỉ Ỏỏ Ủủ Ỷỷ

Some accents are not standard. Examples:

If in the table you see `I´ instead of `x´, this means that the accent applies only on capital I. If you see h, j, t, w or y, this applies only to the lower case letter. Otherwise the accent applies to both upper case letter and lower case letter.

There is a possibility to put double accents (for Vietnamese, for instance). The following ones are recognized, for upper and lower case letters, the order of the accents is irrelevant. Inside braces, there is an accent command, optional spaces, and a character (maybe enclosed in braces).

\"{\=U} \"{\=A} \"{\=O} \"{\'U} \"{\'I} \"{\`U} \.{\=A} \.{\=O}
\={\~ O} \k{\=O} \'{\~U} \'{\O} \'{\=O} \'{\=E} \'{\.S} \c{\' C}
\'{\^A} \'{\^O} \'{\^E} \`{\=O} \`{\=E} \d{\=L} \d{\=R}
\`{\^ A} \`{\^ E} \H{\'U} \H{\'O} \H{\`U} \H{\`O} \H{\h U} \H{\h O}
\H{\~U} \H{\~O} \H{\d U} \H{\d O} \d{\^A} \d{\^O} \d{\^E} \~{\^A}
\~{\^O} \~{\^E} \h{\^A} \h{\^O} \h{\^E} \u{\'A} \u{\`A} \u{\h A}
\u{\~A} \u{\d A} \~{\" O} \^{\'O} \^{\`O} \u{\c E} \.{\v S} \.{\d S}

This is the translation.

&#x1E7A; &#x1DE; &#x22A; &#x1D7; &#x1E2E; &#x1DB; &#x1E0; &#x230;
&#x22C; &#x1EC; &#x1E78; &#x1FE; &#x1E52; &#x1E16; &#x1E64; &#x1E08;
&#x1EA4; &#x1ED0; &#x1EBE; &#x1E50; &#x1E14; &#x1E38; &#x1E5C;
&#x1EA6; &#x1EC0; &#x1EE8; &#x1EDA; &#x1EEA; &#x1EDC; &#x1EEC; &#x1EDE;
&#x1EEE; &#x1EE0; &#x1EF0; &#x1EE2; &#x1EAC; &#x1ED8; &#x1EC6; &#x1EAA;
&#x1ED6; &#x1EC4; &#x1EA8; &#x1ED4; &#x1EC2; &#x1EAE; &#x1EB0; &#x1EB2;
&#x1EB4; &#x1EB6; &#x1E4E; &#x1ED0; &#x1ED2; &#x1E1C; &#x1E66; &#x1E68;
Figure 2. Some characters

You can see a representation on figure 2. We show here the HTML output for these characters. Ṻ Ǟ Ȫ Ǘ Ḯ Ǜ Ǡ Ȱ Ȭ Ǭ Ṹ Ǿ Ṓ Ḗ Ṥ Ḉ Ấ Ố Ế Ṑ Ḕ Ḹ Ṝ Ầ Ề Ứ Ớ Ừ Ờ Ử Ở Ữ Ỡ Ự Ợ Ậ Ộ Ệ Ẫ Ỗ Ễ Ẩ Ổ Ể Ắ Ằ Ẳ Ẵ Ặ Ṏ Ố Ồ Ḝ Ṧ Ṩ ṻ ǟ ȫ ǘ ḯ ǜ ǡ ȱ ȭ ǭ ṹ ǿ ṓ ḗ ṥ ḉ ấ ố ế ṑ ḕ ḹ ṝ ầ ề ứ ớ ừ ờ ử ở ữ ỡ ự ợ ậ ộ ệ ẫ ỗ ễ ẩ ổ ể ắ ằ ẳ ẵ ặ ṏ ố ồ ḝ ṧ ṩ. The first character in the list is: latin capital letter u with macron and diaeresis, the second one is latin capital letter a with diaeresis and macron. The order of accents is not the same. For simplicity, in Tralics, this is irrelevant. You can notice that the LaTeX output is strange. First, we have defined \h to be a no-op. For the figure, we used the following code:

 \leavevmode\setbox0\hbox{#1}\dimen@\ht0 \advance\dimen@.5ex

This code works, provided that the font has, at position 11, something that looks like a hook (for the T1 encoding, this is a cedilla). In LaTeX you cannot put a \" accent on \=U. No error is signaled, it is just that TeX puts the accent before the accentee in case the accentee is not a character, instead of putting it above(note: ). You can say \"{\´U} because \´U is a character in the T1 encoding. The \mathaccent command has not these limitations(note: ). The first character of the figure was composed via


In German, the umlaut character has a special meaning. The following example shows what can be input. See the babel documentation for details.


and the Tralics translation.


The previous hack does not apply if the double quote character has category code 11 (letter), is in an URL, or in a file name to be read (for instance, via \includegraphics).

The translation of the dash character is the following. If this character appears in an URL or while reading a file name, it is left unchanged. If its category code is 11 (letter), usually inside a verbatim environment, its translation is a dash followed by a \textnospace, unless you invoke Tralics with the -nozerowidthspace switch, case where the translation is a single hyphen. Otherwise, a test is made for a ligature: three hyphens in a row produce &#x2014; (mdash), and two hyphens produce &#x2013; (ndash).

The characters: :;!?«» are handled normally if inside an URL, when translating a filename, when their category code is 11 (letter, typically, inside a verbatim), or when the current language is not French. The translation of « is an opening guillemet with some space after it. If the character that follows is (after expansion), a normal space, or a ~, or a \,, it will be discarded. The translation of » is a space plus the character. If the previous character is a space, it will be removed. (TeX has a primitive \unhskip that can remove a space; the Tralics equivalent works in usual cases). The other four punctuation characters are handled like a closing guillemet. In any case, the space added by these characters is a non-breaking one.

The characters `´<> behave in a special manner, in the same case as the guillemets. In fact, if they are doubled, French guillemets will be used instead. Thus <<foo>> and ``foo'' and «foo» behave the same, if the current language is French. Otherwise, a \textnospace will be added after the character, in the same way as for a dash, namely outside an URL, file name, but if the category code is 11 (letter), and the magic switch has not been given. Example:

\language = 0
test ligatures: <<>>``''-- et --- !?:;
test ligatures: <<>>``''-- et --- !?:;
test ligatures:\verb=<<>>``''-- et --- !?:;=

This is the translation

test ligatures: &lt;&lt;&gt;&gt;``''&#x2013; et &#x2014; !?:;
test ligatures : «  »«  »&#x2013; et &#x2014; ! ? : ;
test ligatures :<hi rend='tt'>&lt;&#x200B;&lt;&#x200B;
     -&#x200B; et -&#x200B;-&#x200B;-&#x200B; !?:;</hi>

Conversion into HTML gives test ligatures: <<>>“”– et — !?:; test ligatures : «  »“”– et — ! ? : ; test ligatures :<<>>``''-- et --- !?:;.

The translation of the apostrophe depends on a flag. If Tralics is called with the switch -nostraightquotes, the translation is the same as \textasciiacute, the character U+B4, otherwise it is the quote character U+27. The character is handled normally if inside an URL, when translating a filename, when their category code is 11 (letter, typically, inside a verbatim). This is the translation of the same example as above, whith options -nostraightquotes and -nozerowidthspace. We added option -oe1a, this shows nobreak space as &#xA0;.

test ligatures: &lt;&lt;&gt;&gt;``&#xB4;&#xB4;&#x2013; et &#x2014; !?:;
test ligatures&#xA0;: &#xAB;&#xA0;&#xA0;&#xBB;&#xAB;&#xA0;&#xA0;
  &#xBB;&#x2013; et &#x2014;&#xA0;!&#xA0;?&#xA0;:&#xA0;;
test ligatures&#xA0;:<hi rend='tt'>&lt;&lt;&gt;&gt;``''--&#xA0;et

The soul package provides some commands. Example; \ul gives test for ul, \so gives test for so, \st gives test for st, \caps gives test for caps, \hl gives test for hl.

5.5. Verbatim material

We have seen a little example of verbatim code above. It shows that some &#x200B; characters are inserted, this is so that, if the XML file is read, a double dash will not be interpreted as an en-dash. What the \verb command produces is a sequence of characters, whose category codes are 12, except for some, that are of category 11, namely `´-<>~&:;?!«». You can compare this with the LaTeX code, shown in section 2.12: the \@noligs command makes some characters of category code 13, the associated action is: output the character, with a zero kern in front. There is an exception: the space character is replaced by the \nobreakspace token, but this can be changed.

You can say \verb*+x y+ or \verb+ x y+. All characters between the two plus signs are collected. Any character can be used instead of the plus sign (Try \verb*abca and \verb =a= !). In the case where a star is given, spaces are replaced by \textvisiblespace, otherwise by \nobreakspace. You can say \DefineShortVerb\+, after that +foo+ is the same as \verb+foo+. Note that the command must be followed by something like `\+´ or `\*´, i.e., a macro whose name is formed of a single character. You can say \UndefineShortVerb\+, this will undo the previous command. The syntax is the same. If the character fits on 8 bits, the old category code is restored; otherwise, it is set to 12 (other). Note: assume that the input encoding is latin1, but you declare ^^^^abcd as a short verb. When Tralics sees the four hats, it replaces these 8 bytes by a single character, say C, and enters verbatim mode until finding character C. Since this character does not exist in the current environment, it cannot be found directly; since we are in verbatim mode, it cannot be found using the four-hat construction. For this reason an error is signalled when the end of line is reached (an implicit C character is inserted, so that next line will be translated normally).

In the case where `+´ is a short verb character, you can say \SaveVerb{foo}+\bar+. This has as effect to remember in a private command all tokens that +\bar+ gathers. When you say \UseVerb{foo}, these tokens are re-inserted in the input stream. Example:

\SaveVerb{foo}+\bar +

The transcript file will contain, for the \UseVerb command the following line.

\savedverb@foo ->\verbprefix {\verbatimfont \bar\nobreakspace }

Here, the \ before `b´ is not a command delimiter, for otherwise there would have been a space after \bar. Note: another explanation is that the `b´ is not of category code 11, so that the command is \b; exercise: find all interpretations of this line.

There are various packages that provide a verbatim-like environment. In Tralics, you can define your own via


This defines MyVerbatim to be an environment that behaves like Verbatim, that is an extension of the basic verbatim environnment that takes some optional parameters (here, the default value of xx is yy). The end of a verbatim environment is defined as a line that contains optional spaces, the \end token, optional spaces, the name of the environment enclosed in braces. Additional characters on the current line are assumed to be after the verbatim environment.

In the case of a verbatim environment, all characters on the line are gathered (final spaces disappear, as usual), with category codes as explained above. If this gives an empty list, a no-break space character is added(note: ). As is the case of \verb, the \verbatimfont command is prepended. This is defined to be \tt. Moreover, \verbatimprefix is also added in front of the token list. In the case of the \verb command, there is \verbprefix instead. These two commands are defined as \@empty. You can redefine them. Each line is followed by \par and \noindent. If the environment is followed by an empty line, or a \par command, this command is removed, as well as the last \indent. Example that shows use of the prefix commands:

Test: \verb+foo+ and |bar|

The translation is:

<p>Test: AfooA and AbarA</p>
<p noindent='true'>Bline1B</p>
<p noindent='true'>Bline2B</p>
<p noindent='true'></p>

The Verbatim environment is an extension of the verbatim environment. There is an optional argument, an association list. If you say `numbers=true´, then lines will be numbered (instead of `true´, you can say `left´ or `right´, or anything, the value is ignored). If you say `counter=17´, then lines will be numbered, using counter 17, if you say `counter=foo´, and `foo´ is a counter name, then lines will be numbered, using counter foo. If you say `firstnumber=N´, where N is a number, then lines will be numbered starting from N; if you say `firstnumber=last´, then lines will be numbered incrementing the previous value. The default counter is FancyVerbLine. Other features defined by the fancyvrb package have not yet been implemented.

If a line number M is given, the following piece of code is inserted before the verbatim line: {\verbatimnumberfont{M}}\space. The funny command is \let equal to \small at the start of the run. The number is incremented for each line.

Characters after \begin{Verbatim}, but on the same line, are ignored. The same is true if an optional argument is given: all characters that follow the closing bracket of the optional argument are ignored. The opening bracket is only looked for on the current line (unless the end of line character is commented out).

\begin{Verbatim}                   [numbers=true]
and without
\begin{Verbatim} %
[ok] this is handled as comment
\def\verbatimfont#1{{\it #1}}
5,one line
6,one line
9,one line
10,one line

This is the translation.

<p noindent='true'><hi rend='small1'>1</hi> <hi rend='tt'>TEST</hi></p>
<p noindent='true'>and without</p>
<p noindent='true'><hi rend='tt'>[ok]TEST</hi></p>
<p noindent='true'></p>
<p noindent='true'><hi rend='tt'>TEST</hi></p>
<p noindent='true'><hi rend='large1'>5</hi> <hi rend='it'>5</hi>,one line</p>
<p noindent='true'><hi rend='large1'>6</hi> <hi rend='it'>6</hi>,one line</p>
<p noindent='true'></p>
<p noindent='true'><hi rend='large1'>9</hi> <hi rend='it'>9</hi>,one line</p>
<p noindent='true'><hi rend='large1'>10</hi> <hi rend='it'>1</hi>0,one line</p>
<p noindent='true'></p>

Two additional keywords have been added. In order to be compatible, you should add the following code to the TeX document.

\csname define@key\endcsname{FV}{style}{}
\csname define@key\endcsname{FV}{pre}{}

If you say style=foo, then the token \FV@style@foo is added in front of the token list generated by the verbatim environment. If you say pre=bar, then \FV@pre@bar is added before the token list (and before the style token mentioned above), and \FV@post@bar is inserted near the end (to be precise: before the last \par or \par\noindent. For a case like this

first line
second line
third line

the tokens gathered by the verbatim environment, shown in the transcript file in verbose mode, and re-indented in order to make the structure easy to recognise, are

{Verbatim tokens:
 \FV@pre@pre \FV@style@latex
  \par \noindent {\verbatimnumberfont {1}}
      \verbatimprefix {\verbatimfont first\nobreakspace line}
  \par \noindent {\verbatimnumberfont {2}}
      \verbatimprefix {\verbatimfont second\nobreakspace line}
 \par \noindent }

Assume that the following definitions are given


Then the translation is

<pre class='latex-code'>
<p noindent='true'>
     <hi rend='small'>1</hi>
     <hi rend='tt'>first&nbsp;line</hi></p>^^J
<p noindent='true'>
     <hi rend='small'>2</hi>
     <hi rend='tt'>second&nbsp;line</hi></p>^^J
</pre><p noindent='true'>third line^^J

Note: We have re-indented a little bit the code, and marked newline characters by ^^J. As you can see, each verbatim line gives exactly one line in the XML output, and this line is formed of a <p> element. If you apply a style sheet with the following definition

<xsl:template match="p">
    <xsl:when test="parent::pre">
        <xsl:if test="@noindent = 'true'">
          <xsl:attribute name="class">nofirst noindent</xsl:attribute>

then <p> elements are discarded in a <pre>, and some action is done in case of noindented paragraphs. If moreover the translation of <pre> is defined by the following code

<xsl:template match="pre">
    <xsl:attribute name="class">
      <xsl:value-of select="@class"/>

we get finally

<pre class="latex-code"><small>1</small> <tt>first line</tt>
<small>2</small> <tt>second line</tt>
<p class="nofirst noindent">third line</p>

This is not valid HTML, since <small> is forbidden in a <pre>. We can modify the style sheet so that if <hi> is in a <pre>, then a special action is taken in the case rend=´small´; we can also remove the useless <tt>. A better solution: we uncomment the definition of \verbatimnumberfont. This will have as effect that verbatim line numbers will be in a <vbnumber> element, and we can apply the following transformation.

<xsl:template match="vbnumber">
  <span class='prenumber'>

Thus, the HTML code will be

<pre class="latex-code"><span class="prenumber">1</span> first line
<span class="prenumber">2</span> second line
<p class="nofirst noindent">third line</p>

This document was converted into HTML using the techniques shown here. The style sheet changes the background color of the <pre> element, according to its class, and the background of the <span> to the background of the page.

Note how the `style´ option of the verbatim environment gives a `class´ attribute in HTML document. If you say

{listparameters={\topsep0pt },pre=pre}

then verbatim behaves like Verbatim, said otherwise, an optional argument is scanned. Moreover, the list on the second line will be put in \verbatim@hook; whenever a verbatim environment of type `Something´ is read, the value of the command \Something@hook is considered (this should be undefined or a command that takes no argument), and the tokens are added to the optional argument, before other arguments.

You can say \numberedverbatim or \unnumberedverbatim. After that, verbatim environments will be automatically numbered or not. This does not apply to Verbatim environments.

There is a command \fvset that takes an associated list as argument. If it contains `showspaces=true´ or `showspaces=false´, this changes how spaces are interpreted in a verbatim environment or command (except for \verb*, case where the space is always visible).

5.6. Case change

There are different commands for changing the case of letters. For instance, the translation of

\uppercase{Einstéin: $E=mc^2$}
\lowercase{Einstéin: $E=mc^2$}


<p>EINSTÉIN: <formula type='inline'>
<math xmlns=''>
<mrow><mi>E</mi><mo>=</mo><mi>M</mi><msup><mi>C</mi> <mn>2</mn> </msup>
einstéin: <formula type='inline'>
<math xmlns=''>
<mrow><mi>e</mi><mo>=</mo><mi>m</mi><msup><mi>c</mi> <mn>2</mn> </msup>

There are two tables that control these conversions: the lc-table and the uc-table. If the lc value of a character is non-zero, it´s the lowercase equivalent of the character; otherwise, the character is left unchanged by \lowercase. The same is true for the uc-table. You can use \lccode and \uccode for changing these tables. They are initialized like this: for all integers x with value between `a´ and `z´, and between `à´ and `ÿ´, the uc value is x-32, the lc value is x, the same holds for x-32. There are four exceptions: the pair 215, 247, this is multiplication and division sign, and the pair 223, 255 this is ß and ÿ. On the other hand, we used the pair 255, 376 (for ÿ and Ÿ).

You can use the two commands \MakeUppercase and \MakeLowercase. These commands have a regular syntax (in the example that follows, the \expandafter would be useless for \lowercase). They convert letters, as for \uppercase and \lowercase, plus some commands that define some characters. This example shows the list of all the recognised commands.


The translation is


This gives ABCABCÉÉFOOŒŒØØÆÆÐÐĐĐŁŁŊŊSSSSÞÞ and abcabcééfooœœøøææððđđłłŋŋßßþþ.

Since Tralics version 2.9, all commands listed above expand to characters, that have a non-trivial uc/lc pair. Hence, you can say:


This gives ABCABCÉÉŸŸFOOIJIJŒŒØØÆÆÐÐĐĐŁŁŊŊßSSÞÞ, and abcabcééÿÿfooijijœœøøææððđđłłŋŋßSSþþ.

5.7. Simple commands

We consider here some commands that take no arguments. Unless told otherwise, they are not allowed in math mode. A new paragraph is started (via \leavevmode) in vertical mode.

The following commands all start with `text´. They are forbidden in math mode.

The following commands are accepted in text and math mode.

Following commands expand to a Unicode Character.

5.8. The fp package

5.8.1. Introduction

This is an implementation in C++ of the package by Michael Mehlich. It implements fixed point arithmetics in TeX. Each number is formed by a sign, then 18 digits before the point and 18 digits after the point. Since 10 9 2 30 , four 32bits integers are sufficient. In the code, we shall sometimes write a number as

x= i=-6 5 b i B i =10 -18 ( i=0 11 c i B i )

where B=1000, b i and c i are integers between 0 and 999. This requires 12 integers, instead of 4, but is useful for internal operations. You can say


This will put 13.5 in \foo and 282.5 in \xbar. In verbose mode, you will see that the transcript file contains lines of the form:

{FPread for \FP@add=+10.}
{FPread for \FP@add=+3.5}

In reality, the first input line is converted into


Most commands follow this scheme. There are some exceptions. You can use \FPprint. This takes one argument and prints (typesets) it. The algorithm is a bit strange: if the argument list is empty, the result is 0. If the argument is 123, or more generally a list of tokens, where the first has category code 12 (other), then nothing happens, the arguments is translated normally. If the argument is `foo´, the result is `13.5´. More generally, if the first item is a character not of category code 12, the command behaves like \csname. Don´t try constructions like \FPprint{$x^2$}. You can say \FPset{gee}{foo}. The second argument is handled as for \FPprint. The first argument should be a command name, or a sequence of characters that becomes a command name via \csname. The effect of the command is the same as \def\gee{13.5}.

The general mechanism for a command like \FPadd or \FPsincos is to call intermediary commands like \FP@add or \FP@sincos. These read some command names (these must be definable, no check is make, as for \let), then parse numerical arguments, compute results and store the results in the commands. The result is always normalized: trailing zeroes are removed as well as leading zeroes (but at least one digit is returned before the point). If the number is negative a sign is added. A special case is when the result is boolean. In this case the syntax has the form

\FPiflt{0.21}{0.20} Wrong\else Correct\fi

As a side effect, \ifFPtest is made equivalent to \iftrue or \iffalse. The following line is valid in Tralics, it gives an error in LaTeX.

\iffalse  \FPiflt{0.21}{0.20} \bad\else \badagain\fi \fi

Numbers are read as follows: We assume that \FP@add sees a string that contains two dots and a \relax, see above. This means that you lose if the argument of the user command contains a \relax. Otherwise, we have a list A, a dot, a list B, a dot, a list C, then \relax. As you can see from the \FPmul\xbar\foo\foo example, these quantities are obtained by expanding the argument (here \foo) in a \edef. For some reason quantities \A and \B are expanded again. In a case like


this gives \V after expansion, this is wrong. In a case like


expansion of A is 10.2, this is equally wrong: After expansion, there should remain only digits in A and B; there can be an optional sign at the start of A: any combination of + and - characters is OK. Note that C, as well as all digits after a space in A or B are ignored. Thus, the following two lines are valid for Tralics, invalid in the TeX case.

\def\V{10 .2}

5.8.2. The list of all commands

In the list that follows, \C, \Ca, \Cb are command names, and \V, \Va, \Vb are values.

5.8.3. Alternate syntax

The command \FPupn implements a postfix language that allows you to write shorter code. Here is an example

\FPupn\foo{7 20 2 sub 100 2000 - add +}
\FPupn\foo{20 2 div 100 2000 / add 3 mul 2 *}
\def\mthree{-3}%there is no unary minus in this language
\FPupn\foo{ 3 abs mthree abs 3 sgn 10 * mthree sgn 100 * + + +}
\FPupn\foo{2 3 min 400 500 max +}
\FPupn\foo{12.43745678 2 round 12.35745678 2 trunc -}
\FPupn\foo{e 1.2  exp + 2.3 ln + 3 4 pow + 5 6 root +}
\FPupn\foo{pi 0.7 - sin cos sincos - tan cot tancot +}
\FPupn\foo{0.3 arcsin 0.1 * arccos 0.1 * arcsincos -
           arctan arccot arctancot -}
\FPupn\foo{3.4 seed random}
\FPupn\foo{1.1 2.3 3.4 pop swap copy add sub}

The \testeq command can be used to test the code. It is an error if the two arguments are not the same. Some comments. Consider the last expression. We put 1.1, 2.3 and 3.4 on the stack. After that we pop an item. After that we swap. The stack holds 1.1 (top stack), followed by 2.3. Then we duplicate the top stack. Then we add. The topstack is now 2.2. After subtraction, we get 0.1. If you say `2 3 -´, the result is 1, because - and sub use arguments in a different order. The same is true for / and div. Note the order of 10 2 pow, this gives 1024. If strange words are seen, like `mthree´, they are replaced by \mthree. Note that `e´ and `pi´ are predefined.

If you don´t like postfix language, you can use \FPeval. Here are some examples.

\FPeval\xfoo{(20 - 2) + (2000-100) + 7}
\FPeval\xfoo{(20/2 + 2000/100)*3*2}
\FPeval\xfoo{abs(3) + abs(-3) + (sgn(3)* 10) + (sgn(-3) * 100)}
\FPeval\xfoo{min(2:3) + max(400,500)}
\FPeval\xfoo{round(12.43745678,2) -  trunc(12.35745678, 2)}
\FPeval\xfoo{e + exp(1.2)  + ln(2.3) + pow(3, 4) + root(5, 6)}
\FPeval\xfooa{sin(cos(sin(0.7 - pi))) - cos(cos(sin(0.7 - pi)))}
\FPeval\xfoo{tan (cot(tan(xfooa))) + cot(cot(tan(xfooa)))}
\FPeval\xfooa{arcsin (arccos (arcsin(0.3)*0.1)*0.1) -
               arccos (arccos (arcsin(0.3)*0.1)*0.1)}
\FPeval\xfoo{arctan(arccot(arctan(xfooa))) - arccot(arccot(arctan(xfooa)))}

If you wonder what happens, you can look the transcript file. You can see something like:

{FPpostfix 1  2  3  mul add  400 500  max  sin 4  pow add}
{FPupcmd ??}
{FPupcmd ??}
{FPupcmd ??}
{FPupcmd mul}
{FPread for \FP@upn=+3.}
{FPread for \FP@upn=+2.}
{FPupcmd add}
{FPread for \FP@upn=+6.}
{FPread for \FP@upn=+1.}
{FPupcmd ??}
{FPupcmd ??}
{FPupcmd max}
{FPread for \FP@upn=+500.}
{FPread for \FP@upn=+400.}
{FPupcmd sin}
{FPread for \FP@upn=+500.}
{FPupcmd ??}
{FPupcmd pow}
{FPread for \FP@upn=+4.}
{FPread for \FP@upn=-0.467771805322476126}
{FPupcmd add}
{FPread for \FP@upn=+0.522845423476396576}
{FPread for \FP@upn=+7.}
{FPread for \FP@upn=+7.522845423476396576}

The second line is the expression converted from infix to postfix. Each `??´ represents a string that does not start with a letter. This is generally a number.

5.9. Action before translation

Normally, translation applies only to what is between \begin{document} and \end{document}. This is a very special environment, in fact, it leaves the semantics stack pointer unchanged. There are two hooks. You can say


These commands remember the tokens in a special list, that is inserted in the input stream when \begin{document} or \end{document} is seen. After that, the meaning of the command changes: it becomes `evaluate now´, more precisely \@firstofone. The last action in the begin-document hook is to change the definition again, so that an error may be signaled, for instance Can be used only in preamble: \AtBeginDocument. On the other hand, the \end{document} command inserts a special marker that closes every open file, thus stopping translation at the end of the hook (the bibliography is translated after that). The command \@onlypreamble takes as argument a command name and adds it to the list of commands that become invalid after the preamble.

Before the begin of the document, you can use commands of the form


There are some differences with LaTeX, see next section. If `doc-opt´ contains `useallsizes´ this is the same as if a line in the configuration file has said to use all font sizes. If it contains `french´ or `english´, this defines the default language.

Before version 2.9, the name or options of the class could indicate the top-level section; for instance, book assumed `leadingpart´ and report assumed `leadingchapter´; these keywords are no more recognised. You have to say \toplevelsection{\part} in the class file if you want it to be `part´. The default is `part´ for a book, `chapter´ for a report, `section´ otherwise. If the top-level section is part, chapter, or section, the translation of \subsection is, respectively, a <div3>, <div2> or <div1> element. Moreover, an attribute pair chapters=`true´ or part=`true´ is added to the main element, so that a post-processor can decide that <div1> is subsection, section or chapter.

If the packages `calc´, `fp´ or `fancyhdr´ are loaded, then the meaning of some command changes, as explained elsewhere. If the `babel´ package is loaded, the following languages are recognized: english, american, british, canadian, UKenglish, USenglish (these have number 0), french, francais, frenchb, acadian, canadien (these have number 1) austrian, german, germanb, naustrian, ngerman (these have number 2). The first language in the list is the default language. If a package is named `french´ or `frenchle´ or `german´, the default language is also set. The default language can be used in the attribute list of the main document element. Setting the default language also set the current language (value of \language).

Note that the current version of the babel package accepts 63 options, which are language names, and if you specify an option not in the list, for instance `foo´, then foo.ldf is loaded if possible, so that the number of options could be greater. There are three other options that are recognised by certain languages and whose purpose is to make some characters active. We have shown in section 5.4 how Tralics handles double quotes in German. For instance, in spanish, if option `activeacute´ is given then ´a is a shorthand for \´a, and this applies to 12 other characters. In catalan, you can also use option `activegrave´, this makes `a a shorthand for \`a, it applies only to A, E and O. Finally option `KeepShorthandsActive´ controls whether shorthands are activated by default. The `french´ package no longer exists, there are two versions `frenchpro´ (commercial) and `frenchle´ (free); there are two versions of the for German, `german´ and `ngerman´. All features of these packages can be found in babel (with possibly differences in the syntax). These packages have no options.

The `calc´ and `fancyhdr´ packages have no options. The `fp´ package has two options, `debug´ and `nomessages´ that are ignored by Tralics.

The standard configuration file contains lines like these:

  on package loaded calc CALC = "true"
  on package loaded foo/bar FOO1 = "true"
  on package loaded *foo/bar FOO2 = "true"
  on package loaded foo/*bar FOO3 = "true"
  on package loaded *foo/*bar FOO4 = "true"

You can also say

  on_package_loaded calc CALC = "true"
  on_package_option calc CALC = "true"
  on_class_option article CALC = "true"
  on class option */* CALC = "*+"

Before version 2.8, these lines of codes provoked some actions. They are now ignored. You should use classes and packages instead.

5.10. Classes and packages

We explain in this section how Tralics implements package and classes.

Assume that we have a file named myclass.clt, whose content is given here and will be explained later:

1 \ProvidesClass{mypackage}[2006/08/19 v1.0 myclass document class for Tralics]
2 \NeedsTeXFormat{LaTeX2e}[1995/12/01]
3 \DeclareOption{a}{\typeout{option A}}
4 \DeclareOption{b}{\typeout{option B}}
5 \DeclareOption{d}{\typeout{option D}}
6 \AtEndOfClass{\typeout{End of class}}
7 \typeout{Before execute options}
8 \ExecuteOptions{a}
9 \ProcessOptions\relax
10 \endinput

and a file named mypack.plt, containing this single line:

11 \ProvidesPackage{mypack}[2006/10/10 My package]

a file named mypack1.plt with

\ProvidesPackage{mypack1}[2006/10/10 My package]
\typeout{Loading file mypack1}
\ProcessOptions \relax

and finally a file named mypack2.plt

12 \ProvidesPackage{mypack2}[2006/10/10 My package]
13 \DeclareOption{e}{\typeout{Option E}}
15 \@ifpackageloaded{mypack}
16   {\typeout{Seen package mypack}}
17   {\typeout{Package mypack missing}}
19 \@ifpackagelater{mypack}{2006/11/11}
20   {\typeout{Seen good package mypack}}
21   {\typeout{Package mypack obsolete}}
23 \@ifpackagewith{mypack1}{x}
24   {\typeout{Seen mypack with x}}
25   {}
26 \@ifpackagewith{mypack1}{x,y}
27   {\typeout{Seen mypack with x and y}}
28   {}
29 \@ifpackagewith{mypack1}{x,y,z}
30   {\typeout{Seen mypack with x, y and z}}
31   {}
33 \ProcessOptions\relax
34 \endinput

Assume that we have a source document containing the following lines

35 \AtBeginDocument{\typeout{Begin Document}}
36 \documentclass[a,b,c,e]{myclass}[2007/03/05]
37 \typeout{In preamble}
38 \usepackage{mypack}
39 \usepackage[y,x,w]{mypack1}
40 \usepackage{mypack2}[2000/00/00]
41 \usepackage[y,x,w]{mypack1}
42 \usepackage[aa,bb]{mypack1}
44 \begin{document}
45 Text
46 \end{document}

When Tralics translates the document above, you will see

47 This is tralics  2.11.7, a LaTeX to XML translator
48 Copyright INRIA/MIAOU/APICS 2002-2008, Jos\'e Grimm
49 Licensed under the CeCILL Free Software Licensing Agreement
50 Starting translation of file toto.tex.
51 Warning: class myclass claims to be mypackage.
52 Document class: mypackage 2006/08/19 v1.0 myclass document class for Tralics
53 Before execute options
54 option A
55 option A
56 option B
57 Warning: You have requested, on line 3, version
58 `2007/03/05' of class myclass,
59 but only version
60 `2006/08/19 v1.0 myclass document class for Tralics' is available
61 End of class
62 In preamble
63 Loading file mypack1
64 Unknown option `w' for package `mypack1'
65 Seen package mypack
66 Package mypack obsolete
67 Seen mypack with x
68 Seen mypack with x and y
69 Option E
70 Option clash in \usepackage mypack1
71 Old options: y,x,w.
72 New options: aa,bb.
73 Tralics Warning: Unused global option:
74     c.
75 Begin Document

The \documentclass command has three arguments, an optional one, that defines the class options, a required one that defines the class name, and an optional one that indicates a date. Tralics reads 8 digits, with some separators, but LaTeX is a bit more exigent, four digits for the year, two digits for the month, two for the year, with slashses as separator, see above.

Evaluating the command is complicated. In fact, Tralics reads the file with extension `.clt´ (instead of `.cls´), in either the current directory or the directory containing other configuration files, compares dates, and evaluates options. The behavior of \usepackage is similar. There are two differences. The first is that the mandatory argument of \usepackage can contain a comma-separated list of files; the second is that class options that are not used by the class can be used by packages. These options are called global options.

The \usepackage command must be used after \documentclass, before \begin{document}, there is a synonym \RequirePackage that can be used before the documentclass (this subltety is not implemented in Tralics, both commands are always defined the same). There is also \LoadClass; this behaves like \documentclass, with some exceptions: for instance, the options of this command are not global options; there has to be a single \documentclass (LaTeX has additional requirements).

The two commands \LoadClassWithOptions and \RequirePackageWithOptions behave the same as the commands without the `WithOptions´ but they take only two arguments: you give only a file name and maybe a date, you do not give options, because current options are used. Finally, \InputClass is a command defined by Tralics, that behaves like \input, but: the file (with extension .clt) is looked at in the same place as class files, and it can contain option declarations that apply to the current class (outside a class or package, you cannot declare options).

The class file should contain an identification line. This is like line 1 above, starting with the command \ProvidesClass. You can also use \ProvidesPackage, the behavior is the same. You can also use \ProvidesFile; in this case the identification line is printed on the transcript file, nothing more happens. In the case of the two other commands, the line is printed on the transcript file (in the case of a class, on the terminal as well, see transcript, line 52), and the date is parsed and remembered. The argument of the command should be the same as the file name, or else a warning is printed (line 51).

You can use the commands \AtEndOfPackage or \AtEndOfClass. These commands take an argument, whose content is added to the list of commands to be executed at the end of the class or package. In fact, when the end of file is seen, Tralics will insert and evaluate these tokens (example line 61); moreover a warning will be issued if there are options and the package does not process them (either via \ProcessOptions or via \PassOptionToPackage). Finally, a warning will be issued if the class or package is obsolete, i.e., earlier than the date argument of the usepackage or documentclass command (lines 57 to 60).

The commands \@ifclassloaded or \@ifpackageloaded take three arguments, P, A and B; they evaluate the token list A in case the class or package P is loaded, the token list B otherwise (example line 65). The commands \@ifclasslater or \@ifpackagelater take four arguments, P, D, A and B; they evaluate the token list A in case the class or package P is loaded with a date more recent than D, the token list B otherwise (example line 66). The commands \@ifclasswith or \@ifpackagewith take four arguments, P, L, A and B; they evaluate the token list A in case the class or package P is loaded with options L, the token list B otherwise (example line 67, 68). The order of elements in L is irrelevant, the test is true if the package has been loaded with additional options.

The two commands \PassOptionToClass and \PassOptionToPackage take two arguments: an option list and a class name (or a package name), they add the options to the list of options of the package (this is uselesss if the class or package is not loaded later).

The command \DeclareOption takes two arguments A and B, where A is an option name and B a token list, the action associated to the option. If the option is processed, this list is evaluated. The command \ExecuteOptions takes a list of options as argument, and processes them in the given order (it processes only options defined in the current file). In the transcript given above, line 54 is the result of such an action. The command \ProcessOptions executes all options relevant to the current file. In the example, we have four class options, a, b, c, and e. The class defines a, b, and d, you see the result on lines 55 and 56. Nothing special happens for an option like d that is defined in the class but never referenced. Undefined package options generally provoke a warning (line 64). Unused class options (like c and e) become global options. In fact, option e is defined in `mypack2´, see transcript line 69. Thus, we have a single unused global option (see lines 73 and 74). Note the order of evaluation: if a star follows \ProcessOptions, the order is defined by the user (main document) otherwise by the system (class or package). The LaTeX documentation claims that reading this star may provoke expansion of commands that follows(note: ), hence advises to use \relax in the case where no star is given.

If you say \DeclareOption*{\foo}, the \foo command is applied to every options not secified elsewhere. The name of the option will be in \CurrentOption. You can use \OptionNotUsed if you want to say that an option is not used. The command \NeedsTeXFormat takes a normal argument and an optional argument, but Tralics does nothing with them. The command \default@ds is an alias for \OptionNotUsed. You should not use it.

We give now an extract of the transcript file. Notice that, usually, the optional date argument of the \usepackage command is omitted, and Tralics reads the start of the next line in order to see if it is there. For instance, the first token on input line 6 is read in order to see if it is an open bracket (transcript line 118, 119). The command is evaluated later (transcript line 130). Note that Tralics restores a variable, cur_file_pos, this is the index of the current file in the list of packages or classes; it is zero outside a class or package. You can also see that the category code of the `@´ character is set to 11 (letter) at the start of the file, and restored later.

Read carefully lines 99 to 102. In non-verbose mode, only line 101 is printed. This line says that options a, b, a, and b are executed. In fact, if the package defines options A and B, uses options a and b, and we have global options x, y, there is a first pass; we mark all options from the list A, B that are in the list a, b, x, y (see transcript, lines 99 and 100, or 145, 146); when an option is marked, its code is put in the todo list, and its code is removed. As a side effect, options are executed only once (on the other hand \ExecuteOptions leaves the option unchanged, so that the code may be executed more than once; once options are processed, executing them is a no-op). If you say \ProcessOptions*, the loop is on the global option list x, y, options are marked if they are in A, B. There is then a second pass on the list a, b. If the element is in the list A, B, it will be executed (in the case where is no star in the command, the option has already been used; otherwise it will be added to the to-do list; notice how this defines the order of evaluation of the options). If the element is not in the list, a fall back behaviour is used; if \DeclareOption* {\foo} has been issued, then then \def \CurrentOption {a} \foo will be added to the to-do list. Otherwise a warning is printed in the case of a package (see line 147), the option is added to the list of global options otherwise. The todo list is executed at the end of this second loop.

76 [3] \documentclass[a,b,c,e]{myclass}[2007/03/05]
77 {\documentclass}
78 ++ file myclass.clt exists.
79 ++ Input stack ++ 1 myclass.clt
80 ++ Made @ a letter
81 ++ Opened file myclass.clt; it has 15 lines
82 [1]
83 [2] \ProvidesClass{mypackage}[2006/08/19 v1.0 myclass document class for Tralics]
84 Warning: class myclass claims to be mypackage.
85 Document class: mypackage 2006/08/19 v1.0 myclass document class for Tralics
86 [3] \NeedsTeXFormat{LaTeX2e}[1995/12/01]
87 [4]
88 [5] \DeclareOption{a}{\typeout{option A}}
89 [6] \DeclareOption{b}{\typeout{option B}}
90 [7] \DeclareOption{d}{\typeout{option D}}
91 [8] \AtEndOfClass{\typeout{End of class}}
92 [9] \typeout{Before execute options}
93 Before execute options
94 [10] \ExecuteOptions{a}
95 {Options to execute->a}
96 {Options code to execute->\typeout{option A}}
97 option A
98 [11] \ProcessOptions\relax
99 Marking option a
100 Marking option b
101 {Options to execute->a,b,a,b}
102 {Options code to execute->\typeout{option A}\typeout{option B}}
103 option A
104 option B
105 [12] \endinput
106 ++ End of file myclass.clt
107 ++ Catcode of @ restored to 12
108 Warning: You have requested, on line 3, version
109 `2007/03/05' of class myclass,
110 but only version
111 `2006/08/19 v1.0 myclass document class for Tralics' is available
112 ++ cur_file_pos restored to 0
113 ++ Input stack -- 1 myclass.clt
114 {\typeout}
115 End of class
116 [4] \typeout{In preamble}
117 In preamble
118 [5] \usepackage{mypack}
119 [6] \usepackage[y,x,w]{mypack1}
120 ++ file mypack.plt exists.
121 ++ Input stack ++ 1 mypack.plt
122 ++ Made @ a letter
123 ++ Opened file mypack.plt; it has 1 lines
124 [1] \ProvidesPackage{mypack}[2006/10/10 My package]
125 Package: mypack 2006/10/10 My package
126 ++ End of file mypack.plt
127 ++ Catcode of @ restored to 12
128 ++ cur_file_pos restored to 0
129 ++ Input stack -- 1 mypack.plt
130 {\usepackage}
131 [7] \usepackage{mypack2}[2000/00/00]
132 ++ file mypack1.plt exists.
133 ++ Input stack ++ 1 mypack1.plt
134 ....
135 {\usepackage}
136 ++ file mypack2.plt exists.
137 ++ Input stack ++ 1 mypack2.plt
138 ++ Made @ a letter
139 ++ Opened file mypack2.plt; it has 27 lines
140 ...
141 [9] \usepackage[aa,bb]{mypack1}
142 {\usepackage}
143 [10]
144 Option clash in \usepackage mypack1
145 Old options: y,x,w.
146 New options: aa,bb.
147 {\par}
148 [11] \begin{document}
149 {\begin}
150 {\begin document}
151 +stack: level + 2 for environment
152 {\document}
153 +stack: ending environment document; resuming document.
154 +stack: level - 2 for environment
155 +stack: level set to 1
156 ++ Input stack ++ 1 (AtBeginDocument hook)
157 [1] \let\do\noexpand\ignorespaces
158 ++ End of virtual file.
159 ++ cur_file_pos restored to 0
160 ++ Input stack -- 1 (AtBeginDocument hook)
161 LaTeX Warning: Unused global option(s):
162     c.
163 atbegindocumenthook= \typeout{Begin Document}\let\AtBeginDocument
164     \@notprerr\let\do\noexpand\ignorespaces

No error is signaled if the class or package does not exist. If you compile the example below with Tralics, no error is signaled, if you compile with LaTeX, the following errors are signaled:

1 \documentclass{article}
3 \usepackage[foo]{calc}
4 \usepackage[a,b]{xcalc}
5 \usepackage[a]{xcalc}
6 \makeatletter
7 \@ifpackagewith{calc}{foo}{}{\bad}
8 \@ifpackagelater{calc}{2005/12/12}{}{\xbad}
9 \@ifpackageloaded{xcalc}{\ybad}{}
10 \@ifpackagelater{xcalc}{2000/01/02}{\zbad}{}
11 \@ifpackagewith{xcalc}{a}{}{\tbad}
12 \begin{document}
13 \end{document}

Note that \usepackage takes an optional argument, not given here, so that the first four errors are signaled after looking ahead for one token, they correspond to commands on line 3 and 4. In the case of Tralics, if a package is not found, the \usepackage declaration is ignored, this explains why \ybad is not called; no error is signaled for unknown options; however, the options are remembered, so that the second \usepackage will not try to load the file again, but checks the options. Builtin packages (calc, fp, fancyhdr, babel, french, frenchle, german) behave in a special way. If no `plt´ file is found, they are remembered in the table with 2006/01/01 as date; since the calc package is v4.1b, dated 1998/07/07, this explains why Tralics and LaTeX disagree when asked whether or not the package is older than december 2005. On the other hand, the default date of a file is 0000/00/00, so that an inexistant package is never later than a non-null date; the same is true for a package that does not provide a date, or before the identification of the package is evaluated; thus \zbad is not called.

The xkeyval package is an extension to keyval. It provides three extensions, that can be used in a package or a class:

\ProcessOptionsX \relax

The main document can start with


Let´s make the following assumptions: the testkeyval package defines all keys used in the execute optionsX command, and most of the keys used in the usepackage declaration. You will see the following

xkeyval: Unknown option `epW=5'
xkeyval: Unknown option `opE'

We have shown above that the package defines opF and Cw; these are global class options. The package sees option oF with value, formed a a brace, a command, a comma, three characters, and a closing brace. You will see

Tralics Warning: Unused global options

This mechanism is neither fully compatible with pure LaTeX, nor with the xkeyval extension, but should work in all practical cases.

5.11. Expandable commands

We give here the list of all expandable commands, and some examples. Missing in this list are: some commands defined in the next section, all commands defined via \def in the C++ code, see section 6.13.

5.12. Other expandable commands

Not all commands defined here can be expanded, see the list at the end of Chapter 2. Essentially, this describes the ifthen package, the calc package, the newtheorem mechanism, and some input-output commands.

5.13. Other non-expandable commands

5.14. Special commands

We introduced the command \@reevaluate. It takes three arguments A, B and C, and applies A and B on C. The important point here is that C is read as text (not tokenized), so that category code changes are allowed. There is a starred version, in which A, B, C are environments. In fact, arguments A and B are environment names, and C is the content of the current environment (see example for how to use it).

With the following definitions

\def\xbar{\catcode`\$=12\catcode`\^=12 \ybar}

the translation of

\begin{Env}$3^{eme}$ \end{Env}


<p>x1<hi rend='sup'>er</hi>xy$1^er$y</p>
<p rend='center'>3<hi rend='sup'>e</hi></p>
<p>w$3^eme$ w</p>

This is a part of the transcript file showing the expansion of the command.

[11] {\Fct{$1^{er}$}}
{begin-group character}
+stack: level + 2 for brace
\Fct ->\@reevaluate \foo \xbar
{Reeval: \foo{$1^{er}$}%

This shows the expansion in the case of a starred command. Note that the current environment is terminated; then everything up to \end{whatever} is read.

[12] \begin{Env}$3^{eme}$ \end{Env}
{\begin Env}
+stack: level + 2 for environment
\Env ->\@reevaluate *{center}{wbar}
+stack: ending environment Env; resuming document.
+stack: level - 2 for environment
{Reeval: \begin{center}$3^{eme}$ \end{center}%
\begin{wbar}$3^{eme}$ \end{wbar}%

5.15. Trees

We explain here some commands from the tree-dvips package by Emma Pease. A tree is defined by some nodes and connectors. Each node has a name, whose scope is limited to the current page (Tralics does no validity test for the names). A connector can be attached to the top, bottom, left or right of a node (abreviation is one character of `tblr´), or a corner (two letter, one of `tb´ followed by one of `lr´).

For instance

\node{a}{Value of node A}
\nodepoint{b} \nodepoint{c}[3pt]\nodepoint{d}[4pt][5pt]
\nodecurve{a}{b}{2pt} ?


<node name='a'>Value of node A</node>
<node name='b'/>
<node xpos='3pt' name='c'/><node ypos='5pt' xpos='4pt' name='d'/>
<nodeconnect nameA='a' nameB='b' posA='b' posB='t'/>
<nodeconnect nameA='a' nameB='c' posA='tl' posB='r'/>
<anodeconnect nameA='a' nameB='b' posA='b' posB='t'/>
<anodeconnect nameA='a' nameB='c' posA='tl' posB='r'/>
<barnodeconnect nameA='a' nameB='d' depth='3pt'/>
<nodecurve nameA='a' nameB='b' posA='b' posB='t' depthB='2pt' depthA='2pt'/>?
<nodecurve nameA='a' nameB='b' posA='l' posB='r' depthB='3pt' depthA='2pt'/>
<nodetriangle nameB='b' nameA='a'/>
<nodebox nameA='a'/>
<nodeoval nameA='a'/>
<nodecircle nameA='a' depth='3pt'/>

5.16. Linguistic macros

The gb4e package allows you to input the following (extract of the thesis of C. Romero)

\ex \label{agen1}
\gll ... \th et hit er {\bf \textit{ahte}}.\\
     ... that OBJ-it already PRET-possessed.\\
\glt \textit{... that (he) already owned it.} (CMLAMBX1,31.377)
\ex \label{agen2}
\gll ... the love that men to hym {\bf \textit{owen}}.\\
     ... the love that SUBJ-men to OBJ-him PRES-owe.\\
\glt \textit{... the love that men owe him.} (CMCTPARS,313.C2.1087)

The exe environment is used for numbered examples; it is implemented as a list environment, the \ex command behaves like \item (each item is numbered, the item number is saved is a global counter). The TeX source of the package (as used by Tralics) can be found in the distribution. The non-trivial part in the example above is the \gll command. It takes three lines of text (there is also \glll that takes four lines), the first line is a sequence of words (here in old English), the second line another sequence (translated literally, with possible annotations), and the last line is the translation of the whole, with a bibliographic reference. Words in the first two lines are vertically aligned. The algorithm (by Marcel R. van der Goot) is the following; the list is split into words (a space acts as a word separator), each word is typeset via:

\hbox{#2\strut#3 }% adds space

where #3 is the word, and #2 is \eachwordone for the first line, \eachwordtwo for the second line, and \eachwordthree for the third line (case of \glll). These commands default to \rm. The words are put in a list (a \vbox, argument #1) like this


After that, the two or three lists are merged (the code uses \unvbox and \lastbox in order to get the next element of the list). The command \vtop is used to put two words one above the other, and these boxes are merged together using the following code

\setbox\gline=\hbox{\unhbox\gline \hskip\glossglue \vtop{XXX}}

The glue betweeen the boxes is 0pt plus 2pt minus 1pt (remember that each hbox is terminated by some glue). The Tralics implementation is the following. There are two commands \cgloss@gll and \cgloss@glll written in C++, and the package renames them to \gll and \glll. It is not clear what the translation should be (a list of boxes containing boxes?) In the current implementation, we use a table. This means that the resulting XML is easy to interpret; the only drawback is that we loose linebreaks (from the \glossglue). This is the translation of the example.

<list type='description'>
<item id='uid1692' label='650'>
<table rend='inline'><row><cell halign='left'>...</cell>
<cell halign='left'>þet</cell>
<cell halign='left'>hit</cell>
<cell halign='left'>er</cell>
<cell halign='left'><hi rend='bold'/><hi rend='it'>
    <hi rend='bold'>ahte</hi></hi><hi rend='bold'/>.</cell>
</row><row><cell halign='left'>...</cell>
<cell halign='left'>that</cell>
<cell halign='left'>OBJ-it</cell>
<cell halign='left'>already</cell>
<cell halign='left'>PRET-possessed.</cell>
<p noindent='true'><hi rend='it'>... that (he) already owned it.</hi>
<item id='uid1693' label='651'>
<table rend='inline'><row><cell halign='left'>...</cell>
<cell halign='left'>the</cell>
<cell halign='left'>love</cell>
<cell halign='left'>that</cell>
<cell halign='left'>men</cell>
<cell halign='left'>to</cell>
<cell halign='left'>hym</cell>
<cell halign='left'><hi rend='bold'/><hi rend='it'>
        <hi rend='bold'>owen</hi></hi><hi rend='bold'/>.</cell>
</row><row><cell halign='left'>...</cell>
<cell halign='left'>the</cell>
<cell halign='left'>love</cell>
<cell halign='left'>that</cell>
<cell halign='left'>SUBJ-men</cell>
<cell halign='left'>to</cell>
<cell halign='left'>OBJ-him</cell>
<cell halign='left'>PRES-owe.</cell>
<p noindent='true'><hi rend='it'>... the love that men owe him.</hi>

5.17. Special parsing rules

In the TeXbook, chapter 24, you will find the definition of <general text>. This rule explains that TeX expects a brace-delimited list of tokens, where the starting brace can be either a character, or a token like \bgroup; it can be preceded by optional spaces and \relax tokens. We give here a list of all cases where this rule can be applied.

Back to main page