Tralics, a LaTeX to XML translator; Part I

# 5. Other commands

## 5.1. Character encoding

We have to distinguish between input encoding, internal encoding and output encoding. The internal encoding of TeX is ASCII (i.e. 65 is the internal code of the upper case letter A), at least for all characters with code between 32 and 126. The input encoding is the mechanism that converts the code of the letter A supplied by computer into the code 65. Almost all input encodings are nowadays ASCII-based, they produce the same value for the letter A; the results may be different for a character like é. The output encoding indicates for a letter, say A, which position in the font to use. We shall not discuss the output encoding here. Let´s just notice that the character {´ exists in the font cmtt10, but not in other text fonts of the computer modern family. If you read a version of this document that uses the original encoding (OT1), braces shown in error messages are taken from a math font, hence are upright. Some years ago, a 8bit encoding (called T1) was designed, which contains braces. You can compare Figure 1 in appendix F of the [4] (describing the font cmr10) with Table 7.32 of [6], describing ecrm1000.

The first version of TeX was using 7bit input and output characters (but fonts and dvi files were coded on 8bits). There is an extension Ω to TeX that accepts 16bit characters as input, using different encoding schemes. Characters that are not part of the ASCII specifications (less than 32 or greater than 126) are not guaranteed to be treated the same in all implementations. For this reason, it it wise to load the inputenc package, with the current encoding as argument. The effect will be that some characters, like é will become active, and expand to \´e. As a result: only ASCII letters are allowed in control sequence names. On the other hand, if you say \begin{motclés}, then LaTeX complains with LaTeX Error: Environment motcl\´es undefined. Don´t try to define the motcl\´es environment: the expansion of the accent depends on the context: it is é for \begin and \´e for the macro that prints the error message. Non-ASCII characters may be printed by TeX as ^^ab (in some older version of TeX, I had to pretend, via locale settings, that my computer did not understand English in order for it to output the guillemet as «).

A silly question concerns end-of-line markers. Some systems like Unix use LF (line feed) as line separators, some others like Macintosh use CR (carriage return) and Windows uses CR-LF. This is replaced by TeX by a single character: the carriage return with ASCII code 13. Tralics interprets CR-LF, CR and LF alike: as an end-of-line marker. This marker will be replaced by the character whose code is in \endlinechar, provided that this value is in the range 0–255(note: ). The default value is 13, a character of category 5. The tokeniser converts this into a \par token, a space token or ignores it depending on the state. This space token has value 32 (but Tralics uses 10, so as to keep the same line breaks in the XML result as in the TeX source). Note that, whenever a line is read, spaces at the end of the line are removed. If you want a space after a control sequence, you say something like \TeX\␣´, and if this construct appears at the end of a line, the space is ignored; if the endline character has category code 5, it will be converted to a space, and everything works fine; if this character is for instance 65, you may get a strange error, like this

! Undefined control sequence.^^J
l.170 ...reaks in the \XML\ result as in the \TeX\^^J
^^J
? ^^J


We have shown here the end of line as ^^J. There are four lines: the error messages, two context lines, and the line with the prompt. The two context lines show that the space at the end of the line is removed. TeX does not print the undefined control sequence: it assumes that it is either the last token on the first context line, or a token marked as <recently read>´ or something like that; in our case, the undefined control sequence is the one obtained by replacing ^^J by the value of the endline character.

There is a way to enter special characters in TeX, for instance ^^J is a line feed. The algorithm is the following: whenever TeX sees two consecutive identical characters of category code 7, followed by a character whose number is x, it replaces these three characters by the character whose code is y, where $y=x-64$ if $x\ge 64$, and $y=x+64$ if $x<64$. Hence ^^? yields $y=127$ (this is the delete character). All characters with codes between 1 and 26 can be obtained using the form ^^A, ^^B, etc. The null character is ^^@, characters with code between 27 and 31 are ^^[, ^^\, ^^], ^^^ and ^^_. Character 32 can be represented as ^^. All other characters are ASCII characters. This is an example of use:

27=\char\^^[, 28=\char\^^\,  29=\char\^^], 30=\char\^^^, 31= \char\^^_


Because some characters in the list are of category code 15 (invalid), we have used the construction \char\A (with A replaced by some other character). There is no difference between \char\A and \charA, unless the category code of the character is one of 0, 5, 9, 14, or 15. The result is the character at position 65 or whatever in the current font; the example above selects positions 27 to 31. The translation is

27=&#x1B;, 28=&#x1C;, 29=&#x1D;, 30=&#x1E;, 31= &#x1F;


Note that these characters are invalid in XML1.0, so that this example is not good; if you compile this document with LaTeX, you will see [not compiled with latex]. In general you will see a ff ligature or a oe one; this depends on the output encoding.

When TeX switched to 8 bits, the rule changed a little bit: the previous rule applies only if $0\le x\le 127$, it gives $0\le y\le 127$. Another test was added: if you say ^^ab, these four characters are replaced by the single character whose code is ab (in base 16, i.e. 171 in base ten in this case). In such a case two characters are needed: a letter or a digit; only lower case letters between a and f are allowed. Thus every character in the range 0-255 has such a representation. Note that, by default, the character ^^ab has category code 12, hence is valid. What appears in the dvi file depends on the output encoding, in the case of a 7bit encoding, the character is unknown, a warning is printed in the transcript file, that´s all, otherwise, it should be an opening guillemet, but it could as well be ń. The purpose of a package like inputenc is to change the category code of all special characters, so that it behaves like a command and produces, in the dvi, something that is, as much as possible, independent of the output encoding.

According to this rule, the character 32 has can be entered as ^^20. There is one situation where the space character can be used in this way: at the end of the line, when \endlinechar is non trivial. Note that, in the case where the resulting character has category 7, it can participate in a hat-hat construct. Here is an example.

{1^^{^^ab2^^5e^ab3^^5e^5e^ab4\def\Abc{ok}\def\bAc{OK}\^^41bc\b^^41c}
{\catcode \é=7 ééab $xé2$ %next line should produce M
éé
%$1^è=^^^AééT$ %% hat hat control-A
$1^è=^^^A$ %% hat hat control-A
}\def\msg{a message.^^J}


Some explanations are needed. ^^{ is a semi colon, ^^ab is an opening French guillemet, ^^5e is a hat (recursion...), ^^41 is the uppercase letter A. The first line of the example explains that such funny characters can appear in a control sequence name. The second line shows that the hat-hat mechanism can be used with other characters than a hat. It also shows that, if the mechanism cannot be applied, a character with category 7 behaves like a superscript character, whatever its numeric value. The line that follows shows that the end-of-line character is ASCII 13, aka control-M (usually written as ^M). After that, there are two lines containing a control-A character, shown here as ^A. It is preceded by hat-hat, so that the effect should be a single A. The line that is commented out contains a control-T written as ééT (for some strange reasons, this character is invalid in XML1.0, but valid as an entity in XML1.1, [9], [8]). The last line is just a real example of ^^J. This character is printed by Tralics as LF, or CR-LF on Windows. This is the translation of Tralics:

<p>1;&#xAB;2&#xAB;3&#xAB;4okOK
&#xAB; <formula type='inline'>
<math xmlns='http://www.w3.org/1998/Math/MathML'
><msup><mi>x</mi> <mn>2</mn> </msup>[/itex]</formula
> M<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'
><mrow><msup><mn>1</mn> <mi>&#xE8;</mi> </msup><mo>=</mo><mo></mo
><mi>A</mi></mrow>[/itex]</formula>
</p>


We inserted some newline characters at unusual places (just before greater than signs), other spaces were produced by Tralics; in order to make sure that 8bit characters are printed correctly, we asked Tralics for a seven bit output.

As said above, Ω accepts 16bit characters, using the notation ^^^^abcd. This syntax was implemented in Tralics2.7, via the \char command (remember that in Tralics, the \char and \chardef commands accept 27bit integers); as a consequence, these characters could not be used in a command name; thios restriction does not appluyy anymore (the default category code of characters with code greater then 127 is other, namely 12). Example

\def\foo#1#2#3{#1=#2=#3=}
\foo^^^^0153^^^^0152^^^^0178
^^^^017b^^8?


It is translated by Tralics as &#x153;=&#x152;=&#x178;= &#x17B;x?. The argument to \foo could also have been: \oe\OE{\“Y}. The transcript file contains lines of the form:

[8] \foo^^^^0153^^^^0152^^^^0178
\foo #1#2#3->#1=#2=#3=
#1<-^^^^0153
#2<-^^^^0152
#3<-^^^^0178


It is possible to ask for UTF-8 output in the transcript file. This gives characters that are hard to see using latin1, because characters in the range 128–128+32 are in general unprintable. What is shown here as hat-Ó is a single character.

[2] \foo^^^^0153^^^^0152^^^^0178
\foo #1#2#3->#1=#2=#3=
#1<-Å^Ó
#2<-Å^Ò
#3<-Å¸
{Push p 1}
Character sequence: Å^Ó=Å^Ò=Å¸= .


The original version of the Tralics documentation said: Si on a un texte qui contient essentiellement des caractères 7bits, et très peu d´autres caractères, l´utilisation de caractères 16bits consomme énormément de place. This means that using a 16bit encoding consumes a lot of space if you write a French document (and even more, for an English one). The sentence has 159 ASCII characters and 6 others; these can be input using iso-8859-1 (aka latin-1) as input encoding(note: ). In TeX, it uses 165 bytes, in Ω, it uses 330 bytes. Using a construction like \´e we need 177 bytes (and 7 bits per byte). Using UTF-8 requires only 171 bytes (8 bits per byte). This explains why UTF-8 is popular. We shall explain (in the second part of this document) how UTF-8 is encoded and how TeX may read it. In the case of Tralics, the situation is: you can (via an argument to the Tralics program) specify that the sources are encoded using UTF-8 or latin1 (this being the default). However, if the tex file contains, on the first line “utf8-encoded” UTF-8 encoding will be used, if it contains “iso-8859-1” then latin1 encoding will be used.

## 5.2. New encoding scheme

Since version 2.9, internal encoding of Tralics is 16bit utf8. This has two consequences that will be explained here. The first is that some tables are now much larger. The numeric argument to \catcode, \mathcode, \lccode, \uccode, \sfcode, \delcode, which is a character number can now be anything between 0 and 65535. We also changed the numbers of registers: there are 1024 instead of 256.

The result of a ^^^^abcd construct fits on 16bits, hence is a character, hence can appear in a command name (in the case of a multicharacter control sequence, it must have category code letter´; initially all character with code greater than 128 have category other´). In order to save space, a short-verb character must fit on 8bits; otherwise, its category code will not be properly restored when you undeclare it (category other will be used).

All characters are valid in math mode. The translation of an ASCII character may depend on the font, otherwise, it is always <mi>. For instance, in the case of $\mathbf\´e$, expansion of the accent command produces a 8bit character, unaffected by the font change, and the translation is a <mi> containing the e-acute letter. Full 21 bit characters are allowed in Math mode. An expression $x$ is considered trivial math and translates into a <simplemath> element only if the character fits on seven bits and has category letter.

The default input and output encoding is latin1, which is no more the internal encoding. As a consequence, there are two conversion procedures. We explained above that the input encoding can be given on the first line of the file. Otherwise a default encoding will be used. This can be explained in the configuration file. As a consequence, the main input file is read without conversion, then the configuration file is considered, and then the main input file is converted; all other files are immediately converted.

On the other hand, a character like é is represented as Ã© in the internal tree. This character will appear, in the output file, in the form &#e9; if you call Tralics with option -oe8a or -oe1a, as é if you call Tralics with option -oe1 or Ã© if you call Tralics with option -oe8. If the option contains a, the XML file contains only 7bit ASCII characters; the only difference between the two options is the encoding declaration. These options specify also the encoding used for the transcript file. You can specify it independently with the options -te8a, -te1a, -te8, or -te1. If the character is too big to fit in the encoding, then the hat-hat notation is used (see example above). Because each XML file contains its encoding, a XML processor will handle the file produced by Tralics independently of the output encoding. Moreover, whatever the encoding, input or output, you know that ^^^^03b7 is Greek letter eta.

## 5.3. Changing the input encoding

We mentioned in the previous section that whenever Tralics reads a file, it converts its content, according to the current encoding (that can be given at the start of the file, using ASCII characters), with an exception for the main input file. The situation is a bit more complex: configuration files, tcf files, bibliography data files, and TeX files opened by \openin use a fixed encoding; other source files use a variable encoding.

The default encoding is stored in \input@encoding@default. The default value is one, but can be changed via an option to the program (utf8 or latin1 select encoding 0 or 1 respectively).

The current encoding is stored in \input@encoding. This is an attribute of the current input file, it can be changed at any time. The new encoding is used when Tralics needs to read a new line in order to fetch the next token. Nothing special is done in the case of \read.

Whenever a file is opened, its initial encoding is computed. If the file has a fixed encoding, then all lines are immediately converted, otherwise lines are converted when needed. If the first line of the file contains the string utf8-encoded, then encoding 0 is assumed, if the line contains iso-8859-1, then encoding 1 is assumed, and if the line contains tralics-encoding:NN where NN is a sequence of one or two digits forming a number less than 34, then encoding NN is assumed. There are other heuristics. For instance, if %&TEX encoding = UTF-8 appears near the start of the file, then encoding 0 is assumed. In all other cases, the default encoding is assumed.

In the current version of Tralics, there are 34 possible encodings. Encoding number 0 is UTF8; this is an encoding where an ASCII character is represented by a single byte (with the same value as the character), and other characters use a variable number (between 1 and 4) of bytes. In encodings like UTF16, a character is represented by more than one byte. There is currently no support for such encodings yet. Stated otherwise, we assume that character C is represented by a byte B, and the encoding specifies the value C at position B. Encoding 1 is latin1 (also known as iso-8859-1), it has B=C. For the 32 remaining encodings, it is possible to specify, for each byte B, the associated character C (default is B). Trying to set the current or default encoding to a value outside the range 0-33 is ignored; trying to modify an encoding outside the range 2-33 raises an Illegal encoding error, and invalid byte value gives Illegal encoding position error. In case of an illegal character value (negative, zero, 65536 or more), the byte value is used instead. The magic command is \input@encoding@val; it reads an encoding, a byte and a value. In the example that follows we change the encoding number 2 so that \FOO is read as \foo:

1 \input@encoding@val 2 O =o
2 \input@encoding@val 2 F =f
3 \let\foo\bar
4 \showthe\input@encoding@val 2 O
5 \input@encoding=2
6 \show\FOO
7 \showthe\input@encoding@val 2 O
8 \showthe\input@encoding
9 \input@encoding@default=0
10 \showthe\input@encoding@default
11 \input@encoding=1


This example shows three commands in read or write mode: when the command is prefixed by \showthe it read a value from memory and prints it on the terminal, otherwise a number is scanned and written in memory. The equals signs before the number is optional. No less than 13 integers are scanned, some are given as an explicit integer, some as a character code. We assume that, for encoding 2, all characters map to themselves. Since \FOO is read as \foo, the \show command should print \bar, on lines 4 and 7 you see the value stored of encoding 2 for the character O (first upper case, then lower case), this is twice 111. Other values shown are 2 and 0.

We describe from now on the content of the inputenc package. You load it by saying \usepackage [foo,bar] {inputenc}. The effect of this command is the following. First, a symbol name is defined for each of the 23 known encoding, for instance utf8 for UTF-8 (encoding 0), latin1 for latin1 (encoding 1), etc. The command \inputencodingname holds the current input coding name, and \encoding@value converts this to an integer. The command \inputencoding can be used to change the encoding. It is defined as:

12 \def\inputencoding#1{%
13   \the\inpenc@prehook  %% pre-hook
14   \edef\inputencodingname{#1}%
15   \input@encoding=\encoding@value{\inputencodingname}%
16   \the\inpenc@posthook} %% post-hook


There are two hooks (token lists) that do nothing, added here for compatibility with the LaTeX package. You can use them to output as messages, such as: switching from encoding A to encoding B (the initial value of the encoding name is \relax, this can be used by the pre-hook).

The options, foo and bar in the example, should be valid names. The last name becomes the current and default encoding. As mentioned above, the current encoding applies to an input file, and there is no reason to change the encoding of the package file. Hence, the following is executed:

17   \input@encoding@default\encoding@value{bar}%
18   \AtBeginDocument{\inputencoding{bar}}


If the options are, for instance ansinew and applemac, the tables associated to these encodings are defined; some other tables might also be defined, but you should not rely on this (of course, latin1 and utf8, can be used anywhere, because they are builtin). The package contains

19 \edef\io@enc{\encoding@value{latin9}}
20 \DeclareInputText{164}{"20AC}
21 \DeclareInputText{166}{"160}
22 \DeclareInputText{168}{"161}
23 \DeclareInputText{180}{"17D}
24 \DeclareInputText{184}{"17E}
25 \DeclareInputText{188}{"152}
26 \DeclareInputText{189}{"153}
27 \DeclareInputText{190}{"178}


The code above defines the latin9 (iso-8859-15) encoding. It is very like latin1, but defines the Euro sign at position 164. Defining 256 characters per encoding using this method is inefficient. For this reason you can see

28 \input@encoding@val \encoding@value{latin2} -96 160
29 160 "104 "306 "141 164 "13D "15A 167


As explained above, the command on the start of the line reads 3 integers: an encoding value (here, the encoding of latin2), a byte position and a character value. The byte position must a number between 0 and 255. Here we use an extension: If a negative number minus N has been read, followed by A such that the sum of A and N is at most 256, then N values will be read, and stored at position A and following (here N is 96, and we have shown only the first eight values).

## 5.4. Characters and Accents

There are some commands that put an accent over a letter. You can say a\accent 98 cde, this works in TeX, but not in Tralics: you will get an error, Unimplemented command \accent. The number 98 is read, and converted to an integer. The Unicode character will be used; thus the translated result is abcde´.

You can say \a´e. (note: )This is a command introduced by LaTeX so as to allow accents inside a tabbing. Some care must be taken. If you say \a{par}{b} in LaTeX, you get an error of the form: Paragraph ended before \@changed@cmd was complete. The Tralics error message is: wanted a single token as argument to \a. If you say \a\foo12, there is a single token, and the error is: Bad syntax of \a, argument is \foo. In fact, the token after \a must be a valid accent character. After that \a´ is handled exactly like . You can say \= U´, the space after the command is ignored. You cannot say \={ U}´, the space is not removed, this is an error. In fact, the argument list of the accent command should contain exactly one token (exception: double accents will be explained later). This token should be a character, with code between 0 and 128. Hence \´Ê is wrong, you must say \´{\^E}} if you want Ế. The message is Error in accent, command = \´; Cannot put this accent on non 7-bit character É. If the token \i is given, it will be replaced by i, so that \”\i and \“i produce the same result. You can say \=\AE, \=\ae, \AE, \ae, \AA, \aa, \O, \o. The result looks like ǢǣǼǽǺǻǾǿ.

You can put an accent on a letter only in the case where this gives a Unicode character. In the case of \c{a} and \c{\=a}, the error message is the same: Error in accent, command = \c; Cannot put this accent on letter a. Table 1 indicates on which letters you can put an accent. See the html page http://www-sop.inria.fr/apics/tralics/doc-chars.html for a list of some glyphs.

 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z \^ Ââ Ĉĉ Êê Ĝĝ Ĥĥ Îî Ĵĵ Ôô Ŝŝ Ûû Ŵŵ Ŷŷ Ẑẑ \´ Áá Ćć Éé Ǵǵ Íí Ḱḱ Ĺĺ Ḿḿ Ńń Óó Ṕṕ Ŕŕ Śś Úú Ẃẃ Ýý Źź \ Àà Èè Ìì Ǹǹ Òò Ùù Ẁẁ Ỳỳ \” Ää Ëë Ḧḧ Ïï Öö ẗ Üü Ẅẅ Ẍẍ Ÿÿ \c Çç Ḑḑ Ȩȩ Ģģ Ḩḩ Ķķ Ļļ Ņc n Ŗŗ Şş Ţţ \u Ăă Ĕĕ Ğğ Ĭıi Ŏŏ Ŭŭ \v Ǎǎ Čč Ďď Ěě Ǧǧ Ȟȟ Ǐǐ ǰ Ǩǩ Ľľ Ňň Ǒǒ Řř Šš Ťť Ǔǔ Žž \~ Ãã Ẽẽ Ĩĩ Ññ Õõ Ũũ Ṽṽ Ỹỹ \H Őő Űű \k Ąą Ęę Įį Ǫǫ Ųų \. Ȧȧ Ḃḃ Ċċ Ḋḋ Ėė Ḟḟ Ḣḣ İ Ŀŀ Ṁṁ Ṅṅ Ȯȯ Ṗṗ Ṙṙ Ṡṡ Ṫṫ Ẇẇ Ẋẋ Ẏẏ Żż \= Āā Ēē Ḡḡ Ħħ Īī Ōō Ŧŧ Ūū Ȳȳ \r Åå Ůů ẘ ẙ \b Ḇḇ Ḏḏ ẖ Ḵḵ Ḻḻ Ṉṉ Ṟṟ Ṯṯ Ẕẕ \d Ạạ Ḅḅ Ḍḍ Ẹẹ Ḥḥ Ịị Ḳḳ Ḷḷ Ṃṃ Ṇṇ Ọọ Ṛṛ Ṣṣ Ṭṭ Ụụ Ṿṿ Ẉẉ Ỵỵ Ẓẓ \f Ȃȃ Ȇȇ Ȋȋ Ȏȏ Ȓȓ Ȗȗ \C Ȁȁ Ȅȅ Ȉȉ Ȍȍ Ȑȑ Ȕȕ \T Ḛḛ Ḭḭ Ṵṵ \V Ḓḓ Ḙḙ Ḽḽ Ṋṋ Ṱṱ Ṷṷ \D Ḁḁ \h Ảả Ẻẻ Ỉỉ Ỏỏ Ủủ Ỷỷ

Some accents are not standard. Examples:

• \^a gives â,

• \´ a gives á,

• \ a gives à,

• \" a gives ä,

• \c c gives ç,

• \u a gives ă,

• \v a gives ǎ,

• \~ a gives ã,

• \H o gives ő, it is redefined in the case of double accents,

• \k a gives ą,

• \. a gives ȧ,

• \= a gives ā,

• \r a gives å,

• \b b gives ḇ,

• \d a gives ạ,

• \f a gives an inverted breve accent over a,

• \C a gives a double grave accent on a,

• \T e gives a tilde under e,

• \V d gives a circumflex below d,

• \D a gives a ring below a,

• \h a gives a hook over a.

If in the table you see I´ instead of x´, this means that the accent applies only on capital I. If you see h, j, t, w or y, this applies only to the lower case letter. Otherwise the accent applies to both upper case letter and lower case letter.

There is a possibility to put double accents (for Vietnamese, for instance). The following ones are recognized, for upper and lower case letters, the order of the accents is irrelevant. Inside braces, there is an accent command, optional spaces, and a character (maybe enclosed in braces).

\"{\=U} \"{\=A} \"{\=O} \"{\'U} \"{\'I} \"{\U} \.{\=A} \.{\=O}
\={\~ O} \k{\=O} \'{\~U} \'{\O} \'{\=O} \'{\=E} \'{\.S} \c{\' C}
\'{\^A} \'{\^O} \'{\^E} \{\=O} \{\=E} \d{\=L} \d{\=R}
\{\^ A} \{\^ E} \H{\'U} \H{\'O} \H{\U} \H{\O} \H{\h U} \H{\h O}
\H{\~U} \H{\~O} \H{\d U} \H{\d O} \d{\^A} \d{\^O} \d{\^E} \~{\^A}
\~{\^O} \~{\^E} \h{\^A} \h{\^O} \h{\^E} \u{\'A} \u{\A} \u{\h A}
\u{\~A} \u{\d A} \~{\" O} \^{\'O} \^{\O} \u{\c E} \.{\v S} \.{\d S}


This is the translation.

&#x1E7A; &#x1DE; &#x22A; &#x1D7; &#x1E2E; &#x1DB; &#x1E0; &#x230;
&#x22C; &#x1EC; &#x1E78; &#x1FE; &#x1E52; &#x1E16; &#x1E64; &#x1E08;
&#x1EA4; &#x1ED0; &#x1EBE; &#x1E50; &#x1E14; &#x1E38; &#x1E5C;
&#x1EA6; &#x1EC0; &#x1EE8; &#x1EDA; &#x1EEA; &#x1EDC; &#x1EEC; &#x1EDE;
&#x1EEE; &#x1EE0; &#x1EF0; &#x1EE2; &#x1EAC; &#x1ED8; &#x1EC6; &#x1EAA;
&#x1ED6; &#x1EC4; &#x1EA8; &#x1ED4; &#x1EC2; &#x1EAE; &#x1EB0; &#x1EB2;
&#x1EB4; &#x1EB6; &#x1E4E; &#x1ED0; &#x1ED2; &#x1E1C; &#x1E66; &#x1E68;


You can see a representation on figure 2. We show here the HTML output for these characters. Ṻ Ǟ Ȫ Ǘ Ḯ Ǜ Ǡ Ȱ Ȭ Ǭ Ṹ Ǿ Ṓ Ḗ Ṥ Ḉ Ấ Ố Ế Ṑ Ḕ Ḹ Ṝ Ầ Ề Ứ Ớ Ừ Ờ Ử Ở Ữ Ỡ Ự Ợ Ậ Ộ Ệ Ẫ Ỗ Ễ Ẩ Ổ Ể Ắ Ằ Ẳ Ẵ Ặ Ṏ Ố Ồ Ḝ Ṧ Ṩ ṻ ǟ ȫ ǘ ḯ ǜ ǡ ȱ ȭ ǭ ṹ ǿ ṓ ḗ ṥ ḉ ấ ố ế ṑ ḕ ḹ ṝ ầ ề ứ ớ ừ ờ ử ở ữ ỡ ự ợ ậ ộ ệ ẫ ỗ ễ ẩ ổ ể ắ ằ ẳ ẵ ặ ṏ ố ồ ḝ ṧ ṩ. The first character in the list is: latin capital letter u with macron and diaeresis, the second one is latin capital letter a with diaeresis and macron. The order of accents is not the same. For simplicity, in Tralics, this is irrelevant. You can notice that the LaTeX output is strange. First, we have defined \h to be a no-op. For the figure, we used the following code:

\newcommand\hook@above[1]{%
\rlap{\raise\dimen@\hbox{\kern2pt\char11}}#1}


This code works, provided that the font has, at position 11, something that looks like a hook (for the T1 encoding, this is a cedilla). In LaTeX you cannot put a \" accent on \=U. No error is signaled, it is just that TeX puts the accent before the accentee in case the accentee is not a character, instead of putting it above(note: ). You can say \"{\´U} because \´U is a character in the T1 encoding. The \mathaccent command has not these limitations(note: ). The first character of the figure was composed via

\UnicodeCharacter{x1E7A}{\ensuremath{\ddot{\mbox{\=U}}}}


In German, the umlaut character has a special meaning. The following example shows what can be input. See the babel documentation for details.

\language=2
"a"o"u"e"i"""A"O"U"I"E
"s"z"S"Z"c"C"f"F"l"L"m"M"n"N"p"P"r"R"t"T
"""-"~"|"=""'"<">


and the Tralics translation.

äöüëïÄÖÜÏË
ßßSSSZckCKffFFllLLmmMMnnNNppPPrrRRttTT
--&#x0201E;&#x0201D«»


The previous hack does not apply if the double quote character has category code 11 (letter), is in an URL, or in a file name to be read (for instance, via \includegraphics).

The translation of the dash character is the following. If this character appears in an URL or while reading a file name, it is left unchanged. If its category code is 11 (letter), usually inside a verbatim environment, its translation is a dash followed by a \textnospace, unless you invoke Tralics with the -nozerowidthspace switch, case where the translation is a single hyphen. Otherwise, a test is made for a ligature: three hyphens in a row produce &#x2014; (mdash), and two hyphens produce &#x2013; (ndash).

The characters: :;!?«» are handled normally if inside an URL, when translating a filename, when their category code is 11 (letter, typically, inside a verbatim), or when the current language is not French. The translation of « is an opening guillemet with some space after it. If the character that follows is (after expansion), a normal space, or a ~, or a \,, it will be discarded. The translation of » is a space plus the character. If the previous character is a space, it will be removed. (TeX has a primitive \unhskip that can remove a space; the Tralics equivalent works in usual cases). The other four punctuation characters are handled like a closing guillemet. In any case, the space added by these characters is a non-breaking one.

The characters ´<> behave in a special manner, in the same case as the guillemets. In fact, if they are doubled, French guillemets will be used instead. Thus <<foo>> and foo'' and «foo» behave the same, if the current language is French. Otherwise, a \textnospace will be added after the character, in the same way as for a dash, namely outside an URL, file name, but if the category code is 11 (letter), and the magic switch has not been given. Example:

\language = 0
test ligatures: <<>>''-- et --- !?:;
\language=1
test ligatures: <<>>''-- et --- !?:;
test ligatures:\verb=<<>>''-- et --- !?:;=


This is the translation

test ligatures: &lt;&lt;&gt;&gt;''&#x2013; et &#x2014; !?:;
test ligatures : «  »«  »&#x2013; et &#x2014; ! ? : ;
test ligatures :<hi rend='tt'>&lt;&#x200B;&lt;&#x200B;
&gt;&#x200B;&gt;&#x200B;&#x200B;&#x200B;'&#x200B;'&#x200B;-&#x200B;
-&#x200B; et -&#x200B;-&#x200B;-&#x200B; !?:;</hi>


Conversion into HTML gives test ligatures: <<>>“”– et — !?:; test ligatures : «  »“”– et — ! ? : ; test ligatures :<<>>''-- et --- !?:;.

The translation of the apostrophe depends on a flag. If Tralics is called with the switch -nostraightquotes, the translation is the same as \textasciiacute, the character U+B4, otherwise it is the quote character U+27. The character is handled normally if inside an URL, when translating a filename, when their category code is 11 (letter, typically, inside a verbatim). This is the translation of the same example as above, whith options -nostraightquotes and -nozerowidthspace. We added option -oe1a, this shows nobreak space as &#xA0;.

test ligatures: &lt;&lt;&gt;&gt;&#xB4;&#xB4;&#x2013; et &#x2014; !?:;
test ligatures&#xA0;: &#xAB;&#xA0;&#xA0;&#xBB;&#xAB;&#xA0;&#xA0;
&#xBB;&#x2013; et &#x2014;&#xA0;!&#xA0;?&#xA0;:&#xA0;;
test ligatures&#xA0;:<hi rend='tt'>&lt;&lt;&gt;&gt;''--&#xA0;et
&#xA0;---&#xA0;!?:;</hi>


The soul package provides some commands. Example; \ul gives test for ul, \so gives test for so, \st gives test for st, \caps gives test for caps, \hl gives test for hl.

## 5.5. Verbatim material

We have seen a little example of verbatim code above. It shows that some &#x200B; characters are inserted, this is so that, if the XML file is read, a double dash will not be interpreted as an en-dash. What the \verb command produces is a sequence of characters, whose category codes are 12, except for some, that are of category 11, namely ´-<>~&:;?!«». You can compare this with the LaTeX code, shown in section 2.12: the \@noligs command makes some characters of category code 13, the associated action is: output the character, with a zero kern in front. There is an exception: the space character is replaced by the \nobreakspace token, but this can be changed.

You can say \verb*+x y+ or \verb+ x y+. All characters between the two plus signs are collected. Any character can be used instead of the plus sign (Try \verb*abca and \verb =a= !). In the case where a star is given, spaces are replaced by \textvisiblespace, otherwise by \nobreakspace. You can say \DefineShortVerb\+, after that +foo+ is the same as \verb+foo+. Note that the command must be followed by something like \+´ or \*´, i.e., a macro whose name is formed of a single character. You can say \UndefineShortVerb\+, this will undo the previous command. The syntax is the same. If the character fits on 8 bits, the old category code is restored; otherwise, it is set to 12 (other). Note: assume that the input encoding is latin1, but you declare ^^^^abcd as a short verb. When Tralics sees the four hats, it replaces these 8 bytes by a single character, say C, and enters verbatim mode until finding character C. Since this character does not exist in the current environment, it cannot be found directly; since we are in verbatim mode, it cannot be found using the four-hat construction. For this reason an error is signalled when the end of line is reached (an implicit C character is inserted, so that next line will be translated normally).

In the case where +´ is a short verb character, you can say \SaveVerb{foo}+\bar+. This has as effect to remember in a private command all tokens that +\bar+ gathers. When you say \UseVerb{foo}, these tokens are re-inserted in the input stream. Example:

\DefineShortVerb\+
\SaveVerb{foo}+\bar +
\UndefineShortVerb\+
\UseVerb{foo}


The transcript file will contain, for the \UseVerb command the following line.

\savedverb@foo ->\verbprefix {\verbatimfont \bar\nobreakspace }


Here, the \ before b´ is not a command delimiter, for otherwise there would have been a space after \bar. Note: another explanation is that the b´ is not of category code 11, so that the command is \b; exercise: find all interpretations of this line.

There are various packages that provide a verbatim-like environment. In Tralics, you can define your own via

\DefineVerbatimEnvironment{MyVerbatim}{Verbatim}{xx=yy}


This defines MyVerbatim to be an environment that behaves like Verbatim, that is an extension of the basic verbatim environnment that takes some optional parameters (here, the default value of xx is yy). The end of a verbatim environment is defined as a line that contains optional spaces, the \end token, optional spaces, the name of the environment enclosed in braces. Additional characters on the current line are assumed to be after the verbatim environment.

In the case of a verbatim environment, all characters on the line are gathered (final spaces disappear, as usual), with category codes as explained above. If this gives an empty list, a no-break space character is added(note: ). As is the case of \verb, the \verbatimfont command is prepended. This is defined to be \tt. Moreover, \verbatimprefix is also added in front of the token list. In the case of the \verb command, there is \verbprefix instead. These two commands are defined as \@empty. You can redefine them. Each line is followed by \par and \noindent. If the environment is followed by an empty line, or a \par command, this command is removed, as well as the last \indent. Example that shows use of the prefix commands:

\DefineShortVerb{\|}
\def\verbatimfont#1{{#1}}
\def\verbprefix#1{A#1A}
\def\verbatimprefix#1{B#1B}
Test: \verb+foo+ and |bar|
\UndefineShortVerb{\|}
\begin{verbatim}
line1
line2
\end{verbatim}


The translation is:

<p>Test: AfooA and AbarA</p>
<p noindent='true'>Bline1B</p>
<p noindent='true'>Bline2B</p>
<p noindent='true'></p>


The Verbatim environment is an extension of the verbatim environment. There is an optional argument, an association list. If you say numbers=true´, then lines will be numbered (instead of true´, you can say left´ or right´, or anything, the value is ignored). If you say counter=17´, then lines will be numbered, using counter 17, if you say counter=foo´, and foo´ is a counter name, then lines will be numbered, using counter foo. If you say firstnumber=N´, where N is a number, then lines will be numbered starting from N; if you say firstnumber=last´, then lines will be numbered incrementing the previous value. The default counter is FancyVerbLine. Other features defined by the fancyvrb package have not yet been implemented.

If a line number M is given, the following piece of code is inserted before the verbatim line: {\verbatimnumberfont{M}}\space. The funny command is \let equal to \small at the start of the run. The number is incremented for each line.

Characters after \begin{Verbatim}, but on the same line, are ignored. The same is true if an optional argument is given: all characters that follow the closing bracket of the optional argument are ignored. The opening bracket is only looked for on the current line (unless the end of line character is commented out).

\begin{Verbatim}                   [numbers=true]
TEST
\end{Verbatim}
and without
\begin{Verbatim}
[ok]TEST
\end{Verbatim}
\begin{Verbatim} %
[ok] this is handled as comment
TEST
\end{Verbatim}

\def\verbatimfont#1{{\it #1}}
\def\verbatimnumberfont{\large}
\tracingall
\count3=4
\begin{Verbatim}[counter=3]
5,one line
\end{Verbatim}
\begin{Verbatim}[counter=03]
6,one line
\end{Verbatim}
\newcounter{vbcounter}
\setcounter{vbcounter}8
\begin{Verbatim}[counter=vbcounter]
9,one line
\end{Verbatim}
\begin{Verbatim}[counter=vbcounter]
10,one line
\end{Verbatim}


This is the translation.

<p noindent='true'><hi rend='small1'>1</hi> <hi rend='tt'>TEST</hi></p>
<p noindent='true'>and without</p>
<p noindent='true'><hi rend='tt'>[ok]TEST</hi></p>
<p noindent='true'></p>
<p noindent='true'><hi rend='tt'>TEST</hi></p>
<p noindent='true'><hi rend='large1'>5</hi> <hi rend='it'>5</hi>,one line</p>
<p noindent='true'><hi rend='large1'>6</hi> <hi rend='it'>6</hi>,one line</p>
<p noindent='true'></p>
<p noindent='true'><hi rend='large1'>9</hi> <hi rend='it'>9</hi>,one line</p>
<p noindent='true'><hi rend='large1'>10</hi> <hi rend='it'>1</hi>0,one line</p>
<p noindent='true'></p>


Two additional keywords have been added. In order to be compatible, you should add the following code to the TeX document.

\csname define@key\endcsname{FV}{style}{}
\csname define@key\endcsname{FV}{pre}{}


If you say style=foo, then the token \FV@style@foo is added in front of the token list generated by the verbatim environment. If you say pre=bar, then \FV@pre@bar is added before the token list (and before the style token mentioned above), and \FV@post@bar is inserted near the end (to be precise: before the last \par or \par\noindent. For a case like this

\begin{Verbatim}[pre=pre,style=latex,numbers=true]
first line
second line
\end{Verbatim}
third line


the tokens gathered by the verbatim environment, shown in the transcript file in verbose mode, and re-indented in order to make the structure easy to recognise, are

{Verbatim tokens:
\FV@pre@pre \FV@style@latex
\par \noindent {\verbatimnumberfont {1}}
\verbatimprefix {\verbatimfont first\nobreakspace line}
\par \noindent {\verbatimnumberfont {2}}
\verbatimprefix {\verbatimfont second\nobreakspace line}
\FV@post@pre
\par \noindent }


Assume that the following definitions are given

\def\FV@pre@pre{\begin{xmlelement*}{pre}}
\def\FV@post@pre{\end{xmlelement*}}
%\def\verbatimnumberfont#1{\xbox{vbnumber}{#1}}


Then the translation is

<pre class='latex-code'>
<p noindent='true'>
<hi rend='small'>1</hi>
<hi rend='tt'>first&nbsp;line</hi></p>^^J
<p noindent='true'>
<hi rend='small'>2</hi>
<hi rend='tt'>second&nbsp;line</hi></p>^^J
</pre><p noindent='true'>third line^^J
</p>


Note: We have re-indented a little bit the code, and marked newline characters by ^^J. As you can see, each verbatim line gives exactly one line in the XML output, and this line is formed of a <p> element. If you apply a style sheet with the following definition

<xsl:template match="p">
<xsl:choose>
<xsl:when test="parent::pre">
<xsl:apply-templates/>
</xsl:when>
<xsl:otherwise>
<p>
<xsl:if test="@noindent = 'true'">
<xsl:attribute name="class">nofirst noindent</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</p>
</xsl:otherwise>
</xsl:choose>
</xsl:template>


then <p> elements are discarded in a <pre>, and some action is done in case of noindented paragraphs. If moreover the translation of <pre> is defined by the following code

<xsl:template match="pre">
<pre>
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
<xsl:apply-templates/>
</pre>
<xsl:text>&#x0A;</xsl:text>
</xsl:template>


we get finally

<pre class="latex-code"><small>1</small> <tt>first line</tt>
<small>2</small> <tt>second line</tt>
</pre>
<p class="nofirst noindent">third line</p>


This is not valid HTML, since <small> is forbidden in a <pre>. We can modify the style sheet so that if <hi> is in a <pre>, then a special action is taken in the case rend=´small´; we can also remove the useless <tt>. A better solution: we uncomment the definition of \verbatimnumberfont. This will have as effect that verbatim line numbers will be in a <vbnumber> element, and we can apply the following transformation.

<xsl:template match="vbnumber">
<span class='prenumber'>
<xsl:apply-templates/>
</span>
</xsl:template>


Thus, the HTML code will be

<pre class="latex-code"><span class="prenumber">1</span> first line
<span class="prenumber">2</span> second line
</pre>
<p class="nofirst noindent">third line</p>


This document was converted into HTML using the techniques shown here. The style sheet changes the background color of the <pre> element, according to its class, and the background of the <span> to the background of the page.

Note how the style´ option of the verbatim environment gives a class´ attribute in HTML document. If you say

\DefineVerbatimEnvironment{verbatim}{Verbatim}
{listparameters={\topsep0pt },pre=pre}


then verbatim behaves like Verbatim, said otherwise, an optional argument is scanned. Moreover, the list on the second line will be put in \verbatim@hook; whenever a verbatim environment of type Something´ is read, the value of the command \Something@hook is considered (this should be undefined or a command that takes no argument), and the tokens are added to the optional argument, before other arguments.

You can say \numberedverbatim or \unnumberedverbatim. After that, verbatim environments will be automatically numbered or not. This does not apply to Verbatim environments.

There is a command \fvset that takes an associated list as argument. If it contains showspaces=true´ or showspaces=false´, this changes how spaces are interpreted in a verbatim environment or command (except for \verb*, case where the space is always visible).

## 5.6. Case change

There are different commands for changing the case of letters. For instance, the translation of

\uppercase{Einstéin: $E=mc^2$}
\lowercase{Einstéin: $E=mc^2$}


is

<p>EINSTÉIN: <formula type='inline'>
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow><mi>E</mi><mo>=</mo><mi>M</mi><msup><mi>C</mi> <mn>2</mn> </msup>
</mrow>[/itex]</formula>
einstéin: <formula type='inline'>
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow><mi>e</mi><mo>=</mo><mi>m</mi><msup><mi>c</mi> <mn>2</mn> </msup>
</mrow>[/itex]</formula>
</p>


There are two tables that control these conversions: the lc-table and the uc-table. If the lc value of a character is non-zero, it´s the lowercase equivalent of the character; otherwise, the character is left unchanged by \lowercase. The same is true for the uc-table. You can use \lccode and \uccode for changing these tables. They are initialized like this: for all integers x with value between a´ and z´, and between à´ and ÿ´, the uc value is $x-32$, the lc value is x, the same holds for $x-32$. There are four exceptions: the pair 215, 247, this is multiplication and division sign, and the pair 223, 255 this is ß and ÿ. On the other hand, we used the pair 255, 376 (for ÿ and Ÿ).

You can use the two commands \MakeUppercase and \MakeLowercase. These commands have a regular syntax (in the example that follows, the \expandafter would be useless for \lowercase). They convert letters, as for \uppercase and \lowercase, plus some commands that define some characters. This example shows the list of all the recognised commands.

\def\foo{foo}
\def\List{{abcABCéÉ\foo
\oe\OE\o\O\ae\AE\dh\DH\dj\DJ\l\L\ng\NG\ss\SS\th\TH}}
\expandafter\MakeUppercase\List
\expandafter\MakeLowercase\List


The translation is

ABCABCÉÉfoo&#x152;&#x152;ØØÆÆÐÐ&#x110;&#x110;&#x141;&#x141;&#x14A;&#x14A;SSSSÞÞ
abcabcééfoo&#x153;&#x153;øøææðð&#x111;&#x111;&#x142;&#x142;&#x14B;&#x14B;ßßþþ


This gives ABCABCÉÉFOOŒŒØØÆÆÐÐĐĐŁŁŊŊSSSSÞÞ and abcabcééfooœœøøææððđđłłŋŋßßþþ.

Since Tralics version 2.9, all commands listed above expand to characters, that have a non-trivial uc/lc pair. Hence, you can say:

\def\foo{foo}
\edef\List{{abcABCéÉ\"y\"Y\foo
\ij\IJ\oe\OE\o\O\ae\AE\dh\DH\dj\DJ\l\L\ng\NG\ss\SS\th\TH}}
\expandafter\uppercase\List
\expandafter\lowercase\List


This gives ABCABCÉÉŸŸFOOĲĲŒŒØØÆÆÐÐĐĐŁŁŊŊßSSÞÞ, and abcabcééÿÿfooĳĳœœøøææððđđłłŋŋßSSþþ.

## 5.7. Simple commands

We consider here some commands that take no arguments. Unless told otherwise, they are not allowed in math mode. A new paragraph is started (via \leavevmode) in vertical mode.

• \# translates to #, character U+23, ok in math mode.

• \- is \discretionary{-}{}{} in LaTeX, empty in Tralics.

• \AA and \aa translate to Å and å, characters U+C5 and U+E5, ok in math mode, accepts some accents on it.

• \AE and \ae translate to Æ and æ, characters U+C6 and U+E6, ok in math mode, accepts some accents on it.

• \dag translates to †, character U+2020. This is the same character as produced by the math only command \dagger, or the alternate name \textdagger.

• \ddag translates to ‡, character U+2021. This is the same character as produced by the math only command \ddagger.

• \DH and \dh translate to Ð and ð, characters U+D0 and U+F0, ok in math mode.

• \DJ and \dj translates to Đ and đ, characters U+110 and U+111.

• \endguillemets expands to », character U+BB. Is the same as \guillemotright. You should use this as the environment guillemets.

• \fg translates to »; this is U+A0 (no-break space) followed by U+BB.

• \guillemets expands to «. Is the same as \guillemotleft. You should use this as the environment guillemets.

• \ieme is the same as \textsuperscript{e}\xspace. Something like 3\ieme should typeset as 3e.(note: )

• \iemes is the same as \textsuperscript{es}\xspace. Something like 3\iemes should typeset as 3es.

• \ier is the same as \textsuperscript{er}\xspace. Something like 1\ier should typeset as 1er.

• \iers is the same as \textsuperscript{ers}\xspace. Something like 1\iers should typeset as 1ers.

• \iere is the same as \textsuperscript{re}\xspace. Something like 1\iere should typeset as 1re.

• \ieres is the same as \textsuperscript{res}\xspace. Something like 1\ieres should typeset as 1res.

• \LaTeX translates to <LaTeX/>.

• \No and \Numero is the same as N\textsuperscript{o}\xspace. This should render as No.

• \no and \numero is the same as n\textsuperscript{o}\xspace. This should render as no.

• \O and \o translates to Ø and ø, characters U+D8 and U+F8, ok in math mode, accepts some accents.

• \og translates to «; this is U+AB followed by U+A0 (no-break space).

• \P translates to ¶, this is character U+B6. This is like \textparagraph, but allowed in math mode.

• \S translates to §, this is character U+A7. This is like \textsection, but allowed in math mode.

• \slash translates to /. No penalty is added.

• \SS and \ss translate to SS and ß.

• \TeX translates to <TeX/>.

• \TH and \th translate to Þ and þ, characters U+DE and U+FE, ok in math mode.

The following commands all start with text´. They are forbidden in math mode.

• \textasciiacutex translates as the Unicode character U+2032; this is known as prime´. It is not the same as U+27, apostrophe, or U+B4 acute accent, or U+2B9, modifier letter prime.

• \textasciicircum translates to ⌃, character U+2303 .

• \textasciigrave translates to ‵, character U+2035.

• \textbackslash translates to \, character U+5C.

\def\ybar#1{y#1y}
\newenvironment{wbar}{\catcode\$=12\catcode\^=12w}{w} \newcommand\Fct{\@reevaluate\foo\xbar} \newenvironment{Env}{\@reevaluate*{center}{wbar}}{}  the translation of {\Fct{$1^{er}$}} \begin{Env}$3^{eme}$\end{Env}  is <p>x1<hi rend='sup'>er</hi>xy$1^er$y</p> <p rend='center'>3<hi rend='sup'>e</hi></p> <p>w$3^eme$w</p>  This is a part of the transcript file showing the expansion of the command. [11] {\Fct{$1^{er}$}} {begin-group character} +stack: level + 2 for brace \Fct ->\@reevaluate \foo \xbar {\@reevaluate} {Reeval: \foo{$1^{er}$}% \xbar{$1^{er}$}% }  This shows the expansion in the case of a starred command. Note that the current environment is terminated; then everything up to \end{whatever} is read. [12] \begin{Env}$3^{eme}$\end{Env} {\begin} {\begin Env} +stack: level + 2 for environment \Env ->\@reevaluate *{center}{wbar} {\@reevaluate} +stack: ending environment Env; resuming document. +stack: level - 2 for environment {Reeval: \begin{center}$3^{eme}$\end{center}% \begin{wbar}$3^{eme}\end{wbar}% }  ## 5.15. Trees We explain here some commands from the tree-dvips package by Emma Pease. A tree is defined by some nodes and connectors. Each node has a name, whose scope is limited to the current page (Tralics does no validity test for the names). A connector can be attached to the top, bottom, left or right of a node (abreviation is one character of tblr´), or a corner (two letter, one of tb´ followed by one of lr´). • \node{N}{V} creates a <node> element, whose content is the translation of V, with a name attribute N. • \nodepoint{N}[h][v] creates an empty <node>, with a name attribute N. It has optional attributes xpos and ypos, with value h and v. • \nodeconnect[f]{F}[t]{T} creates a <nodeconnect> element, with attribute nameA equal to F, attribute nameB equal to T, attributes posA and posB equal to f and t. These must be positions; if omitted, or invalid syntax, then t´ and b´ are used (bottom of first node is connected to top of second node). • \anodeconnect. Same as above, but the element is named <anodeconnect> (it has an arrow from the first node to the second). • \barnodeconnect[d]{F}{T} creates a <barnodeconnect> element, with attribute nameA equal to F, attribute nameB equal to T, attribute depth equal to d; this should be a dimension (not tested by Tralics). • \abarnodeconnect. Same as above, but the element is named <abarnodeconnect> (it has an arrow from the first node to the second). • \nodecurve[f]{F}[t]{T}{d1}[d2] is like \nodeconnect, but produces a <nodecurve> element, with two additional attributes depthA and depthB, containing the value of d1 and d2 (default value of d2 is d1). • \anodecurve. Same as above, but the element is named <anodecurve>. • \nodetriangle{F}{T} creates a <nodetriangle> element, with the two names. • \nodebox{T} creates a <nodebox> element, with a single name, it adds a decoration to the node. • \nodeoval{T} creates a <nodeoval> element, with a single name, it adds a decoration to the node. • \nodecircle[d]{T} creates a <nodecircle> element, with a single name, and attribute depth with value d; it adds a decoration to the node. For instance \node{a}{Value of node A} \nodepoint{b} \nodepoint{c}[3pt]\nodepoint{d}[4pt][5pt] \nodeconnect{a}{b} \nodeconnect[tl]{a}[r]{c} \anodeconnect{a}{b} \anodeconnect[tl]{a}[r]{c} \barnodeconnect[3pt]{a}{d} \nodecurve{a}{b}{2pt} ? \nodecurve[l]{a}[r]{b}{2pt}[3pt] \nodetriangle{a}{b} \nodebox{a} \nodeoval{a} \nodecircle[3pt]{a}  Translation <node name='a'>Value of node A</node> <node name='b'/> <node xpos='3pt' name='c'/><node ypos='5pt' xpos='4pt' name='d'/> <nodeconnect nameA='a' nameB='b' posA='b' posB='t'/> <nodeconnect nameA='a' nameB='c' posA='tl' posB='r'/> <anodeconnect nameA='a' nameB='b' posA='b' posB='t'/> <anodeconnect nameA='a' nameB='c' posA='tl' posB='r'/> <barnodeconnect nameA='a' nameB='d' depth='3pt'/> <nodecurve nameA='a' nameB='b' posA='b' posB='t' depthB='2pt' depthA='2pt'/>? <nodecurve nameA='a' nameB='b' posA='l' posB='r' depthB='3pt' depthA='2pt'/> <nodetriangle nameB='b' nameA='a'/> <nodebox nameA='a'/> <nodeoval nameA='a'/> <nodecircle nameA='a' depth='3pt'/>  ## 5.16. Linguistic macros The gb4e package allows you to input the following (extract of the thesis of C. Romero) \begin{exe} \ex \label{agen1} \gll ... \th et hit er {\bf \textit{ahte}}.\\ ... that OBJ-it already PRET-possessed.\\ \glt \textit{... that (he) already owned it.} (CMLAMBX1,31.377) \ex \label{agen2} \gll ... the love that men to hym {\bf \textit{owen}}.\\ ... the love that SUBJ-men to OBJ-him PRES-owe.\\ \glt \textit{... the love that men owe him.} (CMCTPARS,313.C2.1087) \end{exe}  The exe environment is used for numbered examples; it is implemented as a list environment, the \ex command behaves like \item (each item is numbered, the item number is saved is a global counter). The TeX source of the package (as used by Tralics) can be found in the distribution. The non-trivial part in the example above is the \gll command. It takes three lines of text (there is also \glll that takes four lines), the first line is a sequence of words (here in old English), the second line another sequence (translated literally, with possible annotations), and the last line is the translation of the whole, with a bibliographic reference. Words in the first two lines are vertically aligned. The algorithm (by Marcel R. van der Goot) is the following; the list is split into words (a space acts as a word separator), each word is typeset via: \hbox{#2\strut#3 }% adds space  where #3 is the word, and #2 is \eachwordone for the first line, \eachwordtwo for the second line, and \eachwordthree for the third line (case of \glll). These commands default to \rm. The words are put in a list (a \vbox, argument #1) like this \setbox#1=\vbox{\hbox{XXX}\unvbox#1}  After that, the two or three lists are merged (the code uses \unvbox and \lastbox in order to get the next element of the list). The command \vtop is used to put two words one above the other, and these boxes are merged together using the following code \setbox\gline=\hbox{\unhbox\gline \hskip\glossglue \vtop{XXX}}  The glue betweeen the boxes is 0pt plus 2pt minus 1pt (remember that each hbox is terminated by some glue). The Tralics implementation is the following. There are two commands \cgloss@gll and \cgloss@glll written in C++, and the package renames them to \gll and \glll. It is not clear what the translation should be (a list of boxes containing boxes?) In the current implementation, we use a table. This means that the resulting XML is easy to interpret; the only drawback is that we loose linebreaks (from the \glossglue). This is the translation of the example. <list type='description'> <item id='uid1692' label='650'> <table rend='inline'><row><cell halign='left'>...</cell> <cell halign='left'>þet</cell> <cell halign='left'>hit</cell> <cell halign='left'>er</cell> <cell halign='left'><hi rend='bold'/><hi rend='it'> <hi rend='bold'>ahte</hi></hi><hi rend='bold'/>.</cell> </row><row><cell halign='left'>...</cell> <cell halign='left'>that</cell> <cell halign='left'>OBJ-it</cell> <cell halign='left'>already</cell> <cell halign='left'>PRET-possessed.</cell> </row></table> <p noindent='true'><hi rend='it'>... that (he) already owned it.</hi> (CMLAMBX1,31.377)</p> </item> <item id='uid1693' label='651'> <table rend='inline'><row><cell halign='left'>...</cell> <cell halign='left'>the</cell> <cell halign='left'>love</cell> <cell halign='left'>that</cell> <cell halign='left'>men</cell> <cell halign='left'>to</cell> <cell halign='left'>hym</cell> <cell halign='left'><hi rend='bold'/><hi rend='it'> <hi rend='bold'>owen</hi></hi><hi rend='bold'/>.</cell> </row><row><cell halign='left'>...</cell> <cell halign='left'>the</cell> <cell halign='left'>love</cell> <cell halign='left'>that</cell> <cell halign='left'>SUBJ-men</cell> <cell halign='left'>to</cell> <cell halign='left'>OBJ-him</cell> <cell halign='left'>PRES-owe.</cell> </row></table> <p noindent='true'><hi rend='it'>... the love that men owe him.</hi> (CMCTPARS,313.C2.1087)</p> </item></list>  ## 5.17. Special parsing rules In the TeXbook, chapter 24, you will find the definition of <general text>. This rule explains that TeX expects a brace-delimited list of tokens, where the starting brace can be either a character, or a token like \bgroup; it can be preceded by optional spaces and \relax tokens. We give here a list of all cases where this rule can be applied. • (old behaviour) The procedure that scans a math subformula skips over optional \relax commands, and if the token found is not a character (it can be a generalized character like \mathchar), it uses this rule. As a result, in the case ofa_\par b$, you get a missing opening brace error, and in the case of$a\par$, this rule is not applied, you get a missing dollar error (this dollar marks the end of the formula.) In a case like$a_\relax b$, Tralics removes the \relax token before attaching the subscript to the kernel, so that the TeX hack is useless (no missing brace/dollar error is signaled, but Tralics may signal a Bad math expression involving a \par cmd). • In no-mathml mode, if a token, say \foo has the same meaning as \relax, it will eppear under its name in the result. An expression like$a_\foo b$is valid, so that relax tokens are removed lately. • In orderto simplify error recovry, \par tokens are forbidden in math mode; moreover a closing delimiter is added (in the example above, it is a dollar sign that terminates the formula, and a second error will be signaled later) • The procedure that scans the four arguments of \mathchoice uses this rule. You can say \sqrt\relax{x} or$a^\relax{b}$, and the \relax is ignored. Replacing \relax by \par gives a missing brace error. Not that \relax is allowed even in the case where the argument is not delimited by braces, such as in$a_\relax b\$.

• The rule applies for commands that produce an accent, like \bar; it does not apply for commands like \frac; so that \frac\A\B constructs a fration with \A as numeratr, and \B as denominator.

• \discretionary. Quoting Knuth “The routine that scans the four mlists of a \mathchoice is very much like the routine that builds discretionary nodes.” Tralics ignores first two arguments and translates the last one inside a group. See example below.

• \hyphenation, \patterns, \special. Argument is ignored by Tralics. The rule applies.

• \insert, \vadjust. These commands are not implemented by Tralics. This means that an error is signaled. However, arguments are scanned as in plain TeX; this means that a register number must be given for \insert. No error is signaled if this number is 255.

• In the case of \hbox or \vbox, there can be some keywords (read and ignored by Tralics). The content of the box is defined by this rule.

• Scanning of \noalign uses this rule. This is not implemented.

• This rule is used when TeX uses the tokens from \output. Not implemented in Tralics.

• This rule is used when scanning the token list of \uppercase, \lowercase, \message, \write.

Examples.

{DISC:\discretionary \relax{1}\relax{2}\relax \bgroup \bf 3}\relax{4}}
\hyphenation\relax\bgroup12}3\patterns\relax\bgroup 45}6

<p>DISC:<hi rend='bold'>3</hi>4