Tralics, a LaTeX to XML translator; Part I

3. Mathematics

3.1. Introduction

Mathematics play a great role in TeX and Tralics. For instance, TeX has three modes: vertical mode, in which no typesetting is done, horizontal mode (where everything happens) and math mode, a mode in which special objects are handled; a two phase process converts these special objects in normal ones. Fonts to be used in math mode have special properties (see appendices F and G of the TeXbook). Not all subtleties of TeX math can be implemented in Tralics; on the other hand, the XML translation is conforming to MathML. This defines some entities, for instance in isoamsc.ent, there is a definition of &rceil; to &#x02309;. As a consequence, Tralics will translate \rceil to <mo>&rceil;</mo> or <mo>&#x02309;</mo>, depending on an option. Translation of a footnote is in general a <footnote> element, and the user can change the name of this element; this is not done for maths: the name <mo> is a constant.

The syntax of mathematics is often strange. Instead of

\math{E=\fraction{1}{2} m\superscript{v}{2}}

you say

$E={1\over 2} mv^2$

Three categories codes are defined for use in math mode, they correspond to the dollar sign (math shift), underscore character (subscript) and hat character (superscript). If you want a dollar or underscore character, you can say \$, or \_, but \^ produces an accent over what follows, not a hat character (In LaTeX, you can say \textasciicircum, provided that you can guess the name).

In the example above, we have two pseudo commands \fraction and \superscript (followed by two arguments) whereas the plain TeX version uses infix operators (placed between the arguments). The first opertr is greedy. This means that, without the braces in the example above, everything before \over would be the numerator, and everthing after it would be the denominator. On the other hand, you see sometimes 216 instead 216, when people forget braces around the superscript. The essential difference however is that arguments are typeset in different style: the nucleus (what precedes the hat operator) is typeset in text style, while numerator, denominator, superscripts and subscripts are in script style; moreover, it two objects are placed one above the other, cramped style is used used for the object that is below the other one (i.e., the denominator or a subscript). The style influences spacing; because of commands like \over, the current style is known only after the whole expression is parsed. This explains why you may see: Package amsmath Warning: Foreign command \over; \frac or \genfrac should be used instead.

TeX has also a notion of “inner” mode. Inside an inner object, you cannot put an outer one. Such a distinction exists also in HTML, where <div> is outer and <span> is inner. We explained in the previous chapter that \ifinner can be used to check whether current mode is inner or outer, and we mentioned that, outside math mode, this is not well defined in Tralics. This may produces surprising results. Consider for instance \hbox{$$}. Inner mode is the rule inside a box, and a double dollar sign signals the start of an outer (display math) formula. You would expect this expression to provoke an error. In fact, TeX assumes that you know what you do, enters inner math mode when it sees the first dollar sign, and quits when it sees the second one; this gives an empty math formula (in fact, it will contain all tokens from the \everymath hook), surrounded by some space: the value of \mathsurround (this can be set to zero using \m@th). Note that a math formula defines group: assignments made inside the formula are forgotten after full evaluation (in particular after this space is added).

The essential difference between inner (normal, inline) math and outer (display) math is that a display formula uses a line of its own (very often the formula is centered on the line). One could say that a display formula terminates the current paragraph. In fact, it is just interrupted, the paragraph continues after the formula (this is only interesting in constructions like \parshape, whose scope is the current paragraph; here a formula counts for three lines; not implemented in Tralics). The construction \hbox{$$ x$$} produces a display math formula in Tralics, instead of two empty math formulas. Before version 2.11.7, an error was signaled (because Tralics started a new paragraph at the end of the equation, and this is illegal in a box).

A display math formula can have an equation number (via commands \eqno, \leqno, \tag, \notag; these commands were not implemented in early versions, and are described in the last chapter of the second part of this report). The MathML documentation says “One of the important uses of <mlabeledtr> is for numbered equations. In a <mlabeledtr>, the label represents the equation number and the elements in the row are the equation being numbered. The side and minlabelspacing attributes of <mtable> determine the placement of the equation number.” Thus, the recommended way, for MathML, is to use a table, like this (replace ellipsis by an expression)

  <mlabeledtr id='e-is-m-c-square'>
      <mtext> (2.1) </mtext>

This mechanism is not yet implemented. We do not know how to insert numbers automatically, so that the proposed solution is: you can use \label, \ref for any display math formula. This will add an id attribute to the <formula> object, which is a wrapper for the <math>.

When you say {\alpha^2}, TeX will enter math mode with an error of the form Missing $ inserted. On the other hand, Tralics will signal two errors, the first is Math only command \alpha. Missing dollar not inserted, the second is Missing dollar not inserted, token ignored: {Character ^ of catcode 7}. If you want a command that works in math mode and outside math mode, you can say:

\def\foo{\ifmmode \alpha^2 \else $\alpha^2$\fi}

This can be generalised, using the following command


The purpose of the \relax on the last line is for the case of an empty argument: we do not want \ensuremath{} to expand to $$. Note that the argument is handled only once (i.e., \ensuremath does not read it, but calls a helper), because of subtle bugs, see latex bugs data base amslatex/2104. We shall say later `Mode independent commands are interpreted as usual´, this implies that the \relax token will do nothing. We shall see later that, in non-mathml mode, \relax appear in the result unless it is the first in the list. Other commands, not listed in this chapter, may signal an error. For instance, \par is forbidden. Note that \mathchar provokes an Unimplemented command error. If you want a random Unicode character, you should use commands like \mathmi, \mathmo, \mathmn. You can also define a command via \chardef or \mathchardef (the result is the same), and use it, the result is always a <mi> element. The following example shows that \amp produces an ampersand sign in some case, it must be used with care.

$\mathbf{x\AAA\BBB\CCC} \mathmi{foo}\mathmo{\&\#666;}\mathmo{\amp\#777;}$


<formula type='inline'>
  <math xmlns=''>
      <mi mathvariant='bold'>x</mi>

Because a math expression translates as <math> inside a <formula>, and that the math has a long namespace attribute, examples will never fit on a single line. In order to make the result easier to read, we have inserted some newline characters, and reindented all these examples. Two consecutive newline characters are scanned by TeX as space plus \par. This space is ignored by TeX (see TeXbook, the text between exercises 14.12 and 14.13). Hence the general rule in Tralics: when a <p> element is ended, a trailing space or newline is removed from the content of the element, a newline character is added to the parent of the <p>. As a result, you will very often see <p> at the start of a line and </p> at the end of a line in a XML file generated by Tralics.

Consider the following simple example:

$\alpha$ and $$\beta \label{foo}$$

The translation is the following

 <formula type='inline'>
  <math xmlns=''>
 </formula> and</p>
<formula id='uid1' type='display'>
 <math xmlns=''>

You can also say

\(\alpha\) and \[\beta \label{foo}\]

The result is exactly the same. In LaTeX, the commands \(, \), \[ and \] test the current mode. No such test is done by Tralics. The LaTeX implementation of \[ is a bit strange. If the formula is in vertical mode, it will be preceded by a box of width .6\linewidth containing nothing (except two \hss commands to fill it) preceded by the current paragraph indentation. The command \] executes \ignorespaces. As you can see, there is some difference between a single dollar and a double dollar. In the first case, we are in normal math mode, otherwise in display math mode. One difference is the initial style: it is \textstyle (for normal mode) and \displaystyle otherwise (this will be explained later). A second difference is that the \everymath or \everydisplay token list is inserted when scanning the formula depends on the mode. The third difference is specific to Tralics. A display math formula is never `trivial´ (see section 3.5), it can have a label (not more than one): in this case, the <formula> element has an id attribute. In any case, the <formula> element has a type attribute that explains that the formula is inline or display. A non-display formula starts a paragraph; a display math formula cannot appear in a paragraph (the equivalent of \par is executed), if the first non-space token (after expansion) that follows the math formula is not \par, a \noindent token will be inserted (see line 34 of the transcript at page 3.3). Note that, in TeX, a math formula does not end a paragraph, in the sense that a \parshape is valid across math formulas; however what precedes the formula is split into lines, according to parameters in force at the start of the formula. Tralics does not split paragraphs into lines, and does not implement use \parshape.

3.2. The basic objects

The following environments are recognized outside math mode, and produce a math formula: eqnarray*, align*, aligned, split, multline, equation*, math and displaymath. When Tralics sees a dollar character, it looks at the next character (without expansion). If this is a dollar sign, it will be read, and display math mode is entered, otherwise, normal math mode is entered. All environments shown above start display math mode (except math, which enters normal math mode). The environments math and displaymath are equivalent to \(...\) and \[...\] respectively. The environments eqnarray, and split are implemented as arrays. There is no difference between

\begin{eqnarray} a&b\\ c&d \end{eqnarray}
\begin{split} a&b\\ c&d \end{split}


\[\begin{array}{rcl} a&b\\ c&d \end{array}\]
\[\begin{array}{rl} a&b\\ c&d \end{array}\]

Environments equation and align are translated as normal math. A star after the environment name is ignored. In the case of normal math mode, the content of the token list \everymath is inserted before the formula, for displaymath it is \everydisplay. For instance, if you say

\everymath={(N)\ }
\everydisplay={(D)\ }
$\alpha$ and $$\beta$$

the translation will be

 <formula type='inline'>
  <math xmlns=''>
    <mo>(</mo><mi>N</mi><mo>)</mo><mspace width='6pt'/>
    <mi>&alpha;</mi></mrow></math></formula> and</p>
<formula type='display'>
 <math xmlns=''>
   <mo>(</mo><mi>D</mi><mo>)</mo><mspace width='6pt'/>

In TeX, you can put anything inside a math formula, provided it is hidden in a box; this is not possible in Tralics, because we want the XML result to be conforming to MathML. We shall list here all commands valid in math mode, and explain later on how they are translated.

Commands \limits, \nolimits and \displaylimits can be used just after an operator and before subscripts or supscripts, as in \int \limits _x. They are curently ignored by Tralics.

The following environments are recognized: array, matrix, pmatrix, bmatrix, Bmatrix, vmatrix, Vmatrix. All these environments produce arrays. For the first, an argument is required, explaining how cells are aligned. For all other environments, cells are centered. Environments of the form Xmatrix have fences, an implicit \left and \right. In order: parentheses, braces, brackets, simple bars, double bars. There is also an environment cases, with two columns, left aligned, that has an open brace as left delimiter, an empty right delimiter. Example


The translation is the following.

<formula type='inline'>
 <math xmlns=''>
      <mtd columnalign='left'><mi>a</mi></mtd>
      <mtd columnalign='right'><mi>c</mi></mtd>
   <mfenced open='{' close='}'>

The following delimiters are recognized: <, >, ., (, ), [, ] |, \{, \}, \langle, \rangle, \lbrace, \rbrace, \lceil, \rceil, \lgroup, \rgroup, \lfloor, \rfloor, \lmoustache, \rmoustache, \vert, \Vert, \uparrow, \downarrow, \updownarrow, \Uparrow, \Downarrow, \Updownarrow. A delimiter is anything that can follow \left or \right. For MathML, this has to be a character. As the following example shows, we use in most cases a character entity.

$\left\lceil \left\uparrow x\right\}\right.$
$\lceil \uparrow x\}$

The translation is

<formula type='inline'>
 <math xmlns=''>
   <mfenced open='&lceil;' close='.'>
     <mfenced open='&uparrow;' close='&rbrace;'>
<formula type='inline'>
  <math xmlns=''>

This is the list of commands allowed in math mode, as well as in text mode: \dots, \ldots, \quad, \qquad, \␣, \$, \%, \&, \!, \, \{, \}, \i, \sharp, \natural, \flat, \_. The following commands produce space: \;, \:, \>. Note that \! produces a negative space in math mode, nothing outside math mode. Example of use:

\def\alist{\i\j\$\,\_\&\{\}\%\ \^^J\^^I\^^M\!}

This is the translation, with nobreak space replaed by tilde:

&#x131;j$ _&amp;{}%    ~~~,~~~~~~,...,&#x266F;,&#x266E;,&#x266D;
<formula type='inline'>
 <math xmlns=''>
  <mrow><mo>&inodot;</mo><mi>j</mi><mi>$</mi><mspace width='0.166667em'/>
  <mspace width='6pt'/><mspace width='6pt'/>
  <mspace width='6pt'/><mspace width='6pt'/>
  <mspace width='-0.166667em'/><mspace width='1.em'/><mo>,</mo>
  <mspace width='2.em'/>

We give here the list of all symbols that have a translation of the form <mi>&alpha;</mi>. They are of type Ord (ordinary symbol). We start with the lower case Greek letters: \alpha, \beta, \gamma, \delta, \epsilon, \varepsilon, \zeta, \eta, \theta, \iota, \kappa, \lambda, \mu, \nu, \xi, \pi, \rho, \sigma, \tau, \upsilon, \phi, \chi, \psi, \omega, \varpi, \varrho, \varsigma, \varphi, \vartheta, \varkappa, then upper case Greek letters: \Gamma, \Delta, \Theta, \Lambda, \Xi, \Sigma, \Upsilon, \Phi, \Pi, \Psi, \Omega, then other symbols: \hbar, \ell, \wp, \Re, \Im, \partial, \infty, \emptyset, \nabla, \surd, \top, \bottom, \bot, \angle, \triangle. Example

$\alpha\Gamma \surd$

This translates as

<formula type='inline'>
 <math xmlns=''>

Next comes the list of all symbols whose translation is like log. There are of type Ord (ordinary symbol), though they should be Op (large operator). The list is divided in two parts: these have movable limits: \det, \gcd, \inf, \injlim, \liminf, \limsup, \max, \min, \sup, \projlim, and these have not: \dim, \exp, \hom, \ker, \lg, \lim, \ln, \log, \Pr, \arccos, \arcsin, \arctan, \arg, \cos, \cosh, \cot, \coth, \csc, \deg, \sec, \sin, \@mod, \sinh, \tan, \tanh. Example

$\displaystyle\lim_a \liminf_a \sin_a \hom_a$

The LaTeX translation is lim a lim inf a sin a hom a , and the Tralics version is

<formula type='inline'>
<math xmlns=''>
<mstyle scriptlevel='0' displaystyle='true'>
 <msub><mo movablelimits='true' form='prefix'>lim</mo> <mi>a</mi> </msub>
 <msub><mo movablelimits='true' form='prefix'>lim inf</mo><mi>a</mi></msub>
 <msub><mo form='prefix'>sin</mo> <mi>a</mi> </msub>
 <msub><mo form='prefix'>hom</mo> <mi>a</mi> </msub>

From now on, all symbols translate into the form <mo>...</mo>. We start with symbols of type Ord. In reality, most of them they should be of type Op (large operator). \mho, \clubsuit, \diamondsuit, \heartsuit, \spadesuit, \aleph, \backslash, \Box, \imath, \jmath, \square, \cong, \lnot, \neg, \forall, \exists, \coprod, \bigvee, \bigwedge, \biguplus, \bigcap, \bigcup, \int, \sum, \prod, \bigotimes, \bigoplus, \bigodot, \oint, \bigsqcup, \smallint. Examples

$\bigcap \int\oint$

The translation is


These are of type Bin (binary operator). \triangleleft, \triangleright, \bigtriangleup, \bigtriangledown, \wedge, \land, \vee, \lor, \cap, \cup, \multimap, \dagger, \ddagger, \sqcap, \sqcup, \amalg, \diamond, \Diamond, \bullet, \wr, \div, \odot, \oslash, \otimes, \ominus, \oplus, \uplus, \mp, \pm, \circ, \bigcirc, \setminus, \cdot, \ast, \times, \star, \in. Example

$\cap \cup \wr$

The translation is

<formula type='inline'><math xmlns=''>

These are of type Rel (relation). \propto, \sqsubseteq, \sqsupseteq, \sqsubset, \sqsupset, \parallel, \mid, \dashv, \vdash, \Vdash, \models, \nearrow, \searrow, \nwarrow, \swarrow, \Leftrightarrow, \Leftarrow, \Rightarrow, \ne, \neq, \le, \leq, \ge, \geq, \succ, \approx, \succeq, \preceq, \prec, \doteq, \supset, \subset, \supseteq, \subseteq, \bindnasrepma, \ni, \gg, \ll, \gtrless, \geqslant, \leqslant, \not, \notin, \leftrightarrow, \leftarrow, \owns, \gets, \rightarrow, \to, \mapsto, \sim, \simeq, \perp, \equiv, \asymp, \smile, \iff, \leftharpoonup, \leftharpoondown, \rightharpoonup, \rightharpoondown, \hookrightarrow, \hookleftarrow, \Longrightarrow, \longrightarrow, \longleftarrow, \Join, \longmapsto, \frown, \bowtie, \Longleftarrow,


, \Longleftrightarrow. Example.



<formula type='inline'><math xmlns=''>

These are of type Inner: \cdots, \hdots, \vdots, \ddots. These are of type Between (they are of type Ord in TeX, but are used as opening or closing delimiters): \Vert, \|, \vert, \uparrow, \downarrow, \Uparrow, \Downarrow, \Updownarrow, \updownarrow. These are of type Open and Close: \rangle, \langle, \rmoustache, \lmoustache, \rgroup, \lgroup, \rbrace, \lbrace, \lceil, \rceil, \lfloor, \rfloor.

The following characters are classified as `small´: <>,.:;*?!x, these are classified as `small-l´ and `small-r´: ()[], the vertical bar is small-l, these are bin: +/ and the equals sign is of type Rel. Note: what you see here as x is in reality the character 215. It cannot be printed in verbatim mode by LaTeX.

$<>,.:;*?!x ()[]|+-/=$


<formula type='inline'>
 <math xmlns=''>

The following commands are used for accents: \acute, \grave, \mathring, \ddddot, \dddot, \ddot, \tilde, \widetilde, \bar, \breve, \check, \hat, \widehat, \vec, \overrightarrow, \overleftarrow, \underrightarrow, \underleftarrow, \dot.

The following commands are special. They will be explained later: \overline, \underline, \stackrel, \underset, \overset, \mathchoice, \frac, \overbrace, \underbrace, \genfrac, \dfrac, \tfrac, \sqrt, \root.

3.3. Parsing a math formula

This is a non-trivial operation, for this reason in verbose mode, the math expression will be printed on the transcript file. For instance, given

$\begin{cases} x &y\\a&b \end{cases} \mkern18mu x^{ {2 }}!$

whose translation in no-mathml mode is

<texmath type='inline'>
 {\left\rbrace \begin{array}{ll} x &amp;y\\a&amp;b \end{array}\right.}
 \hspace{10.0pt}x^{ {2 }}!

the transcript file will contain

1 {math shift character $}
2 +stack: level + 2 for math entered on line 2
3 +stack: level + 3 for math entered on line 2
4 \cases ->\left \{\begin {array}{ll}
5 +stack: level + 4 for math entered on line 2
6 +stack: level + 5 for cell entered on line 2
7 +stack: level + 6 for math entered on line 2
8 +stack: level - 6 for math from line 2
9 +stack: level - 5 for cell from line 2
10 +stack: level + 5 for cell entered on line 2
11 +stack: level - 5 for cell from line 2
12 +stack: level + 5 for cell entered on line 2
13 +stack: level - 5 for cell from line 2
14 +stack: level + 5 for cell entered on line 2
15 \endcases ->\end {array}\right .
16 +stack: level - 5 for cell from line 2
17 +stack: level - 4 for math from line 2
18 +stack: level - 3 for math from line 2
19 +scanint for \mkern->18
20 +scandimen for \mkern->18.0mu
21 +stack: level + 3 for math entered on line 2
22 +stack: level - 3 for math from line 2
23 +stack: level + 3 for math entered on line 2
24 +stack: level + 4 for math entered on line 2
25 +stack: level - 4 for math from line 2
26 +stack: level - 3 for math from line 2
27 +stack: level - 2 for math from line 2
28 Math: $\begin {cases}{\left\{\begin {array}{ll} x &y\\a&b\end{cases}
29 \end {array}\right.} \mkern\hspace{10.0pt}x^{ {2 }}!$
30 +scanint for \hspace->10
31 +scandimen for \hspace->10.0pt
32 {scanglue 10.0pt\relax }
33 Realloc xml math table to 20
34 {Push p 1}

We shall explain for each line in the transcript file where it comes from. Math mode scanning is entered when the translator sees a math shift character (line 1). The scanner reads some tokens and puts them in a list. The list is printed at the end (lines 28-29). The start of the formula is a bit special, in that the token that follows the first dollar sign is considered unexpanded when we check for a double dollar sign. A new group is entered, before scanning the whole formula (line 2).

The loop is as follows:

We give here an example with some fonts.


The translation is as follows. You can notice that some variants affect only uppercase letters.

<formula type='inline'>
 <math xmlns=''>
   <mi mathvariant='monospace'>A</mi>
   <mi mathvariant='monospace'>b</mi>
   <mi mathvariant='bold'>E</mi>
   <mi mathvariant='bold'>f</mi>
   <mi> G </mi>
   <mi> h </mi>
   <mi mathvariant='sans-serif'>M</mi>
   <mi mathvariant='sans-serif'>n</mi>

3.4. Translation of arrays

Whenever we see an array (this can be a global environment like eqnarray or a local one, like array), we translate all cells one after the other. The character & is the cell separator. The command \\ is the row separator. In the case where an array ends with a \\, this gives an empty row: it will be removed. Each cell has an alignment, left, right, or center. An attribute is added only if this is not center. The array environment has an argument that explains the type of the columns (columns not indicated are centered). The default alignment is `rl´ for split and align, `rcl´ for eqnarray, centered for matrix. You can use \multicolumn. This command takes three arguments: the span which should be some integer, then the alignment (one of r, l or c) and the content of the cell. The program may signal errors in case of wrong syntax. Here is an example:


This is the translation of the array.

  <mtd columnalign='right'><mi>a</mi></mtd>
  <mtd columnalign='left'><mi>c</mi></mtd>
  <mtd columnalign='right'><mi>A</mi></mtd>
  <mtd columnalign='right' columnspan='1'><mi>B</mi></mtd>
  <mtd columnalign='left'><mi>C</mi></mtd>

3.5. Trivial math

If you say `$x$ and $123$´, the translation will be

<p><formula type='inline'><simplemath>x</simplemath></formula> and 123</p>

Initially, we found this a good idea; because this can easily be converted in HTML into <i>x</i>. Moreover `$2^{i\grave eme}$´ gives

<temporary>2<hi rend='sup'>e</hi></temporary>

Here the <temporary> element will not show in the XML tree, but is printed on the terminal if Tralics is called with the `interactivemath´ switch. If you invoke Tralics with the `-notrivialmath´ switch, these hacks are not tried, and the formula translates into:

<formula type='inline'>
  <math xmlns=''>
     <mover accent='true'><mi>e</mi> <mo>&grave;</mo></mover>

There are three hacks: the first is when the formula contains only a letter, the second is when the formula contains only digits, and the last one is when people use a math formula instead of \textsuperscript. This hack is applied only if the math formula starts with digits (no digit at all is OK; braces are ignored) followed by a exponent marker, followed by a special exponent; this has to be a single token or a token list. In the case of a single token, the hack is applied only if this is e or o. Typically, it applies in cases like 2e and No. In the case of more than one token, it applies when the exponent is `th´, `st´, `rd´ and `nd´, for cases like 1st, 2nd, 3rd, and 4th. There are four rules for French: `e´, `eme´, `ieme´, `ème´ and `ième´ convert to `e´, `ier´ and `er´ convert to `er´, `iemes´, `ièmes´ and `es´ convert to `es´, `ère´ and `re´ convert to `re´. The accented letter can be given as è, or \`e or \`{e} or \grave{e} or \grave e. The hack is applied in a case like:

$2 ^{\text{\small\rm \grave ere}} $

Instead of \text, \hbox can be used. Instead of \small or \rm any font change or font size command can be used. Up to two commands can be given. The original Perl version had 30 exceptions, including $\Sigma{}^{{\rm it}}$ and \ddot{\rm o}. Compare Σ it with Σit and o ¨ with ö.

Since version 2.8, there is an integer register named \notrivialmath, that controls these hacks; it contains initially 1, it is set to zero if Tralics is called with the -notrivialmath switch, to seven if Tralics is called with the -trivialmath math switch (and to 349 if Tralics is called with -trivialmath=349). If the value is A+2B+4C modulo 8, where A, B, and C are zero (false) or one (true), then the behavior is the following (by default A is true, other flags are false).

$1^e$, $3^{eme}$ X$^{eme}$ $4^{i\grave{e}me}$
$1^{st}$ $2^{nd}$ $3^{rd}$  $4^{th}$
$x$ $1$ $\alpha$ $\pm$ $\longleftrightarrow$ $-$
$_{foo}$ $^{2+3}$  $_{\bf Foo}$
$+$ $x^{eme}$ $\log$ $_{F\bf oo}$

Translation (with MathML namespace removed), all hacks enabled:

<p>1<hi rend='sup'>e</hi>, 3<hi rend='sup'>e</hi>
    X<hi rend='sup'>eme</hi> 4<hi rend='sup'>e</hi>
1<hi rend='sup'>st</hi> 2<hi rend='sup'>nd</hi> 3<hi rend='sup'>rd</hi>
    4<hi rend='sup'>th</hi>
<formula type='inline'><simplemath>x</simplemath></formula>
   1 &alpha; &pm; &longleftrightarrow; &#x2013;
<hi rend='sub'>foo</hi> <hi rend='sup'>2+3</hi>
   <hi rend='sub'><hi rend='bold'>Foo</hi></hi>
<formula type='inline'><math><mo>+</mo></math></formula>
<formula type='inline'><math><msup><mi>x</mi>
    <mrow><mi>e</mi><mi>m</mi><mi>e</mi></mrow> </msup></math></formula>
<formula type='inline'><math><mo form='prefix'>log</mo></math></formula>
<formula type='inline'><math><msub><mrow></mrow>
     <mrow><mi>F</mi><mi mathvariant='bold'>o</mi>
  <mi mathvariant='bold'>o</mi></mrow> </msub></math>

3.6. Conversion to XML

In the case where the value of the counter \@nomathml is negative, then the translation is a <texmath> element containing all tokens of the math list. For instance,

\binom 12&\int_0^\infty f(x)dx\\[2cm]

translates as

<p><texmath type='inline'>\begin{pmatrix}
\genfrac(){0.0pt}{}{1}{2}&amp;\int _0^\infty f(x)dx\\[2cm]
\@mathfrak W\@mathit _2&amp;\text{xyz}=\sqrt{xxyyzz}

In all other cases we use a highly recursive procedure that converts a math list into a formula. The procedure takes as argument the current style. This is one of D, T, S, or SS (display, text, script, or script script style). It is D for a display math formula, T for a normal formula.

Consider first the case where the formula has an \over, or a variant, not hidden inside braces. This example has 6 subexpressions, each of them have such an operator.

${a\over b}{a\above2mm b}{a\atop b}
{a\overwithdelims[] b}{a\abovewithdelims[]2mm b}{a\atopwithdelims[] b}$

The translation is

<formula type='inline'>
 <math xmlns=''>
   <mfrac><mi>a</mi> <mi>b</mi></mfrac>
   <mfrac linethickness='2mm'><mi>a</mi> <mi>b</mi></mfrac>
   <mfrac linethickness='0.0pt'><mi>a</mi> <mi>b</mi></mfrac>
   <mfenced open='[' close=']'>
       <mfrac><mi>a</mi> <mi>b</mi></mfrac></mfenced>
   <mfenced open='[' close=']'>
       <mfrac linethickness='2mm'><mi>a</mi><mi>b</mi></mfrac></mfenced>
   <mfenced open='[' close=']'>
       <mfrac linethickness='0.0pt'><mi>a</mi> <mi>b</mi></mfrac></mfenced>

It is an error if the formula has more than one such operators. Otherwise, we have two parts: what precedes the operator and what follows the operator. As the example shows, some operators need delimiters. Other operators read a dimension. This dimension must be given explicitly as a sequence of digits and a unit of measure (we could do better; if you want \parindent instead of 2mm, you should use \genfrac instead). After splitting the formula into two parts, the same idea than \genfrac is used. If the current style is C, the next style in the list is used for both parts of the formula (if the style is D or T, the next style is S, otherwise it is SS). Note that \choose is like \over, you should use \binom instead.

We assume from now on that the formula contains no more operators like \over. This means that the current style can be used for the current object. Items are handled as follows:

3.7. Final math mode hacks

Before we forget it: when the formula is completely translated, we have a list of XML elements. If the list is empty, the result is <mrow/>. For instance, in the case of x^{}, then exponent is empty. If the list has a single XML token, this will be the result. Otherwise, everything is put in a <mrow>. If the current formula, or subformula contains a style change, it is put in a <mstyle> element. This is not always the good solution, because the same style is used for everything, what precedes and what follows the style command. If you look at the \genfrac example above, you can see that styles are added by the \genfrac interpreter (the single TeX switch is associated with two MathML attributes).

If we have a formula, of the form $_x^{2}_{abc}$, the translation rules explained so far tell us that we have: an underscore character, an XML element for x, a hat character, an XML element for {2}, an underscore, and an XML element for {abc}. We may have \nonscript tokens; they will be removed, as well as a space that follows. We have to evaluate the commands that control subscripts and superscripts. A hat character gives <msup>, an underscore character gives <msub>, and both give <msubsup>. It is possible for a formula to start with an underscore or a hat: in this case, the kernel is empty. It is not possible for a formula to end with hat or underscore. A kernel can have at most one subscript and at most one superscript; hence the formula above is wrong: the letter x is the first subscript to the empty kernel. A valid formula is for instance $_yx^2$. It translates as

  <msub><mrow></mrow> <mi>y</mi> </msub>
  <msup><mi>x</mi> <mn>2</mn> </msup>

We have mentioned above that some operators can be flagged as left, right, and that adding \bigr may convert a left operator into a right operator. There is a magic that converts, in some cases, the \big operator into fences. For instance

$\bigl [ A\big ( x^2 \big) B \bigr[  $

translates as

<mfenced open='[' close='['>
  <mfenced open='(' close=')'><msup><mi>x</mi> <mn>2</mn> </msup></mfenced>

There is another trick, that works in some cases. Consider:

$\int_0^\infty f(x) dx = \big[  U \big ]$

the translation is

 <msubsup><mo>&int;</mo> <mn>0</mn> <mi>&infin;</mi> </msubsup>
 <mfenced open='[' close=']'><mi>U</mi></mfenced>

The interesting point here is the placement of the inner \mrow. The idea is that the parentheses should remain small (not larger than the \mrow). In particular, it should not be influenced by the integral that precedes and the fence that follows. In some cases, it works.

3.8. Extensions

In Tralics, you can use the following three commands \mathmo, \mathmi, and \mathmn. They take an argument and produce a <mo>, <mi>, or <mn>. There is a file tralics-iso.sty that contains

\def\makecmd#1{\expandafter\newcommand\csname math#1\endcsname}

Then you can say \makemo{x02190}{slarr}, and this will define a command \mathslarr, whose translation (in math mode only) is <mo>&#x02190;</mo>. The file provides nearly 2000 such definitions, taken from the MathML entity files, with the MathML names. These commands can be used instead of TeX commands like \mathchar: remember that a math-char is a 15bit integer, where 8 bits are used for the position in a font table, 3 bits for the type, and 4 bits for the family. Only three types are defined for Tralics, but the content of the element is arbitrary (most math symbols are between U+2100 and U+27FF, there are also letters between U+1D400 and U+1D7FF). There is a command \mathattribute that adds an attribute pair to the last created math element. You can say for instance


After that,

$\min _xf(x) >\operatorname{min} _xf(x)$

translates as

<formula type='inline'>
 <math xmlns=''>
   <msub><mo movablelimits='true' form='prefix'>min</mo> <mi>x</mi> </msub>
    <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>&gt;</mo>
   <msub><mo movablelimits='true' form='prefix'>min</mo> <mi>x</mi> </msub>
    <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo>

The command \DeclareMathOperator takes two arguments (say `foo´ and `bar´), with an optional star before the first argument. It defines \foo to be the command \operatorname applied to `bar´ (with a star when required). The command \operatorname is as shown above (the movablelimits attribute is only added if the command is followed by a star).

You can use the command \mathchardef. This is like \chardef, it reads a command and a number. The number should fit on 15 bits. Otherwise, you will see an error of the form: Bad mathchar replaced by 0: 1234567. The \mathchardef command reads a command, say \foo, and an integer N; there is no difference between \foo and \mathcharN, except that \the\foo returns the integer N, and is faster to parse. Some constants, like \@cclvi=256, are defined in this way by the TeX kernel and should not be used as math characters. Some commands, like \eta=11116, are meant to be used as a math character. In Tralics, until version 2.8 an error will be signaled. In version 2.9, the translation, in math mode, is a <mi> element containing this character; you might say \mathchardef\eta"3B7. Outside math mode, this gives an error: that takes the form Undefined command \eta; command code = 264, instead of Math only command \theta. Missing dollar not inserted; inside math mode, the behavior is the same as the standard one.

TeX has a special register called \fam. If you say something like

\fam3 ${\fam9 \the\fam}\ \the\fam$

then the second \the expands to minus one. The first gives 9, but LaTeX complains with: \textfont 9 is undefined (character 9). In Tralics, you would see

<mrow><mn>9</mn><mspace width='6pt'/><mn>3</mn></mrow>

As the example shows, the family is unused, and not correctly restored. Each character has a \mathcode. The following

\mathcode`\a="0941 $a\the \mathcode`\a$

is interpreted by Tralics as $a2369$. However TeX complains, with \textfont 9 is undefined (character A), because you ask the lower case letter a to be printed like the upper case letter A with textfont 9. A mathcode is a 15bit integer, with an exception: a character whose mathcode is 32768 behaves like an active character, the action associated to it must be defined somehow, for instance like this:

{\catcode`\'=\active \global\let'\active@math@prime}

There is a command \delimiter, it reads a number, but you cannot use it. There is a command \radical, it reads a number, then signals an error. The \mathaccent command is similar.

There are commands \raise and \lower, as well as \vcenter. The last one is not implemented in Tralics. The translation of

a\raise2cm\xbox{foo}{bar}\lower 2pt\xbox{xfoo}{xbar}



As you can see, the specification disappear. Maybe in a future version, we will add an attribute to the box. You cannot use these commands in math mode in Tralics. In TeX, you can get an error of the form: You can´t use `\raise´ in vertical mode, while \vcenter is a math only command. Currently \indent and \noindent are ignored in math mode (in TeX  $\indent_b$ produces a kernel and an index; the kernel is an empty box of width \parindent, of type Ord).

Back to main page