<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" 
 "http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta name="author" content="José Grimm" />
<title>Converting LaTeX to MathML: the Tralics algorithms</title>
<link rel="stylesheet" href="tralics.css" />
<meta name="keywords" content="Latex, XML, HTML, MathML, Tralics" />
</head><body><h1>Converting LaTeX to MathML: the Tralics algorithms</h1><p>French title: Convertir du LaTeX en MathML : les algorithmes de Tralics</p><p> Author: José Grimm<a id="uid1" href="#note1" title="Email: Jose.Grimm@sophia.inria.fr"><small>(note: </small>&#10163;<small>)</small></a></p><p> Location: Sophia Antipolis</p><p>Inria Research Theme: THnum</p><p>Inria Research Report Number: 6373</p><p> Team: Apics</p><p> Date: November 2007</p><p>Keywords: Latex, XML, HTML, MathML, Tralics.</p><p>French keywords: Latex, XML, HTML, MathML, Tralics.</p><h2>Abstract</h2><p>This paper describes how <i>Tralics</i> converts a sequence characters
into a sequence of tokens, into a math list, and finally into a MathML
formula. Tokenisation rules are the same as in TeX, the meaning of these
tokens is the same as in LaTeX, and can be given in packages. Math formulas
are handled in the same spirit as TeX, but construction of the MathML
result is not obvious, due to particularities of both TeX and MathML.</p>
<h2>French Abstract</h2><p>Ce document explique comment <i>Tralics</i> convertit une formule de
mathématique en objet MathML. Les mêmes règles que TeX sont appliquées pour
convertir une suite de caractères en suite de lexèmes, ces lexèmes ont la
même signification que dans LaTeX, et peuvent être définies dans des
paquetages. En mode mathématiques, ces lexèmes sont regroupés en listes,
et traitées dans le même esprit que TeX, les règles précises, décrites dans
ce document, différant considérablement à cause de certaines particularités
de TeX et de MathML.</p>
<hr /><h1>Short Table of Contents</h1><p>
<br /><b>1. <a href="#uid2">Introduction</a></b>
<br /><b>2. <a href="#uid5">General Principles</a></b>
<br /><b>3. <a href="#uid7">Debug</a></b>
<br /><b>4. <a href="#uid9">The Math Scanner</a></b>
<br /><b>5. <a href="#uid26">Notes</a></b>
<br /><b>6. <a href="#uid27">Special Hacks</a></b>
<br /><b>7. <a href="#uid28">The Code Generator</a></b>
<br /><b>8. <a href="#uid52">Adding Fences</a></b>
<br /><a href="#bibliography"><b>Bibliography</b></a></p>

<h1 id="uid2">1. Introduction</h1>
<p><i>Tralics</i> is a LaTeX to XML translator, described in <a href="#bid0" title="Grimm2006">[4]</a> and
<a href="#bid1" title="Grimm2006">[5]</a>. It is used by Inria for production of its Annual Activity
Report in HTML and Pdf. All math formulas found in the HTML pages are images
obtained by converting pieces of the XML document via TeX and tools designed
by D. Carlisle and S. Rahtz, see <a href="#bid2" title="Carlisle, Goossens, Rahtz2000">[2]</a>. The math formulas conform to
the MathML DTD <a href="#bid3" title="Carlisle, Ion, Miner, Poppelier (editors)2001">[3]</a> and can be inserted directly in a HTML
document; some browsers interpret them natively, others
require a plug-in. The header of the HTML file may be</p>
<pre class="xml-code">&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
 "http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd"&gt;
</pre>
<p class="nofirst noindent">Take for instance the following equation:</p>
<div class="mathdisplay"><table width="100%" id="uid3"><tr valign="middle"><td class="leqno"></td><td><math xmlns="http://www.w3.org/1998/Math/MathML" mode="display" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><msubsup><mo>&#8747;</mo> <mn>0</mn> <mi>&#8734;</mi> </msubsup><mi>f</mi><mo>(</mo><mi>x</mi><mo>+</mo><mi>y</mi><mo>)</mo><mi>d</mi><mi>x</mi><mo>=</mo><mrow><mo>|</mo><mi>z</mi><mo>|</mo></mrow><mspace width="3.33333pt"></mspace><mo>.</mo></mrow></math></td><td class="eqno">(1)</td></tr></table></div>
<p class="nofirst noindent">translated by <i>Tralics</i> as</p>
<pre class="xml-code">&lt;math mode='display' xmlns='http://www.w3.org/1998/Math/MathML'&gt;
  &lt;mrow&gt;
   &lt;msubsup&gt;&lt;mo&gt;&amp;#x0222B;&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt; &lt;mi&gt;&amp;#x0221E;&lt;/mi&gt;&lt;/msubsup&gt;
   &lt;mi&gt;f&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;y&lt;/mi&gt;&lt;mo&gt;)&lt;/mo&gt;
   &lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;
   &lt;mrow&gt;&lt;mo&gt;|&lt;/mo&gt;&lt;mi&gt;z&lt;/mi&gt;&lt;mo&gt;|&lt;/mo&gt;&lt;/mrow&gt;
   &lt;mspace width='3.33333pt'/&gt;&lt;mo&gt;.&lt;/mo&gt;
  &lt;/mrow&gt;
&lt;/math&gt;
</pre>
<p class="nofirst noindent">It happens that Amaya, one of the first browsers that renders MathML, shows
very large parentheses, instead of normal ones.
We shall explain in this
document how <i>Tralics</i> converts input characters into MathML code, and the
algorithm it uses in order to get a correct size for delimiters. A description
of all math commands can be found in <a href="#bid4" title="Grimm2007">[6]</a>, its HTML version<a id="uid4" href="#note2" title="Available on http://www-sop.inria.fr/apics/tralics"><small>(note: </small>&#10163;<small>)</small></a>
contains examples of all constructs.
By the way,
the previous expression was entered as follows, and you can see how we forced
<i>Tralics</i> to consider the parentheses as ordinary math objects.</p>
<pre class="latex-code"><span class="prenumber">1</span>     \[\int _0^\infty f\mathord(x+y\mathord) dx = |z|~.\]
</pre>

<h1 id="uid5">2. General Principles</h1>
<p>In general, a compiler is formed of a lexer, a parser, a semantics analyser, a
code generator. The purpose of the parser is to build a tree, using the tokens
given by the lexer, representing the input; it is converted by the
semantics analyser into an output tree, that is converted into a sequence of
bytes by the code generator. The behaviour is TeX is special, in that tokens
are evaluated as soon as possible, and no parse tree is created: the code
generator receives lists of characters, boxes, glue and the like, typesets
them (i.e., splits paragraphs into lines, fixes glue, adds page breaks, etc.)
and converts the result into a sequence of bytes that are written in the DVI
or Pdf file. In the case of a math expression, an intermediate tree is
constructed and processed, and then converted into an ordinary list of
characters, boxes, and the like.
The main difference between <i>Tralics</i> and TeX is, of course, the code
generator, but the handling of math expressions is, in principle, the same.
There may be some subtle differences, as demonstrated by the
following example, which will be explained later:</p>
<pre class="latex-code"><span class="prenumber">2</span>     $\tracingall\sqrt\frac{1}{2}$
</pre>
<p>The lexer is the program that converts lines of text into tokens. There are
two kinds of tokens in TeX, commands and characters. For instance <samp>\[</samp> and
<samp>\int</samp> are two commands. In the case of a multi-letter control sequence, the
space that follows is discarded. The meaning of the command is defined (in an
internal table) by two integers: the command code and a subtype.
For instance the command code of <samp>\mathord</samp> means: change the type of the
item that follows in the current math list, and the subtype is the value of
the new type (here Ord). In <i>Tralics</i> <samp>\alpha</samp> and <samp>\beta</samp> have the
same command
code meaning ordinary math symbol, and the subtype is the index of the
associated XML object in a table<a id="uid6" href="#note3" title="One could imagine a command code meaning: select a Greek letter in the current math font, this is a ..."><small>(note: </small>&#10163;<small>)</small></a>.
In the case of a character, a pair of
integers is created by the lexer; the command code is replaced by the category
code of the character, and the subtype by the ASCII value of the character (in
fact, <i>Tralics</i> is not restricted to 7-bit characters;
instead of using the <samp>\texteuro</samp> command, you can use
<samp>^^^^20ac</samp> or an input encoding such as latin9 that provides
such a character). Letters have category code 11, most other characters have
category code 12. In the example above, the dollar, underscore, and hat
characters
have a special meaning, they are short-cuts for math mode. The space character
has a special category code; it is usually ignored in math mode. Finally the
tilde is an active character (behaves like a command).</p>
<p>A primitive command is one defined in the Pascal (for TeX) or C++ source (for
<i>Tralics</i>), as opposed to user-defined commands that are defined in a format
file, a package, or in the document under translation. For instance, the tilde
active character is a user defined command that takes no argument, and expands
to <samp>\nobreakspace</samp> (a special rule of <i>Tralics</i> converts this into an
ordinary math space in math mode). The <samp>\frac</samp> command takes two arguments
(generally delimited by braces), its expansion is <samp>{1<samp>\over</samp>2}</samp>.
The <samp>\sqrt</samp> command expands to some primitive that puts a
square root around its argument. Because TeX uses a two-pass mechanism, the
braces that delimit the argument of the square root are provided by the
expansion of <samp>\frac</samp>. The <samp>\over</samp> command is a bit strange: its
arguments are not enclosed in braces, but the braces define a scope. The
<samp>\over</samp> command should not be used and <i>Tralics</i> handles <samp>\frac</samp> as a
primitive. This means that the second example does not work.</p>
<p>The `expand´ procedure is one part of the translator, its purpose is to deal
with user defined commands. It handles also conditionals. Some primitive
commands are mode-independent, for instance the commands that modify internal
quantities (category codes, equation numbers, etc). Remaining commands
(including characters) add material to the dvi file (this can be done
indirectly, you can put things in boxes, duplicate or ignore them). In the
case of <i>Tralics</i>, an XML tree is produced instead of a dvi file, and this part
of the program differs considerably from TeX. For instance, TeX splits
paragraphs into line, but not <i>Tralics</i>, so that commands that control the
line-breaking or page-breaking algorithm have a completely different
implementation. Of the three modes of TeX, vertical, horizontal and math, the
most particular one is math mode, and the purpose of this paper is to describe
its implementation in <i>Tralics</i>.</p>

<h1 id="uid7">3. Debug</h1>
<p>The error message produced by <i>Tralics</i> for the example line 2 is the following.</p>
<pre class="latex-code"><span class="prenumber">3</span> Error signaled at line 1 of file tty:
<span class="prenumber">4</span> Missing { inserted before unexpected }
</pre>
<p class="nofirst noindent">This is because the expression was read as <samp>\sqrt</samp><samp>{<samp>\frac</samp>}</samp>
and <samp>\frac</samp> got a closing brace, instead of an opening one. Now, you are in
trouble, because the inserted open brace does not match a closing
brace<a id="uid8" href="#note4" title="Before version 2.10.9, the situation was worse, because Tralics removed the offending closing brace,..."><small>(note: </small>&#10163;<small>)</small></a>.
In fact you will see this</p>
<pre class="latex-code"><span class="prenumber">5</span> Error signaled at line 1 of file tty:
<span class="prenumber">6</span> Extra $ ignored while scanning argument of \sqrt.
</pre>
<p class="nofirst noindent">You can see that you are in even greater trouble, because <i>Tralics</i> will never
find the closing dollar sign if it continues like that. Notice however that
<i>Tralics</i> explains here why the dollar sign is invalid: it expects the closing
delimiter for the argument of the <samp>\sqrt</samp> command.
Assume that we have two lines of text, followed by two empty lines.</p>
<pre class="latex-code"><span class="prenumber">7</span> Error signaled at line 4 of file tty:
<span class="prenumber">8</span> Unexpected \par while scanning argument of \sqrt.
</pre>
<p class="nofirst noindent">You cannot put end-of-paragraph commands in a math formula. The effect of such
a mistake is radical: it stops parsing the expression. In our case, it
provides a closing brace. The second empty line is also read as <samp>\par</samp>, this
signals an error and provides the closing dollar sign.</p>
<p>Consider now the the example on line 1, without the <samp>\mathord</samp>. When you
compile it in verbose mode, the transcript file will contain the following.</p>
<pre class="log-code"><span class="prenumber">9</span> [1] \[\int _0^\infty f(x+y) dx = |z|~. \]
<span class="prenumber">10</span> \[-&gt;$$
<span class="prenumber">11</span> {math shift character}
<span class="prenumber">12</span> +stack: level + 2 for math entered on line 1
<span class="prenumber">13</span> ~ -&gt;\nobreakspace
<span class="prenumber">14</span> \]-&gt;$$
<span class="prenumber">15</span> +stack: level - 2 for math from line 1
</pre>
<p class="nofirst noindent">The first line is printed when <samp>\tracingoutput</samp> is positive and a line is
read from a file; lines containing a command followed by an arrow and some
text are printed when <samp>\tracingmacros</samp> is positive and a command is
expanded; lines that start with a plus sign are produced when
<samp>\tracingrestores</samp> is positive, and the main stack (called semantic nest by
Knuth) is modified, for instance when leaving a group restoration of variables
is shown; lines that are enclosed in braces are printed when
<samp>\tracingcommands</samp> is positive, and a command is evaluated: here it is the
first dollar sign.
Math commands like <samp>\int</samp> do not appear in the transcript file: outside
math mode an error is signaled, in math mode these tokens are added to the
math list, and the list is dumped as a whole, for instance</p>
<pre class="log-code"><span class="prenumber">16</span> Math: $$\int_0^\infty f(x+y) dx = |z|~. $$
</pre>
<p class="nofirst noindent">You can say <samp>\tracingall</samp>, in this case the transcript
file contains everything that is needed in order to understand why <i>Tralics</i> produces strange results, including errors. When <samp>\tracingmath</samp> is true,
the following lines will be printed by the fencing algorithm described in the
last section of this paper.</p>
<pre class="log-code"><span class="prenumber">17</span> MF: After find paren0 0b 2l 4B 6r 9R 10m 12m 15b
<span class="prenumber">18</span> MF: sublist start=0 2l 6r 9R 10m 12m 15R
<span class="prenumber">19</span> MF: Find paren2 k=9 2l 6r
<span class="prenumber">20</span> MF: Find paren1 (1, 8) 2l 6r
<span class="prenumber">21</span> MF: OK 2 6
<span class="prenumber">22</span> MF: Find paren2 k=15 10m 12m
<span class="prenumber">23</span> MF: Find paren1 (10, 14) 10m 12m
<span class="prenumber">24</span> MF: BB 10 14
</pre>

<h1 id="uid9">4. The Math Scanner</h1>
<p>The first pass of the math translator produces a math list, this is a list of
extended tokens; the additional field is generally the value of the current
math font (one of the 15 fonts defined by MathML). The algorithm may replace a
TeX token by an XML value, in this case the additional field indicates the
role of the object (relation, binary, opening delimiter, etc; this is called
the <i>kind</i> in <a href="#bid5" title="Knuth1984">[7]</a>). In the case of
a command like <samp>\frac</samp> that takes
arguments, each argument is defined by a separate math list, the additional
field is a pointer to the argument list. Each math list has a type (if the
list comes from an environment, the type encodes the name of the environment).</p>
<p>The scanner is defined by the following set of rules:</p>
<ul>
<li id="uid10"><p class="nofirst noindent">The `expand´ procedure is called; in particular user defined
commands are evaluated, as well as conditionals.</p>
</li>
<li id="uid11"><p class="nofirst noindent">Mode independent commands are evaluated as usual. Evaluation has no
effect on the current math list.</p>
</li>
<li id="uid12"><p class="nofirst noindent">Non-math commands should not appear, an error will be signaled later.</p>
</li>
<li id="uid13"><p class="nofirst noindent">In general, the current token is added to the trace, this trace may be
printed to the transcript file at the end.</p>
</li>
<li id="uid14"><p class="nofirst noindent">Special case of fonts. If the current token is <samp>\rm</samp>, <samp>\textrm</samp>,
<samp>\rmfamily</samp>, <samp>\mathrm</samp>, a math font switch token N is constructed (in
this case, it selects font number 1). If the current
command takes no argument (like <samp>\rm</samp>), it will be replaced by
N. Otherwise the argument is read, call it L; if the current font is O, then
NLO is read again (see note 1 below).</p>
</li>
<li id="uid15"><p class="nofirst noindent">The effect of a math font is to change an internal variable that holds
the current font value.</p>
</li>
<li id="uid16"><p class="nofirst noindent">Special case of a dollar when a tag is present. There is a hidden
end-of-math hook in <i>Tralics</i>, used by the <samp>\tag</samp> command. If a dollar
sign is seen and the hook is not empty, then the hook is re-inserted (as well as
the dollar sign) and cleared.</p>
</li>
<li id="uid17"><p class="nofirst noindent">Tokens <samp>\begingroup</samp> or <samp>\endgroup</samp> can be used to increase or
decrease nesting (these tokens act as group delimiters). The same effect can
be achieved with braces, but these define a math sub-list. In the case of
<samp>\left</samp> and <samp>\right</samp>, a delimiter is moreover scanned.
See note 2.</p>
</li>
<li id="uid18"><p class="nofirst noindent">If the current token is <samp>\begin</samp> or <samp>\end</samp>, this is the start or
end of an environment; it could be a user defined environment (handled like
a user defined command), or a math environment (producing an array). It is
parsed in the obvious way. Some math expressions are formed of a unique
environment (so that the end-of-math hook must sometimes be inserted).</p>
</li>
<li id="uid19"><p class="nofirst noindent">Special case of <samp>&amp;</samp>, <samp>\\</samp> or
<samp>\multicolumn</samp>. These commands can appear only in a table, either in text
mode or math mode. Special parsing rules are needed in math mode. Strange
errors may be signaled.</p>
</li>
<li id="uid20"><p class="nofirst noindent">If the current token is a dollar sign, it is the start or the end of a
new math formula. Action depends on the type of the formula to be created.
If it´s a simple math formula, the dollar indicates the end of the formula;
if it´s a display math formula, the next token (after expansion) is
considered, it should be a second dollar sign; this indicates the end of
the formula. If <i>Tralics</i> scans the content of a <samp>\hbox</samp> or friends, this
indicates the start of a subformula. See Note 3.</p>
</li>
<li id="uid21"><p class="nofirst noindent">There are commands that take arguments, some are optional, some are
mandatory; for instance, <samp>\sqrt</samp> takes an optional argument, <samp>\genfrac</samp>
has a strange syntax. The parser returns an array of sublists, one per
argument. See note 6.</p>
</li>
<li id="uid22"><p class="nofirst noindent">Commands like <samp>\text</samp>, <samp>\mbox</samp> and <samp>\hbox</samp> read an argument; in
the case of <samp>\hbox</samp>, the <samp>\everyhbox</samp> token list is inserted;
otherwise they behave the same. Other commands that construct boxes are
forbidden (for instance <samp>\vbox</samp>, or extensions like <samp>\xbox</samp>).</p>
</li>
<li id="uid23"><p class="nofirst noindent">Glue is normalised. This is a bit tricky, but <samp>\mskip18mu</samp> is
replaced by <samp>\hspace</samp><samp>{10pt}</samp>; in general a glue value is read,
transformed into a dimension, and handled like an ordinary command with an
argument.</p>
</li>
<li id="uid24"><p class="nofirst noindent">Ordinary commands are added to the list, as well as special characters.
The difference between <samp>\texteuro</samp> and <samp>\alpha</samp> is that the euro sign
is character U+20AC, treated as an ordinary identifier in math mode, while the
alpha letter is known (a bit later in the conversion process) to be an ordinary
identifier, and <samp>\cap</samp> is a binary operator. The translation of a math
symbol could be an entity name <tt class="txt">&amp;alpha;</tt> or its value U+3B1.
The current font is ignored, although Unicode has variants for Greek
letters.</p>
</li>
<li id="uid25"><p class="nofirst noindent">Characters are added to the lists, together with the current font.</p>
</li></ul>

<h1 id="uid26">5. Notes</h1>
<p><b>Note 1.</b> A lot of work on math translation was done when
T. Bouche showed interest in putting
abstracts of journals like the Annales de l´Institut Fourier on the Web using
MathML (see <a href="#bid6" title="Bouche2006">[1]</a>). We provide the no-mathml mode, in which
the translation directly matches the value of the math list. For instance,
translation of formula 1 is</p>
<pre class="xml-code"><span class="prenumber">25</span> &lt;texmath&gt;\int _0^\infty f\mathord (x+y\mathord ) dx = |z|~.&lt;/texmath&gt;
</pre>
<p class="nofirst noindent">Normally, translation of</p>
<pre class="latex-code"><span class="prenumber">26</span>  $12 ab \bf Cd \it Ef \cal Gh$
</pre>
<p class="nofirst noindent">contains</p>
<pre class="xml-code"><span class="prenumber">27</span>   &lt;mn&gt;12&lt;/mn&gt;
<span class="prenumber">28</span>   &lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;b&lt;/mi&gt;
<span class="prenumber">29</span>   &lt;mi&gt;&amp;#x1D402;&amp;#x1D41D;&lt;/mi&gt;
<span class="prenumber">30</span>   &lt;mi&gt;&amp;#x1D438;&amp;#x1D453;&lt;/mi&gt;
<span class="prenumber">31</span>   &lt;mi&gt;&amp;Gscr;&amp;hscr;&lt;/mi&gt;
</pre>
<p class="nofirst noindent">but it can be changed (via an integer variable) to</p>
<pre class="xml-code"><span class="prenumber">32</span>   &lt;mn&gt;12&lt;/mn&gt;
<span class="prenumber">33</span>   &lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;b&lt;/mi&gt;
<span class="prenumber">34</span>   &lt;mi mathvariant='bold'&gt;Cd&lt;/mi&gt;
<span class="prenumber">35</span>   &lt;mi mathvariant='italic'&gt;Ef&lt;/mi&gt;
<span class="prenumber">36</span>   &lt;mi mathvariant='script'&gt;Gh&lt;/mi&gt;
</pre>
<p class="nofirst noindent">You can also ask <i>Tralics</i> to replace the quantity <tt class="txt">&amp;Gscr;</tt> by
<tt class="txt">&amp;#1D4A2;</tt>. The essential difference is that a browser that
has no font containing character U+1D402 will use a question sign or a black
square in the first case, and a normal C, rather than a bold one, in the
second case. When <i>Tralics</i> sees the bold C, it constructs an element formed
of all characters that follow, provided that they have category letter, and
are one of the 26 letters (upper case or lower case). Translation would be the
same with a space after C, but other commands (for instance <samp>\relax</samp> or a
font change) inhibits the mechanism. This mechanism is not applied in the case
of the default font (lines 28 and 33).</p>
<p>Instead of <samp>\bf</samp>, you should use <samp>\textbf</samp> or <samp>\mathbf</samp>. These are
commands that take arguments delimited by braces. These braces may be used by
LaTeX as group delimiters. Hence <i>Tralics</i> inserts <samp>\begingroup</samp> and
<samp>\endgroup</samp> around the argument (it is not clear why <i>Tralics</i> does not use
braces).</p>
<p><b>Note 1bis.</b>
For the following input</p>
<pre class="latex-code"><span class="prenumber">37</span> $A \mathbf{B \mathit{C} D} E$
</pre>
<p class="nofirst noindent">we get the following trace</p>
<pre class="latex-code"><span class="prenumber">38</span> $A \mml@font@bold\begingroup B \mml@font@italic
<span class="prenumber">39</span>         \begingroup C\endgroup\mml@font@bold D\endgroup\mml@font@normal E$
</pre>
<p class="nofirst noindent">and the math list (as well as the no-mathml output) is the same, without the
grouping delimiter:</p>
<pre class="xml-code"><span class="prenumber">40</span> &lt;texmath type='inline'&gt;A \mml@font@bold B \mml@font@italic
<span class="prenumber">41</span> C\mml@font@bold  D\mml@font@normal  E&lt;/texmath&gt;
</pre>
<p>This looks a bit strange, but
you can change the definition of either <samp>\mathbf</samp> or <samp>\mml@font@bold</samp>
(let´s denote this token by <samp>\MFB</samp> for simplicity).
In note 4 below, we explain the simplest method: redefine <samp>\mathbf</samp> as a
command that takes no argument and prints as <samp>\mathbf</samp>. In this case, the
braces are considered as group delimiters, and interpretation of the formula
can change if, by default, they are not considered as group delimiters.
We can redefine <samp>\MBF</samp> so that it prints <samp>\bf</samp>. We must take care to
insert a space, since otherwise you will see <samp>\bfB</samp>. The next difficulty is
that a font switch token has to be inserted between C and D; in order for this
to be bold, we need first that <samp>\MBF</samp> changes the value, and that the
change induced by switching to italics is restored. This implies that we must
take care when redefining <samp>\MBF</samp>, and <i>Tralics</i> must take care of grouping.
Here is a solution</p>
<pre class="latex-code"><span class="prenumber">42</span> \makeatletter
<span class="prenumber">43</span> \def\mml@font@bold{\@curmathfont=2\string\bf\space}
<span class="prenumber">44</span> \def\mml@font@italic{\@curmathfont=3\string\it\space}
<span class="prenumber">45</span> \def\mml@font@normal{\@curmathfont=0\string\normalfont\space}
<span class="prenumber">46</span> $A \mathbf{B \mathit{C} D} E$
</pre>
<p class="nofirst noindent">Translation is</p>
<pre class="xml-code"><span class="prenumber">47</span> &lt;texmath type='inline'&gt;A \bf B \it C\bf  D\normalfont  E&lt;/texmath&gt;
</pre>
<p><b>Note 2.</b>
This example demonstrates grouping:</p>
<pre class="latex-code"><span class="prenumber">48</span> \def\foo{\A\def\A{0}\A}\def\A{1}
<span class="prenumber">49</span> ${\foo}={x\foo}=\left[\foo\right]=\begingroup\foo\endgroup\foo=\foo$
</pre>
<p class="nofirst noindent">You should see

<span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mn>10</mn><mo>=</mo><mrow><mi>x</mi><mn>10</mn></mrow><mo>=</mo><mfenced open="[" close="]"><mn>10</mn></mfenced><mo>=</mo><mn>1010</mn><mo>=</mo><mn>00</mn></mrow></math></span>.
The MathML formula contains chunks separated by an equals sign; the first token is
the integer 10; the second token is a <tt class="txt">&lt;mrow&gt;</tt> that contains x and 10 (the
value of <samp>\A</samp> is restored, and math a list is converted into an <samp>\mrow</samp>
then comes a <tt class="txt">&lt;mfenced&gt;</tt> element
containing 10, it is followed by 1010, and then 00. The <samp>\endgroup</samp> token
restores the value of <samp>\A</samp>, but does not create a sublist, and there is
nothing
that inhibits the conversion of the four digits into a number. After
<samp>\left</samp> or <samp>\right</samp>, a delimiter is needed. This means that the next
token is fully expanded, and <samp>\relax</samp> tokens are discarded, and one of the
known delimiters must be found.</p>
<p><b>Note 3.</b>
In plain TeX, a math formula always starts and ends with a dollar
sign. In LaTeX, you can use the construction <samp>\(</samp>...<samp>\)</samp> for inline
math, and <samp>\[</samp>...<samp>\]</samp> for display math; in these cases, we have commands
that expand to dollar signs. LaTeX provides also <samp>math</samp> and
<samp>displaymath</samp> environments, but these are less used. There are more
complex environments like <samp>equation</samp>, <samp>align</samp>, etc.
Outside math mode, if a dollar sign is seen, math mode is entered. Next token
is read, without expansion. If this token is a dollar (in fact, has category
3) display math will be entered. Otherwise, the token is re-read, unless it is
<samp>\relax</samp>. In any case, when math mode is entered, the current mode will be
normal math, or display math, and one of the token lists <samp>\everymath</samp> or
<samp>\everydisplay</samp> will be inserted.</p>
<p>In the case of a display math formula, the equivalent of <samp>\par</samp> is inserted
before and after the math formula (this has no consequence on the parsing).
The result of the translation is a <tt class="txt">&lt;math&gt;</tt> element in a <tt class="txt">&lt;formula&gt;</tt>
element; both elements hold an attribute that tells if the mode is inline or
display. In display mode, a label is allowed, producing an attribute of the
<tt class="txt">&lt;formula&gt;</tt>.</p>
<p><b>Note 4.</b>
The command <samp>\ensuremath</samp> takes an argument and evaluates it in math mode.
It is coded like that: in math mode, the expansion is the argument, outside
math the expansion is dollar, argument, dollar. However, if the argument is
empty, this is considered as a double dollar, hence starts display math. For
this reason, a <samp>\relax</samp> token is added in front. As explained above, this
token is discarded by <i>Tralics</i>. In a case like this</p>
<pre class="latex-code"><span class="prenumber">50</span> \everymath{\let\bf\relax}
<span class="prenumber">51</span> $\bf x$
</pre>
<p class="nofirst noindent">the <samp>\bf</samp> token is not discarded. Since its meaning is <samp>\relax</samp>, it will
be ignored. Consider the following variant</p>
<pre class="latex-code"><span class="prenumber">52</span> \everymath{\let\textit\relax}
<span class="prenumber">53</span> $ x= \textit{y} +z$
</pre>
<p class="nofirst noindent">If you compile in no-mathml mode, these tokens, whose meaning is <samp>\relax</samp>,
will show in the result, and you get:</p>
<pre class="latex-code"><span class="prenumber">54</span> &lt;texmath type='inline'&gt;\bf x&lt;/texmath&gt;
<span class="prenumber">55</span> &lt;texmath type='inline'&gt; x= \textit {y} +z&lt;/texmath&gt;
</pre>
<p><b>Note 5.</b>
If you add braces around a sequence of tokens this produces a math
list; after translation elements are enclosed in a <tt class="txt">&lt;mrow&gt;</tt> element. This
produces a scope that controls the size of delimiters inside it. This element
is not added if the list has a single element, except in some special cases:
In a case like
<samp>$<samp>\displaystyle</samp><samp>{<samp>\sum</samp>}</samp>´$</samp>,
the <tt class="txt">&lt;mrow&gt;</tt> is required if you
want the prime to be on the right of the sum, rather than on the top. In the
case of <samp>${x_2}_3$</samp>, an error occurs if there is no <tt class="txt">&lt;mrow&gt;</tt> and
we try to convert the XML into Pdf. Note that TeX has a strange exception; if
the math list is a single Acc atom, the atom itself is appended to the
list. This means that, in a case like <samp>${{<samp>\bar</samp> x}_2}_3$</samp>,
the outer braces are ignored (adding scripts does not changed the type) and a
double subscript error is signaled; this behaviour is not implemented in
<i>Tralics</i>.</p>
<p><b>Note 6.</b>
Here is the list of all functions that have a special syntax.</p>
<pre class="latex-code"><span class="prenumber">56</span> $\genfrac (){0pt}3{foo}{bar}$
<span class="prenumber">57</span> $\sqrt{x}, \sqrt[x]{y}$
<span class="prenumber">58</span> $\xleftarrow{u} \xleftarrow[v]{u}\xrightearrow{u} \xrightarrow[v]{u}$
<span class="prenumber">59</span> $\smash{x} \smash[b]{x} \cfrac{x}{y} \cfrac[r]{x}{y}$
<span class="prenumber">60</span> $\operatorname{sin}\operatorname*{cos}$
<span class="prenumber">61</span> $\hat\relax y \sqrt\relax{x}$
<span class="prenumber">62</span> $\mathchoice \relax\relax {1}\relax{2}{3}{4}$
<span class="prenumber">63</span> % $x\relax_y\relax^z$
<span class="prenumber">64</span> $\mathmi{x} \mathmi[a][b]{x} \mathbox[a][b]{foo}{x=y}$
</pre>
<p class="nofirst noindent">The last line shows extensions provided by <i>Tralics</i>. If an odd number of
optional arguments is given, the last one is ignored. In all these cases, the
braces act as a group; optional <samp>\relax</samp> is ignored before the arguments of
<samp>\mathchoice</samp>, <samp>\root</samp>, and commands that produce an accent. The line
that is commented out is parsed normally, the <samp>\relax</samp> tokens are removed
later on, when <i>Tralics</i> attaches the script to the kernel. These rules
concerning <samp>\relax</samp> are strange, but implemented as in TeX (Start of
Chapter 26 of <a href="#bid5" title="Knuth1984">[7]</a> shows that a `filler´, defined in the middle of
Chapter 24, is allowed before an open brace.)</p>

<h1 id="uid27">6. Special Hacks</h1>
<p>There is a special counter, associated to the nomathml option of the program.
In this case, the translation of the math formula is the internal list converted
into a string in straightforward manner. For instance</p>
<pre class="latex-code"><span class="prenumber">65</span> \newcommand\abs[1]{|#1|}
<span class="prenumber">66</span> \[\int _0^\infty f(\frac xy) dx = \abs{z}~. \]
</pre>
<p class="nofirst noindent">translates as</p>
<pre class="xml-code"><span class="prenumber">67</span> &lt;texmath type='display'&gt;\int _0^\infty f(\frac{x}{y}) dx = |z|~. &lt;/texmath&gt;
</pre>
<p class="nofirst noindent">In what follows, we shall assume that the counter is zero. There is a second
counter, associated to the trivialmath option; it is described in details in
the main <i>Tralics</i> documentation, and the purpose is to convert math formulas
into non-math, in the following cases:</p>
<pre class="latex-code"><span class="prenumber">68</span> a single letter $x$, a single number $123$
<span class="prenumber">69</span> a greek letter $\alpha$, a superscript or a subscript with text
<span class="prenumber">70</span> only as $_{\bf foo}$, or numbers with special
<span class="prenumber">71</span> exponents, for instance $2^{nd}$, $3^{i\grave{e}me}$.
</pre>
<p class="nofirst noindent">This translates as</p>
<pre class="xml-code"><span class="prenumber">72</span> &lt;p&gt;a single letter
<span class="prenumber">73</span> &lt;formula type='inline'&gt;&lt;simplemath&gt;x&lt;/simplemath&gt;&lt;/formula&gt;,
<span class="prenumber">74</span> a single number 123
<span class="prenumber">75</span> a greek letter &amp;alpha;, a superscript or a subscript with text
<span class="prenumber">76</span> only as &lt;hi rend='sub'&gt;&lt;hi rend='bold'&gt;foo&lt;/hi&gt;&lt;/hi&gt;, or numbers with special
<span class="prenumber">77</span> exponents, for instance 2&lt;hi rend='sup'&gt;nd&lt;/hi&gt;, 3&lt;hi rend='sup'&gt;e&lt;/hi&gt;.
</pre>
<p>In what follows, we consider only normal formulas. We have some difficulties
with labels, tags and references. Consider for instance</p>
<pre class="latex-code"><span class="prenumber">78</span> \[ \tag{foo} x=y \label{a} \]  and $b=\eqref{a}$
</pre>
<p class="nofirst noindent">Translation of the first formula is</p>
<pre class="xml-code"><span class="prenumber">79</span> &lt;formula id='uid1' type='display'&gt;
<span class="prenumber">80</span>   &lt;math mode='display' xmlns='http://www.w3.org/1998/Math/MathML'&gt;
<span class="prenumber">81</span>      &lt;mrow&gt;
<span class="prenumber">82</span>        &lt;mi&gt;x&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mi&gt;y&lt;/mi&gt;
<span class="prenumber">83</span>        &lt;mspace width='2.em'/&gt;
<span class="prenumber">84</span>        &lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt; foo &lt;/mi&gt;&lt;mo&gt;)&lt;/mo&gt;
<span class="prenumber">85</span>     &lt;/mrow&gt;
<span class="prenumber">86</span>   &lt;/math&gt;
<span class="prenumber">87</span> &lt;/formula&gt;
</pre>
<p class="nofirst noindent">The first important point is that the <samp>\label</samp> command adds an ID to the
<tt class="txt">&lt;formula&gt;</tt> element, not to the math element or one of its sub-elements. As
a consequence, it is impossible to use more than one label. Consider now the
translation of the tag; MathML provides a strange method for equation
numbering, that is badly interpreted by Firefox, and maybe other browsers. For
this reason, it is not used by <i>Tralics</i>. There is a mechanism (described in
full in the <i>Tralics</i> documentation) that allows you to change the behaviour
of the <samp>\tag</samp> command. The default action is to put it at the end of the
formula, with a bit of space and parentheses; multiple tags are merged, an
optional star removes parentheses. It is possible to put the tag as an
attribute to the formula, rather than putting it in the math. Note that a
non-italic font will be used for the tag.</p>
<p>Translation of the second formula is</p>
<pre class="xml-code"><span class="prenumber">88</span> &lt;formula type='inline'&gt;
<span class="prenumber">89</span>    &lt;math xmlns='http://www.w3.org/1998/Math/MathML'&gt;
<span class="prenumber">90</span>      &lt;mrow&gt;
<span class="prenumber">91</span>         &lt;mi&gt;b&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mref target='uid1'/&gt;&lt;mo&gt;)&lt;/mo&gt;
<span class="prenumber">92</span>      &lt;/mrow&gt;
<span class="prenumber">93</span>    &lt;/math&gt;
<span class="prenumber">94</span> &lt;/formula&gt;
</pre>
<p class="nofirst noindent">This was an error in previous version of <i>Tralics</i>. The same error was produced
when putting the reference into a <samp>\mbox</samp>. The main reason is that MathML
defines no <tt class="txt">&lt;mref&gt;</tt> element. In LaTeX, the reference is expected to print
`foo´: note that the <samp>\label</samp> command remembers the value of the <samp>\tag</samp>,
it can be placed after the <samp>\tag</samp> command, this works since amsmath
analyses the formula twice; the same method could be used by <i>Tralics</i>, since
the label is interpreted (converted into an ID attribute) after translation of
the formula. In our opinion, the correct method is to put the tag as an
attribute to the formula, and this value should be used by the renderer for
both formulas.</p>
<p>Consider now the following</p>
<pre class="latex-code"><span class="prenumber">95</span> \def\Big#1{{\hbox{$\left#1\vbox to11.5\p@{}\right.\n@space$}}}
<span class="prenumber">96</span> $\Big($
</pre>
<p class="nofirst noindent">This definition is taken from plain.tex, we shall discuss its use
later. Let´s ignore the last token, whose purpose is to make sure no
additional horizontal space is added by this complicated construction.
TeX creates a math formula; inside this
formula, there is a <samp>\hbox</samp> that can contain any construction that can be
inserted in a line of a paragraph (for instance an image); here it is a math
formula; the math formula contains a <samp>\vbox</samp> (vertical stacking of items,
normally space or horizontal boxes); here the box is empty, its width is zero,
and its height is explicit. It is not possible in MathML to insert random
elements in a math formula. Essentially, <i>Tralics</i> handles line 97
exactly as line 98:</p>
<pre class="latex-code"><span class="prenumber">97</span> $A\mbox{B$C$D F}E$
<span class="prenumber">98</span> $A\text{B}C\text{D}\space\text{F}E$
</pre>
<p class="nofirst noindent">We explained before that <samp>\hbox</samp>, <samp>\mbox</samp> and <samp>\text</samp> were parsed
alike, with the exception that <samp>\hbox</samp> inserts the <samp>\everyhbox</samp> token
list. Here, in line 98, we assume that the command <samp>\text</samp> contains
only characters, maybe font changes, it will be translated into a <tt class="txt">&lt;mtext&gt;</tt>
element. Since MathML provides no possibility of vertical stacking, the
<samp>\vbox</samp> command produces an error. Note that <samp>\hbox</samp> and <samp>\vbox</samp> have
special parsing rules in TeX, these are not implemented in math mode so do
not try <samp>\hbox to 2cm</samp>.</p>

<h1 id="uid28">7. The Code Generator</h1>
<p>Let´s start with the procedure that converts a table, because it is easy to
explain. Consider the following example:</p>
<pre class="latex-code"><span class="prenumber">99</span> \begin{align}
<span class="prenumber">100</span> \formulaattribute{tag}{8-2-3}
<span class="prenumber">101</span> \thismathattribute{background}{white}
<span class="prenumber">102</span> \rowattribute{mathvariant}{bold} x^2 + y^2+100 &amp;=  z^2 \\
<span class="prenumber">103</span> \multicolumn{1}{l}{\text{and}}\\
<span class="prenumber">104</span> \cellattribute{columnalign}{left}  x^3 + y^3+1 &amp;&lt;  z^3
<span class="prenumber">105</span> \end{align}
</pre>
<p class="nofirst noindent">In some cases, the environment starts with an optional argument that specifies
vertical alignment; it is followed by a mandatory argument that specifies
horizontal cell alignment (non-characters are forbidden, characters other than
r, l, c are ignored); for some environments like align, this argument must not
be given (alignment here is rl repeated 5 times). The second argument of
<samp>\multicolumn</samp> must be one of r, l or c; the effect is to add an attribute
the current cell. The first argument of <samp>\multicolumn</samp> must be an integer
<span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mi xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML">k</mi></math></span>, if the current cell has number <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mi xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML">n</mi></math></span>, the next one has number <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>+</mo><mi>k</mi></mrow></math></span>, and
this number is used for determining its alignment; the number <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mi xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML">k</mi></math></span> is added as
an attribute to the cell. The last argument of <samp>\multicomumn</samp> is the
content of the cell.</p>
<p>Translation of the environment is in general a <tt class="txt">&lt;mtable&gt;</tt>, but
fences may be added (for instance in the case of a matrix). Sometimes a
displaystyle attribute is added to the table (for instance in the case of the
`<samp>align</samp>´ environment). In this case the current style is changed to
displaystyle. In the case of the `<samp>multline</samp>´ environment, the first and
last cells have special alignment. If the array is terminated by a
double-backslash, this produces a row with an empty cell. Hence: if the last
cell of the array is empty it will be discarded, and if the last row is empty,
it will be discarded too.</p>
<p>The example above contains some commands with name terminated by
`attribute´. The purpose is to add an attribute to (in order) the current
formula, the current math element, the current row, and the current cell. In
the example, the first cell in the last row has two attributes: its alignment
is right (as defined by the table header) and left (as defined by the
command). These attributes are set in the correct order, so that final
alignment is left. The content of a cell is converted, as any other math list,
according to the algorithm described below.</p>
<p>The code generator takes a sequence of tokens; it converts each token into a
MathML object. In a second step, some special action in done if <samp>\Big</samp>
commands are seen. In a third step, scripts are attached to kernels; in a
fourth step, some <tt class="txt">&lt;mrow&gt;</tt> elements are added in order to get a correct size
for delimiters. Finally, the resulting list is converted into a XML element.
Let´s start with this final step: normally all elements of the list are put in
a <tt class="txt">&lt;mrow&gt;</tt> element (see note 5). If an explicit style command is seen, a
<tt class="txt">&lt;mstyle&gt;</tt> element will be added.</p>
<p>The first step is defined by the following rules</p>
<ul>
<li id="uid29"><p class="nofirst noindent">Space is ignored.</p>
</li>
<li id="uid30"><p class="nofirst noindent">Commands like <samp>\hspace</samp> read a value and produce math space.</p>
</li>
<li id="uid31"><p class="nofirst noindent">Commands like <samp>\mbox</samp>, <samp>\ref</samp>, are processed as expected (see
above). They may produce <tt class="txt">&lt;mtext&gt;</tt> or math space elements.</p>
</li>
<li id="uid32"><p class="nofirst noindent">Commands like <samp>\textstyle</samp> change the current style.</p>
</li>
<li id="uid33"><p class="nofirst noindent">Command <samp>\nonscript</samp> is discarded in non-script style.</p>
</li>
<li id="uid34"><p class="nofirst noindent">Commands <samp>\mathbin</samp> and friends are interpreted (see below).</p>
</li>
<li id="uid35"><p class="nofirst noindent">Normally, a sequence of digits is converted into a <tt class="txt">&lt;mn&gt;</tt> element,
and a sequence of characters into a <tt class="txt">&lt;mi&gt;</tt> element (see Note 1 above).</p>
</li>
<li id="uid36"><p class="nofirst noindent">Hat and underscore, <samp>\right</samp>, etc, are left unchanged.
Note that the token or token list that follows hat or underscore is processed
in a smaller style.</p>
</li>
<li id="uid37"><p class="nofirst noindent">Constants like <samp>\alpha</samp> are replaced by values found in tables.</p>
</li>
<li id="uid38"><p class="nofirst noindent">Lists are interpreted (recursion). In the case of <samp>\left</samp>,
<samp>\right</samp>, delimiters are added. Arrays are handled here.</p>
</li>
<li id="uid39"><p class="nofirst noindent">All remaining tokens should be commands that take arguments. These
arguments are processed, and something is done to them, but there are
subtleties.</p>
</li>
<li id="uid40"><p class="nofirst noindent">In the case of <samp>\mathchoice</samp>, only one of the 4 arguments is used.</p>
</li>
<li id="uid41"><p class="nofirst noindent">In the case of <samp>\operatorname*</samp>, the argument should be a character
string, and the result is a <tt class="txt">&lt;mo&gt;</tt>, classified as operator with nolimits
or displaylimits. Same idea for <samp>\qopname</samp>.</p>
</li>
<li id="uid42"><p class="nofirst noindent">Commands like <samp>\cellattribute</samp> read their arguments, and install an
attribute pair if possible, see example above.</p>
</li>
<li id="uid43"><p class="nofirst noindent">Commands like <samp>\mathmi</samp>, <samp>\boxed</samp>, <samp>\smash</samp>, <samp>\phantom</samp>
are handled.</p>
</li>
<li id="uid44"><p class="nofirst noindent">Commands that generate fractions are considered here. We have a lot a
variants, that may change the style of numerator, its horizontal alignment,
the width of the rule, and delimiters may be added. Commands like
<samp>\xleftarrow</samp> behave like a fraction, with rule replaced by an arrow.
Commands that add accents are handled here. The result can be any XML object,
in some cases it is flagged as a math operator, in some cases as a big object.</p>
</li></ul>
<p>We consider now phase three. The current math list contains some XML
elements, and some unprocessed TeX tokens, that are evaluated now. We
consider three objects K, E and I (kernel, exponent, index), the idea is to
add exponent and index to the kernel; there is a state variable, that can be
looking for K or found K. A type T, an integer S, are also considered.</p>
<ul>
<li id="uid45"><p class="nofirst noindent">If underscore or hat is seen while looking for K, an empty group
<tt class="txt">&lt;mrow/&gt;</tt> is used for K, and K is found.</p>
</li>
<li id="uid46"><p class="nofirst noindent">If <samp>\nonscript</samp> is seen, the token that follows is discarded,
provided it is an XML element of type space.</p>
</li>
<li id="uid47"><p class="nofirst noindent">If a command of type <samp>\mathop</samp> is found while looking for K,
this defines the type T.</p>
</li>
<li id="uid48"><p class="nofirst noindent">If another command if found when looking for a kernel, this is an error.
Otherwise, we have our kernel. If T is not defined by the previous rule, then
T is the type of K.</p>
</li>
<li id="uid49"><p class="nofirst noindent">Consider now the case where we have T. If the current token is
<samp>\limits</samp> or friends, this changes the behaviour of limits placement.
If the command is <samp>\displaylimits</samp>, it will be replaced by <samp>\limits</samp> or
<samp>\nolimits</samp> depending on whether or not current mode is display. This
defines integer S.</p>
</li>
<li id="uid50"><p class="nofirst noindent">If the command is underscore or hat, then the object that follows
becomes I or E. It is an error if the command is the last token in the list,
if the object that follows is hat or underscore, if it not an XML object, or
if the correspondin script is already given.</p>
</li>
<li id="uid51"><p class="nofirst noindent">If the previous rule does not apply (maybe because we are at the end of
the token list), scripts are attached to the kernel, and parsing continues
in mode looking for K.</p>
</li></ul>
<p>Attaching an index is trivial: the result is a <tt class="txt">&lt;msub&gt;</tt> element with two
children K and I. We have four cases to consider, because both I and E, or
none, or one of both can be present. Now, MathML provides <tt class="txt">&lt;munder&gt;</tt>, this
is the construct to be used if the index should be placed under the kernel
(for instance a sum in display mode). The <tt class="txt">&lt;msub&gt;</tt> operator is the correct
one if S says nolimits. It is also correct if T says nolimits (or
displaylimits in non-display mode). Otherwise, the other operator must be used.
If we use <tt class="txt">&lt;munder&gt;</tt> for a sum
in non-display mode, this is incorrect, because MathML assumes by default that
the operator has displaylimits as property (i.e., the movablelimits attribute
is true). In some cases, <i>Tralics</i> sets it correctly to false, and sometimes
it fails. If anything is attached to the kernel K, its type becomes big,
otherwise it remains T.</p>
<p>Phase four of the algorithm is assumed to insert some fences (<tt class="txt">&lt;mrow&gt;</tt>
elements) in order
to control the size of delimiters like parentheses; it uses the fact that some
elements are big, either a fraction, an array, an element with an
accent, or a kernel with a script; it also uses the fact that some
objects are of type variable-size. In a case like <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><msup xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>(</mo><mn>2</mn><mi>x</mi><mo>)</mo></mrow> <mrow><mo>&#8211;</mo><mn>1</mn></mrow> </msup></math></span>, we have a
variable size opening parenthesis, two small objects and a big one, the
closing parenthesis with its superscript. This must be considered a closing
delimiter (if we want to pair it) and a big object (our algorithm does nothing
is nothing is big). Adding fences means to convert the formula into one of
the following variants</p>
<pre class="latex-code"><span class="prenumber">106</span> $ { ( 2x ) ^{-1} }$
<span class="prenumber">107</span> $ \left( 2x \right) ^{-1}  $
<span class="prenumber">108</span> $ { ( 2x ) } ^{-1} $
</pre>
<p class="nofirst noindent">The first variant is not the right one; other two variants are equivalent, and
the last one is chosen. Note that the exponent must be attached to the object
produced by the group (either <tt class="txt">&lt;mfenced&gt;</tt> or <tt class="txt">&lt;mrow&gt;</tt>). For this reason,
if the kernel K is a variable size delimiter and has scripts, we add to the
resulting list the kernel K and a special marker before K-with-scripts. At the
end, the
list is considered again; whenever we have a special marker, preceded by L,
followed by K-with-scripts we replace the kernel of K-with-scripts by L,
discard L and the special marker. In the case of the expression above, the
algorithm constructs a group, and L is this group.</p>

<h1 id="uid52">8. Adding Fences</h1>
<p>Let´s consider the following math expression</p>
<div class="mathdisplay"><math xmlns="http://www.w3.org/1998/Math/MathML" mode="display" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><msubsup><mo>&#8747;</mo> <mn>0</mn> <mi>&#8734;</mi> </msubsup><mi>f</mi><mrow><mo>(</mo><mi>x</mi><mo>+</mo><mi>y</mi><mo>)</mo></mrow><mi>d</mi><mi>x</mi><mo>=</mo><mrow><mo>|</mo><mi>z</mi><mo>|</mo></mrow><mspace width="3.33333pt"></mspace><mo>.</mo></mrow></math></div>
<p class="nofirst noindent">This is the same expression as shown in the introduction, without the
<samp>\mathord</samp>. It is typeset by TeX in exactly the same way, but interpreted
differently by <i>Tralics</i>. The reason is that the expression contains four
stretchy operators, two parentheses and two vertical bars. There is no problem
with operators that stretch horizontally (for instance text over an arrow or
arrow over text), but something must be done with operators that stretch
vertically; these are known to <i>Tralics</i> because they are of type Open, Close
or Between (the command <samp>\mathbetween</samp> has been added to <i>Tralics</i>, it says
that the element that follows should behave like a vertical bar).</p>
<p>In TeX, there is only one way to get a variable size object: it must be
a delimiter used in a left-right scope, the same idea is used in MathML.
In TeX a delimiter is a character that has a <samp>\delcode</samp>, a property that
explains how to get larger versions of the object, in MathML it is an operator
that has the stretchy property and can stretch. When you say <samp>\left[</samp>
followed by some material and <samp>\right]</samp>, this defines a scope with two
delimiters and the height and depth of the delimiters is the height and depth
of everything else in the scope, the <samp>\middle</samp> command takes a delimiter as
argument, and can be used in a left-right group.</p>
<p>In MathML there
is no difference between a <tt class="txt">&lt;mfenced&gt;</tt> element containing A, bar, B, with
parentheses as fences, and a <tt class="txt">&lt;mrow&gt;</tt> element that contains an open
parenthesis, A, bar, B and the second parenthesis<a id="uid53" href="#note5" title="The difference can be in the presence of a `fence´ attribute, that has, according to the MathML reco..."><small>(note: </small>&#10163;<small>)</small></a>.
All three delimiters do
stretch. Moreover, parentheses do stretch even if they are not the first or
last item in the list. For some browsers, the size is not the same when the
delimiter is the first or second item in the list. Currently, <i>Tralics</i> ignores the fact that some binary operators (slash for instance) can be
stretchy.</p>
<p>Our algorithm handles left-right pairs in step one, this never causes problem.
The translation of <samp>\Big[</samp> could be a left bracket that stretches
at least and at most 120 percent of the size of the character: this
possibility does not exists in TeX, and may be implemented in a future
version of <i>Tralics</i>.
The plain TeX definition of <samp>\Big</samp> is is</p>
<pre class="latex-code"><span class="prenumber">109</span> \def\Big#1{{\hbox{$\left#1\vbox to11.5\p@{}\right.\n@space$}}}
</pre>
<p class="nofirst noindent">This code assumes that a ten point font is used. The amsmath package assumes
on the other hand that <samp>\big@size</samp> contains the size of a normal delimiter,
and scales it like this</p>
<pre class="latex-code"><span class="prenumber">110</span> \renewcommand{\Big}{\bBigg@{1.5}}
<span class="prenumber">111</span> \def\bBigg@#1#2{%
<span class="prenumber">112</span>   {\@mathmeasure\z@{\nulldelimiterspace\z@}%
<span class="prenumber">113</span>      {\left#2\vcenter to#1\big@size{}\right.}%
<span class="prenumber">114</span>    \box\z@}}
</pre>
<p class="nofirst noindent">Note that <samp>\vcenter</samp> is used instead of <samp>\vbox</samp>, but the definition is
otherwise the same: we have a group, a box inside the group, and a math
formula inside the box. A simpler definition could be</p>
<pre class="latex-code"><span class="prenumber">115</span> \def\Big#1{\left#1\vbox to11.5\p@{}\right.}
</pre>
<p class="nofirst noindent">It is not possible to put a v-box in a MathML formula, but a
phantom could be used instead.</p>
<p>The current algorithm (step two) is the following. When <i>Tralics</i> sees <samp>\big</samp>
and friends, followed by a token T, it ignores the prefix, unless
<samp>\left T</samp> is valid. The result will be of type left, right or middle,
in case of <samp>\bigl</samp>, <samp>\bigr</samp> or <samp>\bigm</samp>. It will be of type left or
right if T is an opening or closing delimiter. It will be of type middle
otherwise. There is no difference between <samp>\big</samp>, <samp>\Big</samp>, <samp>\bigg</samp> and
<samp>\Bigg</samp>. After that, big-left and big-right are converted to <samp>\left</samp>
and <samp>\right</samp> if properly nested (this implies that, in some cases, the
prefix is ignored). For instance</p>
<pre class="latex-code"><span class="prenumber">116</span> $a\big(b\big)c$
</pre>
<p class="nofirst noindent">is translated as</p>
<pre class="xml-code"><span class="prenumber">117</span> &lt;mrow&gt;
<span class="prenumber">118</span>   &lt;mi&gt;a&lt;/mi&gt;
<span class="prenumber">119</span>   &lt;mfenced open='('close=')'&gt;&lt;mi&gt;b&lt;/mi&gt;&lt;/mfenced&gt;
<span class="prenumber">120</span>   &lt;mi&gt;c&lt;/mi&gt;
<span class="prenumber">121</span> &lt;/mrow&gt;
</pre>
<p>Consider now step four of the algorithm. Each token in the list has a type,
and according to these type, some operators are converted into fences.
The type of <samp>\Bigr[</samp> is large right delimiter (this type is useful only in
step two), the type of <samp>\sum</samp> is math operator with movable limits (this
type explains how scripts should be attached to it). In step four,
only six types are considered. First, we have types l, r,
and m, that correspond to left, right and middle delimiters (these are the
stretchy operators), then we have binary and relation operators
(denoted by B and R). All remaining objects are classified as big or small;
small objects are ignored. Big objects are fractions, symbols with scripts,
etc. The type of an object is often correctly defined by default, but you
can change it by adding a prefix. Example</p>
<pre class="latex-code"><span class="prenumber">122</span> $\mathopen a \mathclose b \mathord c \mathbin d \mathrel e\mathbetween  f $
</pre>
<p class="nofirst noindent">A non trivial question is how to represent the type of a stretchy operator.
For instance, in TeX, a vertical bar is of type ordinary, and preceded by
<samp>\bigm</samp> it becomes a relation. For this reason a new type has been
introduced, as well as the command <samp>\mathbetween</samp>.</p>
<p>Our rules will be explained on two examples, formulas (<a href="#uid3">1</a>) above and
(<a href="#uid54">2</a>) here</p>
<div class="mathdisplay"><table width="100%" id="uid54"><tr valign="middle"><td class="leqno"></td><td><math xmlns="http://www.w3.org/1998/Math/MathML" mode="display" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow><mo>&#8741;</mo><msup><mrow><mo>|</mo><mi>f</mi><mo>|</mo></mrow> <mn>2</mn> </msup><mo>&#8211;</mo><msup><mrow><mo>|</mo><mfrac><msub><mi>p</mi> <mi>n</mi> </msub> <msub><mi>q</mi> <mi>n</mi> </msub></mfrac><mo>|</mo></mrow> <mn>2</mn> </msup><mo>&#8741;</mo></mrow> <mrow><msup><mi>L</mi> <mi>&#8734;</mi> </msup><mrow><mo>(</mo><mi>T</mi><mo>)</mo></mrow></mrow> </msub><mo>&lt;</mo><mi>&#949;</mi><mo>,</mo></mrow></math></td><td class="eqno">(2)</td></tr></table></div>
<p class="nofirst noindent">Line 16, as well as lines 123-124, show the sequence of tokens inserted in the
math lists, while lines 17, 125, 127, and 129 show types constructed by Rule 1.</p>
<pre class="log-code"><span class="prenumber">123</span> $$\mathopen\|\mathopen|f\mathclose|^2 - \mathopen|\frac{p_n}{q_n}
<span class="prenumber">124</span>  \mathclose|^2 \mathclose\|_{L^\infty(T)} &lt;\varepsilon,$$
<span class="prenumber">125</span> MF: After find paren0 0b 1l 3r 4b
<span class="prenumber">126</span> MF: matched 1, 3
<span class="prenumber">127</span> MF: After find paren0 0l 1l 3r 5b 6B 7l 8b 9r 11b 12r 14b 18b
<span class="prenumber">128</span> MF: rec 1, 3; 7, 9;
<span class="prenumber">129</span> MF: After find paren0 0l 3b 4B 5b 7b 8r 10b 14b
<span class="prenumber">130</span> MF: matched 0, 8
</pre>
<p><b>Rule 1.</b> There is nothing to do if the formula has no big object, or
no delimiters. In a case like <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi><mo>(</mo><mi>b</mi><mo>)</mo><mi>c</mi></mrow></math></span> we do nothing if the only big object is b.
In a case like <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi><mo>(</mo><mi>b</mi><mo>)</mo><mi>c</mi><mo>(</mo><mi>d</mi><mo>)</mo><mi>e</mi></mrow></math></span> something is done if any element is big. More
precisely, if we define the brace level of a token as the number of opening
delimiters minus the number of closing ones before it (this can be a negative
number) action is done if there is a token a brace-level zero or negative,
or if two opening delimiters have been seen. In case something is done, the
type of useful tokens is printed (the last item on the line is the length of
the list).</p>
<p>Our algorithm is recursive. Line 125 corresponds to the index attached to the
double bar. This double bar is item number 12 in the list. Since it is preceded
by the command <samp>\mathclose</samp>, it is a closing delimiter, hence you see 12r. Item
13 is a special marker, and item 14 is the K-with-scripts (double bar with
subscript) that must be merged with item 12.</p>
<p><b>Rule 2.</b> Fences are recursively added if possible and useful.
In a case like <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mo>(</mo><mo>[</mo><mi>a</mi><mo>]</mo><mo>=</mo><mo>|</mo><mi>c</mi><mo>|</mo><mo>)</mo><mo>+</mo><mo>[</mo><mi>b</mi><mo>|</mo><mi>c</mi><mo>]</mo><mo>=</mo><mo>(</mo><mi>e</mi><mo>|</mo><mi>f</mi><mo>|</mo><mi>g</mi><mo>)</mo></mrow></math></span> fences are added around brackets,
because they match, and contain at most one middle delimiter. The first pair
of parentheses does not match, because they are not at outer level, the second
pair because there are two vertical bars. Fences are added for all matching
pairs, if at least one is not at outer level<a id="uid55" href="#note6" title="This additional subclause may be relaxed, because rendering of formulas () change if we swap LHS and..."><small>(note: </small>&#10163;<small>)</small></a>.
Example: see line 128. The algorithm restarts with rule 1. The remaining type
list is shown on line 129. Remaining delimiters are properly matched, but
there is no nesting, and rule 3 will be used.</p>
<p><b>Rule 3.</b> In a case like <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>(</mo><msup><mi>x</mi> <mn>2</mn> </msup><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><msup><mi>y</mi> <mn>2</mn> </msup><mo>|</mo><mi>z</mi><mo>)</mo></mrow></mrow></math></span>, that contains alternatively
opening and closing delimiters, and at most one middle delimiter in each
group, obvious fences are used, and that´s all.
The trace will contain a line of the form</p>
<pre class="latex-code"><span class="prenumber">131</span> MF: matched 0, 2  4, 8
</pre>
<p class="nofirst noindent">This means that the first fence starts with token 0 and ends with token 2,
the second fence with token 0 and ends with token 2. See also line 130.</p>
<p><b>Rule 4.</b> In a case like <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mo>&#8747;</mo><mo>|</mo><mi>x</mi><mo>|</mo></mrow></math></span>, we split the formula in two parts:
before and after the big. What precedes the big is handled by the rule 5,
after that, rule 2 is applied, except that if what follows the big contains
only delimiters (of whatever type) obvious fences are used.
The trace will show</p>
<pre class="latex-code"><span class="prenumber">132</span> MF: LBR 1 3
</pre>
<p><b>Rule 5.</b> The algorithm considers sublists (see line 18). In
example 1, everything after the integral sign is considered. This list is
further divided according to binary or relation operators. For instance
<span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mo>|</mo><mi>a</mi><mo>+</mo><mi>b</mi><mo>|</mo><mo>=</mo><mo>|</mo><mi>c</mi><mo>+</mo><mi>d</mi><mo>|</mo></mrow></math></span> can be split at the equals sign, but not <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi><mo>+</mo><mo>|</mo><mi>b</mi><mo>=</mo><mi>c</mi><mo>|</mo><mo>+</mo><mi>d</mi></mrow></math></span>, and
<span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>+</mo><mi>g</mi><mo>(</mo><mi>y</mi><mo>)</mo></mrow></math></span> can be split at the plus sign. We check two cases: binary and
relation, or relation. The example on line 18 shows that the plus at 4B is
discarded. It shows on lines 19 and 22 the two sublists (lines 20 and 23 are
the same, where numbers in parentheses indicate position of first and last
element in the sublist). In some cases pairing is OK, and you will see a line
like 21, where fences are inserted at the location of the delimiter, and
sometimes pairing fails, and fences are added at the start or end of the
subformula (see line 24).</p>
<p>Conclusion: the algorithm described here is implemented in <i>Tralics</i> version
2.10.9; it has been tested on all examples known to the author. Cases where the
algorithm fails exist and you can submit a bug report to the author; he will
use them to improve the quality of the software.
<h1 id="bibliography">Bibliography</h1>
<p class="noindent nofirst" id="bid6">[1] <span class="smallcap">Thierry Bouche.</span> <i>A pdflatex-based automated journal production system. </i>in « TUGboat », number 1, volume 27, 2006.</p>
<p class="noindent nofirst" id="bid2">[2] <span class="smallcap">David Carlisle, Michel Goossens, Sebastian Rahtz.</span> <i>De XML à PDF avec <tt>xmltex</tt> et PassiveTeX. </i>in « Cahiers Gutenberg », number 35-36, pages 79-114, 2000.</p>
<p class="noindent nofirst" id="bid3">[3] <span class="smallcap">David Carlisle, Patrick Ion, Robert Miner, Nico Poppelier (editors).</span> <i>Mathematical Markup Language (MathML) Version 2.0. </i>2001, <a href="http://www.w3.org/TR/MathML2/">http://<!--PASS THROUGH allowbreak-->www.<!--PASS THROUGH allowbreak-->w3.<!--PASS THROUGH allowbreak-->org/<!--PASS THROUGH allowbreak-->TR/<!--PASS THROUGH allowbreak-->MathML2/<!--PASS THROUGH allowbreak--></a></p>
<p class="noindent nofirst" id="bid0">[4] <span class="smallcap">José Grimm.</span> <i>Tralics, a LaTeX to XML translator, Part I. </i>Rapport Technique, number 309, Inria, 2006, <a href="http://hal.inria.fr/inria-00000198">http://<!--PASS THROUGH allowbreak-->hal.<!--PASS THROUGH allowbreak-->inria.<!--PASS THROUGH allowbreak-->fr/<!--PASS THROUGH allowbreak-->inria-00000198</a></p>
<p class="noindent nofirst" id="bid1">[5] <span class="smallcap">José Grimm.</span> <i>Tralics, a LaTeX to XML translator, Part II. </i>Rapport Technique, number 310, Inria, 2006, <a href="http://hal.inria.fr/inria-00069870">http://<!--PASS THROUGH allowbreak-->hal.<!--PASS THROUGH allowbreak-->inria.<!--PASS THROUGH allowbreak-->fr/<!--PASS THROUGH allowbreak-->inria-00069870</a></p>
<p class="noindent nofirst" id="bid4">[6] <span class="smallcap">José Grimm.</span> <i>Producing MathML with Tralics. </i>Rapport de Recherche, number 6181, Inria, 2007, <a href="http://hal.inria.fr/inria-00144566">http://<!--PASS THROUGH allowbreak-->hal.<!--PASS THROUGH allowbreak-->inria.<!--PASS THROUGH allowbreak-->fr/<!--PASS THROUGH allowbreak-->inria-00144566</a></p>
<p class="noindent nofirst" id="bid5">[7] <span class="smallcap">Donald E. Knuth.</span> <i>The TeXbook. </i>Addison Wesley, 1984.</p>
</p>
<h1>Notes</h1><hr /><p class="nofirst noindent" id="note1"><a title="back to text" href="#uid1">Note 1. </a>Email: Jose.Grimm@sophia.inria.fr</p><hr /><p class="nofirst noindent" id="note2"><a title="back to text" href="#uid4">Note 2. </a>Available on <a href="http://www-sop.inria.fr/apics/tralics">http://www-sop.inria.fr/apics/tralics</a></p><hr /><p class="nofirst noindent" id="note3"><a title="back to text" href="#uid6">Note 3. </a>One could imagine a command code
meaning: select a Greek letter in the current math font, this is a possible extension.</p><hr /><p class="nofirst noindent" id="note4"><a title="back to text" href="#uid8">Note 4. </a>Before version 2.10.9, the situation was worse, because
<i>Tralics</i> removed the offending closing brace, and handled <samp>\par</samp>
like ordinary non-math commands</p><hr /><p class="nofirst noindent" id="note5"><a title="back to text" href="#uid53">Note 5. </a>The difference can
be in the presence of a `fence´ attribute, that has, according to the MathML
recommendation, no effect in the suggested visual rendering rules.</p><hr /><p class="nofirst noindent" id="note6"><a title="back to text" href="#uid55">Note 6. </a>This additional
subclause may be relaxed, because rendering of formulas (<a href="#uid3">1</a>) change if
we swap LHS and RHS. Remedy is unclear.</p></body></html>

