Tralics, a LaTeX to XML translator; Part II

1. Introduction

This document is the second part of a report that explains Tralics and how it can be used for the Raweb. You will find a sequence of commented source files, most of them publicly available on the Web, followed by a large index (over twenty pages).

The index is sorted alphabetically; in the case of <a>, or $c or \z@, the first character is not used for sorting; the object \@car can be found with the character `c´; some private TeX commands have an at-sign in their name, sometimes in the middle, and sometimes at the start.

If you look at the letter Y, you can see five occurrences of the word `year´. The first item uses a normal font; we use such a font for an attribute value (for instance, in the Raweb DTD, line 292, you can see that the <citation> element has an attribute from whose value can be `year´). The second item uses a sans-serif font, this is the name of an attribute (for instance of the main Raweb element has a year attribute). Other items in the index use a type-writer font, and additional characters. For instance <year> is the name of a HTML or XML element, $year is the name of a xsl variable, and \year is the name of a TeX command (this one is not used here).

If you look at the letter I, you will find id, id, and ID and ID. The case of ID is a bit special since it is the name of an attribute set (and only one such data structure is used); and ID is an attribute type. We use id for the name of an attribute, and id for an xsl template (whose purpose is to set the attribute). The index contains also entity names like %list;, such entities appear only in the DTD.

We do not index everything! Consider a source line, of the form

1371 \XMLNSAX{fo}{break-before}{\FObreakbefore}{auto}

The number in the left margin has nothing to do with the line number of the source document; it will be referenced to in some cases. This line is to be interpreted by TeX, it has 28 tokens. However, for a human reader, it contains five tokens, that could be indexed. The first token is the name of a TeX command that appears more than a hundred times (with and without the trailing X); it will not be indexed, as well as other common TeX commands like \def, \gdef, \XMLelement, or <xsl:template> in the part concerning style sheets, and <!ELEMENT> and <!ATTLIST> in the last chapter. The `fo´ in this example is a prefix that appears almost always after \XMLNSA or \XMLNSAX, thus it does not appear in the index. On the other hand, the attribute name, the associated TeX command and the default value will be found in the index. Consider a second example

1503 <xsl:variable name="Directory" select="concat($LeProjet,$year)"/>

Here, the line contains seven tokens. There is an element and two attributes. These will not be found in the index (the attribute name name will only be indexed if it appears in the HTML or XML document, not in the style sheet as xsl keyword). Xsl functions like `concat´ will not be indexed. On the other hand you will find the three variables $Directory, $LeProjet and $year (in the line above, one variable is set, the other two variables are used).

The first chapter describes xmltex.tex. This is a TeX file whose purpose is to read and evaluate an XML file. The interpretation depends on some user commands, to be put in a .xmt file (the “user” here being the guy who designs the DTD of the XML file, as opposed to the author of the document, or the author of xmltex). The file contains a lot of commands of the form \expandafter, \csname, \edef, and the like, that are not described in standard LaTeX books. If you understand this file, you can be called a TeX Master (according to the TeXbook, a Master is somebody who understand tables, a Grandmaster is somebody who can design output routines; the whole XML stuff described in this report is somewhere between these two levels). It is however a challenge for a software like Tralics to be able to read the xmltex.tex file.

Using xmltex is easy. For instance, Chapter 3 explains how maths can be interpreted (this is an extension of the work of Carlisle, the author of xmltex). We have added commands that interpret the picture environment, and some extensions; the only difficulty here is that the commands have an irregular syntax, so that the standard mode of evaluation cannot be used (for instance, if you say

<oval x='1.2' y='3.4' specs='lt'> Text</oval>

we must call the associated TeX command like this

\oval(1.2,3.4)[lt]{Text}

rather than

\oval{1.2}{3.4}{lt}{Text}

Perhaps, the easiest way would be to write an intermediate command. The code is only given as an example of what can be done; it is not completely tested, and not used for the Raweb at all, because we do not know how to convert it into HTML.

Chapter 8 is an addition to version 2 of this document. It explains how to convert a document (like a PhD thesis, or a technical report) into a HTML document, after conversion into XML. We show how to solve a non-trivial problem: there are are objects, similar to <oval> above, that cannot be rendered in HTML, and have to be replaced by images: these are obtained by creating an auxiliary XML file, evaluating it by LaTeX, converting the dvi into a sequence of images.

A similar idea is used in the Kraken software by Nader Salman: in this case, the XML file contains <math> and <cite> elements, containing LaTeX code; a script extracts this code, calls Tralics, and reinserts the math formulas; the <cite> elements are replaced by pointers to the bibliography, generated as a by-product by Tralics; this is a rather original way to produces an activity report, see http://www-sop.inria.fr/odyssee/.

Back to main page