Tralics, a LaTeX to XML translator; Part II

French title: Tralics, un traducteur de LaTeX vers XML; Partie II

Author: José Grimm(note: )

Location: Sophia Antipolis – Méditerranée

Inria Research Theme: THnum

Inria Research Report Number: 310

Revision: 4

Team: Apics

Date: September 2005

Revised Date: February 2008

Keywords: Latex, XML, HTML, MathML, XSLT, PostScript, Pdf, stylesheet, formatting.

French keywords: Latex, XML, HTML, MathML, XSLT, PostScript, Pdf, feuilles de style, formatage.

Abstract

In this document we describe Tralics, a LaTeX to XML translator, and its application to the Raweb. There are two parts: the first part describes the translator, the second part the tools required for the Raweb.

This document has different chapters; we shall describe first how TeX can read an XML file and convert it to Pdf; in effect, we shall describe the xmltex, fotex and mathml packages, written by D. Carlisle et S. Rahtz, with some minor bug corrections and additions. We show how style sheets can be used to convert the XML source into XSL/FO or HTML, or even XML. Finally, we shall explain the Raweb DTD.

The second version of this report contains an additional chapter that explains how to convert a Research Report or a PhD thesis in HTML using Tralics and an XML processor. The third version of this reports adds some additional comments.

French Abstract

Dans cet rapport nous décrivons le logiciel Tralics, un traducteur de LaTeX vers XML, et son application au Raweb. La première partie de ce document décrit le traducteur lui-même, et la deuxième partie explique tous les outils nécessaires pour exploiter les fichiers XML.

Ce document contient plusieurs chapitres: on expliquera d´abord comment TeX peut interpréter du XML et produire du Pdf; il s´agit des packages xmltex, fotex, mathml2, écrits par D. Carlisle et S. Rahtz, avec quelques corrections et ajouts. On expliquera comment des feuilles de style permettent de convertir le XML en XSL/FO ou HTML, ou même en XML. Finalement, on expliquera la DTD Raweb.

La version 2 de ce rapport contient un chapitre supplémentaire qui explique comment convertit un document de type rapport de recherche ou thèse en HTML grâce à Tralics et un processeur XML. La version 3 décrit des changements ultérieurs.


1. Introduction

2. Interpreting XML in TeX

3. Interpreting MathML and related stuff in TeX

4. Interpreting XSL/Format in TeX

5. Converting XML to XML

6. Converting XML to HTML

7. Converting XML to XSL/Format

8. Application to other examples

9. The DTDs

10. Corrigendum

Bibliography

[1] Consortium WWW. La spécification XML. in « Cahier Gutenberg », number 33-34, 1999, This is a French translation of the specifications, http://www.w3.org/TR/1998/REC-xml-19980210.

[2] David Carlisle. XMLTEX: A non validating (and not 100% conforming) namespace aware XML parser inplemented in TeX. in « TUGboat », number 3, volume 21, 2000, pages 193-199.

[3] David Carlisle, Patrick Ion, Robert Miner, Nico Poppelier (editors). Mathematical Markup Language (MathML) Version 2.0. http://www.w3.org/TR/MathML2/, 2001.

[4] Michael Kay. XSLT, Programmer´s Reference. edition 2nd, Wrox Press Ltd, 2001.

[5] Michael Kay. XSLT 2.0, Programmer´s Reference. edition 3rd, Wrox Press Ltd, 2004.

[6] Donald E. Knuth. The TeXbook. Addison Wesley, 1984.

[7] Frank Mittelbach, Michel Goossens, Johannes Braams, David Carlisle, Chris Rowley. The LaTeX companion, second edition. Addison Wesley, 2004.

[8] The Unicode Consortium. The Unicode Standard, version 4.0. Addison Wesley, 2003.

[9] W3C. Extensible Markup Language (XML) 1.1. http://www.w3.org/TR/xml11/, 2004.

[10] W3C. Extensible Markup Language (XML) 1.0 (Third Edition). http://www.w3.org/TR/REC-xml/, 1998, Third edition published in 2004.

[11] Web consortium. Namespaces in XML. http://www.w3.org/TR/1999/REC-xml-names-19990114, 1999.

Table of Contents


1. Introduction
2. Interpreting XML in TeX
     2.1. Constructing characters
     2.2. Using UTF-8 characters
     2.3. Warnings
     2.4. Reading the text
     2.5. Namespaces
     2.6. Redefining \protect
     2.7. The catalogue
     2.8. Reading elements
     2.9. End of element
     2.10. Using attributes
     2.11. Processing instructions
     2.12. Declarations
     2.13. Entities
     2.14. Interpreting the Doctype element
     2.15. Grabbing content
     2.16. Defining actions
     2.17. Other commands
     2.18. Example
3. Interpreting MathML and related stuff in TeX
     3.1. Local patches to xmltex.tex
     3.2. Support for MathML
          3.2.1. Fences
          3.2.2. Accents
          3.2.3. More math
          3.2.4. Tables
     3.3. Other commands
          3.3.1. Pictures
          3.3.2. Titlepage
          3.3.3. Images
          3.3.4. Trees
          3.3.5. Tables
          3.3.6. Other commands
     3.4. The fotex.cfg file
4. Interpreting XSL/Format in TeX
     4.1. Generalities
     4.2. The root
     4.3. Mathematics
     4.4. Multiple columns
     4.5. Page masters
     4.6. Page sequences
     4.7. Flows
     4.8. Borders
     4.9. Spacing for blocks
     4.10. Quadding
     4.11. Arrays
     4.12. Boxed blocks
     4.13. Lists
     4.14. Blocks
     4.15. Percentages
     4.16. Fonts
     4.17. Links
     4.18. Footnotes
     4.19. Inline material
     4.20. Floats and images
     4.21. Markers
     4.22. Page numbers and anchors
     4.23. Other elements
     4.24. Bootstrap code
5. Converting XML to XML
     5.1. Converting the XML to the new DTD
     5.2. Addings Ids
6. Converting XML to HTML
     6.1. Common code for HTML conversion
          6.1.1. The main translation rule
          6.1.2. Creating pages
          6.1.3. Titles, keywords, persons
          6.1.4. Other elements
          6.1.5. References
     6.2. Dealing with topics
     6.3. Converting the bibliography
     6.4. Converting the bibliography into HTML
     6.5. Helper style sheets
     6.6. The raweb CSS style sheet
7. Converting XML to XSL/Format
     7.1. The rrrafo3.xsl file
     7.2. The rawebfo file
     7.3. Page definitions
     7.4. The text
     7.5. The table of contents
     7.6. The bibliography
     7.7. People
     7.8. References
     7.9. Generic elements
     7.10. Lists
     7.11. Images
     7.12. Tables
     7.13. Mathematics
     7.14. Other elements
     7.15. Computing column specifications
     7.16. Converting cells
     7.17. Customisation
8. Application to other examples
     8.1. Modifying the source document
          8.1.1. Case of the report
          8.1.2. Case of the thesis
     8.2. The Perl script for extracting trees
     8.3. The style sheet for extracting trees
     8.4. The title page
     8.5. The style sheets
          8.5.1. Splitting the text
          8.5.2. Footnotes
          8.5.3. The index
          8.5.4. Divisions
9. The DTDs
     9.1. The Raweb DTD
          9.1.1. General purpose elements
          9.1.2. Elements specific to the Raweb
          9.1.3. The bibliography
          9.1.4. Research Reports
     9.2. The raweb2 DTD
     9.3. The classes DTD
10. Corrigendum
     10.1. Breaking Urls in the Pdf, 2007/01/28, 2007/07/30
          10.1.1. Examples of hyperlinks
          10.1.2. Typesetting hyperlinks
          10.1.3. Using the \url command
          10.1.4. Avoiding use of \url
          10.1.5. A better solution?
     10.2. Bad TOC layout, 2007/02/04
     10.3. Math fonts, 2007/02/14, 2007/03/20
     10.4. Text font in math, 2007/04/09
     10.5. Math extensions, 2007/02/15
     10.6. Missing minus signs, 2007/02/24
     10.7. Math attributes and other commands, 2007/03/24
          10.7.1. Attributes for arrays
          10.7.2. Explicit equation numbers
          10.7.3. Infinite horizontal glue
     10.8. Operators, limits, fences, 2007/03/20
     10.9. More math fonts, 2007/05/04
     10.10. New raweb DTD, 2007/07/29
     10.11. New module specification, 2007/08/02
     10.12. Input encoding, 2007/11/12
     10.13. Glossary and other indexes, 2007/12/28
     10.14. Additional Commands, 2007/12/31
     10.15. LaTeX font support, 2007/12/31
     10.16. Key-val, 2008/01/27
     10.17. New file IO, 2008/02/10
Bibliography
Index