Getting started with Tralics

Contents

This file is a primer for Tralics, a LaTeX to XML translator. We explain how to use it. First, we consider a trivial example, that contains only text, then the same one, with latex markup. After that, we give a small example with a configuration file and a last one with another configuration file. We also explain near the end the meaning of all those numbers printed by Tralics at the end of the job. Finally we give a complete non trivial document.

Introduction

This document was written in 2006 for Tralics version 2.9; since then the syntax changed a bit. The following demonstrates some features

grimm@macarthur$tralics -inputfile=../hello5.xml -o Hello This is tralics 2.11.5, a LaTeX to XML translator, running on macarthur Copyright INRIA/MIAOU/APICS 2002-2008, Jos\'e Grimm Licensed under the CeCILL Free Software Licensing Agreement Fatal error: Cannot open input file ../hello5.tex  If you say tralics hello, the effect is to convert hello.tex into hello.xml. The program takes some arguments; in the example above, there are three arguments separated by spaces. An argument that start with a dash is an option, these are described in options of the tralics command. Some options (like -verbose) take no argument; some take two arguments, and some take a single argument. If option foo takes one argument bar, you can use one of the following possibilities: provide foo and bar as arguments to the command (in the example above we have -o and Hello); or provide foo=bar as a single argument (in the example above, the option is inputfile, its argument is ../hello5.xml). There are no spaces around the equals sign here. If you add spaces, the following can happen: if Tralics sees foo = bar as a single character string, it handles it correctly; if it sees two strings (perhaps because a single space, on the left or on the right of the equals sign), you are in trouble and an error may be signaled; finally, if it sees three string (the second one being the equals sign), it interprets the arguments correctlty. All options start with a dash, and all arguments that are not options are input files; unless the -interactivemath option is used, exactly one input file must be given. If you want to convert a file whose name starts with a dash, you must use the -inputfile option, that is only available since version 2.11.5. Another feature is that you can specify the name of the output file. Note also that if the input file is foo.xml Tralics will not attempt to convert foo.xml.tex into foo.xml.xml; it makes the assumption that you want foo.xml from foo.tex. The first document The simplest document you can give to Tralics has the following form. Hello, world!  Assume that the file is called hello.tex and you call Tralics on it. It is necessary that the file has the .tex extension. You can say tralics hello -noconfig or tralics -noconfig hello.tex to compile it, the result will be the same. If you just say tralics hello, the default configuration file is loaded, but the resulting XML file is the same. This is what you will see on the terminal.  grimm@medee$ tralics hello
1  This is tralics 2.9, a LaTeX to XML translator
2  Copyright INRIA/MIAOU/APICS 2002-2006, Jos\'e Grimm
3  Licensed under the CeCILL Free Software Licensing Agreement
4  Starting translation of file hello.tex.
5  Configuration file identification: standard $Revision: 2.24$
7  Bib stats: seen 0 entries
8  Seen 0 bibliographic entries
9  Math stats: formulas 0, kernels 0, trivial 0, \mbox 0, large 0, small 0.
10 Buffer realloc 0, string 955, size 7136, merge 0
11 Macros created 78, deleted 0; hash size 1393; foonotes 0.
12 Save stack +6 -6.
13 Attribute list search 1344(1336) found 820 in 1072 elements (1070 at boot).
14 Number of ref 0, of used labels 0, of defined labels 0, of ext. ref. 0.
15 Modules with 0, without 0, sections with 0, without 0
16 There was no image.
17 Output written on hello.xml (189 bytes).
18 No error found.


Let's try to understand all these lines. Lines 1 to 3 identify the software and its release number (here 2.9). You may have a different version, that produces a different output. In particular, the third line of the XML result is a comment line that holds the version number, and since version 2.9, the compilation date. Thus, the size of XML output can depend on the version.

Line 4 indicates what Tralics will do (before version 2.13, there was an option telling Tralics to check the file for the Raweb, or call LaTeX). Defaulty is to translate the source file in to XML. Lines 5 and 6 say that some configuration file has been read. If the option -noconfig is given, you see No configuration file instead. For details about configuration files, see Configuration files of Tralics .

Everything starting at line 7 is printed after translation is complete. Lines 7 to 16 will be explained in detail later; in version 2.13 you will not see line 15 (module statistics), 16 (the number of images, if zero), and 8 (number of entries in hello_.bbl, this file is not generated if useless). Line 11 says that 1393 multiletter control sequences were in use, and 78 macros were created; line 13 says that 1070 elements are constructed at bootstrap time (for instance, \empty and \alpha both occupy a slot in the hash table, one is a macro, and the other is associated to an XML element).

The last three lines indicate that Tralics was happy (no errors found) and generated an XML file. If you call Tralics with option -silent, you will see only the first four lines, and the last three ones. Note: on Windows, two characters are used to mark an end of line, instead of one on Unix. Thus, the size of the XML document might be 195 instead of 189. We can look at it:

<?xml version='1.0' encoding='iso-8859-1'?>
<!DOCTYPE unknown SYSTEM 'unknown.dtd'>
<!-- Translated from latex by tralics 2.9, date: 2006/09/20-->
<unknown>
<p>Hello, world!
</p></unknown>


Tralics has generated another file, named hello.log. In this case, the content is more or less the same as what is printed on the terminal. As a general rule, everything printed on the terminal is also printed on the transcript file. In what follows we have marked with -- the lines that differs (the first line printed on the terminal is This is tralics...', this is done before Tralics knows the name of the transcript file). We have marked with ++ some lines that are not printed on the terminal. Three lines are marked **', they indicate some omitted stuff, namely the statistics (same as above), File info and Bootstrap info, these are 25 lines starting with \countdef, 43 with \dimendef, 16 with \chardef or \mathchardef, and 23 with \skipdef. These lines explain, for instance, that the chapter counter is the counter number 28.

-- Transcript file of tralics 2.13.0 for file hello.tex
++ Start compilation: 2008/07/22 09:55:21
++ OS: Linux, machine medee
Starting translation of file hello.tex.
++ Using iso-8859-1 encoding (idem transcript).
++ Left quote is  right quote is '
++ ++ Input encoding is 1 for hello.tex
Configuration file identification: standard $Revision: 2.24$
+++Configuration file has type \documentclass
++ No \documentclass in source file
++ Using some default type
++ dtd is unknown from unknown.dtd (standard mode)
++ OK with the configuration file, dealing with the TeX file...
++ There is a single line
++ Starting translation
** File info
** Bootstrap info
** Statistics


If the option -noconfig is given to Tralics, instead of the line marked +++', you would see No type in configuration file, (and the two lines preceding it would read No configuration file). The next three lines just say that Tralics is about to translate one line of TeX code.

Our second document

The second example looks a bit more like a LaTeX document. Here it is

\documentclass{article}
\begin{document}
Hello, world!
\end{document}


The file is called hello1.tex. If you run tralics hello1 the resulting XML will be as follows. This assumes that no file article.tcf is found, and that the default configuration file maps article to std (this is the case for the file with revision number 2.24).

<?xml version='1.0' encoding='iso-8859-1'?>
<!DOCTYPE std SYSTEM 'classes.dtd'>
<!-- Translated from latex by tralics 2.13.0, date: 2008/07/22-->
<std><p>Hello, world!
</p>
</std>


If you compile the file with tralics hello1 -noconfig, the result XML file will be independent of any configuration or tcf file. Translation is the same as above.

The essential difference between hello.xml and hello1.xlm is that the DTD is now article, or maybe std (this can be configured). We show here the start of the transcript file, indicating with !! the lines that differ from the previous run. Of course, the compilation date is not the same. The important point is Seen \documentclass article. Note that Tralics still says Using some default type, because no configuration file is given.

-- Transcript file of tralics 2.13.0 for file hello1.tex
!! Start compilation: 2008/07/22 10:21:29
Starting translation of file hello1.tex.
Using iso-8859-1 encoding (idem transcript).
No configuration file.
++ No type in configuration file
!! Seen \documentclass article
!! Potential type is article
++ Using some default type
!! dtd is std from classes.dtd (standard mode)
++ Ok with the config file, dealing with the TeX file...
!! There are 4 lines


We focus here on some other parts of the transcript file. First, you can see a lots of lines starting with ++. They are printed by the I/O manager. Whenever Tralics tries to open a file, it print a line in the transcript file. For instance, you can see that the configuration file is searched in three different locations. You can also see that the @ character is made a letter while reading some files; and you can see that cur_file_pos is restored (this variable is used by the class/package mechanism). You can see the action at the end of some files (the virtual file contains the documenthook token list).

There are some other interesting lines. The internal character encoding is UTF-8. We shall explain later how to change the output encoding, as well as the encoding used on the terminal and the transcript file; each input file can have its own encoding. In this example, the output encoding is latin1.

When the \documentclass command is seen, the class file is loaded. As you can see, Tralics uses article.clt instead of article.cls. Since the standard classes (book, report and article) share some options, the common code is in std.clt. Instead of foo.sty, a file named foo.plt is used for packages. Finally, jobname.ult is loaded if it exists. What you can see in the transcript file is, in order, the second argument of \ProvidesClass, which is also printed on the terminal, the default options (argument of \ExecuteOptions) and the actual options (optional argument of \documentclass, empty in this example).

Transcript file of tralics 2.13.0 for file hello1.tex
Start compilation: 2008/07/22 10:21:29
OS: Linux, machine medee
Starting translation of file hello1.tex.
Using iso-8859-1 encoding (idem transcript).
Left quote is  right quote is '
++ Input encoding is 1 for hello1.tex
++ file .tralics_rc does not exist.
++ file ../confdir/.tralics_rc exists.
Configuration file identification: standard $Revision: 2.24$
...
There are 4 lines
Starting translation
...
{\chardef \voidb@x=\char12}
++ file hello1.ult does not exist.
++ file article.clt does not exist.
++ file ../confdir/article.clt exists.
++ Opened file ../confdir/article.clt; it has 25 lines
Document class: article 2006/08/19 v1.0 article document class for Tralics
++ file std.clt does not exist.
++ file ../confdir/std.clt exists.
++ Opened file ../confdir/std.clt; it has 52 lines
File: std 2006/08/19 v1.0 Standard LaTeX document class, for Tralics
++ End of file ../confdir/std.clt
++ cur_file_pos restored to 1
{Options to execute->letterpaper,10pt,oneside,onecolumn,final}
{Options to execute->}
++ End of file ../confdir/article.clt
++ cur_file_pos restored to 0
++ End of virtual file.
++ cur_file_pos restored to 0
...


Our third document

Assume that we have a file, named hello.tcf that contains the following lines.

## This is an example of a configuration file for tralics
## Copyright 2006 Inria/apics, Jose' Grimm
## $Id: hello.tcf,v 1.1 2006/07/17 09:09:06 grimm Exp$
## tralics ident rc=hello.tcf $Revision: 1.1$

DocType = Article classes.dtd
DocAttrib =Foo \World
DocAttrib =A \specialyear
DocAttrib =B  \tralics
DocAttrib =C  \today
BeginCommands
\def\World{world}
\def\today{\the\year/\the\month/\the\day}
End


The file defines, in order, the doctype to use in the XML file, four attributes to the document element, and two commands \World, and \today. The \tralics pseudo-command produces 'Tralics \tralicsversion', and the \specialyear pseudo-command returns 2006 for every date between May 2006 and April 2007 (see configuration file). Important note: classes.dtd or article.dtd is just a character string that Tralics puts in the XML file. If no special command is used, the XML file is well-formed, but no attempt is made to validate it against the DTD.

Consider the following source file, named hello2.tex:

\newcommand\hello{\uppercase {h}ello}
\documentclass{article}
\begin{document}
\hello, \World!
\end{document}


Compile with tralics hello2.tex -config=hello.tcf -oe8. We assume that the tcf file is found, in the current directory, or elsewhere (according to these rules). It should be no surprise if the result is the following

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE Article SYSTEM 'classes.dtd'>
<!-- Translated from latex by tralics 2.13.0, date: 2008/07/22-->
<Article  C='2008/7/21' Foo='world' B='Tralics version 2.13.0' A='2008'>
<p>Hello, world!
</p>
</Article>


Note the following three points: the order in which attributes appear in the XML document is not guaranteed; when Tralics prints the date, it uses two digits for the month and day, the \today command shown above is not perfect; the -oe8 options tells Tralics to use UTF-8 output. In this case, only ASCII characters are used, so that nothing special happens.

If we use the following invocation
tralics hello2.tex -config=hello.tcf -verbose
the transcript file contains much more lines. We give here some of them.

  1 Transcript file of tralics 2.13.0 for file hello2.tex
2 Copyright INRIA/MIAOU/APICS 2002-2008, Jos\'e Grimm
4 Start compilation: 2008/07/22 10:36:37
6 Starting translation of file hello2.tex.
10 Trying config file from user specs: hello.tcf
12 Configuration file identification: hello.tcf $Revision: 1.1$
14 Using tcf type hello
15 dtd is Article from classes.dtd (standard mode)
16 Ok with the config file, dealing with the TeX file...
17 There are 5 lines
18 Starting translation
219 [1] %% Begin bootstrap commands for latex
220 [2] \@flushglue = 0pt plus 1fil
221 {\skip19}
222 +scanint for \@flushglue->0
223 +scandimen for \@flushglue->0.0pt
224 +scanint for \@flushglue->1
480 [54] %% End bootstrap commands for latex
482 [1] \newcommand\hello{\uppercase {h}ello}
487 [13]   \def\World{world}
491 [14]   \def\today{\the\year/\the\month/\the\day}
528 [1] \InputIfFileExists*+{hello2.ult}{}{}
535 [2] \documentclass{article}
542 ++ Opened file ../confdir/article.clt; it has 25 lines
744 {\begin document}
751 [1] \let\do\noexpand\ignorespaces
765 [4] \hello, \World!
767 \hello->\uppercase {h}ello
768 {\uppercase}
771 Character sequence: Hello, .
772 \World->world
773 Character sequence: world! .
774 [5] \end{document}
775 {\end}
776 {Text:Hello, world!
777 }
778 {\end document}
780 {\enddocument}
781 {\endallinput}
782 {\enddocument}
784 {\endgroup (for env)}
791 Bib stats: seen 0 entries
794 Macros created 144, deleted 1; hash size 2242; foonotes 0.
800 (For more information, see transcript file hello2.log)

Lines 10 and 13 show that Tralics has understood that we wanted the file hello.tcf to be used as configuration file, and it has found it. Line 12 is the copy of a special marker found in the configuration file. Note the space after the dollar sign: a system like rcs will not modify the number in this html file!

Lines 219 to 480 correspond to some definitions (\@flushglue, etc) that are systematically done. On line 794, you can see Macros created 144, deleted 1. In fact 141 macros are created at bootstrap phase, and 3 are defined later: on line 482 you can see the definition of \hello, on line 487 that of \World, and on line 506 the redefinition of \today (this command holds the date as printed on line 4, namely '2008/07/22 10:36:37'). The hash size 2242 is the number of multicharacter control sequences entered in the hash table.

For details about the transcript file consider the example of \loop macro. We have left lines 767 and 768 that show macro expansion, and lines 771, 773 that show all the characters (non-commands) translated by Tralics. The last character on line 773 is a newline character shown as a space here, but as a newline between 776 and 777. These two lines indicate a piece of text added to the XML tree.

Lines 780 and 782 indicate execution of the \enddocument command (this trick is needed because of the end-document-hook mechanism), and line 781 indicates that all lines after the \end{document} line have been discarded. Note. Assume that we have a file foo containing \input{hello2} \bar and maybe some other stuff. When the command at line 780 is seen, LaTeX will execute the end-document hook and stop, but Tralics must translate the bibliography. As a consequence, it executes the pseudo command \endallinput, that calls \endinput on all open files (foo and hello2). As a consequence, all unread characters from these files are ignored. Moreover, characters from the current line of each file (in the example \bar) are removed. There is an exception, that could be removed in the future: characters from the current line from the current file are not removed. Thus the newline character after the \end{document} is not discarded. This explains the additional character in the XML result.

Our fourth document

We assume now that we have a configuration file hello3.tcf containing this:

## This is an example of a configuration file for tralics
## Copyright 2006 Inria/apics, Jose' Grimm
## $Id: hello3.tcf,v 2.3 2006/07/24 12:09:34 grimm Exp$
## tralics ident rc=helloconf3!

BeginCommands
\def\World{world}
End

BeginTitlePage
\maketitle <Title> "" ""
End

DocType = Article classes.dtd
att_language = "language"

BeginCommands
\newcommand\hello{\uppercase {h}ello}
End

BeginTitlePage
\abstract <abstract> "No abstract given"
\author <author> "No author given"
End


A standard configuration file would consist of a single block BeginCommands ... End' and a single block BeginTitlePage ... End', instead of two blocks. Nevertheless, you have the right to split your commands as shown here. This configuration file defines five commands. There is no restriction on \hello and \World, but the \maketitle command can be used only once. Moreover, the \abstract and \author commands have to be used before the \maketitle command, they have a default value.

Consider a source file hello3.tex that contains the following lines

% -*- latex -*- utf8-encoded
% tralics configuration file 'hello3'

\documentclass{book}
\begin{document}
\maketitle
\hello, \World!
\end{document}


The second line of the document tells Tralics to use hello3.tcf' as configuration file rather than the default. Since the document class is book, the main element of the XML output has part='true' in its attribute list. It has also language='english', because the default language is language number 0, namely english and the configuration file provides the attribute name to use (value of att_language).

This file contains non-ASCII characters, so that it will be converted to the input encoding. By default, latin1 is assumed, so that no error will be signaled. Here, the first line of the document says its encoding is UTF-8. By default the output encoding is also latin1, so that we say tralics hello3 -utf8output, and this gives the following output.

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE Article SYSTEM 'classes.dtd'>
<!-- Translated from latex by tralics 2.13.0, date: 2008/07/22-->
<Article language='english' part='true'>
<Title><abstract>No abstract given</abstract>
</Title>
<p>Hello, world!
</p>
</Article>


If you say tralics hello3 -oe8a -te1a, this specifies the output encoding as well as the transcript encoding. If the transcript encoding is not specified, the output encoding will be used instead. Here 8 means UTF-8 and 1 means latin1. The letter a means ASCII; thus non-ascii characters will be printed in the form &#256; in the XML file, and ^^ab in the transcript file. The input file could be ill-formed, for instance, the following two lines have the same conversion.

\show ©A\show éabB\show ó©©©C
\show A\show B\show C


The character that follows the first \show cannot be the first byte of a valid UTF-8 sequence, hence is discarded. The character that follows the second \show is the first of a three-byte sequence, so that three bytes are read, and discarded because the second byte cannot be a continuation byte (and error will also be signaled for the third byte). Finally, the character that follows the third command is valid, but it does not fit on 16bits, hence is rejected. You would see the following on the tty.

UTF-8 parsing error (line 55, file txtb.tex, first byte)
UTF-8 parsing error (line 55, file txtb.tex, continuation byte)
UTF-8 parsing error (line 55, file txtb.tex, continuation byte)
UTF-8 parsing overflow (char 957033, line 55, file txtb.tex)
the letter A.
the letter B.
the letter C.
Input conversion errors: 1 line, 4 chars.
Input conversion: 5 lines converted.


Statistics

The Tralics distribution contains a lot of test files, some of them should compile without error, others

 1 This is tralics 2.13.0, a LaTeX to XML translator, running on medee
2 Copyright INRIA/MIAOU/APICS 2002-2008, Jos\'e Grimm

3 Licensed under the CeCILL Free Software Licensing Agreement
4 Starting translation of file torture.tex.
5 Configuration file identification: standard $Revision: 2.24$
7 Configuration file identification: torture.tcf $Revision: 1.5$
8 Read tcf file for type: ../confdir/torture.tcf
9 Document class: article 2006/08/19 v1.0 article document class for Tralics
10 File taux2.tex' already exists on the system.
11 Not generating it from this source
12 Translating section command div0: W.
13 \show: 0
14 Translating section command div0: A.
15 Translating section command div0: B.
15 Translating section command div0: C.
17 Translating section command div0: A.
18 Translating section command div0: B.
19 Translating section command div0: C.
20 Warning: junk in table
21 detected on line 3613 of file torture.tex.
22 END OF FILE
23 Bib stats: seen 5(2) entries
24 Seen 3 bibliographic entries
25 Math stats: formulas 751, kernels 167, trivial 5, \mbox 19, large 1, small 59.
?? List stats: short 0, inc 33, alloc 90238.
27 Buffer realloc 28, string 12981, size 247367, merge 232
28 Macros created 1986, deleted 1463; hash size 3062; foonotes 5.
29 Save stack +5189 -5189.
30 Attribute list search 6965(1484) found 2667 in 6419 elements (1117 at boot).
Attribute list search 5400(1336) found 2077 in 6077 elements (1070 at boot).
31 Number of ref 31, of used labels 23, of defined labels 31, of ext. ref. 25.
?? Modules with 0, without 0, sections with 2, without 29
33 Input conversion: 49 lines converted.
34 There were 9 images.
35 Following images not defined: x, y, Logo-INRIA-couleur, ../../a_b:c, x_, figure1a, figure1b, figure1c.
36 Output written on torture.xml (164989 bytes).
37 No error found.


Line 4, 5 and 6 show that the standard configuration file has been read, lines 7 and 8 that torture.tcf has been included because of the aliasing mechanism of the standard configuration file.

Lines 10 and 11 show the output of a filecontents environment.

Line 12, and 14 to 29 show progress: each time a toplevel section is translated, its name is printed.

Line 13 is a test of the \show command.

Line 20-21 is printed when a non-tabular appears in a table.

Line 23 is a result of \typeout command, placed at the end of the document: It tests that every line until the last one have been read.

The different statistics have to be interpreted as follows

• Line 23 says that 5 citations in the document, two of them defined with the \bibitem command, thus three unsolved that are looked for in the bibliography data files. Line 24 says that 3 bibliographic entries have been found in the data base.
• Line 25 indicates some math statistics. In fact, 751 math formulas are constructed. They were 167 cases in which a non-trivial kernel was used (case where an index or an exponent has to be constructed). There were 5 trivial formulas, such as $^{i\grave {e}me}$, $23$, $x$ that were translate in a special manner (Say tralics -notrivialmath if you do not like this behavior.) There were ten cases where something like \mbox appeared in a math formula. In fact, Tralics has created three <mtext> elements. There was 1 large and 59 small objects: these are explained elsewhere. It has to do with how the command \big, and similar ones are translated.
• Line ?? says that you used a total of 90238 list cells and the global free list was incremented 33 times, decremented 0 times. The figures for the testfp file are 953556 cells, 755 increments, and 33 decrements. Whenever a cell is needed, one is taken from the free list; if the free list is empty, 100 cells are allocated; when a cell becomes useless, it is added to the free list; when the size of the free list exceeds 1000, the whole free list is recycled. In version 2.13, memory allocation changed, list cells are no more counted.
• Line 27 gives buffer statistics. Initially buffers can contain 512 characters. Only once has the size to be increased. All these buffers used to create 12981 strings, with a total of 247367 bytes.
If you say something like {foo}{bar}, when Tralics sees the first closing brace, it creates an XML element (generally a <p> element) with the string foo, and when it sees the second closing brace it creates a second element with the string bar (this is because strange things can happen when a closing brace is seen). More than often, these two elements can be merged. This happened 232 times.
• Line 28 says that Tralics has created 1986 macros and deleted 1463 ones. For the fptest, these numbers are much larger, namely 7703 creations and 5401 deletions; this is a big number, this is because lots of tests are done, and each test uses a local macro. Moreover 3062 slots are used in the hash table by multiletter control sequences (it is 2225 at start of document). This number has to be compared with the 925 commands of PlainTeX and the 3216 of standard LaTeX. Finally, five footnotes and no index are defined in the document.
• Line 29 says that Tralics has seen 5189 open braces (or \begingroup command, or environment) that made it increase the save stack pointer. Good news is that the same number of closing braces has been seen.
• Line 30 says that Tralics has created 6419 elements. 1117 elements were created by the bootstrap phase (essentially for the math formulas). There is a special hash table for elements and attributes. 1484 strings were added to this table by the bootstrap phase, and a total of 2667 at the end of the run. There were however 6965 access to this table.
• Line 31 says that Tralics has defined 31 labels (a label is associated to each section, footnote, item in a list, table and figure environments, and some math formulas). Of these 31 labels, 23 were used via a \ref command (some labels were used more than once, since the number of \ref commands was 31). Moreover, 25 references to an external document, via \href.
• Line ?? explains that there are some modules and some sections with (and without) some information useful only for the RAweb. This information is not printed anymore.
• Line 33 says that 49 lines containing non-ASCII characters were converted to UTF-8.
• Lines 34 says that 9 images were included via the \includegraphics commands, only one of them exists. In fact, the file torture.img contains the following
# images info, 1=ps, 2=eps, 4=epsi, 8=epsf, 16=pdf, 32=png, 64=gif
see_image("x",0,33);
see_image("y",0,2);
see_image("Logo-INRIA-couleur",0,8);
see_image("../../tralics/Test/a_b",1,1);
see_image("../../a_b:c",0,1);
see_image("x_",0,1);
see_image("figure1a",0,2);
see_image("figure1b",0,2);
see_image("figure1c",0,1);

The second number (e.g., 33) is the nummber of times the file is included, and the first is the sume of the types, for instance 17=1+16 means postscript and pdf. Here only a_b is found in PostScript format.
• Line 36 says that Tralics has printed a given number of bytes in the resulting XML file. This good news.
• Line 37 says that no error was detected. This is even better news.

A complete example

We give here an example of a full document. This uses the RR package, that defines the commands RRetitle, RRauthor RRabstract (this make a non-trivial title page), the fancyvrb package (there is a pre=pre' somewhere in the text, the associated action is in the package) and natbib, for the citations.

This document does not compile with version 2.10.7 and larger, because of the line marked COMPATIBILITY PROBLEM. In the current version this line terminates the verbatim environment, so that the line that follows raise errors concerning unclosed environments.

The document was translated using the following invocation: tralics SRC -noentnames -nostraightquotes -nozerowidthspace -trivialmath. We then converted the XML in HTML, it is given here.

\documentclass{article}  % iso-8859-1
\usepackage{fancyvrb}
\usepackage{natbib}
\usepackage{RR}

\RRetitle{A sample file for Tralics}
\RRauthor{José Grimm}
\RRabstract{This document shows some commands of \textit{Tralics}.
We use it also to show that characters are converted into the right encoding
in a lot of situations, including commands, titles, indices etc.
The \textit{XML} result is translated via \textsl{XSLT} into \textbf{HTML}
and available  on the web
\url{http://www-sop.inria.fr/apics/tralics/txtc.html}.
Source document can be found at
\url{http://www-sop.inria.fr/apics/tralics/doc-step.html}.}

\keyword{Latex, XML, HTML, UTF8, Hàn}
\begin{document}

% This is à còmment

\tableofcontents
\section{Who is Hàn}
If you call tralics with options -te1a ou -te8a, the terminal should show
\verb=^^e0=\index{verb}\footnote{Index here} for the section title;
if you say -te1, there is a single byte, if you say
-te8, there are two bytes. If you say -e1a or -e8a, the XML file should
contain \verb=&#E0;=, in the case -oe1 ou -oe8, the XML file contains the
characters shown on the terminal.

\def\gobble#1{} %% Used later

The following lines try to demonstrate that Tralics handles 16bit characters.
An error will be signaled because the argument is out of range; but the
character with hex value 312 should be valid; the command defined here
by csname has two characters in its name, it must be followed by an
exclamation point (a space is allowed between the command an the exclamation
point).
\expandafter\def\csname féé\endcsname!{123}
\expandafter\def\csname f^^^^0123\endcsname!{312}
\catcode\é 11 \catcode"123=11 \catcode65536=11
\féé !! \f^^e9^^^^00e9 !! \f^^^^0123 !!

This is standard verbatim: \verb+a _bç+, \verb*=a _bç=, \verbèa _bçè,
\verb-\verb+ { } -, \verb +x+ . Think about this last example.
We index here a word\index{vérb}. Location is just before period.
This is a verbatim environment
\begin{verbatim}
{\let\rm\bf \bf totoé}
<!--this is a comment -->
&Dollar; not &Equals; &Euro;
\end{verbatim}
% See comment below
\begin{rawxml}
{\let\rm\bf \bf totoé}
<!--this is a comment -->
&Dollar; not &Equals; &Euro;
\end{rawxml}

Note. A verbatim environment neutralises meaning of some commands.
The last line of the verbatim environment should start with an ampersand
character; since this is a special character in XML, it is represented as
\verb=&amp;= or \verb=&#x26;=. Lines can be numbered; spaces can be replaced
by non-breaking ones; lines can use special fonts; paragraphs can be
no-indented, etc.\index{verb@verb}% same as \index{verb}

On the other hand, a rawxml environment is left unchanged. Remember however
that end-of-line characters and spaces are removed from the end of the line; a
new line character is added at the end of the line. If you remove the comment
between the two environments, replacing it by an empty line, then the second
environment will be in vertical mode. Otherwise, the end of the verbatim
environment inserts a \verb=\noindent=, and the environment that follows is in
horizontal mode. As a consequence, there will be a P element on the first
line of the raw xml; moreover, since the final space in a paragraph is
removed, you will find the end-P element at the end of the line.%
\index{vérb@verb}% this a new index entry

The translation of the environment contains e-acute (its representation
depends on the output encoding), three ampersand characters, a less than sign,
a greater than sign. The second line is a valid XML comment, the third line is
well-formed XML (it contains three entities, so that the XML is valid only if
the DTD defines these entities); it is very easy to produce invalid and
ill-formed XML.\index{vérb@vérb}% this one alreedy seen

The xmllatex command is to be used with care. It can produce
\xmllatex{Hàn Th&\#x1ebf; Thành}{unused}\footnote{Hàn is the author of
pdftex}. The second argument is meant to be translated by \LaTeX, it is
ignored by Tralics. Instead of \verb=\xmllatex{foo}{bar}=, define a command,
use it in the text, and overwrite it in a ult file (user configuratin file).%
\index{vérb@vérb|bf}% Note that encap is ignored

In the current version, you can say \'{\^e} because the double-accent
mechanism is implemented, or ^^^^1ebf, this is a character, as valid as the
other ones. This is possible and dangerous too \xmllatex{<TeX/>}{tex}.%
\index{vérb!vèrb} %subitem in index

A verbatim test. We put some stuff in English and French before, in order
to show how it is translated differently. The end of the environment can
contain spaces (see example above), but nothing else.
\language=0
test ligatures: <<>>''-- et --- !
\language=1
test ligatures: <<>>''-- et --- !
\numberedverbatim
\begin{verbatim}
test : !@#$%^&*()_$
test : {\foo\} et zxcvbnm,./
\end{verbatim}
\begin{verbatim}
test ZXVBNM<>? ~
test \verb+\verb-xx-+
test ligatures: <<>>''-- et --- !
\end{verbatim} Not this one COMPATIBILITY PROBLEM
\end{notverbatim}
\end{verbatim}

\gobble{
\end{verbatim}
}

\unnumberedverbatim
Verbatim without line numbers.
\begin{verbatim}
test : !@#$%^&*()_$
test : {\foo\} et zxcvbnm,./
test ZXVBNM<>? ~
test \verb+\verb-xx-+
test ligatures: <<>>''-- et --- !
test BL : \\738! et \\838!.
\end{verbatim}
The BL test is funny; why should it fail? a long long time ago, before
it was called Tralics, our translator was written in Perl, and such a line
was illegal; the math was converted by Omega, see \cite{place99}. We cite also
\citeyear{mKay}, and \citefullauthor{mathml2}.

and demonstrate the counting possibilities
\begin{Verbatim}                   [numbers=true]
test line 1a
test line 1b
\end{Verbatim}
and without numbers
\begin{Verbatim}
[numbers=true]test line 2a
[numbers=true]test line 2b
\end{Verbatim}
\begin{Verbatim} %
[numbers=true] this text is ignored
The environment has an optional argument; spaces but no newlines are allowed
between brace and bracket; what follows the argument on the line is ignored
\end{Verbatim}

We put here the first character of the line in italics
\def\verbatimfont#1{{\it #1}}
\def\verbatimnumberfont{\large}
\count3=4
\begin{Verbatim}[counter=3]
5 we use here counter number 3
6 for counting lines
\end{Verbatim}
Define our Verbatim hook now.
\expandafter\def\csname Verbatim@hook\endcsname{pre=pre,style=latex}
\begin{Verbatim}[counter=03]
7 we use here counter number 03 (the same)
8 but the HTML output differs a lot.
\end{Verbatim}
\newcounter{vbcounter}
\setcounter{vbcounter}{\count3}
\begin{Verbatim}[counter=vbcounter]
9 we use here counter named vbcounter
10 initialised to the value of the previous counter
\end{Verbatim}
\begin{Verbatim}[counter=vbcounter]
11 yet another verbatim line (ok with é^^e9?)
\end{Verbatim}

\let\verbatimfont\tt
\def\verbatimnumberfont#1{\xbox{vbnumber}{#1}}

\DefineVerbatimEnvironment{xverbatim}{Verbatim}{pre=pre,style=latex}
\begin{xverbatim}[numbers=left]
note that, if no counter is specified, it is FancyVerbLine
\end{xverbatim}
\begin{xverbatim}[numbers=left,firstnumber=last,style=log]
and that the first line is numbered one by default.
Of course, options given on the line have precedence over options
inherited from the definition.
\end{xverbatim}

\newenvironment{centré}{\centering}{}
\begin{centré}
In French, centré means centered.
\end{centré}

\DefineShortVerb{\|}
\SaveVerb{DU}|$_|\def\DU{\UseVerb{DU}} %$
\section{Short  Verb, as in \DU}
\let\verbatimfont\sffamily
Test of |\DefineShortVerb| and |\UndefineShortVerb|. Normally
the bar is used, but 16bit characters are possible. Example, with itrema:
\DefineShortVerb{\ï}
|toto| ïxï |+x-| ï|t|ï,
\UndefineShortVerb{\ï}
and without:
|toto| ïxï |+x-| ï|t|ï
Spaces: like this |+ +| or that \fvset{showspaces=true}|+ +|
Verbatimfoo: \verb|+ foo +*foo*foo*|.

\def\verbatimfont#1{{#1}}
\def\verbprefix#1{A#1A}
\def\verbatimprefix#1{B#1B}

\SaveVerb{Ç}|}|\def\FE{\UseVerb{Ç}}
\DefineShortVerb{\+}
\SaveVerb{VE}+|+\def\VE{\UseVerb{VE}}
\UndefineShortVerb{\+}
\UndefineShortVerb{\|}

Test of useverb \UseVerb{Ç}, \FE,\VE, \DU.
\begin{verbatim}
We have changed the font, and added a prefix
Spaces are special
\end{verbatim}

Switch to English, for colons in URLs \language=0

\bibliography{tralics}

\end{document}
`

back to home page © INRIA 2005, 2006 Last modified $Date: 2008/08/08 16:20:53$