12	Serrano, M. The HOP Development Kithttp://www.inria.fr/mimosa/Manuel.Serrano/publi/sfp06/article.htmlproceedings of the Seventh ACM sigplan Workshop on Scheme and Functional ProgrammingPortland, Oregon, USASep 2006.

13	Serrano, M. HSS: a Compiler for Cascading Style Sheets10th ACM Sigplan Int'l Conference on Principles and Practice of Declarative Programming (PPDP)Hagenberg, AustriaJul 2010.

16	World Wide Web Consortium, Cascading Style Sheets level 2 Revision 1 CSS2.1 Specificationhttp://www.w3.org/TR/2009/CR-CSS2-20090423/CR-CSS2-20090423W3C RecommendationApr 2009.

10	Loitsch, F. and Serrano, M. Trends in Functional ProgrammingHop Client-Side CompilationSeton Hall University, Intellect Bristol (ed. Morazán, M. T.)UK/Chicago, USA 2008141--158.

1	Bobrow, D. et al.Common lisp object system specificationhttp://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/cltl2.htmlspecial issueSigplan Notices23Sep 1988.

12	Serrano, M. The HOP Development Kithttp://www.inria.fr/mimosa/Manuel.Serrano/publi/sfp06/article.htmlproceedings of the Seventh ACM sigplan Workshop on Scheme and Functional ProgrammingPortland, Oregon, USASep 2006.

15	World Wide Web Consortium, XQuery 1.0: An XML Query Languagehttp://www.w3.org/TR/xquery/REC-xquery-20070123/W3C RecommendationJan 2007.

5	Hosoya, H. and Pierce, B. XDuce: a Typed XML Processing LanguageIn Proc. of Workshop on the Web and Data Bases (WebDB 2000226--244.

7	Kelsey, R. and Clinger, W. and Rees, J. The Revised(5) Report on the Algorithmic Language Schemehttp://www.inria.fr/mimosa/fp/Bigloo/doc/r5rs.htmlHigher-Order and Symbolic Computation111Sep 1998.

6	Iso/Iec, Information technology, Processing Languages, Document Style Semantics and Specification Languages (DSSSL)http://www.jclark.com/dsssl/10179:1996(E)ISO 1996.

14	Walsh, N. and Muellner, L. DocBook: The Definitive GuideO'ReillyOct 1999.

2	Flatt, M. and Barzilay, E. and Findler, R. B. Scribble: closing the book on ad hoc documentation toolshttp://www.cs.utah.edu/plt/publications/icfp09-fbf.pdfICFP '09: Proceedings of the 14th ACM SIGPLAN International Conference on Functional ProgrammingEdinburgh, Scotland 2009109--120.

3	Gallesio, E. and Serrano, M. Skribe: a Functional Authoring Languagehttp://www.inria.fr/mimosa/Manuel.Serrano/publi/jfp05/article.htmlJournal of Functional Programming 2005.

11	Maranget, L. Hevea, un traducteur de LaTeX vers HTML en CamlActes des 10e Journfrancophones des langages applicatifs 1999.

4	Greene, A. BASIX -- An Interpreter Written in http://www.tug.org/TUGboat/Articles/tb11-3/tb29greene.pdfTUGBoat113 1990381--392.

9	Lamport, L. LaTeX - a Document Preparation SystemAddison-Wesley, ReadingsMassachusetts, USA 1986.

8	Knuth, D. The TEXbookAddison-Wesley, ReadingsMassachusetts, USA 1986.

HopTeX - Compiling HTML to LaTeX with CSS

Manuel Serrano Inria Sophia Antipolis Manuel.Serrano@inria.fr http://www.inria.fr/indes/Manuel.Serrano

@Misc{ serrano:hoptex11,
  author = {Serrano, M.},
  title = {HopTeX - Compiling HTML to LaTeX with CSS},
  category = {web programming},
  year = 2011,
  month = jan,
  url = {http://hop.inria.fr/hop/weblets/homepage?weblet=hoptex&file=hoptex.pdf}
}

This article1 presents HopTeX, a new application for authoring Html and LaTeX documents. The content of the document is either be expressed in Html or in a blending of Html and a dedicated wiki syntax, for the sake of conciseness and readability. The rendering of the document is expressed by a set of Css rules. The main originality of HopTeX is to consider LaTeX as a new media type for Html and to express the compilation from Html to LaTeX by the means of dedicated style sheet rules.

HopTeX can then be used to generate high quality documents for both paper printed version and electronic version.

HopTeX is implemented in Hop, a multi-tier programming language for the Web 2.0. This implementation extensively relies on two facilities generally only available on the client-side that Hop also supports on the server-side of the application: DOM manipulations and Css server-side resolutions.

http://hop.inria.fr/hop/weblets/homepage?weblet=hoptex

GNU General Public License

1Introduction
2Background, the Hop programming language
3HopTeX
1. 31The surface syntax
2. 32The deep syntax
4Generating TeX
5A full-fledged programming language
1. 51Accommodating bibliography
2. 52Placing floats
6Conclusion

1Introduction

Many scientific publications, in particular in academia, are authored with TeX or LaTeX 8, 9. This is a batch system where documents are actually disguised programs that, when executed, produce various output document formats including DVI or PDF.

Although the TeX programming language is Turing-complete, it is mostly exclusively used as a purely authoring declarative language. Being more than forty years old it lacks most modern features of programming languages: its syntax is difficult to parse, it supports no object-oriented features, and it offers a limited set of functions for interacting with the operating system. In consequence, programming in TeX requires a strong expertise that is repellent to many, although a small community of aficionados is able to use it beyond expectations (see for instance 4). On the other hand TeX is still widely used because its rendering engine, coupled with the MetaFont tool, delivers high quality documents that hardly no contemporary typesetting system matches.

The most striking shortcoming of TeX/LaTeX is its inability to produce Html. Since publishing on the web is nowadays mandatory, translators from LaTeX to Html such as Latex2html or Hevea 11 have emerged. These tools have limitations because they offer few facilities for controlling the graphical rendering of the generated documents. This limitation comes from their inability to use Css with the generated Html documents because these lack Html classes or Html identifiers.

Other tools such as Skribe 3 and Scribble 2 follow the symmetrical path which consists in considering LaTeX as a target and no longer as a source. They attempt to improve LaTeX by providing a sane programming language used to generate the texts. They offer an ad hoc syntax that combines algorithmic constructs and text oriented markups. A program can generate LaTeX as well as Html. These two systems are agnostic with respect to the generated format. As a consequence of this design choice, they adopt abstractions reflecting a least-common denominator of their target formats. That design choice also makes them difficult to use when fine grain tuning of the generated document is needed. This characteristic is shared by systems such as Texinfo or DocBook 14 that represent texts using a neutral syntax that can be either compiled to Html, LaTeX, and even other formats.

Accommodating Html as a regular data type in programming language is not new. DSSSL 6, the pioneer and LAML are two examples based on the Scheme programming language 7. Other languages such as XDuce 5 or XQuery 15 extends this to XML. These languages are well suited for manipulating XML documents but they have no particular skill for authoring documents.

HopTeX is a new system for authoring articles, reports, documentation, and books that follows yet another approach. It accepts as input either Html or a compact wiki syntax that can be seconded by the expressions of the Hop programming language. It either produces web pages or LaTeX files. HopTeX aims at combining the best of the two worlds: it generates Html for using the modern interactive features of the web browsers and it generates LaTeX for producing high quality paper output. This approach enables HopTeX to generate online documents that embed arbitrary Html fragments such as videos, canvas, pictures, or interactive Ajax elements. It also enables HopTeX to generate paper documents that rely on pre-existing LaTeX styles. HopTeX generates regular LaTeX files so it is up to the user to include the correct proper statement in his document source. For instance, to accommodate the ACM style required by the conference, the head of the present paper contains the following:

<tex:verbatim>
\documentclass[nocopyrightspace]{sigplanconf}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage{color}

\setlength{\pdfpagewidth}{8.5in}
\setlength{\pdfpageheight}{11in}
...
\maketitle
</tex:verbatim>

HopTeX is implemented in Hop 12, a multi-tier programming language for the web. Hop offers features that dramatically simplify the implementation of HopTeX. In particular, it constructs a server-side DOM for the HTML documents and it supports a server-side CSS resolver. These two features are extensively used to compile to LaTeX.

This presentation of HopTeX is organized as follows. First, to let users unfamiliar with the Hop programming language understand this paper without consulting previous articles, the language is briefly presented in Section 2. Section 3 presents the main functionalities of HopTeX. Section 4 shows how LaTeX is generated out of the initial Html document. Section 5 shows the benefit HopTeX users can expect from resorting to a full-fledged web programming language.

2Background, the Hop programming language

Hop is a multi-tier programming language for the web which shares many characteristics with JavaScript. It belongs to the functional languages family. It relies on a garbage collector for automatically reclaiming unused allocated memory. It supports type annotations that let the compiler partially check types at compile-time. Types that cannot be inferred are check dynamically at runtime. It is fully polymorphic (i.e., the universal identity function can be implemented). Hop has also several differences with JavaScript, the most striking one being its parenthetical syntax closer to Html than to C-like languages. Hop is a full-fledged programming language so it offers an extensive set of libraries. It advocates CLOS-like object oriented programming 1. Its main characteristic is that it fosters a programming model where a web application is conceived as a whole. For that, it relies on a single formalism that embraces simultaneously server-side and client-side of the applications. Both sides communicate by means of function calls and signal notifications. Server-side parts are compiled to a mix of bytecode or native code and client-side parts are compiled to JavaScript 10. In the source code, a syntactic mark instructs the compiler about the location where the expression is to be evaluated.

When an URL is intercepted by a Hop server for the first time, the server automatically loads the associated program and the libraries it depends on. Programs first authenticate the user they are to be executed on behalf of and they check his permissions. In order to load or install the program on the client side, the server elaborates an abstract syntax tree (AST) and compiles it on the fly to generate a Html/JavaScript document that is sent to the client. Here is an example of a simple Hop program that is started by browsing the URL http://localhost/hop/hello.

(define-service (hello) (<HTML> (<DIV> :onclick ~(alert "world!") "Hello")))

Contrary to Html, Hop's markups (i.e., ,<HTML> and <DIV>) are node constructors. That is, the service hello elaborates an AST whose compilation into Html is delayed until the result of the request is transmitted to the client. This two phased evaluation process is strongly different from embedded scripting language such as PHP. The AST representing the GUI exists on the client as well as on the server. This brings flexibility because it gives the server opportunities to deploy optimized strategies for building and manipulating the ASTs as it lets DOM computations take place on the server-side of the application. This characteristic is extensively used for implementing HopTeX.

3HopTeX

This article not being a HopTeX user manual only its prominent features are presented. HopTeX documents are expressed in Html. However, because Html concrete syntax is verbose it is cumbersome to manipulate for the user. HopTeX therefore proposes an alternative wiki syntax that can be used in conjunction of Html. It is expected that this syntax will be preferred by users so it is first presented in this section. Secondly, it is shown how the wiki syntax and the full-fledged Html syntax can be blended inside documents.

31The surface syntax

HopTeX syntax is stratified: the surface syntax is used to typeset input texts, the deep syntax, which coincides with the syntax of the Hop expressions, is used to embed complex Html trees in the document. The surface syntax is inspired by most popular wiki syntaxes and in particular by MediaWiki2 and CreoleWiki3. It allows authors to express a subset of Html in a concise and visual way. For instance, tags for strong and emphasize are ** and // which are considered by some more intuitive and more compact than the corresponding Html tags. For instance, the following HopTeX input text:

HOP wiki supports **strong**, //emphasize//, __underline__, 
and ++mono space++. These can be **__combined__** 
**//anyhow//**.

is rendered as:

HOP wiki supports strong, emphasize, underline, and mono space. These can be combined anyhow.

The surface syntax supports sections (==), paragraphs (~~), verbatim texts (lines beginning with two white spaces), tables (lines beginning by either ^ or |), lists (lines beginning with two whitespaces followed by either a * or - character), or other classical block constructs that are separated one entry from another by two blank lines. For instance, the following table:

| This    |  is  ^    a table ^

produces the following result:

This	is	a table

The delimiter ^ introduces table head while the delimiter | introduces regular table cells. This explains why the words a table is rendered with a bold font in the example above.

HopTeX supports mathematical expressions which are introduced by the $$ delimiter. Inside this delimiter HopTeX borrows the syntax of TeX whose syntax for mathematics is deemed expressive and compact. Mathematical expressions are compiled to MathML on the fly. For instance:

  * $$\prod_n^m \lim_{n \rightarrow \infty} x = 0$$
  * $$\overbrace{\overline{x}^{2} + 1}$$
  * $$(n+1)^2\quad \sqrt{1-x^2}\quad\overline{w+\bar z}
 \quad p^{e_1}_1$$

produces:

$\prod_{n}^{m} lim_{n \to \infty} x = 0$
$\overset{⏞}{{\overline{x}}^{2} + 1}$
$(n + 1)^{2} \sqrt{1 - x^{2}} \overline{w + \overline{z}} p_{1}^{e_{1}}$

Links and anchors are syntactically similar to those of MediaWiki but extended to support citations, references, and footnotes that are introduced by using a dedicated protocol (bib: for citations, section: for sections, ...). For instance:

Links refer to URLs such as ++[[http://www.inria.fr]]++. 
They may also refer to sections or bibliographic entries 
such as: HopTex is described in Section [[section://HopTeX]].

produces:

Links refer to URLs such as http://www.inria.fr. They may also refer to sections or bibliographic entries such as: HopTex is described in Section 3.

32The deep syntax

The surface syntax trades completeness for compactness. That is not all Html trees can be represented using the surface syntax. For such trees, the deep syntax is used. The escaping sequence of the deep syntax is ,(. When the HopTeX parser reads such a prefix, it reads the rest of the expression using the regular Hop parser, evaluates the expression, and inserts the result in the tree. For instance:

The //deep// escape sequence is ,(<TT> ",("). It can 
be used to insert HTML trees such as ,(<KBD> "C-x s"). 
The ++<WIKI>++ markup is used to 
,(<SPAN> :style "color: darkblue" 
   (<WIKI> [enter the //surface// syntax from the 
//deep// syntax])).

produces:

The deep escape sequence is ,(. It can be used to insert HTML trees such as C-x s. The <WIKI> markup is used to enter the surface syntax from the deep syntax.

4Generating TeX

Wiki syntaxes such as the HopTeX surface syntax are designed to express a subset of Html concisely. As such, they are easy to translate into Html. They are far less obviously translated into TeX. This translation is described in this section.

Observation 1: TeX/LaTeX (henceforth LaTeX) and Html are not isomorphic. Html is more flexible and more compositional. For instance a Html TABLE might contain PRE elements while LaTeX refuses verbatim environments inside a tabular. Consequently not all Html documents, and thus HopTeX documents, can be automatically compiled into LaTeX.

Facing this problem, two obvious solutions emerge: either reduce the expressiveness to HopTeX to the least common denominator of Html and LaTeX, or treat Html parts that have no LaTeX equivalent specially. We have considered the intersection of the two languages too small so we have adopted the latter solution. In consequence, from time to time, HopTeX users have to specify explicitly how to compile some part of the text into LaTeX. However, we have worked hard to minimize the number of occurrences of such situations and we have worked even harder to provide convenient means for expressing these ad-hoc compilation schemas.

Observation 2: Cascading Style Sheets (henceforth Css) 16 effectively separate the structure of a document from its rendering. If compiling Html into LaTeX is possible roughly equivalent to rendering Html into LaTeX, then, Css could probably be used for that compilation.

Consider our previous example using bold-face fonts and italic and consider what happens if we ask a web browser to render them using the following Css rules:

strong:before { content: "{\\textbf{"; } em:before { content: "{\\emph{"; } strong:after, em:after { content: "}}"; }

The browser will display the following document

HOP wiki supports {\textbf{bold}}, {\emph{italic}}...

which is almost4 a LaTeX compilation.

The HopTeX compilation relies on Css in a principled manner where the compilation rules are expressed as Css rules. In addition to simplicity, using Css also brings flexibility because it let users provide their own compilation rules in their own Css files that can override the default compilation strategy.

41CSS driven compilation

The browser cannot be used to implement the compilation as a simple Html rendering for two reasons. First, the browser cannot save the rendered text. Second, some compilation rules are more complex than merely adding a prefix and a suffix. For instance, in Html, pre elements are regular blocks that only differ from paragraph by not collapsing white spaces and by breaking lines at newline character positions and by using a dedicated font. LaTeX has nothing similar. The verbatim environment has the same behavior for justification and line breaks but considers markups as plain texts. Extensions such as alltt approach pre but all have incompatibilities. In consequence, Html pre elements have to be treated specially when compiled to LaTeX.

HopTeX relies on server-side Css processing. It resorts to the Hss 13 compiler which is included in the Hop development environment 12. Amongst other features, Hss contains a parser that builds abstract syntax trees and a resolver that matches rules against HTML elements.

When a HopTeX input text is to be compiled into LaTeX, the surface syntax is first parsed to produce a full-fledged server-side DOM representation of the Html document. The elements of this tree are matched against Css rules which govern the compilation into LaTeX. The extra tex keyword can be used in Css @media rules to specify rules that are only applicable to the LaTeX compilation.

The rest of this section presents the details of the compilation. The algorithm is expressed by 4 Hop functions. We deem the Hop language sufficiently high level to be used as an abstract notation for describing these algorithms. Readers unfamiliar with functional programming will probably find some details of the implementation obscure. We hope they will still be able to grasp the general intuition of the algorithms.

The service hoptex/tex implements the entry point of the compiler. It accepts two parameters, the URL of the source file to be compiled and the name of the target file. The service first builds a server side DOM for the document (using the library function wiki-file->dom). Then it loads the Css style sheets imported in the DOM tree and invokes the xml->tex function.

(define-service (hoptex/tex url dest) (let* ((doc (wiki-file->dom url)) (hd (dom-get-elements-by-tag-name doc "head")) (css (map tex-load-hss (links-of-head hd)))) (call-with-output-file dest (lambda (op) (xml->tex doc css op)))))

The function xml->tex is in charge of compiling one node of the DOM tree into one LaTeX element. The parameter node is the node to be compiled, the parameter css is the opaque data structure representing the Css rules, and the last parameter p is the output port where to write the result of the compilation. Numbers are written in the target file without modification; strings are escaped, that is, all special LaTeX characters are protected against interpretation (the function tex-string is in charge of this task); lists are recursively processed; and XML nodes are treated specially by the function xml-elements->tex which is given in Figure 1.

(define (xml->tex node::obj css::obj p::output-port) (cond ((string? node) (display (tex-string node) p)) ((number? node) (display node p)) ((list? node) (for-each (lambda (o) (xml->tex o css p)) node)) ((xml-element? node) (xml-element->tex node css p))))

Compiling a XML element is decomposed in 7 steps.

Compute node style. It is computed by the library function css-get-computed-style. If no style is found then the compilation simply compiles recursively the children of the node.
Check if the element is visible. The style may make an element invisible if it contains declarations such as display: none. Invisible elements are ignored by the compiler.
Compile the prelude. The prelude is computed using the tag of the node and the elements of the style.
Compile the before attribute. The before attribute is string of characters that has to be inserted before the current element. It is handled by the function xml-style->tex. For instance, the default before attribute of the Html em nodes is the string {\em{. The before attribute can be customized by users while the prelude is hardwired in HopTeX.
Compile the body of the node. This involves two cases. If the Css style contains a dedicated compiler for the node, use that compiler. Otherwise, recursively compiles the children nodes.
Compile the after attribute. The after attribute is symmetrical to the before attribute. It closes the LaTeX environment opened in the before attribute.
Compile the postlude. The postlude is symmetrical to the prelude. It mostly consists in closing the environment opened in the prelude. For instance, if the prelude as emitted {\small{, the postlude emits }}.

(define (xml-element->tex node::xml-element css p) ;; step 1: compute the style (let ((style (css-get-computed-style css node))) (if (css-style? style) ;; step 2: check visibility (when (css-visible? style) (xml-element-visible>tex node css p style)) ;; step 1b: plain recusive compilation (xml->tex (xml-element-body node) css p)))) (define (xml-element-visible->tex n css p style) (with-access::css-style style (after before) (let ((texc (style->tex (xml-element-tag n) style)) (css-proc (css-style-get-attribute style 'proc))) ;; step 3: tex prelude (for-each (lambda (t) (display (car t) p)) texc) ;; step 4: style :before (when (css-style? before) (xml-style->tex before css p)) ;; step 5: body compilation (if (procedure? css-proc) ;; step 5b: a dedicated compiler is used (css-proc n css p) ;; step 5c: a simple recursive descent is used (xml->tex (xml-element-body n) css p)) ;; step 6: style :after (when (css-style? after) (xml-style->tex after css p)) ;; step 7: tex postlude (for-each (lambda (t) (display (cdr t) p)) texc)))))

1Compiling XML elements.

The function xml-style->tex, not given here, is a trimmed down version of xml-element->tex that is in charge of processing the content strings of the or attributes.

42Examples

In this section we present a few examples of compilation and we show how users can change the generated LaTeX rendering by providing additional Css rules.

421Example 1, a simple compilation

Assuming the Css rules given in Section 4, let us study the compilation of the following text:

A **strong //and emphasized//** text

First, the server parses the text and translates it into Html. Along this process, it builds a DOM representation of the following tree:

<DIV> A <STRONG>strong <EM>and emphasized</EM></STRONG>
 text</DIV>

The compiler has to compile the DIV elements which has three children: the string A, the DIV containing the STRONG... elements, and the string text. Since the DIV element has no style attached to it then its compilation consists in a simple traversal of the tree. The first string is written as is. Then comes the compilation of the STRONG and EM elements. These ones have styles that specify a before and after strings that are inserted in the generated LaTeX output. The result of the compilation is:

A {\textbf{strong{\emph{and emphasized}}}} text

422Example 2, adding user rules

A user wanting to emphasize even more texts which are under a STRONG and a EM elements could use his own Css rule such as (remember that the > CSS operator filters direct descendant of a node):

strong > emph { text-decoration: underline; }

This changes the compilation of the EM nodes whose parents are STRONG nodes. It adds the rule text-decoration: underline to the style computed by the css-get-computed-style that enriches the default compilation of EM elements. The generated LaTeX code becomes:

A {\textbf{strong{\emph{\underline{and emphasized}}}}} 
text

423Example 3, designating elements

As with Html, Css rules for HopTeX can be used to change the compilation of individual nodes. A simple way to achieve this is to assign identifiers to nodes and use these identifiers in the rules. Wiki tags used by HopTeX accept identifier and class declarations. They are given by suffixing the tag with :id@class. For instance, one may write:

~~:p1@note This is a note.

which defines a paragraph named p1 that belongs to the class note. Identifiers and classes can be used in rules such as:

p.note:before { content: "Note:"; font-style: italic; } @media tex { #p1 { font-size: 70%; } }

The p.note:before rule applies to all rendering engines. So in particular to the LaTeX code generator that adds the italicized version content before the paragraph. The #p1 rule only applies to the LaTeX compilation because protected by a @media tex rule. It instructs the code generator to use tiny font the paragraph #p1 that will be compiled as:

43Three particular cases

As mentioned in Section 4, resorting to before and after attributes of Css style suffice to compile most Html elements. However, for a few of them, inserting a prefix and a suffix is not enough. The current HopTeX version makes a special case for exactly 4 elements, namely IMG, PRE, TABLE, and A. We present the compilation of the first three in this section. The compilation of A is delayed to Section 51.

When Css prefixes are not enough, an ad hoc compilation function can be defined. These functions are declared in the rules as the value of the HopTeX specific proc property. They are Hop functions that HopTeX calls with three parameters: the node to be compiled, the current css rule set, and the output port where the result should be written. Let us illustrate these compilation function on three examples.

431Compiling images

Images are inserted in the text with either the regular IMG markup or with the wiki syntax {{...}} as in:

{{screenshot.png|a screenshot}}

Images are compiled in LaTeX into a includegraphics environment in which image resizing is expressed as a ratio of the line width. The HopTeX function xml->tex-img is in charge of this translation. It computes the LaTeX size of the image. If no width is specified for a image, the generated LaTeX image spans over the whole line. If a width is given, the percentage string is converted into a floating point value in the range $[0. .1]$ , which is concatenated to the string \linewidth.

(define (xml->tex-img node::xml-img css p) (fprintf p "\\includegraphics[width=~a]{~a}" (let ((w (node-computed-style node :width css))) (if (string? w) (let ((m (pregexp-match "^([0-9]+)%$" w))) (if (not m) ;; a string such as "10em" w ;; a percentage (format "0.~a\\linewidth" (cadr m)))) "\\linewidth")) (dom-get-attribute node "src")))

The default HopTeX rule for compiling images is:

img { width: 80%; proc: $xml->tex-img; }

The dollar sign before the xml->tex-img is a syntactic annotation that tells the Css parser that the following expression is not a literal but a value of the Hop language. The compilation of the image given above using the previous Css is:

\includegraphics[width=0.8\linewidth]{screenshot.png}

432Compiling pre-formatted blocks

As noted in Section 4, Html PRE elements have no direct LaTeX counterpart. To compile them, HopTeX generates a full line wide tabular nested in a texttt environment, and it replaces all white spaces with the explicit command \ that forces LaTeX to introduce plain blank characters. The implementation of this function is as follows:

(define (xml->tex-pre node::xml-pre css p) (with-access::xml-pre node (body) (display "\\noindent\\texttt{" p) (display "\\begin{tabular*}{\\linewidth}" p) (display "{l@{\\extracolsep{\\fill}}}\n" p) (let loop ((b body)) (cond ((string? b) (let ((s (tex-string b))) (display (string-substitute s " \n" "\\ " "\\\\\n") p))) ((pair? b) (for-each loop b)) (else (xml->tex b css p)))) (display "\\end{tabular*}}\n" p)))

The Css rule that accommodates this compilation scheme is:

pre { font-size: small; proc: $xml->tex-pre; }

433Compiling tables

Html tables and LaTeX tabulars have nearly orthogonal designs. Html tables tunings are expressed on a per-cell basis while LaTeX tables are configured on a per-column/per-row basis. In consequence, compiling Html tables into LaTeX tabulars is inherently ad hoc. The default HopTeX compilation flushes left cells and includes no rule at all. The function xml->tex-table first counts the number of columns in order to generate the LaTeX columns declaration. Then, each row of the table is compiled with the function xml->tex-tr that separates each element with the & sign and that inserts an end of line delimiter after each row.

(define (xml->tex-table el::xml-table css p) (define (count-columns obj) (define (tr-count-columns obj) (length (xml-element-body obj))) (apply max (map tr-count-columns (xml-element-body obj)))) (fprintf p "\\begin{tabular}{~a}\n" (make-string (count-columns el) #\l)) (xml->tex (xml-element-body el) css p) (display "\\end{tabular}\n" p)) (define (xml->tex-tr el::xml-tr css p) (with-access::xml-element el (body) (if (null? body) (display "\\\\\n" p) (let loop ((body body)) (xml->tex (car body) css p) (if (null? (cdr body)) (display " \\\\\n" p) (begin (display " & " p) (loop (cdr body))))))))

The default Css rules for table are as follows:

table { proc: $xml->tex-table; } tr { proc: $xml->tex-tr; } th:before { content: "{\\textbf{\\textsf{"; } th:after { content: "}}}"; }

In addition to connecting the two functions above to the TABLE and TR elements, it also configures TH elements to mimic their Html default appearance. Provided with these declarations, the table example given Section 31 is compiled as:

\begin{tabular}{lll}
This & is & {\textbf{\textsf{a table}}} \\
\end{tabular}

5A full-fledged programming language

In this section we illustrate the benefits of using a full-fledged programming language in HopTeX by presenting two extensions. We show how to manage bibliographic references and how to delegate the placement of floating elements to Css rules.

51Accommodating bibliography

Bibliography citations are treated by HopTeX as a special kind of external hyperlinks. Consistently, the wiki syntax is augmented with the new bib:// protocol that accommodates citations which then look like:

[[bib://knuth:tex86 lamport:latex86]]

Because the BibTex format is widely used, it has been found appropriate to make it directly usable in HopTeX. For that, a full BibTex parser has been implemented in HopTeX. When a document is to be processed, the BibTex bibliography database is then parsed and stored in a hash table. Then the DOM is traversed and all citations are adjusted. For the sake of the example, here is the code in charge of this traversal:

(define (citation? e) (when (xml-element? e) (with-access::xml-element e (attributes) (let ((href (xml-get-attribute :href attributes))) (and href (string? (xml-attribute-value href)) (string-prefix? "bib://" (xml-attribute-value href))))))) (define (dom-get-citations expr) (filter citation? (dom-get-elements-by-tag-name expr "a")))

It uses regular DOM functions, that in Hop are also available on the server-side of the applications, to retrieve all the link elements (A Html elements) whose links are prefixed by the bib:// string.

52Placing floats

Placing floating elements with LaTeX, is a nightmare that we have all lived once. Directives such as htbn are supposed to instruct the layout algorithm but they constantly fail. More strict directives have been added such as !H but in practice they show similar results. The only effective solution to trick the internal TeX algorithms consists in moving the floating elements in the source text back and forth. In addition to be painful and error prone this idiosyncratic behavior has an important drawback when a single source is used to generate LaTeX and Html document. Since the web browser does not move float elements, the figures moved for LaTeX appear as randomly placed in the Html version.

Because HopTeX generates LaTeX documents from Html specifications, we have an opportunity to improve over the previously described solution. Instead of moving the floating elements in the source text, HopTeX moves them only in the generated LaTeX target accordingly to configurations expressed in Css rules. For instance, one may write:

@media tex { #float1 { with: 100%; column-count: 2; float: -350; } }

which means that the float element named float1 has to be moved 350 elements upward in the DOM tree.

Prior to generating LaTeX code the DOM tree is thus traversed to inspect all floating elements that have a float style attribute attached. Such elements are moved backward when the value is negative and forward when positive. The source code for moving a node in the tree is traditional DOM programming. It is given in Figure 2.

(define (move-float-backward! node offset) (let loop ((o offset) (prev node)) (if (= o 0) (dom-insert-before! (dom-parent-node prev) node prev) (loop (- o 1) (dom-previous-node prev doc))))) (define (dom-previous-node node doc) (let ((sibling (dom-previous-sibling node))) (if (not sibling) (dom-parent-node node) (dom-last-node sibling)))) (define (dom-last-node node) (let ((l (dom-child-nodes node))) (if (pair? l) (let ((n (car (last-pair l)))) (if (xml-text-element? n) n (dom-last-node n))) node)))

2Moving elements backward in a DOM tree.

6Conclusion

HopTeX is an operational system. It has already been used to write a couple of articles in addition to the present one. The whole implementation counts less than 4KLOC lines of Hop code and 1KLOC of Css rules. Such a compactness is possible only because it extensively uses the features offered by the Hop programming language: high level of abstractions supported by functional values, object-oriented support, full polymorphism, DOM server-side manipulation, Css server-side resolution, and builtin parsing facilities. HopTeX is free software released under the GPL license. It is available from the Hop web page.

7References

1	Bobrow, D. et al.Common lisp object system specificationhttp://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/cltl2.htmlspecial issueSigplan Notices23Sep 1988.

2	Flatt, M. and Barzilay, E. and Findler, R. B. Scribble: closing the book on ad hoc documentation toolshttp://www.cs.utah.edu/plt/publications/icfp09-fbf.pdfICFP '09: Proceedings of the 14th ACM SIGPLAN International Conference on Functional ProgrammingEdinburgh, Scotland 2009109--120.

3	Gallesio, E. and Serrano, M. Skribe: a Functional Authoring Languagehttp://www.inria.fr/mimosa/Manuel.Serrano/publi/jfp05/article.htmlJournal of Functional Programming 2005.

4	Greene, A. BASIX -- An Interpreter Written in http://www.tug.org/TUGboat/Articles/tb11-3/tb29greene.pdfTUGBoat113 1990381--392.

5	Hosoya, H. and Pierce, B. XDuce: a Typed XML Processing LanguageIn Proc. of Workshop on the Web and Data Bases (WebDB 2000226--244.

6	Iso/Iec, Information technology, Processing Languages, Document Style Semantics and Specification Languages (DSSSL)http://www.jclark.com/dsssl/10179:1996(E)ISO 1996.

7	Kelsey, R. and Clinger, W. and Rees, J. The Revised(5) Report on the Algorithmic Language Schemehttp://www.inria.fr/mimosa/fp/Bigloo/doc/r5rs.htmlHigher-Order and Symbolic Computation111Sep 1998.

8	Knuth, D. The TEXbookAddison-Wesley, ReadingsMassachusetts, USA 1986.

9	Lamport, L. LaTeX - a Document Preparation SystemAddison-Wesley, ReadingsMassachusetts, USA 1986.

10	Loitsch, F. and Serrano, M. Trends in Functional ProgrammingHop Client-Side CompilationSeton Hall University, Intellect Bristol (ed. Morazán, M. T.)UK/Chicago, USA 2008141--158.

11	Maranget, L. Hevea, un traducteur de LaTeX vers HTML en CamlActes des 10e Journfrancophones des langages applicatifs 1999.

12	Serrano, M. The HOP Development Kithttp://www.inria.fr/mimosa/Manuel.Serrano/publi/sfp06/article.htmlproceedings of the Seventh ACM sigplan Workshop on Scheme and Functional ProgrammingPortland, Oregon, USASep 2006.

13	Serrano, M. HSS: a Compiler for Cascading Style Sheets10th ACM Sigplan Int'l Conference on Principles and Practice of Declarative Programming (PPDP)Hagenberg, AustriaJul 2010.

14	Walsh, N. and Muellner, L. DocBook: The Definitive GuideO'ReillyOct 1999.

15	World Wide Web Consortium, XQuery 1.0: An XML Query Languagehttp://www.w3.org/TR/xquery/REC-xquery-20070123/W3C RecommendationJan 2007.

16	World Wide Web Consortium, Cascading Style Sheets level 2 Revision 1 CSS2.1 Specificationhttp://www.w3.org/TR/2009/CR-CSS2-20090423/CR-CSS2-20090423W3C RecommendationApr 2009.

Almost compilation only because apart from using cut-and-paste there is no means to save the result of this compilation.

http://www.wikicreole.org

http://www.mediawiki.org

Work partially supported by the French ANR agency, grant ANR-09-EMER-009-01.