Abstract syntax

We define a language's grammar as an abstract syntax specification. A hierarchy of grammar rules delimits a set of legal abstract syntax trees that may be created. An abstract syntax definition consists of nodes, called operators, and node types, called phyla. The operators are the terminals and non-terminals of a language. An operator is defined by the type and number of its descendents. An operator may belong to more than one phylum, meaning that it may appear in different contexts. In our example of a language of mathematical expressions, we may define a phylum EXP which contains all the operators of our language:

EXP ::= plus minus prod uminus assign variable integer ;

We include the assignment operator and integers in our language of expressions. Note that plus is a binary operator whose descendents also belong to the EXP phylum. An operator's signature is defined by the number and phyla of its descendents. An abstract syntax thus recursively defines operators, their descendent phyla, the operators contained by these phyla, and so on.

Operators come in three flavors:

In Metal, phyla names must begin with an upper case letter followed by any number of letters, underscores, or numbers. Operator names must begin with a lower case letter followed by any number of letters, underscores, or numbers.

In an organized abstract syntax specification, each phylum gathers related operators together, forming a kind of language subset. The phylum EXP for example, contains all of the operators necessary to represent a single mathematical expression. Due to the recursive relationship between phyla and operators, even a moderately sized language quickly becomes an intricate hierarchy of language subsets. When attacking a relatively important language, it is indispensable to organize language elements modularly, i.e., through phyla.

Given an abstract syntax definition, the Metal compiler generates a persistent representation called the tables. Whenever we try to construct an abstract syntax tree in a given formalism, either by hand or by parsing, the tables are read into memory (unless already present) and serve as the basis of tree construction.

We implement comments in this version of Exp. Comments pose an interesting problem in a language specification. We would like to insert comments virtually anywhere in a program. In order to preserve this position information, we may store comments in one of two ways. First, each operator signature could contain a comment descendent. But this would be very cumbersome and detract us from the real signature of the operator. Instead, we store comments as annotations on the source tree. A comment is attached to an abstract syntax tree at a node constructed from nearby text. Whereas we specify an operator's signature statically, we may also dynamically extend abstract syntax for requirements such as comments. The Metal compiler automatically includes comment operators and phyla in the formalism tables, so to include them in Exp, we need only specify their concrete syntax (see section .)


                  



Tutorial