DDG : A C++ High Level Data Dependence Graph Library

2.0

Author:: : Sid Touati

with contribution from Frederic Brault (INRIA) and Sebastien Briais (UVSQ).

Introduction

This is a generic C++ library that handles data dependence graphs (DDG) for optimising compilation. It is built on top of the LEDA graph library (see http://www.algorithmic-solutions.com/enleda.htm). Our graph library is specially though for research purposes where people are willing to make quick, robust and modular implementations of code optimisation techniques for basic blocks and simple innermost loops (modeled by regular mono-dimensional data dependences). We manage directed acyclic graphs (DAGs) for basic blocks and cyclic graphs for innermost loops. The user is able to take advantage of many standard algorithms for graphs. Also, numerous algorithms on data dependence graphs are implemented. It is also possible to configure the library for different instruction set architectures, and multiple register types.

LEDA is a famous C++ graphs and general data structures library. We have used it since many years, and we can safely say that it is better than other existing C++ graph and data structures libraries that we experimented (BOOST, STL, etc.). LEDA is a high level library greatly helping to implement complex algorithms in a quick, robust and modular way. According to our deep experience, a C++ code using LEDA looks like a high level algorithm, allowing to easily debug it without suffering from programming details. Furthermore, LEDA offers the largest set of implementation of well known algorithms in graph theory and data structures: these are very helpful in optimising compilation, especially for instruction level parallelism problems such as instruction scheduling and register allocation. LEDA is well suited for academic research.

Remarks:

Our C++ library does not perform or implement data dependence analysis. Currently, it's up to the user to make such analysis. Building a precise data dependence analysis module is another on progress software of us (see http://www.prism.univ-cotedazur.fr/~bem/fadalib/home.html).
This version of DDG has been tested with g++ version 4.2. Since g++ implementations continuously change the accepted C++ codes, we do not know of further version of g++ would compile LEDA and DDG. Maybe a minor porting would be required.

Requested Libraries for Using DDG

Getting a LEDA licence is required in order to be able to build new binaries using our DDG library. For our colleagues in universities, algorithmic-solutions offers free academic binary LEDA licences allowing you to use our DDG library. It is also recommended to learn the basic ways to use LEDA.
For our XML parsing, we use libxerces-c. It is provided in almost all linux distribution.

What is new in this version 2.0

Now, DDG may handle complex ISA, with multiple register types. Each instruction may write multiple results of distinct types.
Add new classes : ARCHITECTURE, REGISTER_TYPES
Enhancements of precedent classes : adding check methods, adding multiple register types
Register saturation computation can be done of any register type.
Bug fix in Register Saturation Computation
New enhanced XML formats for input/output DDG and architecture description.
Deprecated DDG file formats: gl.
Now, DDG is distributed under the LGPL software licence instead of GPL.

What was new in version 1.5

It compiles with g++ 4.2
Added : minimal chain decomposition (Dilworth decomposition)
Added : loop unroll example
Added : delta_r and delta_w functions with id as parameter
Added functions in DAG : functions lt ge parralel comparable
Improvement of function and member declarations : we put a "const" C++ qualifier to each method that does not modify the C++ object.

What was new in version 1.1

Enhanced ISA architectural description : for instance, reading and writing delay from/to registers are now implemented inside DDG ( $\delta_r$ and $\delta_w$ latencies). Also, the hardware latencies of operations are implemented too, which may distinct from the latencies of the edges.
The DDGs now may have edges with nonpositive integer latencies and distances ( $\delta$ and $\lambda$ functions become general integers). This is an important aspect if we would like to model optimal scheduling problems for VLIW/EPIC architectures. Consequently, some DDG algorithms have been slightly modified to include this aspect (such as critical cycle computation, longest and shortest paths, etc.)
The DDG library allows now to manage multiple types of edges : flow, antidep, output dep, input dep, serial
Implementing loop retiming/shifting methods
Enhanced gl DDG format
Consistent DDG copying : copy methods preserve the internal consistency of DDGs objects.
Consistent hidding nodes and edges
Import and export methods from gml and leda graph formats
Enhanced export method to vcg DDG format for interactive DDG visualization : colored edges classes, etc.
Associate a generic textual attribute to each node. This would allow to associate, for instance, the textual code representation of each instruction.
Several other minor functions
Bug fixes

DAG Model

A DAG $G=(V,E, \delta)$ in our library is a directed acyclic graph (from top to bottom) that represents the data dependences between a set of statements and any other serial constraints. The DAG is defined by its set of nodes (statements) $ V $

, its set of edges (data dependences and serial constraints) $E=\{(u,v)|\ u,v \in V\}$ , and $\delta$ such that $\delta(e)$ is the latency of the edge $ e $

(in terms of processor clock cycles for instance). While the latencies are non-negative in general (this assumption is important for many standard graph algorithms), we allow in the DDG library to have non-positive latencies in order to model some VLIW and EPIC characteristics. We consider a target RISC-style architecture. We differentiate between statements and precedence constraints, based on whether they refer to values to be stored in registers or not.

$V_{R,t} \subseteq V$ is the set of statements (operations) which define values of type to be stored in registers of type . We simply call them values.
$E_{R,t}\subseteq E$ is the set of data flow dependence edges through registers of type .

Notation and Definitions on DAGs

In this manual, we use the following notations for a given DAG $G=(V,E,\delta)$ (as those usually used in lattices and orders algebra):

$\Gamma^+_G(u)=\{v \in V| (u,v) \in E\}$ successors of in the graph . They are also called the children of .
$\Gamma^-_G(u)=\{v \in V| (v,u) \in E\}$ predecessors of in the graph ;.They are also called the parents of .
$\forall e=(u,v) \in E \quad source(e)=u \wedge target(e)=v$ . and are called endpoints;
$\forall u,v \in V, u \le v \Longleftrightarrow \exists \textrm{ a path } (u,\ldots,v) \textrm{ in }G$ . This relationship is reflexive: $\forall u, u \le u$ .
$\forall u,v \in V, u < v \Longleftrightarrow u \neq v wedge \exists \textrm{ a path } (u,\ldots,v) \textrm{ in }G$ ;
$\forall u,v \in V:\ u || v \Longleftrightarrow \neg(u\le v) \wedge \neg (v\le u)$ . and are said to be parallel ;

$\forall u,v \in V:\ u \sim v \Longleftrightarrow u\le v \vee v\le u$ . and are said to be comparable ;

$\forall u \in V\ \uparrow u=\{v \in V| v=u \vee v \le u \}$ 's ascendants including . In other terms, a node is an ascendant of a node iff there exists a path from to .
$\forall u \in V\ \downarrow u=\{v \in V| v=u \vee u \le v \}$ 's descendants including . In other terms, a node is a descendant of a node iff there exists a path from to .
$A\subseteq V$ is an antichain iff all nodes belonging to are parallel. Formally, $A\subseteq V$ is an antichain in iff $\forall u,v\in A,\ u||v$ ;

is a maximal antichain iff its size in terms of number of nodes is maximal. Formally, is a maximal antichain $iff \forall A \textrm{ antichain in } G,\quad |A|\le |AM|$ ;

The size of a maximal antichain is called the width of the DAG and is noted .

$C\subseteq V$ is a chain iff all nodes belonging to are compatable. Simply, all nodes of a chain belongs to the same path in the DAG. Formally, $C\subseteq V$ is a chain in iff $\forall u,v\in C,\ u \sim v$ ;

$CD= \{C_1, \cdots, C_p\}$ is a chain partition of if any $C_i \in CD$ is a chain and: $\forall u \in V, \exists !i \in [0,p-1]: u \in C_i$ .

A chain decomposition is minimal if its indice is minimal. Such minimal indice is noted .

In 1950, Dilworth proved that , and each maximal antichain is equivalent to a minimal chain decomposition (and vice-versa).

Loop Model

We consider a simple innermost loop (without branches, with possible recurrences). It is represented by a data dependence graph (DDG) $G=(V,E,\delta,\lambda)$ , such that:

is the set of the statements in the loop.
is the set of precedence constraints (flow dependences, or other serial constraints). Any edge has the form , where $\delta(e)$ is the latency of the edge (in terms of processor clock cycles for instance) and $\lambda(e)$ is the distance of the edge in terms of number of iterations. Both $\delta$ and $\lambda$ may be non-positive. However, valid loops DDGs have always cycles with summed non-negative distances, that is: $\forall C \textrm{ a cycle in } G, \sum_{e \in C} \lambda(e)\ge 0$ .

We consider a target RISC-style architecture and we distinguish between statements and precedence constraints, depending upon whether they refer to values to be stored in registers or not.

$V_{R,t} \subseteq V$ is the set of statements that produce values to be stored in registers of type . We simply call them values.
$E_{R,t}\subseteq E$ is the set of flow dependence edges through registers of type . The set of consumers (readers through registers) of a value $u^t \in V_{R,t}$ is therefore the set $Cons(u^t)=\{v\in V\ |\ (u,v)\in E_{R,t}\}$

Processor Model

In order to consider static issue VLIW processors in which the hardware pipeline steps are visible to compilers (we consider dynamically scheduled superscalar processors too), we assume that reading from and writing into a register may be delayed from the beginning of the schedule/issue time, and these delays are visible to the compiler (architecturally visible). We define two delay (offset) functions $\delta_r$ and $\delta_w$ in which

$\delta_{w,t}: V_{R,t} \to \mathbb{N}$

$u \mapsto \delta_{w,t}(u)|\ 0\le \delta_{w,t}(u)<lat(u)$

The writing cycle of $ u^t $ into a register is delayed by $\delta_{w,t}(u)$ clock cycles after the issue (schedule) date of $ u $ . $lat(u)$ is the hardware latency of the instruction $ u $ (it may be distinct from the latencies of the data dependence edges). Also, we define:

$\delta_r: V \to \mathbb{N}$

$u \mapsto \delta_r(u)|\ 0\le \delta_r(u)\le<lat(u)$

The reading cycle of $ u $ from a register is delayed by $\delta_r(u)$ clock cycles after the issue (schedule) date of $ u $ .

According to the semantics of superscalar processors (sequential semantics) and EPIC/IA64, $\delta_r$ and $\delta_{w,t}$ are equal to zero. Also, there are many VLIW processors with zero reading/writing delays. But some VLIW processors such that Trimedia have non-zero reading/writing delays.

Some File Formats Defined and Used by the DDG Library

User Defined Instruction Set Architectures (XML format)

ISA objects can be constructed via C++ methods or via import from an XML file. Here is an example of such an XML file:


<arch version='1'>
  <regtype type='BR' number='2'>
    <register name='br0'>
    <register name='br1'>
  </regtype>
  <regtype type='GR' number='61'>
  </regtype>

  <instruction opcode='nop' latency='0' delta_r='0'>
  </instruction>
  <instruction opcode='inst_2' latency='1' delta_r='2'>
     <write regtype='GR' delta_w='1'/>
     <write regtype='GR' delta_w='0'/>
  </instruction>
  <semantic type='UAL'/>
</arch>

For now, only version 1 is supported, so the version attribute must be set to 1. For each register type, the type name and the number of registers must be provided. The type name has to be unique. Optionally, register names can be specified. For each instruction, a unique opcode must be provided, as well as the latency and read delay of the instruction. If the instruction writes into one or several registers, the register types and write delay must be provided for each one. The semantic can also be specified (once in the file). If the type attribute is 'UAL', then UAL semantic is assumed. Else, or if the semantic is not specified, NUAL is assumed. The current DDG release contains examples of XML files describing some architectures.

Our enhanced XML format for Importing and Exporting Data Dependence Graphs

DDG objects can be constructed via C++ methods or imported/exported via XML files. Here is a simple example of such a file :

<ddg version='1'>
  <operation type='inst_1' id='1'/>
  <operation type='inst_2' id='2'/>
  <operation type='nop' id='3' description='Idle'/>
  <edge src='1' dst='2' latency='3' distance='0' typedep='flowdep_reg' regtype='GR'/>
  <edge src='1' dst='3' latency='0' distance='1' typedep='serial'/>
  <edge src='2' dst='3' latency='0' distance='1' typedep='antidep_reg' regtype='BR'/>
</ddg>

Again, version must be set to 1, since only version 1 is supported at the moment. For each operation, a unique integer id must be supplied. The instruction type must match the architecture definition. An optional description can also be provided. For each edge, the id of the source and of the destination operations must be provided, as well as the latency and the distance. The type of the dependence can be any of : flowdep_reg serial antidep_reg outputdep_reg inputdep_reg flowdep_mem antidep_mem outputdep_mem; inputdep_mem spilldep_mem other_mem killerdep reusedep For some of those dependencies, the register type has to be specified. Again, it must match one of the types defined in the architecture file.

The current DDG release contains examples of XML files describing DDG provided from STmicroelectronics.

Our gl Format for Importing and Exporting Data Dependence Graphs (Deprecated Format)

The gl format is our own simple design for DDGs. This format is used in the case of simple ISA, with a unique register type. For ISA with multiple register types, the enhanced XML format should be used (see Our enhanced XML format for Importing and Exporting Data Dependence Graphs). Other DDGs format are also supported (see LEDA format, gml format). Let assume the name "loop" for the DDG, Hence the file loop.gl contains nodes, edges with their latencies and distances (in terms of number of iterations) in the form

 DDG.GRAPH 1.1
 #number_of_nodes
 opcode_id_node_1
 opcode_id_node_2
 ...      .
 opcode_id_node_n
 #number_of_edges
 source_id dest_id latency distance edge_type
 ...   .

Note that the latency of an edge may be distinct from the latency of the source instruction. edge_type is a string describing the type of the data dependence ("flowdep_reg", "serial", "antidep_reg", "outputdep_reg", etc.). "" do not belong to the strings.

Default Opcodes (Deprecated)

The default opcodes are as follows. 0 add_op 1 sub_op 2 not_op 3 mul_op 4 div_op 5 ld_op 6 st_op 7 copy_op 8 nop_op Be aware that the current gl format does not contain any information on the ISA (except the opcode ids of the nodes). To completely define an ISA for a DDG, see User Defined Instruction Set Architectures (XML format).

Nodes identifiers (Deprecated)

Each node in a DDG is uniquely defined by an integer identifier (id). For instance, the id of each node can be simply equal to the position of the node in the loop.gl file (starting from 1). Suppose that the edge section of the loop.gl file contains a line describing an edge in the form :

 2  4  1  0

This means that the edge is from node number 2 to node number 4 with a latency equal to 1 and a distance equal to zero. Do not confuse between the node id and its opcode.

Some Algorithms Implemented by LEDA

Here is a simple list of algorithms implemented inside LEDA that we can use inside our DDG library

Basic Graph Algorithms
1. Sorts algorithms (DFS, BFS, etc.)
2. Computing strongly connected components, connected components, bipartite components, transitive closure, transitive reduction
Shortest Path Algorithms
Maximum Flow
Min Cost Flow Algorithms
Minimum Cut
Maximum Cardinality Matchings in Bipartite Graphs
Bipartite Weighted Matchings and Assignments
Maximum Cardinality Matchings in General Graphs
General Weighted Matchings
Stable Matching
Minimum Spanning Trees
Euler Tours
Algorithms for Planar Graphs
Graph Drawing Algorithms
Graph Morphism Algorithms
Graph Morphism Algorithm Functionality

Some Algorithms Implemented by the DDG Library

Our DDG library inherits from all the LEDA powerfull algorithmic implementations, such as high level graph operators, access and update DDG structures and attributes, etc. It also includes some specific algorithms used in optimising compilation.

DAG algorithms
1. Shortest and longest paths between any pair of nodes
2. Dilworth decomposition, maximal antichain, minimal chain decomposition.
3. Register Saturation of a DAG (maximal register pressure independently of instruction schedules). The current implementation considers multiple register types.
4. Import and export methods from/to XML formats, see Some File Formats Defined and Used by the DDG Library
5. Exporting to graph visualization tool (xvcg).
Loop algorithms
1. Critical cycle : the critical cycle in a loop DDG is defined as the one producting the maximal ratio $\max_{C\textrm{ a circuit}} \frac{\sum_{e \in C} \delta(e)} {\sum_{e \in C} \lambda(e)}$ . It is an old problem in graph theory studied initially by Dantzing in 1966. Such circuit is assumed in general to have $\sum_{e \in C} \delta(e)0>$ and $\sum_{e \in C} \lambda(e)>0$ . However, our DDG library adds the special case of null circuits, i.e.,, those defined with $\sum_{e \in C} \delta(e)=0$ and $\sum_{e \in C} \lambda(e)\ge 0$ .
2. DDG unrolling (loop unrolling) with recurrences
3. DDG merging (loop merging)
4. Loop retiming/shifting
5. Import and export methods from/to XML formats, see Some File Formats Defined and Used by the DDG Library
6. Exporting to graph visualization tool (xvcg).

Interactive Data Dependence Graphs Visualization using XVCG tool

The DDG library implements enhanced export methods to xvcg format that allow helpful and interactive DDG visualization. The xvcg tool is a free software that can be downloaded from http://rw4.cs.uni-sb.de/users/sander/html/gsvcg1.html The DDG library has export methods that produce vcg graphs with the following visual properties.

The nodes writing into registers (values) are in green, while the other nodes are in grey. See screenshot 1 .
The edges are valued by couples representing the edge latency and the iteration distance. That is, each edge is valued by the couple $(\delta(e), \lambda(e))$ .
There are ma,y types of dependence edges : flow edges throught registers, flow edges throught memory, serial edges, antidependance edges, outputdep edges, etc. See screenshot 2 .
The user may visualize the nodes attributes by clicking on Node Information :
1. ISA opcodes : see screenshot 3 and screenshot 4 .
2. different latencies : hardware operation latency, reading and writing delays from/into registers, see screenshot 5 and screenshot 6
3. The distinct register types produced by the instruction.
4. textual attributes attached to ndes (codes, etc.)
The user may visualize andd/or hide any class of these edges using the xvcg tool (click on Expose/Hide Edges) : see screenshot 7.

Some other screenshots can be found here .

Getting Started

Download Sources

The source files of our DDG library (with the included reference manual) can be downloaded from http://www-sop.inria.fr/members/Sid.Touati/sw/DDG/DDG-2.0.tgz Our codes are under LGPL licence. If you are from academia, you can get a free academic licence for LEDA. Otherwise, you should buy a licence to be able to use DDG. In this Release you find:

INSTALL instructions
Full user guide documentation
DDG and ISA examples (in XML format)
C++ Code Examples
All C++ sources (except LEDA, that should be instaleld separately by the user)

Download some Binaries Examples (linux X64)

Some binaries under linux/x86 have been built using g++ 4.2. Our binaries do not require LEDA, but they require other dynamic open source libraries such as libstd++ and libxerces-c. Here is the list of binaries we distribute as examples:

Library : binary archive ar file.
Examples (all sources included in the distribution) :
Command line DDG file format filter: xml2vcg.

you can execute these programs on the DDG and ISA samples we provide inside the DDG release.

Building the DDG Library

See installation instructions in the source file distribution. The current version of DDG has been implemented under linux/x86 with g++ version 4.2.2 and LEDA version 5.

Some Coding Instructions for Users

The user should include "DDG.h" in his C++ program. This header file defines the API of the DDG library and includes some LEDA header files. Thus, the user should set the LEDAROOT environment variable to the path of LEDA installation directory and use the include directory -I$LEDAROOT/incl in the command line argument of g++.

Furthermore, we highly recommend to use the -fno-inline option of your C++ compiler.

Linking Instructions

After successfull compilation (installation) of the DDG library, a file library called libDDG.a is created. This library has to be linked with the user applications using the DDG features. Furthermore, the LEDA library should be present (make sure that the LEDAROOT environment variable is correctly set), especially libG.a and libL.a. The user should then link its application with all these libraries by using the following link option for g++ line command libDDG.a -L$LEDAROOT -lG -lL. Note that the order of these linker options is important. Starting from LEDA version 6.0, the linking options -lG -lL become obsolete, they have been replaced by -lleda.

A Hello World Program

There are two main DDG C++ classes, DAG and LOOP, corresponding to the DAG and loop models defined in DAG Model and Loop Model respectively. So, each DDG is an object of these two C++ classes. Data dependence graphs can be filled either by reading a DDG XML file (see Some File Formats Defined and Used by the DDG Library), or buy using C++ methods (new_node, new_edges, etc.). If the user wishes, he can implement his own import method from other graph formats. Here is a simple hello world program to show that it is very simple to handle data dependence graphs using our library. Detailed information could be found in the definitions of DAG and LOOP classes, and in the other code examples.

#include "DDG.h" // the header file of the DDG library. To be included by the user.

#include <iostream>
#include <cstdlib>


using namespace std;
using namespace DDG; // the namespace of the DDG library.



int main(int argc, char *argv[])
{
    int         i;
    int c; // for getopt
    // some strings for file names
    ARCHITECTURE isa_arch;
    
    char        *ddg_filename=NULL, *arch_filename=NULL;
    //declare a DDG of a loop (cyclic data dependence graph)
    DDG::LOOP loop;

    LEDA::node n,u;
    LEDA::edge e;
    
    
    while ((c = getopt (argc, argv, "i:a:")) != -1){
        switch(c){
            case 'a': arch_filename=optarg;
                break;
            case 'i': ddg_filename =optarg;
                break;
    case '?': 
                cout<<"usage:"<< argv[0] <<" -i ddg_filename.xml [-a isa_desc.xml] "<<endl;
                return EXIT_FAILURE;
        }
        
    }
    
    if (arch_filename!=NULL)  { 
        isa_arch.read_from_xml(arch_filename);
        if(isa_arch.check()){
            loop=LOOP(isa_example);  //build an empty DDG loop with a user defined architcural (ISA) description
        } else return EXIT_FAILURE;
        
    } 
     //---- The user can read a DDG from an xml format defined by the DDG library
    if (ddg_filename != NULL){
        i=loop.read_from_xml(ddg_filename); // reads the DDG loop from an xml file
        if(i==-1) return EXIT_FAILURE;
        }
    else{
        cout<<"usage:"<< argv[0] <<" -i ddg_filename.xml [-a isa_desc.xml] "<<endl;
        return EXIT_FAILURE;
    }

    

     
    
    cout<< "DDG example : " <<ddg_filename<<endl;
    // Null circuits are accepted inside the DDG and detected
    cout<<"Critical Cycle  of "<< ddg_filename<<" = "<<loop.CRITICAL_CYCLE(el)<<endl;
    cout<<"Edges belonging the the critical cycle"<<endl;
    forall(e, el){
        ie=loop[e];
        cout<< loop.source_id(e)<< " -> "<< loop.target_id(e);
        cout<<" : "<< ie<<endl;
    }

    
    return EXIT_SUCCESS;
}