DDG : A C++ High Level Data Dependence Graph Library
2.0
- Author:
- : Sid Touati
with contribution from Frederic Brault (INRIA) and Sebastien Briais (UVSQ).
This is a generic C++ library that handles data dependence graphs (DDG) for optimising compilation. It is built on top of the LEDA graph library (see http://www.algorithmic-solutions.com/enleda.htm). Our graph library is specially though for research purposes where people are willing to make quick, robust and modular implementations of code optimisation techniques for basic blocks and simple innermost loops (modeled by regular mono-dimensional data dependences). We manage directed acyclic graphs (DAGs) for basic blocks and cyclic graphs for innermost loops. The user is able to take advantage of many standard algorithms for graphs. Also, numerous algorithms on data dependence graphs are implemented. It is also possible to configure the library for different instruction set architectures, and multiple register types.
LEDA is a famous C++ graphs and general data structures library. We have used it since many years, and we can safely say that it is better than other existing C++ graph and data structures libraries that we experimented (BOOST, STL, etc.). LEDA is a high level library greatly helping to implement complex algorithms in a quick, robust and modular way. According to our deep experience, a C++ code using LEDA looks like a high level algorithm, allowing to easily debug it without suffering from programming details. Furthermore, LEDA offers the largest set of implementation of well known algorithms in graph theory and data structures: these are very helpful in optimising compilation, especially for instruction level parallelism problems such as instruction scheduling and register allocation. LEDA is well suited for academic research.
- Getting a LEDA licence is required in order to be able to build new binaries using our DDG library. For our colleagues in universities, algorithmic-solutions offers free academic binary LEDA licences allowing you to use our DDG library. It is also recommended to learn the basic ways to use LEDA.
- For our XML parsing, we use libxerces-c. It is provided in almost all linux distribution.
- Now, DDG may handle complex ISA, with multiple register types. Each instruction may write multiple results of distinct types.
- Add new classes : ARCHITECTURE, REGISTER_TYPES
- Enhancements of precedent classes : adding check methods, adding multiple register types
- Register saturation computation can be done of any register type.
- Bug fix in Register Saturation Computation
- New enhanced XML formats for input/output DDG and architecture description.
- Deprecated DDG file formats: gl.
- Now, DDG is distributed under the LGPL software licence instead of GPL.
- It compiles with g++ 4.2
- Added : minimal chain decomposition (Dilworth decomposition)
- Added : loop unroll example
- Added : delta_r and delta_w functions with id as parameter
- Added functions in DAG : functions lt ge parralel comparable
- Improvement of function and member declarations : we put a "const" C++ qualifier to each method that does not modify the C++ object.
- Enhanced ISA architectural description : for instance, reading and writing delay from/to registers are now implemented inside DDG (
and
latencies). Also, the hardware latencies of operations are implemented too, which may distinct from the latencies of the edges. - The DDGs now may have edges with nonpositive integer latencies and distances (
and
functions become general integers). This is an important aspect if we would like to model optimal scheduling problems for VLIW/EPIC architectures. Consequently, some DDG algorithms have been slightly modified to include this aspect (such as critical cycle computation, longest and shortest paths, etc.) - The DDG library allows now to manage multiple types of edges : flow, antidep, output dep, input dep, serial
- Implementing loop retiming/shifting methods
- Enhanced gl DDG format
- Consistent DDG copying : copy methods preserve the internal consistency of DDGs objects.
- Consistent hidding nodes and edges
- Import and export methods from gml and leda graph formats
- Enhanced export method to vcg DDG format for interactive DDG visualization : colored edges classes, etc.
- Associate a generic textual attribute to each node. This would allow to associate, for instance, the textual code representation of each instruction.
- Several other minor functions
- Bug fixes
A DAG
in our library is a directed acyclic graph (from top to bottom) that represents the data dependences between a set of statements and any other serial constraints. The DAG is defined by its set of nodes (statements)
, its set of edges (data dependences and serial constraints)
, and
such that
is the latency of the edge
(in terms of processor clock cycles for instance). While the latencies are non-negative in general (this assumption is important for many standard graph algorithms), we allow in the DDG library to have non-positive latencies in order to model some VLIW and EPIC characteristics. We consider a target RISC-style architecture. We differentiate between statements and precedence constraints, based on whether they refer to values to be stored in registers or not.
is the set of statements (operations) which define values of type
to be stored in registers of type
. We simply call them values.
is the set of data flow dependence edges through registers of type
.
In this manual, we use the following notations for a given DAG
(as those usually used in lattices and orders algebra):
successors of
in the graph
. They are also called the children
of
.
predecessors of
in the graph
;.They are also called the parents
of
.
.
and
are called endpoints;
. This relationship is reflexive:
.
;
.
and
are said to be parallel ;
.
and
are said to be comparable ;
's ascendants including
. In other terms, a node
is an ascendant of a node
iff there exists a path from
to
.
's descendants including
. In other terms, a node
is a descendant of a node
iff there exists a path from
to
.
is an antichain iff all nodes belonging to
are parallel. Formally,
is an antichain in
iff
;
is a maximal antichain iff its size in terms of number of nodes is maximal. Formally,
is a maximal antichain
;
- The size of a maximal antichain is called the width of the DAG and is noted
.
is a chain iff all nodes belonging to
are compatable. Simply, all nodes of a chain belongs to the same path in the DAG. Formally,
is a chain in
iff
;
is a chain partition of
if any
is a chain and:
.
- A chain decomposition
is minimal if its indice
is minimal. Such minimal indice is noted
.
- In 1950, Dilworth proved that
, and each maximal antichain is equivalent to a minimal chain decomposition (and vice-versa).
We consider a simple innermost loop (without branches, with possible recurrences). It is represented by a data dependence graph (DDG)
, such that:
is the set of the statements in the loop.
is the set of precedence constraints (flow dependences, or other serial constraints). Any edge
has the form
, where
is the latency of the edge
(in terms of processor clock cycles for instance) and
is the distance of the edge
in terms of number of iterations. Both
and
may be non-positive. However, valid loops DDGs have always cycles with summed non-negative distances, that is:
.
We consider a target RISC-style architecture and we distinguish between statements and precedence constraints, depending upon whether they refer to values to be stored in registers or not.
is the set of statements that produce values to be stored in registers of type
. We simply call them values.
is the set of flow dependence edges through registers of type
. The set of consumers (readers through registers) of a value
is therefore the set
In order to consider static issue VLIW processors in which the hardware pipeline steps are visible to compilers (we consider dynamically scheduled superscalar processors too), we assume that reading from and writing into a register may be delayed from the beginning of the schedule/issue time, and these delays are visible to the compiler (architecturally visible). We define two delay (offset) functions
and
in which


The writing cycle of
into a register is delayed by
clock cycles after the issue (schedule) date of
.
is the hardware latency of the instruction
(it may be distinct from the latencies of the data dependence edges). Also, we define:


The reading cycle of
from a register is delayed by
clock cycles after the issue (schedule) date of
.
According to the semantics of superscalar processors (sequential semantics) and EPIC/IA64,
and
are equal to zero. Also, there are many VLIW processors with zero reading/writing delays. But some VLIW processors such that Trimedia have non-zero reading/writing delays.
ISA objects can be constructed via C++ methods or via import from an XML file. Here is an example of such an XML file:
<arch version='1'>
<regtype type='BR' number='2'>
<register name='br0'>
<register name='br1'>
</regtype>
<regtype type='GR' number='61'>
</regtype>
<instruction opcode='nop' latency='0' delta_r='0'>
</instruction>
<instruction opcode='inst_2' latency='1' delta_r='2'>
<write regtype='GR' delta_w='1'/>
<write regtype='GR' delta_w='0'/>
</instruction>
<semantic type='UAL'/>
</arch>
For now, only version 1 is supported, so the version attribute must be set to 1. For each register type, the type name and the number of registers must be provided. The type name has to be unique. Optionally, register names can be specified. For each instruction, a unique opcode must be provided, as well as the latency and read delay of the instruction. If the instruction writes into one or several registers, the register types and write delay must be provided for each one. The semantic can also be specified (once in the file). If the type attribute is 'UAL', then UAL semantic is assumed. Else, or if the semantic is not specified, NUAL is assumed. The current DDG release contains examples of XML files describing some architectures.
DDG objects can be constructed via C++ methods or imported/exported via XML files. Here is a simple example of such a file :
<ddg version='1'>
<operation type='inst_1' id='1'/>
<operation type='inst_2' id='2'/>
<operation type='nop' id='3' description='Idle'/>
<edge src='1' dst='2' latency='3' distance='0' typedep='flowdep_reg' regtype='GR'/>
<edge src='1' dst='3' latency='0' distance='1' typedep='serial'/>
<edge src='2' dst='3' latency='0' distance='1' typedep='antidep_reg' regtype='BR'/>
</ddg>
Again, version must be set to 1, since only version 1 is supported at the moment. For each operation, a unique integer id must be supplied. The instruction type must match the architecture definition. An optional description can also be provided. For each edge, the id of the source and of the destination operations must be provided, as well as the latency and the distance. The type of the dependence can be any of : flowdep_reg serial antidep_reg outputdep_reg inputdep_reg flowdep_mem antidep_mem outputdep_mem; inputdep_mem spilldep_mem other_mem killerdep reusedep For some of those dependencies, the register type has to be specified. Again, it must match one of the types defined in the architecture file.
The current DDG release contains examples of XML files describing DDG provided from STmicroelectronics.
The gl format is our own simple design for DDGs. This format is used in the case of simple ISA, with a unique register type. For ISA with multiple register types, the enhanced XML format should be used (see Our enhanced XML format for Importing and Exporting Data Dependence Graphs). Other DDGs format are also supported (see LEDA format, gml format). Let assume the name "loop" for the DDG, Hence the file loop.gl contains nodes, edges with their latencies and distances (in terms of number of iterations) in the form
DDG.GRAPH 1.1
#number_of_nodes
opcode_id_node_1
opcode_id_node_2
... .
opcode_id_node_n
#number_of_edges
source_id dest_id latency distance edge_type
... .
Note that the latency of an edge may be distinct from the latency of the source instruction. edge_type is a string describing the type of the data dependence ("flowdep_reg", "serial", "antidep_reg", "outputdep_reg", etc.). "" do not belong to the strings.
The default opcodes are as follows. 0 add_op 1 sub_op 2 not_op 3 mul_op 4 div_op 5 ld_op 6 st_op 7 copy_op 8 nop_op Be aware that the current gl format does not contain any information on the ISA (except the opcode ids of the nodes). To completely define an ISA for a DDG, see User Defined Instruction Set Architectures (XML format).
Each node in a DDG is uniquely defined by an integer identifier (id). For instance, the id of each node can be simply equal to the position of the node in the loop.gl file (starting from 1). Suppose that the edge section of the loop.gl file contains a line describing an edge in the form : This means that the edge is from node number 2 to node number 4 with a latency equal to 1 and a distance equal to zero. Do not confuse between the node id and its opcode.
Here is a simple list of algorithms implemented inside LEDA that we can use inside our DDG library
- Basic Graph Algorithms
- Sorts algorithms (DFS, BFS, etc.)
- Computing strongly connected components, connected components, bipartite components, transitive closure, transitive reduction
- Shortest Path Algorithms
- Maximum Flow
- Min Cost Flow Algorithms
- Minimum Cut
- Maximum Cardinality Matchings in Bipartite Graphs
- Bipartite Weighted Matchings and Assignments
- Maximum Cardinality Matchings in General Graphs
- General Weighted Matchings
- Stable Matching
- Minimum Spanning Trees
- Euler Tours
- Algorithms for Planar Graphs
- Graph Drawing Algorithms
- Graph Morphism Algorithms
- Graph Morphism Algorithm Functionality
Our DDG library inherits from all the LEDA powerfull algorithmic implementations, such as high level graph operators, access and update DDG structures and attributes, etc. It also includes some specific algorithms used in optimising compilation.
- DAG algorithms
- Shortest and longest paths between any pair of nodes
- Dilworth decomposition, maximal antichain, minimal chain decomposition.
- Register Saturation of a DAG (maximal register pressure independently of instruction schedules). The current implementation considers multiple register types.
- Import and export methods from/to XML formats, see Some File Formats Defined and Used by the DDG Library
- Exporting to graph visualization tool (xvcg).
- Loop algorithms
- Critical cycle : the critical cycle in a loop DDG is defined as the one producting the maximal ratio
. It is an old problem in graph theory studied initially by Dantzing in 1966. Such circuit is assumed in general to have
and
. However, our DDG library adds the special case of null circuits, i.e.,, those defined with
and
. - DDG unrolling (loop unrolling) with recurrences
- DDG merging (loop merging)
- Loop retiming/shifting
- Import and export methods from/to XML formats, see Some File Formats Defined and Used by the DDG Library
- Exporting to graph visualization tool (xvcg).
The DDG library implements enhanced export methods to xvcg format that allow helpful and interactive DDG visualization. The xvcg tool is a free software that can be downloaded from http://rw4.cs.uni-sb.de/users/sander/html/gsvcg1.html The DDG library has export methods that produce vcg graphs with the following visual properties.
- The nodes writing into registers (values) are in green, while the other nodes are in grey. See screenshot 1 .
- The edges are valued by couples representing the edge latency and the iteration distance. That is, each edge
is valued by the couple
. - There are ma,y types of dependence edges : flow edges throught registers, flow edges throught memory, serial edges, antidependance edges, outputdep edges, etc. See screenshot 2 .
- The user may visualize the nodes attributes by clicking on Node Information :
- ISA opcodes : see screenshot 3 and screenshot 4 .
- different latencies : hardware operation latency, reading and writing delays from/into registers, see screenshot 5 and screenshot 6
- The distinct register types produced by the instruction.
- textual attributes attached to ndes (codes, etc.)
- The user may visualize andd/or hide any class of these edges using the xvcg tool (click on Expose/Hide Edges) : see screenshot 7.
Some other screenshots can be found here .
The source files of our DDG library (with the included reference manual) can be downloaded from http://www-sop.inria.fr/members/Sid.Touati/sw/DDG/DDG-2.0.tgz Our codes are under LGPL licence. If you are from academia, you can get a free academic licence for LEDA. Otherwise, you should buy a licence to be able to use DDG. In this Release you find:
- INSTALL instructions
- Full user guide documentation
- DDG and ISA examples (in XML format)
- C++ Code Examples
- All C++ sources (except LEDA, that should be instaleld separately by the user)
Some binaries under linux/x86 have been built using g++ 4.2. Our binaries do not require LEDA, but they require other dynamic open source libraries such as libstd++ and libxerces-c. Here is the list of binaries we distribute as examples:
you can execute these programs on the DDG and ISA samples we provide inside the DDG release.
See installation instructions in the source file distribution. The current version of DDG has been implemented under linux/x86 with g++ version 4.2.2 and LEDA version 5.
The user should include "DDG.h" in his C++ program. This header file defines the API of the DDG library and includes some LEDA header files. Thus, the user should set the LEDAROOT
environment variable to the path of LEDA installation directory and use the include directory -I$LEDAROOT/incl
in the command line argument of g++
.
Furthermore, we highly recommend to use the -fno-inline
option of your C++ compiler.
After successfull compilation (installation) of the DDG library, a file library called libDDG.a
is created. This library has to be linked with the user applications using the DDG features. Furthermore, the LEDA library should be present (make sure that the LEDAROOT
environment variable is correctly set), especially libG.a
and libL.a
. The user should then link its application with all these libraries by using the following link option for g++
line command libDDG.a
-L$LEDAROOT
-lG
-lL
. Note that the order of these linker options is important. Starting from LEDA version 6.0, the linking options -lG -lL become obsolete, they have been replaced by -lleda.
There are two main DDG C++ classes, DAG and LOOP, corresponding to the DAG and loop models defined in DAG Model and Loop Model respectively. So, each DDG is an object of these two C++ classes. Data dependence graphs can be filled either by reading a DDG XML file (see Some File Formats Defined and Used by the DDG Library), or buy using C++ methods (new_node
, new_edges
, etc.). If the user wishes, he can implement his own import method from other graph formats. Here is a simple hello world program to show that it is very simple to handle data dependence graphs using our library. Detailed information could be found in the definitions of DAG and LOOP classes, and in the other code examples.
#include "DDG.h"
#include <iostream>
#include <cstdlib>
using namespace std;
using namespace DDG;
int main(int argc, char *argv[])
{
int i;
int c;
ARCHITECTURE isa_arch;
char *ddg_filename=NULL, *arch_filename=NULL;
DDG::LOOP loop;
LEDA::node n,u;
LEDA::edge e;
while ((c = getopt (argc, argv, "i:a:")) != -1){
switch(c){
case 'a': arch_filename=optarg;
break;
case 'i': ddg_filename =optarg;
break;
case '?':
cout<<"usage:"<< argv[0] <<" -i ddg_filename.xml [-a isa_desc.xml] "<<endl;
return EXIT_FAILURE;
}
}
if (arch_filename!=NULL) {
isa_arch.read_from_xml(arch_filename);
if(isa_arch.check()){
loop=LOOP(isa_example);
} else return EXIT_FAILURE;
}
if (ddg_filename != NULL){
i=loop.read_from_xml(ddg_filename);
if(i==-1) return EXIT_FAILURE;
}
else{
cout<<"usage:"<< argv[0] <<" -i ddg_filename.xml [-a isa_desc.xml] "<<endl;
return EXIT_FAILURE;
}
cout<< "DDG example : " <<ddg_filename<<endl;
cout<<"Critical Cycle of "<< ddg_filename<<" = "<<loop.CRITICAL_CYCLE(el)<<endl;
cout<<"Edges belonging the the critical cycle"<<endl;
forall(e, el){
ie=loop[e];
cout<< loop.source_id(e)<< " -> "<< loop.target_id(e);
cout<<" : "<< ie<<endl;
}
return EXIT_SUCCESS;
}