The design of the MAVROS compiler

Christian Huitema
Date: June 7, 1990
MAVROS is a tool that generates encoding and decoding routines for ASN-1 defined syntaxes. This document list the various components of the software, and gives information on the various algorithms which have been implemented within these components. It contains information on the compiler itself, on the library routines, on the test procedures and on the porting procedures.

The components of MAVROS

The MAVROS compiler is composed of three components:

The compiler itself, which comprises:
- A main module, responsible of reading the program arguments and the execution flags,
- A module responsible of decoding the input, which interfaces with yacc and lex,
- A module containing the pretty printing routines.
- A module containing ancillary routines.
- One module for each type of generation, i.e. ASN1 basic encoding rules, textual input or output, light weight syntaxes and copying routines,
- A basic module, concentrating all the coding function which are common to several types of generations,
The ASN-1 run time library,
The TEST library.

The porting procedures are documented in a separate section.

This document correspond to:

Release 9.4 of the MAVROS library,
Release 8.3 of the ASN1 library,
Release 1.9 of the TESTASN1 library.

Decoding the MAVROS input

The MAVROS input is parsed by using lex and yacc generated programs, which call routines of the module mavrospro.c in order to build a representation in memory of the input. The input is then checked for consistency, and can be displayed by a pretty printer. It will be used by the code generation routines, which are made easier to write by the availabity of a number of ancillary routines.

Parsing the input

The lex syntax is defined in the file mavros.lex, and the yacc grammar in the file mavros.gr. The routines of mavros.gr call some routines in mavrospro.c in order to build in memory a tree representing the input.

Structure of the memory tree

Each node of the mavros input is represented in memory by a structure of type comp, defined in mavrospro.h:

struct	   comp{
	  char	    * c_id;	   /* The text identifier */
	  int	    c_mode,	   /* The tag MODE */
		    c_number,	   /* The tag NUMBER */
		    c_implicit,	   /* IMPLICIT or not */
		    c_type,	   /* The TYPE code */
		    c_u_tag,	   /* The UNIVERSAL tag	number,	if implicit.. */
		    c_optional,	   /* OPTIONALITY */
		    c_first_son,   /* the first	child */
		    c_brother,	   /* Next brother or "-1" */
		    c_father,	   /* father */
		    c_line_num,	   /* Line number of the declaration */
		    c_call_mode;   /* call mode, for arguments */
	  char	    * c_name,	   /* Name of the type */
		    * c_index;	   /* Name of index or variable	*/
};

These informations are for the most part straightforwardly derived from the ASN.1 declarations, like:

triangle [0] EXPLICIT INTEGER OPTIONAL

They are completed by chaining parameters for the organisation of the tree, and by the line number of the declaration in the input file, which is used for editing error messages. The type of the variable is represented by a type code, whose possible values are defined in mavrospro.h, and is completed by the name of the type. For example, a reference to an imported ASN.1 type will be represented by:

imported_variable ImportedType (some parameters)

		 c_type:    C_EXTERN
		 c_name:    "ImportedType"

The useful types of the Universal Class are redefinition of the OCTET STRING or CHARS types; their tag number in the universal class is stored in the structure, e.g.:

string[0] NumericString (l, v)

would be represented by:

		c_type:	    T_OCTETS
		c_u_tag:    18
		c_name:	    "NumericString"

The children of a given type are held in the same structure. They comprise:

Initialisations, finalisations, memory allocation and default rules,
Calling parameters in the case of basic or referenced components,
Contained entries in the case of structured components (SET, CHOICE, SEQUENCE).

At the upper level of the tree, one finds the declarations of types. The children of these declarations include a list of calling parameters, followed by the declaration of the type itself.

Checking the consistency

After building up the tree in memory, one will perform a set of verifications, by calling the procedure verify_tree in the file mavrospro.c.

First, we walk down each type, in order to make sure that all components are correctly encoded.

Then, for each type, we compute a call mode: it can be either a SET or a SEQUENCE generation, or a normal structured type, or a simple type. Simple types will be computed by means of macros. Those types are simple which have only one component, which is NOT a simple component defined later on, and which don't include allocations.

If the type includes some allocation, the decoding procedure will be generated, but the coding procedures will be done by macros.

Pretty printing routines

An option of MAVROS enables the printing of tree, in a pretty format that can be used as input. The printing routines are concentrated in the file mavrosprt.c.

Principles of pretty printing

The pretty printing is handled by the routine tree_print. It generates an output in the file package.pretty, where package is the package name.

The printing will include a declaration section, with the list of imported and exported procedures, the declaration of the defined procedures, and a list of all defined types.

Printing of type definitions

The printing of the each type is handled by the routine type_print, which will first print type name and its calling arguments, and will then print the definition itself by using the routine printcomp.

Printing of a component

The routine printcomp takes two arguments: i is the index of an element in the memory tree, and j is the depth of that element. The depth indicator is used in order to properly indent the result.

The main part of the routine is a switch based on the component type. The type of the component will be printed first, followed by sonme type specific indications, e.g. for CHOICE or SET OF; if the component is structured, its members will be print by calling recursively printcomp.

Ancillary routines

A number of generic routines, commonly used for generating various types of encodings, are grouped in the module mavrosbase.c. They enable:

Tree exploration
Line numbering
The coding of Arguments to procedures
Generic declarations of routines
Coding the calls to the basic types
Generating the call to externally defined types
Coding of the ANY DEFINED BY procedures
Generation of the Make file

Tree exploration

Before encoding a routine, one must:

list all the indexes of the embedded sequences of or set of components,
compute the maximum depth of the tree, in order to correctly dimension the size of the component stack,
compute the maximum number of embedded set components, in order to dimension the size of the set component masks.

This is performed by a routine called get_depth.

Line numbering

We produce in the generated source a line compatible with the output of the C preprocessor. The result is that error messages generated at routine compile time will be relative to the mavros input.

In debug mode, one will often want to debug the generated code instead. This is achieved by setting the debug level to a value larger than 5.

This is performed by a routine called numberline.

Arguments to procedures

The routine firstarg give the index of the first son of type argument of a given component.

The routine argnumber yeld the number of arguments for a given component.

The routine arglst produces a comma separated list of arguments; the routine arglst_cpy is used for the copy arguments.

The routine argdcl produces a declaration of the arguments; the routine argdcl_cpy is used for the copy arguments.

The routine argprint prints out an argument to a procedure. The m flag indicates whether the argument should be passed by value (m=0) or by address (m=1). The correct printing requires a knowledge of the calling mode, indicated in the global variable current_call_mode, and of the current declarations, which are derived from calling list of the current element, identified by the global variable current_list_head.

The routine cpy_argprint has a similar role for copy arguments.

Generic declarations of routines

The table exec_table holds a set of information which are used to parametrized the calls to different types of generated routines. Each line has the following structure:

	    struct {
			  char	 * suffix;
			  char	 * value;
			  char	 * common_args;
			  int	 arg_mode;
			  int	 return_asn1;
			  char	 * arg_decl;}
	    }exec_flag;

In this table:

suffix is the suffix indentifying a type of coding procedure, for example _free for the memory desallocation routine,
value is the value which is returned by the procedure, for example asn1z= for the coding routine,
common_args is the list of arguments that all routines of that type must have, e.g. asn1z,asn1zm for the decoding routines,
arg_mode indicates whether the arguments shall be passed by address (1) or by value (0), when the choice is left in the procedure declaration.
return_asn1 indicates the type of value returned by the procedure: int(0), asn1(1) or char * (2).
arg_decl contains the declaration of the common arguments, e.g. register asn1 asn1z, asn1zm; for the _dec procedures.

The names of the indexes of the table are defined in mavrospro.h. The table is used by a number of routines, in order to code the call to extern or basic procedures (cd_extern_or_basic_call), and in order to build declarations (decl_proc_head).

Coding the calls to the basic types

The call to each type is very much in line with that of the external codings, e.g.:

value = routine_name (coding arguments,parameters);

The coding routines for ASN-1 include a flag. By convention, this flag is encoded first, by a single instruction, or is similarly decoded -- if it is known, and if the particular type allows for that treatment. The termination for this specific routines (or macros) will be "_vcod" or "_vdec", by opposition to the straight coding and decoding. Coding macros will take then a coding argument (z), and decoding macros two (z, zmax).

The scalar decoding routines are treated by two steps:

value = decoding(area, &nextarea); area = nextarea;

this is embedded in a macro which assume the existence of a variable for "nextarea".

The number of parameters is a parameter of the routine type. In principle, all parameters take the mode defined by the calling type, with the exception of the FLAGS were the second argument is a constant.

The calls are generated by the routine cdbasic, which use the parameters stored in the table basic_table:

      struct   basic_table_entry {
	       char *		     bt_name;
	       int		     bt_vcod;
	       int		     bt_vdec;
	       int		     bt_alloc;
	       int		     bt_nb_parms;
	       int		     * bt_parm_mode;};

These parameters have the following signification:

bt_name: the name of the routines associated to the type, e.g. asn1_boolean_cod is obtained by combined bt_name and the suffix associated to the coding mode in the exec_table.
bt_vcod, bt_vdec indicates whether the coding and the decoding should be performed as for scalar elements,
bt_alloc indicates whether memory allocation may have to be performed when decoding or copying that type,
bt_nb_parms is the number of arguments associated to the type,
bt_parm_mode is an array of dimension bt_nb_parms holding the calling mode of each argument: either undefined(0), by address(1) or by value(2).

The index of the line associated to a basic type is derived from the type code by the routine index_basic_op.

Generating the call to externally defined types

A single procedure, cdextern, with a callmode flag, generates the call to a subroutine that will perform the required function, e.g.:

        DECODE: decode the element,
        ENCODE: code the element,
        FREE:   free the allocated areas,
        COMPCOD: code the components,

etc..(the flags are defined in "mavrospro.h", as external routine generation flags).

The procedure will generate a call to a routine like: asn1z = element_dec(asn1z,asn1zm,element) composed of the following segments:

 (value)(name)(suffix)((common_args),(argument_list))

Where is indeed the name of the element to be encoded or decoded, and where (value), (suffix) and (common_args) depend from the "callmode".

In order to generate properly the (argument_list), one will search the declaration of the element in the "component list", and one will determine for each argument its mode: by address or by value -- the default mode depending of the "callmode".

Note that when generating the argument list we have to take care of those modes which dont require any "common args", so has not to generate any leading comma...

The routines can also be called in a "include" mode, in order to generate the call list of "include" declarations. In that case, neither the (name) or (suffix) components are printed, nor the parenthesis; also, one should not generate any comments, even in debug mode.

The arguments corresponding to the various "callmode" are collected in the "exec_table".

Coding of the ANY DEFINED BY procedures

These are defined by a declaration of the form:

type ::= ANY variable DEFINED BY procedure(arguments)

The routine cd_anydef will generate, for each type of procedure, a call of the form:

asn1z = (*procedure_dec(arguments))(asn1z,asn1zm,variable)

where the said variable is always passed by address.

Generation of the Make file

The Make file is generated by the routine genermake.

General routines

The general routines are not associated to a particular transfer syntax. They include:

Memory liberation routines,
Error handling routines,
Copying routines
Comparison routines
Hash codings

In the current version, the generation of the Memory liberation and Error handling routine is directly associated to that of the ASN.1 basic encoding and decoding routines. The corresponding code is included in the file mavrosgen.c. The copying routine generation code is present in the file mavroscpy.c, the comparison code in the file mavrosxxx.c. The hash coding routines are generated by a rather generic coding process, in mavrosxxx.c.

Memory liberation routines

The philosophy of the generation of the memory liberation routines will be further detailed in the following sections:

General structure
Handling of CHOICEs
Handling of SET and SEQUENCEs
Handling of SET OF, SEQUENCE OF
Handling of basic components
Handling of the ANY DEFINED BY construct

General structure

The decoding routine do allocate memory, in order to copy in this memory the elements that they decode. The memory liberation routine will desallocate all the memory blocs that had prior been allocated.

The procedure frcomp will be called for all components. The test fr_something_to_free tells whether a given element may or may not include memory allocations. One note that for all dubious cases, i.e. elements which are imported and elements of type field, the test returns true.

Handling of CHOICEs

If at least one branch of the choice includes memory allocation, a switch statement will be generated. The branch of the choice will be followed.

Handling of SET and SEQUENCEs

For each component of the construct in which memory may be allocated, the routine frcomp will be called. The generated code will indeed include a test of presence of the optional components.

Handling of SET OF, SEQUENCE OF

If memory is allocated within the loop, a loop will be generated, that frees each member of the array. Then, if necessary, the memry allocated to hold the array will be freed.

Handling of basic components

The components of type field, i.e. octets and bit strings, object identifiers and elements of type any can be either allocated or pointed in the decoding buffer. As the copying routines will always allocate memory, the memory desallocator is always called.

Handling of the ANY DEFINED BY construct

The ANY DEFINED BY construct poses an interesting freeing problem. In order to determine the structure of the variable component, hence the type of memory liberation that shall be applied, one must free these components before any attempt to free the other elements. It is thus necessary to develop a specific routine for all these variables, called free_anydef. This routine will only generate code for those branches of the component which do contain an element of type ANY DEFINED BY, as defined by the test procedure anydef_found.

Error handling routines

The philosophy of the error handling routine generation will be detailed in the following subsections:

General structure of the error handling
Error handling of CHOICEs
Error handling of SET and SEQUENCEs
Error handling of SET OF, SEQUENCE OF
Error handling of basic components

General structure

The error handling routines are complementary to the memory allocation routines. They are called by the decoding routines when an error is detected, in order to replace the missing data by some realistic value; they will in particular garantee that memory liberation routines can execute properly.

The procedure errcomp will be called for each component. If needed, it will perform some memory allocation for each of these components. It will also perform the initialisation of variables requested by the INITIAL close. Then, it will perform some component specific action, and end up by performing the instructions specified by the FINAL close.

Error handling of CHOICEs

If a component is of type CHOICE, the error handling will consist into:

The initialisation of the choice index to the first branch of the CHOICE,
The execution of the error handling for the component specified in that first branch.

Error handling of SET and SEQUENCEs

The error handling for element of type SET or SEQUENCE will consist into:

The error handling associated to all mandatory components,
The assignation of the default values for the defaulted components,
Nothing for the optional components.

Error handling of SET OF, SEQUENCE OF

The error handling for the recurring components will consist into the initialisation to zero of the variable holding number of components.

Error handling of basic components

The error handling for basic components will be the calling of the error handling routine associated to the component.

Copying routines

The philosophy of the copying routine generation will be detailed in the following subsections:

General structure of the copying generation
Copying of CHOICEs
Copying of SET and SEQUENCEs
Copying of SET OF, SEQUENCE OF
Copying of basic components

General structure of the copying generation

For each component of an ASN.1 structure, the routine cpy_comp will be called. This routine will first generate the necessary memory allocations and initialisations of the target component, and will then switch to type specific actions, before generating the final initialisations of the target.

Copying of CHOICEs

A switch is generated, in order to use the copying instructions necessary for the selected branch. The choice index is copied.

Copying of SET and SEQUENCEs

The copying of sequences and sets is equal to that of all the components which are actually present, plus indeed the copying of the selection fields for the missing components.

There is no specific component handling routine.

Copying of SET OF, SEQUENCE OF

A loop is generated, in order to copy each element from the source to the target. The number of elements is also copied.

Copying of basic components

Basic elements copying is generated by calling the generic routine, cdbasic.

References to ASN.1 types defined in this package or imported are solved by a call to the generic routine cdextern.

References to the ANY DEFINED BY construct are solved by a call to the generic routine cd_anydef.

Generation of comparison routines

The generation of comparison routines follows a pattern very similar to that of copying routines, with the exception of memory allocation -- as there is no need to perform any decoding.

Principles of the comparison routines

The comparison routines have a structure of the form:

XXXXXX_cmp (A,B)

and returns the possible values of:

0 if A is equal to B,
+1 if A is larger than B,
-1 if A is lower than B, or if A is different from B in the absence of an ordering relation.

Note that in order to test the relation ``A < B'', one should test for a value of ``+1'' of ``XXXXXX_cmp (B,A)'', as a value of ``-1'' of ``XXXXXX_cmp (A,B)'' would not be significant.

A set of routine in the ``ASN.1'' library provides for the comparisons between the basic data types.

Comparison of sequences

The comparison of sequences is equal to the comparison of all their components. The first components of a sequence are considered ``most significant'' for the ordering relation. When a sequence includes optional components, two values will be considered equal if the optional element is absent from both, or if it is present in both with equal values.

The comparison of sets is equal to that of sequences.

Comparison of arrays

Two arrays are equal if they have the same number of components, and if all those components are equal. This apply to both the SET OF and SEQUENCE OF types.

This is in fact incorrect as long as the SET OF type is concerned, as two such arrays should be equal regardless of the order of their components. A more precise comparison procedure will be implemented in a further release.

Comparison of choices

Two values of a choice type are equal if the same alternative is selected, and if the alternative values are equal.

There is a little difficulty in testing the equality of the alternatives when the choice can be ``defaulted'', as the default branch is taken when the selector has any value except one of the defined branches...

Impact on the generic routines

The basic line for testing a component has the structure:

if ((asn1_cmp = XXXXXX_cmp(A,B))!=0) return(asn1_cmp);

This is different from most coding and decoding routines, which have a structure of the form:

 XXXXXX_xxx(, A);

The standard routines for the ``basic'', ``extern'' and ``any defined'' data types include a special test for the comparison routine, in order to generate correctly the return of the test value.

Generation of hash coding routines

The generation of hash coding routine is performed by calling a generic generator of coding routines with the identification of the hash coding type ``C_HASH''.

Generation of the ASN.1 coding and decoding routines

The routine for handling the ASN.1 basic encoding rules are generated by routines from the file mavrosgen.c.

For each type in the input file, the routien onetype will be called, in order to generate:

Coding routines
Length computation routines
Decoding routines

Each of these will be described in a separate section.

Coding routines

The philosophy of the coding will be further detailed in the following sections:

General structure
Handling of tags and length fields
Handling of CHOICEs
Handling of SET and SEQUENCEs
Handling of SET OF, SEQUENCE OF
Handling of basic components
Handling of the ANY DEFINED BY construct

General structure

The coding of a component is governed by the routine cdcomp in the file mvavrosgen.c. This routine

Handling of tags and length fields

The sequences for coding tags are generated by the routine openasncod(m,n,s) which takes three parameters: the class of the tag, m, the number in the class, n, and the structuration indicator, s. In principle class, number and structure are read by MAVROS from the ASN.1 specification, and the routine will generate a single instruction, of the form:

*asn1z++ = tag_value;

Indeed, if the number is larger than 31, the tag will be coded over several bytes.

However, the outmost tag of a component can be redefined by an implicit clause. If this is the case, one must generate a test for coding either the default tag or the implicitly redefined value.

After coding the tag, one should code the length field. The length is not known at that stage, so we have two alternatives:

For structured components, one will reserve a single byte for the length field, and initialize the beginning of component stack,
For non structured components, one will not touch the length field: it will be filled by the component coding routine.

The beginning of component stack holds the address of the length byte. When the coding of the component is completed, one programs a call to the end of component coding routine, which takes two parameters: the location of the last byte, and the location of the length. This is generated by either closeasncod or closeasnset. Note that the routine generates a call through a pointer, asn1_end or asn1_end_set, so that the coding of the length field can be tailored to the needs of the application, e.g. allowing or not the undefined value for the length field.

Handling of CHOICEs

The coding of the CHOICE constructs consists in the generation of a switch, whith one case for each possible selection within the CHOICE, in the routine choicecod.

Handling of SET and SEQUENCEs

The coding of the external tag is performed by the seqcod routine, which calls the seqbodycod routine for coding the elements of the set or sequence. For named elements, we will generate two procedures: XXX_cod for coding the element itself, and XXX_ccod for coding its components.

The components coding sequence are generated in sequence; if a default clause is present, a test is generated in order to only encode the non default values.

Handling of SET OF, SEQUENCE OF

The coding of this constructs is performed by the routine arraycod. It consists in the coding of the tag, followed by a for construct with the index and limits specified in the annotated ASN.1 file, inside which MAVROS generates the coding of the repeated element.

Handling of basic components

For each basic type, a coding routine generates either on-line instructions (booleancod, nullcod) or the call to the ad-hoc coding routine (realcod, externcod, integercod, anycod, copycod, bitcod, charscod, flagcod, octetscod).

References to ASN.1 types defined in this package or imported are solved by a call to the generic routine cdextern.

Handling of the ANY DEFINED BY construct

References to this constructs are solved by a call to the generic routine cd_anydef.

Length computation routines

The philosophy of the coding will be further detailed in the following sections:

General structure
Handling of tags and length fields
Handling of CHOICEs
Handling of SET and SEQUENCEs
Handling of SET OF, SEQUENCE OF
Handling of basic components
Handling of the ANY DEFINED BY construct

General structure

For each component, the routine lncomp is called, in order to generate the instruction that will compute the length. The length will be computed in the variable asn1l.

If the component is structured, the routine lncomp will generate a line to compute the length of the tag. Then, it will call the routine lncompbis, which is a switch based on the component type.

Handling of tags and length fields

The length of the tag field depend of the tag number, and is computed by the routine asn1codlen. The length of the length field is set by default to three bytes, e.g. a one byte undefined length field plus a two bytes end of component mark.

Handling of CHOICEs

The routine choicelen generates a switch for all branches of the CHOICE statement, so that the length is computed as that of the selected alternative.

Handling of SET and SEQUENCEs

The routine seqlen generates instructions in order to compute the length of a set or sequence as the sum of the length of the external tag plus that of each component. If a component is optional, the same test is generated as in the coding routines.

Handling of SET OF, SEQUENCE OF

The routine arraylen generates a for construct, so that the length of the set or sequence of component is generated as the sum of the length of the external tag plus the length of each occurence of the component.

Handling of basic components

The computation of the length of the basic components is computed by ad-hoc routines: the length of strings varies with the actual length of this string, whilst the length reserved for scalar types is a majorant of the coding length.

References to ASN.1 types defined in this package or imported are solved by a call to the generic routine cdextern.

Handling of the ANY DEFINED BY construct

References to this constructs are solved by a call to the generic routine cd_anydef.

Decoding routines

The philosophy of the coding will be further detailed in the following sections:

General structure
Handling of tags and length fields
Handling of CHOICEs
Handling of SETs
Handling of SEQUENCEs
Handling of SET OF, SEQUENCE OF
Handling of basic components
Handling of the ANY DEFINED BY construct

General structure

The generation of the decoding instructions for an ASN.1 component is performed by calling for each component the routine dccomp, which will handle the explicit tag if it exists, and which will then call the routine dccompbis. This routine will at first generate the initialisations and memory allocations necessary to the component, by calling the routines dcalloc and dcinit, then execute a switch, which will trigger a call to the generation of the proper decoding for the component type. Finally, the final initialisations, if required, will be generated by calling dcinit.

Handling of tags and length fields

The execution of the decoding uses two parameters, asn1z and asn1zm. Asn1z contains the current location, and asn1zm contains the address of the last byte of the decoding buffer. These variables are used through all the decoding programs, in order to detect decoding errors. If an error occurs, the variable asn1z will be set to a null value.

The decoding of a tag is generated by the routine asn1open(m,n,s). When the tag is known, this routine simply generate an incrementation of asn1z; if the tag is unknown, it generates a call to the library routine asn1_skiptype.

The decoding of the length field is only performed in the case of structured elements. This is done by generating a call to the routine asn1_length, which returns the new value of asn1z and which fills in its third parameter the new buffer limit, as derived from the length field.

The buffer limit is kept in a stack, asn1st, which is increased after each opening of a structured component. It can be null, if the length field was in the undefined format, in which case the buffer limit of the enclosing component will be used.

When a component is closed, the routine asn1_close has to be called in order to check the proper closure, e.g. that components of undefined length are terminated by an end of content mark, or that components of defined length have exactly the expected length. Then, the routine asn1_unstack shall be called in order to retrieve the previous value of asn1zm.

Handling of CHOICEs

The decoding of CHOICE components is performed by the routine choicedec.

In order to decode a CHOICE component, one generates an initial if and a succession of else if statements: for each alternative of the choice, one will generate a test comparing the current tag with that of the alternative. If the test is true, the decoding of the component will be executed, and the index of the alternative will be affected to the choice variable. If none of the alternative is selected, a decoding error is detected (ASN1_ERR_CHOICE).

The test of the tag is more difficult for embedded choice components, as the tag can take any of the alternative values: this is dealt with by the matchchoice routine. Similarly, the test of components defined elsewhere in the module, or imported, can be difficult if this component happens to be an untagged CHOICE; the test is replaced by a call to XXX_asn1, as generated by matchextern.

Handling of SETs

The decoding of SETs is performed by the routine setdec. As the components of the sets can be encoded in any order, one has to generate a loop, i.e. to repeat while the pointer asn1z is inside the set a sequence of if and else ifs very similar to the decoding of a choice component. In order to test that all mandatory elements of the set are presents, one will first generate the initialisation of a table of booleans, asn1set, into which one location per component is reserved after the index asn1setc. After an element as been decoded, the corresponding location is set to 1, and the tag of that element is no longer looked for.

When the decoding of the set is completed, the table of booleans is examined, and the default action, is any, is taken for the missing optional elements. If some mandatory elements are missing, an error is detected.

The decoding routines for SETs is also used for components of sets. If a set refers to components of another set, the corresponding XXX_dec routines will be called, where asn1z points to the beginning of the set, and asn1zm to the corresponding buffer limit. This avoid duplication, but obliges us to ignore unexpected elements when decoding a set.

Handling of SEQUENCEs

The handling of sequences is performed by the routine seqdec, which will decode the sequence's tag and then call the routine seqbodydec for decoding the sequence components. After decoding the components, it will generate a loop for ignoring unexpected components at the end of the sequence.

The decoding of the components is done by generating for each of these components a test. If the limit of the buffer has not been reached, and if the tag matchs, the decoding ionstruction will be executed; otherwise, the component will be considered as missing. If it was optional, default actions will be taken; if it was mandatory, a decoding error will be detected.

Handling of SET OF, SEQUENCE OF

The handling of sets of and sequences of is performed by the routine arraydec, which will generate the decoding of the tag, and then a loop, in order to decode and count the components. After the loop, the number of components will be assigned to the tag limit.

Handling of basic components

Special purpose routines handle the decoding of the basic components.

References to ASN.1 types defined in this package or imported are solved by a call to the generic routine cdextern.

Handling of the ANY DEFINED BY construct

References to this constructs are solved by a call to the generic routine cd_anydef.

Generation of text conversion routines

It is very often desirable to provide a text lay out of ASN.1 components. In order to do so, MAVROS can be instructed to generate text conversion routines, i.e. :

A routine, XXXXXX_out, which operates on a C representation of the element, and encode the data as a text string.
A routine, XXXXXX_olen, which can be used to compute the length of the text string that will be produced by MAVROS.
A routine, XXXXXX_in, which can parse a text string and produce a C representation of the element.

This document describes:

the purpose of text coding routines,
the text coding rules and their usage in the coding programs,
the generation of the parsing programs

The purpose of text coding routines

The main purpose of the text coding routines is to provide extended logging and testing facilities, by printing a formated version of the output in a text file, or by loading text encoded messages in memory and sending them over the network. Indeed, a combination of these two facilities can be used interactively in a simple debugging interface.

Use for logging results

These routines can be used to log text results, as in:

/* Receive an ASN.1 encoded data */
l = receive_from_network(message);
/* Decode the ASN.1 element */
if (XXXXXX_dec(message, message + l, &data_type) != 0){
/* perform some error correction */
...
XXXXXX_free(&data_type);}
if (logging_is_required){
   log_buffer = realloc(log_buffer,
      (unsigned) XXXXXX_olen(&data_type));
   end_of_buffer = XXXXXX_out(log_buffer, message);
   fprintf(log_file,"Received a message:\n%s\n",log_buffer);}

Use for test inputs

Conversely, they can be used to generate test messages, as in:

/* Read a text message */
current_line = asn1_load_text(
                   &input_buffer,
                   &input_limit,
                   input_file,
                   error_report_file,
                   current_line);
/* Parse the text message */
if (XXXXXX_in(input_buffer,input_limit,&data_type) == 0){
   /* Do some error handling action */
   ...
   XXXXXX_free(&data_type);
}else{
   /* Encode in ASN1 */
   line_buffer = (asn1) realloc((char *)line_buffer,
                      (unsigned) XXXXXX_len(0,&data_type));
   end_of_coding = XXXXXX_len(line_buffer, -1, 0, &data_type));
   l = end_of_coding - line_buffer;
   /* Send to network */
   send_to_network(line_buffer,l);
   /* Free the data */
   XXXXXX_free(&data_type);
}

Indeed, the text decoding routines can take as input the exact value which was produced by the coding routines, which is defined by the text coding rules.

The text coding rules

The text coding rules are directly derived from the rules for external representation defined in the ASN.1 standard, with a few modifications. The definition of the text coding rules include the definition of the syntax of structured elements, i.e. SEQUENCE, SET and CHOICEs, and the definition of the encoding of the basic elements, i.e. booleans, integers, reals and strings.

Representation of the scalar types

The representation of the scalar type is basically conform to the ASN.1 representation rules. When converting scalar types to text, MAVROS use the following conventions:

A BOOLEAN value will be represented by the key words TRUE or FALSE.
An INTEGER value will be represented by a decimal notation of the INTEGER, e.g. 1789 or -1. If the value is one of the named values for the data type, the corresponding name will be used instead of the deimal value.
An ENUMERATED value will be represented by the key word associated to the value. If the value is outside the definition range, the decimal presentation will be used.
A REAL value will be represented by using either 12 significant digits in floating point format or an exponent format.

When converting back from text to local representation, MAVROS can accept any floating point or exponentiated representation of REAL numbers, and either a decimal or a named value for INTEGER and ENUMERATED numbers.

Representation of bit strings

Short bit strings are represented by binary strings. Long bit strings are represented by hexadecimal strings, completed by a binary string.

When a bit string is represented by the MAVROS extended type FLAG, and when its values are enumerated, then the representation is composed of the printing of the selected flags.

Representation of octet strings and character strings

As much as possible, a textual representation (quoted ascii strings) is used. If the string contains non printable characters, an hexadecimal representation is used.

Representation of object identifiers

The oid library (see oid.man) is used for representing object identifiers.

Representation of SEQUENCE and SET

A sequence or a set is represented as the concatenation of its components, enclosed in brackets and separated by commas. Identified components are represented by the concatenation of the identifier, and optional equal sign, and the value of the component. Unidentified components are represented by their value.

Representation of SEQUENCE OF and SET OF

The sets and sequences of are represented by a concatenation of their elements values, separated by commas, and enclosed in brackets.

Representation of CHOICE

A component of type choice shall be represented as the choosen value. However, this poses a problem when the component is named, as it could lead to ugly formats like:

choice_identifier = member_identifier = some_value

We are thus obliged to introduce some variability in the lay out. The lay out of such a choice element would thus become:

choice_identifier = \

This makes the printing and parsing programs a little more complex.

Representation of ANY

Elements of type ANY are presented like octet strings, including the type and length fields.

Generation of the parsing programs

The ASN1 library

The ASN1 library contains the run time support for the encoding and decoding routines generated by MAVROS. These routines cover the following functions:

Support of the ASN-1 Basic encoding rules,
Support of alternative syntaxes, like text or light weight encodings,
Support of specific data types,
Portability support and in particular memory allocation functions,
Input output and debugging support.

These functions will be detailed in the following sections.

Support of the ASN-1 Basic encoding rules

In order to support the basic encoding rules, the ASN1 library contains:

General encoding routines,
Coding and decoding functions for the scalar types,
Coding and decoding functions for the string types,

These functions will be detailed in the following subsections.

General encoding routines

The general encoding routines deal with the coding of the type tags (type_cod.c), the coding of the length field (lencod.c), the copying of ASN.1 encoded components (copy.c) and the closing of structured components (end.c, end_d.c, end_set.c).

In most cases, the type tag of an element is specified in the ASN.1 module with the syntax of the element, and will be directly generated by MAVROS. In some cases, e.g. when a type is exported to a remote module and implicitly referenced, the tag must be compiled at run time. This is performed by the routine asn1_type_cod, which will take care of using exactly the number of octets required.

In some cases, the length of an element can be predicted prior of its encoding. The routine asn1_lencod will encode the length field, using exactly the number of octets required.

The copying routine asn1_copy copies an ASN1 encoded area into the coding buffer, changing its type if necessary.

The coding of structured components by MAVROS is done in three phases. First, the type is coded, and a single octet is reserved for coding the length. Then the elements composing the content are encoded after this octet. Finally, the length itself is encoded by the closing routine, which takes two parameters: the address of the octet reserved for the length field, and the address of the first octet following the encoding of the components. If the length of the element is less than 128 bytes, the length is stored in the reserved octet; otherwhise, and undefined length mark is placed in the reserved octet, and an end of content mark is encoded at the end of the content.

This strategy is not compatible with the requirements of unique encoding defined in the section 9 of the X.500 recommendation, which requires that the defined format of length is always used, and that the components of sets are sorted by increasing tag order. Fulfilling this requirements implies that one may have to perform useless copying, or even to execute sort programs: such a decision should only taken at run time, when the unique encoding is required. MAVROS permit this by generating only indirect references to the closing routines:

extern asn1 (*asn1_end)();
extern asn1 (*asn1_end_set)();

These pointers are both initialized with the address of the routine asn1_regular_end, which performs the simple actions described above. When the unique encoding is required, the value of asn1_end should be set to asn1_defined_end, and the value of asn1_end_set should be set to asn1_unique_set. These two routines are compatible with asn1_regular_end. The routine asn1_defined_end will encode the length field on exactly the number of octets required, pushing the value of the content by as many bytes to the right as required. The routine asn1_unique_set will copy the component into an intermediate storage, sort them using a simple bubble algorithm, and then reencode the fully defined length and the ordered content.

General decoding routines

The general decoding routines include functions for manipulating the type tag (skiptype.c), for decoding the length field (lendec.c, length.c), for skipping an element (skip.c), for computing the number of elements in a structure (number.c), and for copying an element to a new position (move.c). They may detect errors, which can be explained (errmes.c).

In most cases, MAVROS can predict the length of an element tag from the ASN.1 syntax of this element, and can point directly to the next field. In some cases however, for example with exported element which are implicitly tagged, one has to skip a variable length tag in order to point to the element length. This is performed by the asn1_skiptype routine, which will return an error (ASN1_ERR_TAG) if the tag exceeds the limit of the decoding buffer.

For historical reasons, the library contains to routines for decoding the length of an element, asn1_lendec and asn1_length. They differ by their parameter list, as one compute the length itself, whilst the other compute the address of the first byte following the content. Both will detect errors if the length is incorrectly encoded.

The routine asn1_close (length.c) Tests that the component is correctly terminated. It will return an error if the value is smaller than expected, or if the end of component mark was not present within the limit of the buffer, or if the end of component mark was improperly encoded.

MAVROS use a stack of pointers to master the decoding of embedded elements. The pointers design the first octet after the ebd of the element, or are null if the length is undefined. At any moment, the decoding routines use a pointer to the current buffer limit, which is either the end of the component itself, or the end of the structure containing the element, or the end of the received message. This pointer has to be recomputed after closing a structured element: this is done by the asn1_unstack procedure.

It may occur that the decoding routines generated by MAVROS detect an error, e.g. a missing or unexpected component. They will signal it by returning a null pointer instead of the pointer to the octet following the element, and they will place in the global variables asn1_wrongbyte and asn1_diagnostic the location and the code of the error. This can be done by calling the routine asn1_errdec (length.c).

There are many cases when the decoding routine have to skip an element and obtain a pointer to the following value. This is performed by the routine asn1_skip (skip.c); this routine will check the validity of the asn1 encoding of that element. The companion routine asn1_size will return the size of the element, obtained by skipping the element and substracting the address of the current pointer from that of the next pointer. The routine asn1_number (number.c) will open a component and count the number of embedded elements by using the asn1_skip routine.

The routine asn1_any_dec (skip.c) uses asn1_skip to produce a volatile decoding of a component of type ANY. The routine asn1_move (move.c) is used when a permanent copy is required.

Support of scalar types

The scalar types include the booleans, the integers, the enumerated integers and the reals. There is no specific routine for the coding and decoding of booleans: the necessary instructions are generated in line by MAVROS. The coding and decoding functions for the integers are held in the files intcod.c and intdec.c. The coding and decoding functions for the reals are held in the file real.c.

The coding and decoding functions for the integers are semi portable. The compilation flag ANYBOUNDARY is used to indicate whether the CPU requires that 32 bits integers begin on word boundaries, like a sparc, or whether it can load and store integers from any address, like a vax or a 68020. The compilation flag BIGENDIAN indicates whether the CPU writes its digits from left to right or from right to left: a 68000 is big-endian, a vax is little-endian. These parameters are set in the header file asn1.h. The coding and decoding routines use these options, if available, to speed up the coding and decoding of ASN.1 integers.

The coding and decoding functions for real depends from the flag IEEE_REAL, which indicates whether the CPU follows the IEEE conventions for the representation of floating point numbers. The routines necessary for the support of real types are grouped in the real.c.

Support of the String Types

Several routines have been written to support the coding and decoding of octets strings, bit strings and character strings.

The file binmov.c contains the routine asn1_binmov, which moves an ASN.1 encoded bit string into a character string.

The file bitcod.c contains the routine asn1_bitcod, for coding a bit string, and the routine asn1_flagcod, for coding a flag, i.e. a fixed size bit string encoded on a single word.

The file octmov.c contains the routine asn1_octmov, which moves an asn1 encoded octet string into a character string, the routine asn1_strmov which moves an asn1 encoded octet string into a null terminated character string, and the routine asn1_chars_dec for decoding an asn1 encoded character string.

In ASN-1, an octet string can be encoded following either the primitive format, i.e. as a string of bytes, or following the structured format, i.e. as a sequence of strings. The latter case is hard to use for C programs. Hence, the library contains procedures to pack all the substring in a single one, without extra memory allocation. These routines can be called from asn1_octets_dec; they are stored in the file octpak.c.

Similar routines can be called for bit strings by the procedures asn1_bits_dec, in the file strbin.c.

Macros and subroutines

A large part of the decoding of simple types is generated in line by MAVROS, or is performed via simple macros. In some situations, one need a reference to a subroutine, e.g. for passing the address of the procedure in a table. The file libasn1.c contains the otherwise missing procedures, for the decoding of these types.

Support of alternative syntaxes

MAVROS can generate, as well as ASN.1 coding and decoding routines, a number of other types of procedures: light weight or textual encodings, copying routines. The ASN1 library contains the support functions for these procedures.

Light weight encodings

The light weight encodings functions are concentrated in the liblw.c file. Most are extremely simple. A revision is underway.

Textual encodings

The textual encodings functions are concentrated in the input.c file. In contains routines for managing the indentation of the result, which use the global variables:

int asn1_output_indentation = 0;
int asn1_output_indent_len = 2;
char * asn1_output_indent_value = "\t";

Various routines are then used for the handling of separators, of brackets, for the decoding of bits and octets strings, of reals and of integers. The routines supporting the coding of the real types are grouped in a separate file, inreal.c.

Copying routines

The file copying.c contains the procedures used for copying the string types. The other copying procedures are in fact macros, defined in the header file asn1.h.

Support of specific data types

The ASN.1 library contains routines for supporting the object identifiers and universal time types, which have a special syntax in ASN.1.

Handling of object identifiers

The abstract syntax of an object identifier is a sequence of integers, each of which identifying a branch in the object identification tree, e.g.:

joint-iso-ccitt ds(5) attributeType(4)

Is represented by the sequence of arcs 2-5-4. But, in fact, that syntax is only used when the element is built, in order to guarantee its unicity. ASN.1 specifies that it should be transmitted as an octet string, encoding the value of the successive integers in a compact form, e.g.:

<55 4>

The object identifiers are handled by MAVROS as short octet strings. An object oid of type asn1_oid will contain a pointer to an octet area, oid.v, and the length of that octet string, oid.l. The octet string contains in fact the compact form used by ASN.1, which is easy to use for comparisons and sort operations.

The file oid.c contains several routines to convert from the external format to the internal format (asn1_oid_get) and vice versa (asn1_oid_put). It also contains a procedure for declaring an object identifier value asn1_oid_decl, which authorize the conversion procedures to use its name as an abreviation.

The file oidint.c contains routine for converting the internal representation from a compact octet string to a table of integers (asn1_oid_int), and vice versa (asn1_int_oid).

Handling of Universal Times

The Universal time format is defined as a character string, made of the year, month, day, hour, minute and optionally secunds. It can be completed by a time differential between the local time and the UTC time, or by the letter Z, to denote the absence of any differential. The generalized time format is an evolution of the UTCtime, completed by an arbitrarily precise decimal fraction of a secund.

The routines in the file utctime.c perform the conversion between the UTC time format:

861021163112+0400
910605213044.922+0200

and the timeb structure defined by :

	  struct timeb
	  {
	       time_t   time;
	       unsigned short millitm;
	       short    timezone;
	       short    dstflag;
	  };

This structure is also used by the system call ftime, which is available under most versions of UNIX.

The routine asn1_gen_time will convert an UNIX representation into either a generalized time, if the millitm component is non null or an utc time otherwise. The routine asn1_utc_time will always convert to an UTC time, without fractions of seconds. The routine asn1_time will perform the reverse conversion.

As the UTCtime format stores years on two digits, these routines will give unpredictable results for dates after the 31 December 1999. They are also inadequate for dates before the 1 January 1902.

Comparison routines for strings

The library contains some comparison routines, for bits strings and octet strings, in the files bits_cmp.c and field_cmp.c.

Portability support

Memory allocation

The file allocm.c contains the memory allocation routines:

asn1_malloc will be called instead of malloc by the decoding routines,
asn1_realloc will be called instead of realloc,
asn1_free will be called instead of free.

These routines maintain a list of allocated pointers, which is accessed through a hashing table maintained by static procedures. Asn1_alloc_free will take no action if the pointer to be freed is not present in the hashed list, i.e. was not allocated by asn1_alloc_malloc or was already freed; it will call free and remove the pointer from the list otherwise.

The routine asn1_alloc_snap_shot can be used to get the number of blocs which are represented in the hashing table at a given time, and optionally to obtain the content of the hashed table. It can be used when debugging complex applications, e.g. to check that a server is not silently crunching all the available memory...

System independant memory copy

The file length.c contains conditionnally compiled routines for byte copy, zero filling of area and byte comparison, which will be used when neither the BERKELEY routines (bcopy, bzero, bcmp) nor the SYSTEM 5 routines (memcpy, memset, memcmp) are available.

Input output and debugging support

Most protocols need to be debugged, and the availability of generic tools is indeed valuable. The ASN.1 library contains the routines necessary to dump an easily understandable textual version of a component in a file, an to load these elements from a file.

Dumping a textual version in a file

The routine asn1_dump (file dump.c) will take an ASN.1 encoded memory area as argument, and dump it on a file as e.g.:

[1]{
 [UNIVERSAL 16]{
  [UNIVERSAL 2]
   ' 0'16,
  [UNIVERSAL 19]
   "very very very very very very very very very very very very very ve" &
   "ry very very very very very very very very very very very very very" &
   " very very very very very very very very very very very very very l" &
   "ong string"}}

For each element, the tag will be represented first, following the ASN.1 conventions. Then, if the element is primitive, the content will be represented either as a printable string if it consists of printable characters, or as an hexadecimal string otherwise; the character ampersand can be used as a continuation mark for long strings. The structured elements will be encoded as as the list of their elements, enclosed between curly brackets and separated by commas.

Loading an element in memory

The routine load.c will read an element encoded as the output of asn1_dump from a file into memory. It can be used for simple debugging tools, e.g. in order to replay a session that was previously archived. Indeed, the text file can well be edited, e.g. in order to insert arbitrary protocol variations, or in order to change some protocol elements. This facility has been used for the test routines.

The loader uses a routine to skip linear withe spaces in the file, which is stored in the file load_blank.c.

One can also use the routineasn1_fetch (file fetch.c) for loading in memory an ASN.1 encoded element from an ASN.1 encoded file, or the routine asn1_load_text (file load_text.c) to load in memory a text encoding element following the textual representation conventions generated by MAVROS.

Design of the test procedures

The test procedures comprise five steps:

The generation of coding routines and header files, using MAVROS, for a test syntax,
The compilation of the coding routines using the makefile generated by MAVROS,
The compilation of the test program, which depends of the header file generated by MAVROS, and is linked with the routines produced in step 3-,
The execution of the test program, in order to verify that the coding routines can be executed correctly,
The execution of the lint program, in order to check the consistency of the code generated by MAVROS with the ASN1 library and with the test program.

The tests shall be run in order to check the correctness of the porting of the ASN.1 library on a new system. They can be used as non regression tests after an update of either the MAVROS compiler or the ASN1 library.

In the following sections, we will detail:

The design of the test syntax,
Loading the local representation,
Testing the ASN.1 coding procedures,
Testing the alternative syntaxes,
Testing the user support libraries,

The design of the test syntax

The test syntax is defined by the file test.mvr of the TESTASN1 directory. It exercises at least once each of the basic data types and construction facility of ASN-1, as well as each of the MAVROS extensions.

Entry

The main component, entry, is typed as SET OF CHOICE. Within this main choice, one will find various types of components: a SEQUENCE (seq2), a SET (set4), a BIT STRING (bits1), a FLAG, i.e. a bit string encoded on a single word (flags1), a REAL and a imported type (import1).

Seq2, seq1

The sequence seq2 is a simple redefinition of another sequence, called seq1. One should check that no coding routines are generated for seq2, but only C macros, making a direct reference to seq1. The sequence seq1 contains reference to a boolean, an integer, an OPTIONAL octet strings and two references to a set, set1. Indeed, the test cases will contain several occurences of this sequence, in particular in order to test the correct handling of optional components within sequences.

Set4

The set set4 is dedicated to the test of the COMPONENTS OF construct. It contain a direct inclusion of the components of the set set1, an a sequence component which consists solely of the components of the sequence seq3.

Seq3

The sequence SEQ3 contains an object identifier oid, and a component of type ANY. Both components are optional.

Set1

The set set1 is referred to as component of the elements seq1 and set4. It contains an INTEGER, for which four named values and a DEFAULT value are defined, a sequence of components of type choice1, and then a direct and a tagged reference to this same component choice1. Different values will occur in the test suite, where the initial integer will sometime take a default value. The direct reference to choice1 is used to insert a decoding difficulty, i.e. the recognition of an untagged choice within a set.

Choice1

The choice choice1 contains tagged and untagged integer components, an untagged choice component, which is again a difficulty for the decoding procedures, and a copy component, in order to test the delayed coding and decoding procedures.

Imported component

One of the elements of the main choice, at the entry level, is an imported component called import1. The definition of the coding and decoding routines for that component are in the file import1.c; they code the integer parameter as a numeric string. The element is only present in order to test the import facility of MAVROS.

Loading the local representation

The first action of the test_mvr program is to load in memory a reference version of the data to encode, stored in the file test.input. The syntax of this file was choosen in order to make the loading routines as simple as possible, which does perhaps not result in the best level of readibility...

Number and types of entries

The first value in the file is the number of entries. It is followed by the description of each of these entries, which begins by an entry type:

		   _____________________
		  | Code |   C type    |
		  |______|_____________|
		  |  0	 |   seq2      |
		  |  1	 |   set4      |
		  |  2	 |   bit string|
		  |  3	 |   flags     |
		  |  4	 |   integer   |
		  |  5	 |   real      |
		  |______|_____________|

Scanning a component of type seq2

The C type seq2 comprise the following elements:

A boolean value (a.b),
An integer value (a.i),
an octet string (a.o),
Two elements of type set1 (b and c).

Loading the type set4

The type set4 comprises an element of type seq3 (a) followed by an element of type set1 (b). The element of type seq3 is composed of an object identifier (a.oid) followed by a component of type any (a.any).

Loading the type set1

The type set1 comprises:

An integer value (i1),
The number of components(nc) of the set of
nc components of type choice1,
two components of type choice1.

Loading the type choice1

The type choice1 comprises an indication of the selected choice, which is followed by a value of the selected type:

	     0,1     An	integer	value (o.i)
	      2	     A string value(o.s)
	      3	     A value of	type ANY(o.any)

Loading octet strings, object identifiers and ANY

Octet strings, object identifiers and components of type ANY are represented in the input file as hexadecimally encoded files, e.g.

'03abacadae010a0b0c'

The routine scan_field of test_mvr will transform this input into two components, a number of octets(l) and a pointer to a memory area containing the value(v).

Loading bit strings

Bit strings are encoded as a string of hexadecimal characters delimited by simple quotes, just like octet string. However, the first two characters, which would encode the first byte of the octet string, encode here the number of unused bits in the last octet.

The routine scan_bits of test_mvr will transform this input into two components, a number of bits(l) and a pointer to a memory area containing the value(v).

Loading text strings

Text strings should be enclosed between double quotes, as in:

"This is a string"

The special value null string can be used for entering a missing string, e.g. when the string component is optional.

Loading booleans

Booleans are represented in the file by either a string, whose value can be either TRUE or FALSE.

Loading integers

Integers are read from the file as simple numbers, using the "%d" format of scanf.

Loading reals

Reals are read as the combination of a mantissa and an exponent. The mantissa is read as a real number, using the %f format of scanf. The exponent is read as an integer. The resulting value is the product of the mantissa by 2 to the power of the exponent.

This form of encoding has been choosen because it enables us to specify floating point numbers whose representation will not depend of machine specificities; the result of the coding procedures will thus be machine independant.

Loading flags

In order to avoid dependencies on the local ordering of bits, and on the local size of words, the flags will have at most two elements sets, which should be represented as the numbers of two named bytes. For exemple, an input like:

2 3

Will encode the elements F2 and F3 defined in the test.mvr file:

flags1(f:int) ::= [APPLICATION 2]
FLAGS {F0(0),F1(1),F2(2),F3(3)} (f,4)

The correct encoding of the flags will thus depend from a correct generation of the constants F0, F1, F2 and F3 by MAVROS in the include file test.h.

Testing the ASN.1 coding procedures

The test program test_mvr include several sequences of tests for checking the validity of the ASN.1 coding and decoding procedures generated by MAVROS:

That element is encoded, then decoded. The result of the encoding is compared to the initial value, which gives a first consistency test.
The encoding can be compared to a reference encoding,
Alternative encoding can be loaded in memory from a test file. One will verify that their decoding yeld the same result.
Erroneous encodings can be loaded in memory from another test file. The errors shall be detected, and the process shall demonstrate that it recovers correctly from these errors.

These tests will be detailed in the following subsections.

Consistency test

In order to test the consistency of encodings and decodings, the program will:

Reserve a buffer for the encoding. The length of the buffer is determined by the result of the entry_len routine,
Apply the coding routine entry_cod,
Decode the result using the entry_dec routine,
Compare the decoded value with the original value. The comparison routines are independant of MAVROS; they are provided in the test_mvr program.
Free the results, using the entry_free routines.

The execution of the coding and decoding procedures is protected by a time-out, in order to detect possible loops. The comparison routines will print any differences in the test output; the test program will exit with a non zero code if such a failure is detected. The test output will also contain information on the memory allocation, which enables us to check that the freeing routines did indeed free all the memory segments which were allocated during the decoding:

Comparison of input and output:
Memory allocation before decoding (mode = 0) = 0.
Memory allocation after decoding  (mode = 0) = 11.
End of comparison.
Memory allocation after free      (mode = 0) = 0.

After performing this test, the coding will be transformed, in order to emulate the extreme encodings that can be generated by different ASN-1 compilers. The basic ASN-1 encoding rules allow to encode the length field of structured components either in defined form or in the undefined form, leaving a place holder in the length field and terminating the structure by an end of component mark. The test routines will:

reencode the whole structure, so that only undefined length forms are used for structured fields, and then repeat the steps 3 to 5 described above,
reencode the whole structure, so that only defined length forms are used for structured fields, and then repeat the steps 3 to 5 described above,

These tests gives us the confidence that the decoding routines generated by MAVROS will correctly decode the output of other ASN.1 compilers.

Testing the result of the coding

The self coherency tests would not discover the differences between the encoding generated by MAVROS and a correct encoding. In order to detect any discrepancy, the test_mvr program can be ordered to compare the encoding with a reference version, stored in the file test.dump specified by the -diff argument.

The format of that file correspond to the output of the asn1_dump routine, which is part of the ASN1 library. It contains text encoded ASN1 rather than binary, in order to improve the readibility:

[UNIVERSAL 17]{
 [APPLICATION 0]{
  [0]{
   [UNIVERSAL 2]
    ' 5'16},
  [1]{
   [UNIVERSAL 16]{}},
  [UNIVERSAL 2]
   ' 0'16,
  [2]{
   [UNIVERSAL 2]
    ' 0'16},
  [UNIVERSAL 16]{
   [UNIVERSAL 6]
    ' 1 2 3 4 5 6'16,
....

In fact, a reference file with precisely this format can be produced by using the -dump argument of test_mvr.

The output file also contain an hexadecimal dump of the encoding:

Testing the standard encoding:
Dump binary (26578 - 26762):
3180602da0 3 2 1 5a1 230 0 2 1 0a2 3 2 1 0301a 6 6 1 2 3 4 5 6a1
102480 4 131 4 132 4 133 4 134 0 06080a0 3 2 1fea1803080 2 1 013
81d3766572792076657279207665727920766572792076657279207665727920
7665727920766572792076657279207665727920766572792076657279207665
7279207665727920766572792076657279207665727920766572792076657279
2076657279207665727920766572792076657279207665727920766572792076
6572792076657279207665727920766572792076657279207665727920766572
7920766572792076657279207665727920766572792076657279207665727920
766572792076657279206c6f6e6720737472696e67 0 0 0 0 5 0a2 624 4 4
 231323015 6 9 3abacadae 1 a b ca1 8 4 6616263646566 0 0a033 1 1
ff 2 2 6fd 4 2 1 1a013a1 830 681 4 69f6bc7 2 1 1a2 481 2 0813111
a0 3 2 1 1a1 230 0 2 1 0a2 3 2 1 0a043 1 1 0 2 180 4 1 0a012a1 8
30 681 4f9609439 2 1 1a2 381 17f3124a0 4 2 2 6fda1143012 2 1 0 2
 3 080 0 2 280 0 2 180 2 1fe 2 1 0a2 3 2 1 041 3 77f7062 4 3 2 4
3043 431373839 9 9c0cc10 0 0 0 0 0 0 9 980d61bf4cbac71 cb3 9 9c0
8511 0 0 0 0 0 0 0 0

This dump can be checked manually by ASN.1 experts, if they want to find out the details of the tags and length encodings. It is indeed somewhat less readable than the text format.

Test of alternative encodings

The ASN-1 basic encoding rules dont fully specify the encoding of the messages. Different decisions can be taken for SET, STRING and REAL elements; moreover, one should take a lenient approach towards some benign coding mistakes, like the coding of integers on 4 bytes whilst 3 might have been sufficient.

The test.entries file, which is specified by the -load argument to test_mvr, contains two different encodings of the reference value, with subtle differences like:

Components placed in different order within a SET,
Characters and bit strings expressed in the structured format, instead of the primitive format generated by MAVROS,
Slightly deviant encoding of some INTEGER elements,
Alternative codings of REAL elements, obtained by changing the base of the exponent, or by using the decimal representation.

The format of this file is the same text encoded ASN1 as that of the reference file test.dump. The file can indeed be produced by editing the text of the reference file. Each element is loaded in memory by using the asn1_load routine of the ASN1 library, and is then decoded. The value is compared to the reference value, and any error is reported. The test program will exit with a non null status if any error occurs.

One should note that the introduction of different encodings has obliged us to revise the comparison procedure for the REAL elements. Instead of a simple test for equality, one will also consider that two floating point number match if their relative difference is less than 1.E-7.

Test of erroneous encodings

The decoding routines generated by MAVROS contain sequences to detect the correctness of the input: trusting the remote entity is generally not an adequate policy in an open network. These routines will detect syntax errors, like missing components or components incorrectly encoded. These error detection sequences are tested by loading in memory incorrect encoding of the reference data, from the file test.failures specified by the -lbad argument to test_mvr.

Each element of the file will be presented to the decoding routine entry_dec, which should detect the coding errors:

The first element does only contain two entries instead of 10, and an integer variable has the value 5 instead of 0.
Within the secund and third element, a mandatory component is missing, once in a SET and then in a SEQUENCE.
Within the fourth element, an unexpected element occurs within a CHOICE.

The test routine will check that an error is detected for each of these entries. The test program will exit with a non null status if one of the error remained undetected; the precise result of the detection is found in the output file.

Another interest of the test is to detect the possibility of the program to survive a decoding error. After each decoding, the allocated memory is freed by calling the routine entry_free, and one can check that all memory allocations have been cleared.

Testing the alternative syntaxes

MAVROS does not only generate standard ASN.1 coding and decoding procedures. One can also use it to generate faster routines (light weight) or text encodings for simple I/O operations, and to generate copying routines. All these routines can be tested by the test_mvr program.

Testing the light weight coding procedures

The current version of the light weight decoding routines is not very satisfactory. A better version is under design. The test of the light weight routines has thus been placed under conditional compilation, with the flag INRIA, in the test program.

The test of the light weight coding consists of:

Calling the coding routine entry_lwc,
Producing a binary dump of the result,
Calling the decoding routine entry_lwd,
Comparing the result with the reference value,
Freeing the result.

As for the standard encoding, the program will return an error if the decoded value does not match the initial value. By looking at the test output, one can check that all memory affectations have been cleared.

Testing the text coding procedures

The text coding procedures are tested by applying the procedure entry_out to the initial value, and comparing the result with the content of the file test.text_out, specified by the parameter -dtxt to the program test_mvr. The program test_mvr can be directed to produce a reference text by passing the text file parameter with the -prt argument to test_mvr.

The decoding is tested by loading in memory the text values stored in the file test.text_ref specified with the argument -ltxt, using the procedure asn1_load_text of the ASN1 library, by decoding them with the entry_in procedure, and by comparing the result with the reference value.

It is possible to propose incorrect encoding by specifying a file with -lbtx (load bad text) argument of test_mvr. The detection of errors and the survivability of the program will be tested as for ASN1 encodings.

Testing the copying routines

The test of the copying routines can be triggered by the -copy argument of test_mvr. The test consist of:

Copying the reference data with the routine entry_cpy,
Comparing the copied data with the reference data,
Freeing the copied data.

If a difference between the reference data and the copied data is detected, the test program will exit with anon null code. The test output can be examined to check that all the memory allocated during the copying is freed by entry_free.

Testing the user support libraries

The ASN1 library contains routines for supporting the input and output of OBJECT IDENTIFIERS and of UTC Times.

Testing the Object Identifier interface routines

The object identifier interface routines are tested by the program oidt. This program will read an input file, oidt.input, which contains the on each line an object identifier description, preceded by the letter d if that object shall be declared, or by the letter p if it should only be printed.

The test includes indeed object identifiers of various types, and some coding dificulties like the presence of identifier parts (integers) larger than 32 bits. Each porposed value will generated two lines in the output file, as in:

Coding of "rare attributeType(4)" is < 3902082a08aa075 0 4>
declare("rare-attribute","rare attributeType(4)") returns 0
Coding of " rare-attribute 1" is < 3902082a08aa075 0 4 1>
Decoding yelds "rare-attribute 1 "
Coding of " rare-attribute 2" is < 3902082a08aa075 0 4 2>
Decoding yelds "rare-attribute 2 "

By comparing this output with a reference file, oidt.output, one can verify the correctness of the Object Identifier manipulation routines.

Testing the Universal Time coding routines

The test of the UTC Time manipulation routines is performed by loading in memory the file time.test, specified by the argument -utc of test_mvr. This file contains line of the form:

020101000001+0000/Wed Jan  1 01:00:01 1902
991231225959+0000/Fri Dec 31 23:59:59 1999
351103125620+0600/Sun Nov  3 07:56:20 1935
691228215512+0400/Sun Dec 28 18:55:12 1969

On each line, an UTC time value, or a generalized time value, is separated by a / from the print out of the same date in the UNIX format, for a machine running a local time of GMT+1. The program will decode the UTC time value and compare the result to UNIX date; it will return an error in case of a difference. Then, it will reencode the date, and check that the result of the encoding is equal to the original value.

Portability and tests

The MAVROS compiler itself is portable, and does not make assumptions on either the CPU or the particular brand of UNIX which is in use on the target machine. Moreover, it generates portable code; the system dependancies are concentrated in the ASN1 run time library, which will have to be ported on every target system. In fact, MAVROS makes only one hypothesis: that the machine is byte oriented, that one can address individual bytes. Porting to a non byte oriented architecture, e.g. where a single character is stored on a 64 bits word or where the word size is 36 bits, would be difficult.

On byte oriented machines, porting concentrates to the update of a few flags in the header file asn1.h:

#if defined(mc68020) || defined(mc68010) || defined(mc68000)
#define ANYBOUNDARY
#define BIGENDIAN
#define IEEE_REAL
#else
#if defined(sparc)
#define BIGENDIAN
#define IEEE_REAL
#else
#if defined(gould)
#define BIGENDIAN
#else
#if defined(vax)
#define ANYBOUNDARY
#endif vax
#endif gould
#endif sparc
#endif mc68020

These flags are used for optimizing the coding and decoding of integer and real elements, as well as for the definition of bit string constants; they have the following meaning:

ANYBOUNDARY is set to indicate that on the target machine the long integers can be stored on arbitrary locations. This flag should not be set if the target machine, like the SPARC, requires that long integers start on word boundaries.
BIGENDIAN is set if the machine stores the bits of an integer word in big endian format, i.e. most significant bit first. It should not be set on machines like vaxen, which use the opposite little endian convention.
IEEE_REAL is set if the machine uses the conventions for handling foating point numbers defined by the IEEE, like a 68020.

Once the correct value of these flag has been determined for a new CPU, it will have to be included in a new release of the header file. The ASN-1 library can then be compiled.

Once the flags have been set and the run time library compiled, one can check the porting by using the test suite.

The design of the MAVROS compiler

The components of MAVROS

Structure of the memory tree

Checking the consistency

Principles of pretty printing

Printing of type definitions

Printing of a component

General structure

Handling of CHOICEs

Handling of SET and SEQUENCEs

Handling of SET OF, SEQUENCE OF

Handling of basic components

Handling of the ANY DEFINED BY construct

General structure

Error handling of CHOICEs

Error handling of SET and SEQUENCEs

Error handling of SET OF, SEQUENCE OF

Error handling of basic components

General structure of the copying generation

Copying of CHOICEs

Copying of SET and SEQUENCEs

Copying of SET OF, SEQUENCE OF

Copying of basic components

Principles of the comparison routines

Comparison of sequences

Comparison of arrays

Comparison of choices

Impact on the generic routines

General structure

Handling of tags and length fields

Handling of CHOICEs

Handling of SET and SEQUENCEs

Handling of SET OF, SEQUENCE OF

Handling of basic components

Handling of the ANY DEFINED BY construct

General structure

Handling of tags and length fields

Handling of CHOICEs

Handling of SET and SEQUENCEs

Handling of SET OF, SEQUENCE OF

Handling of basic components

Handling of the ANY DEFINED BY construct

General structure

Handling of tags and length fields

Handling of CHOICEs

Handling of SETs

Handling of SEQUENCEs

Handling of SET OF, SEQUENCE OF

Handling of basic components

Handling of the ANY DEFINED BY construct

Use for logging results

Use for test inputs

Representation of the scalar types

Representation of bit strings

Representation of octet strings and character strings

Representation of object identifiers

Representation of SEQUENCE and SET

Representation of SEQUENCE OF and SET OF

Representation of CHOICE

Representation of ANY

General architecture

The parsing of CHOICE

The parsing of SEQUENCEs

General encoding routines

General decoding routines

Support of scalar types

Support of the String Types

Macros and subroutines

Light weight encodings

Textual encodings

Copying routines

Handling of object identifiers

Handling of Universal Times

Comparison routines for strings

Memory allocation

Dumping a textual version in a file

Loading an element in memory

Entry

Seq2, seq1

Set4