ESCJ 24:  Astgen Manual

Compaq Confidential.

Last modified: September 22, 1998

The astgen tool reads in a file containing annotated, partial implementations of AST classes and writes full implementations for those classes, putting each in its own source file. It also outputs two auxilliary classes. Using the generator leads to a description of AST classes that is more manageable than the full implementation would be because it is in a single file and because it is smaller by a factor an order of magnitude. Also, the generator allows one to easily change an AST hierarchy and the code found inside of AST classes.

The input to astgen looks very much like a set of Java class declarations. These declarations are annotated with Java comments containing pragmas understood by astgen. The input must use Java's lexical language and must follow the following grammar:

 PackageDeclaration_opt ImportDeclarations_opt EndHeader ClassDeclaration*
where the non-terminals other than EndHeader are defined in the Java Language Specification. The EndHeader is a Java single-line comment starting with //# followed by some space then followed by the keyword EndHeader (case is significant to astgen). If a ClassDeclaration in an astgen input file has a superclass, the declaration of that superclass must appear earlier in the input file.

Given such an input file, astgen does the following:

Output

The tool outputs one .java file for each ClassDeclaration in the input file. As discussed above, these per-class .java files consist mostly of the generic header appended to the text of the class declarations, plus some boilerplate methods generated automatically. "Child fields" of an AST node are declared using pragmas (pragmas are described in the next section). Child fields are public fields pointing to what should be the children of an AST node. The child fields of a class declaration play an important role in the generation of the boilerplate for the declaration.

In addition to the per-class .java files, the tool outputs a file called SubTagConstants.java and another called Visitor.java. These files support the boilerplate code generated for the per-class .java files.

The bullet points below describe the boilerplate methods generated plus the two support files for them.

The discussion above suggest that many of the above methods are generated only in non-abstract classes. This is not exactly true. As mentioned earlier, if a ClassDeclaration in the input to the tool has a superclass, the declaration of that superclass must appear earlier in the input file. This implies that every input file declares a set of "root" classes that are superclasses of all the other, non-root classes declared in the file. If a class is both abstract and one of these root classes, then the tool generates into it abstract versions of the methods listed above (that is, versions without implementations). This means that the methods defined above can be called on all AST nodes, not just concrete ones.

Pragmas

Inside a ClassDeclaration, between member declarations, astgen recognizes a number of pragmas which either generate member declarations or control the output of boilerplate members.

The syntax of a pragma is a Java single-line comment on a line by itself. Pragma comments are distinguished by starting with //#. Inside class C, the following pragma defines the child fields of C:

The following pragmas apply to a class as a whole rather than to individual fields. They control non-field specific aspects of the generation of boilerplate methods like make and check. The syntax of these pragma is again a single line comment begining with //# and containing a single keyword.
  • "//# NoMaker".

  • Inside class C, this declaration suppresses the generation of C's make method, allowing a custom maker to be written instead (or none at all).
  • "//# ManualTag".

  • Inside class C, this declaration suppresses the generation of C's getTag method, allowing a custom one to be written instead. In our Java front-end, we use this feature to allow us to return different tags for BinaryExpr depending on the expression's operation.
  • "//# PostMakeCall".

  • Inside class C, this declaration adds the following line to the end of C's automatically-generated make method:
     postMake();
    No implementation of postMake is generated. The intent is for the user to write postMake themselves, giving them a hook to customize the initialization of nodes after the child fields are filled in using the arguments to make. In our Java front-end, we use this feature to allow us, in the maker for CompilationUnit, to set the parent pointer of the TypeDecl objects passed as arguments.
  • "//# PostCheckCall".

  • Similar to PostMakeCall, this declaration adds the following line to the end of automatically-generated check methods:
     postCheck();
    As with postMake, no implementation is generated for postCheck, allowing the user to provide their own checking code. In our Java front-end, we use this feature to ensure that a Name has at least one identifier in it. In place of these class-wide pragmas, an alternative design would have been for astgen to change the code it outputs based on whether a ClassDeclatation contains certain methods. For example, instead of the ManualTag pragma, astgen could generate a getTag method only for classes that do not contain a manually-defined getTag method. In the future, we may change to this design (such a change would be backward compatible).

    Legal Statement Privacy Statement