December 2, 1995
A central goal in the design of the Java programming language is robustness: ``early checking for possible problems, later dynamic (run-time) checking, and eliminating situations that are error prone'' ([Java95]). This paper suggests five changes to Java aimed at improving its robustness. Most of these changes are backward compatible with the existing Java specification.
Many of the above changes involve moving run-time checks to compile-time. While run-time checks detect errors in a particular execution of a program, compile-time checks detect errors in all executions of a program, increasing confidence in the correctness of the program. Also, compile-time checking eliminates run-time overhead incurred by run-time checks. The third change introduces a run-time check were no check is currently done. The first change moves a compile-time check to run-time. In this case, the compile-time check is too restrictive, forcing programmers into an unsafe work-around that eliminates all checking. In this case, a run-time check improves robustness of Java programs.
Table of contents:
In general, it is impossible to determine statically if a variable is initialized. Thus, Java's check is defined using a conservative, all-paths test: a use of a variable is illegal if the variable is not initialized on all paths leading up to the use. This test is too strict; it rules out code programmers need to write. For example, consider the following:
void foo(String args[]) { int j; if (..some predicate..) j = 0; // ...more conditional code that might or might // not set j... int i; if (..true only if j has been set by previous stmts..) i = j; // Java complains about this because it // doesn't think j is initialized else i = 10; // ... }This method is illegal in Java because j is not initialized on all paths leading to the point at which j is used. However, conditional initialization as in the above example is inevitable. Java's strict check forces programmers to work around Java's strict checking by initializing variables like j with dummy values. An example of such a work around can be found in the MethodExpression class of the Java compiler:
public long checkValue(Environment env, Context ctx, long vset) { ClassDeclaration c = null; // This is a dummy initialization ...34-lines of complicated, nested-if statements in which c is initialized... ...15 more lines... clazz = c.getClassDefinition(env); // Assumes c not null!! ... }The use of c near the end of the method body assumes that it is initialized by the nested if statements. However, the nested if statements are sufficiently complicated that it is not trivial to determine if they in fact initialize c. And even when such code is correct, it is fragile: a maintenance programmer making a change to the if statements can easily create a situation in which c is not initialized.
The dummy-value work around is particularly dangerous when variables like j or c are stored into data structures. If the conditional initialization code is buggy, the dummy value might be stored into the data structure, corrupting the structure's integrity. Finding such bugs is time consuming because the corruption often does not manifest itself until program execution has continued a long distance past the buggy code.
A compile-time error is appropriate if a variable is not initialized on any path before it is used. However, only a warning is appropriate if a variable is not initialized on all paths. A compile-time error in the all-paths case forces programmers into the dummy-value work around, which has the effect of removing all checking for uninitialized variables just when it is needed most: when complicated conditionals are involved. Instead of issuing an error, the compiler should generate code to check at run time for uses of variables that are not initialized on all paths.
In CLU-derived languages such as Argus ([Liskov88]) and Theta ([Liskov95]), programmers must indicate explicitly when null can be assigned to a variable or stored into a data structure. This is done by introducing the type Maybe[T]. The null reference can be assigned to Maybe[T], but not to T. Uses of Maybe[T] require run-time checks for null, but uses of T do not. Experience in CLU-derived languages indicates that T is used most of the time, improving performance and increasing reliability. Research work on moving null checking in C and Modula-3 to compile-time support the CLU experience ([Evans96][Detlefs96]).
In addition to removing run-time checks, Maybe types make programs easier to understand. In the Java implementation, the method inline of the class ArrayExpression returns an Expression:
Expression inline(...) { ... }Can this method return null? The name of the method suggests no: when an expression is inlined, one might not expect to get null back. But a closer inspection of the code indicates that it can in fact return null. If Java had Maybe types, questions such as this could be answered by simply looking at signatures rather than by looking at code.
We recommend the type expression T ?x for Maybe types in Java. This syntax fits Java's C genealogy: the type qualifier ? would be a type-constructor like * is in C. Under this proposal, null could be assigned to T ?, but not to T. If backward compatibility is desired, the syntax T !x could be used to indicate that null can not be assigned, and the meaning of T x could remain as-is (i.e., null can be assigned). The T ? syntax has the advantage of using the simpler syntax for the common case and also of having the safer default. It is not backward compatible, but a mechanical translator could translate old programs.
Most Java programs will not rely on Java's wrap-around semantics. Instead, programs will assume that overflow does not occur. This is dangerous. If buggy code causes an overflow, the error will not be detected; instead, corrupt data will be passed further and further away from the source of the error. As mentioned earlier, this kind of error is hard to debug.
The argument against arithmetic exceptions is that they impose a performance penalty. However, on modern architectures with support for overflow detection, it is not clear that this penalty is significant. Further, the reliability/performance trade-off is a trade-off that should be decided by the end-user, not the language. If the user of a Java applet wants to run fast-and-lose, she can turn off overflow checking (and other run-time checking such as array-bounds checking). A more cautious user might decide to take the penalty and run more safely.
Adding arithmetic exceptions to Java is not backward compatible. However, we feel most programs just assume overflows do not occur and do not rely on wrap-around semantics. This means the change will not be burdensome on the installed base of programs.
Hashtable, Java's hashtable class, illustrates the benefits of parameterized types. Hashtable maps keys to values. In Hashtable, both keys and values have type Object, which effectively means they are untyped, which in turn means that dynamic type-checks are needed. For example, in the Identifier class of the Java compiler, a Hashtable is used to map Strings to Identifiers. Because Hashtable is untyped, the code for lookup must downcast elements of this table to Identifiers:
public static synchronized Identifier lookup(String s) \{ Identifier id = (Identifier)hash.get(s); // Note downcast ... }With downcasts of this sort require run-time checks, which imply both the possibility of an exception being raised and run-time overhead.
When using Hashtables, the types of both the keys and the values are usually known at compile-time. Parameterized types would allow programmers to express this information so it can be checked at compile-time. This would eliminate the need for downcasts and the associated run-time checks. This analysis applies to Vector and Enumeration, two other commonly-used Java classes that require extensive run-time type checking. For example, the Enumeration class is used over fifty times in the Java source code, almost all of which involve a run-time type check that could easily be moved to compile-time.
It is beyond the scope of this document to propose a Java extension for parameterized types. [Day95] contains a valuable discussion about the design of parameterization mechanisms and should be consulted. It is available via ftp .
Java's array subtyping rule is unsound due to the the infamous ``contravariant'' rule for subtyping ([Cardelli88]). Thus, run-time checking is required. To see where, consider the following example:
class A { A(){}; }; class B extends A { B() {}; public static void main(String args[]) { B[] x = new B[1]; A[] y = x; y[0] = new A(); // bug!! } };On the line marked ``bug,'' an instance of A is stored into an array of B. This is an error because A is not a subtype of B. The Alpha-3 Java implementation raises a run-time exception on this program:
java.lang.IncompatibleTypeException at B.main(Test.java:9)In general, when storing an element into an array, a run-time check is needed to ensure that a supertype object isn't being stored into a subtype array. (The Java spec says nothing about this problem in its discussion of arrays or of IncompatibleTypeException.)
The subtype rule for arrays-and its associated run-time check-can be eliminated if parameterized types were available. With parameterized types, programmers can type programs more carefully, allowing the compiler to check for errors. For example, we can add a parameterized Sequence type. A Sequence of B is a read-only supertype of B[], i.e., elements can be read from a Sequence but not write into them. A Sequence of B can safely be passed where a Sequence of A is expected, taking care of many of the places in which the original array subtype rule would be used. Although eliminating the array subtype rule would not be backward compatible, we feel the resulting improvement in robustness and performance justify the change.