ESCJ 2: Improving the safety of Java

Raymie Stata

December 2, 1995

A central goal in the design of the Java programming language is robustness: ``early checking for possible problems, later dynamic (run-time) checking, and eliminating situations that are error prone'' ([Java95]). This paper suggests five changes to Java aimed at improving its robustness. Most of these changes are backward compatible with the existing Java specification.

Many of the above changes involve moving run-time checks to compile-time. While run-time checks detect errors in a particular execution of a program, compile-time checks detect errors in all executions of a program, increasing confidence in the correctness of the program. Also, compile-time checking eliminates run-time overhead incurred by run-time checks. The third change introduces a run-time check were no check is currently done. The first change moves a compile-time check to run-time. In this case, the compile-time check is too restrictive, forcing programmers into an unsafe work-around that eliminates all checking. In this case, a run-time check improves robustness of Java programs.

Table of contents:

  1. Move local-variable initialization checking to run-time
  2. Add Maybe types for compile-time null checking
  3. Add arithmetic exceptions
  4. Add parametric polymorphism
  5. Eliminate the array subtyping rule

1. Move initialization checking to run-time

Java has a compile-time check to ensure that local variables are initialized before use (this section pertains only to local variables, not to static or instance variables). Although compile-time checks are generally preferable to run-time checks, we recommend moving this particular check to run-time. The compile-time check is too strict, forcing programmers into an unsafe work around. Moving this check to run-time eliminates the need for this work around, improving the reliability of Java programs. Moving this check to run-time is backward compatible with the current Java spec, and the performance penalty is small because the check can be optimized away in most cases.

In general, it is impossible to determine statically if a variable is initialized. Thus, Java's check is defined using a conservative, all-paths test: a use of a variable is illegal if the variable is not initialized on all paths leading up to the use. This test is too strict; it rules out code programmers need to write. For example, consider the following:

     void foo(String args[]) {
       int j;
       if (..some predicate..) j = 0;
       // ...more conditional code that might or might
       //    not set j...

       int i;
       if (..true only if j has been set by previous stmts..)
         i = j; // Java complains about this because it
                //   doesn't think j is initialized
       else i = 10;
       // ...
     }
This method is illegal in Java because j is not initialized on all paths leading to the point at which j is used. However, conditional initialization as in the above example is inevitable. Java's strict check forces programmers to work around Java's strict checking by initializing variables like j with dummy values. An example of such a work around can be found in the MethodExpression class of the Java compiler:
     public long checkValue(Environment env, Context ctx, long vset) {
       ClassDeclaration c = null; // This is a dummy initialization

       ...34-lines of complicated, nested-if statements in which
          c is initialized...

       ...15 more lines...

       clazz = c.getClassDefinition(env); // Assumes c not null!!
       ...
     }
The use of c near the end of the method body assumes that it is initialized by the nested if statements. However, the nested if statements are sufficiently complicated that it is not trivial to determine if they in fact initialize c. And even when such code is correct, it is fragile: a maintenance programmer making a change to the if statements can easily create a situation in which c is not initialized.

The dummy-value work around is particularly dangerous when variables like j or c are stored into data structures. If the conditional initialization code is buggy, the dummy value might be stored into the data structure, corrupting the structure's integrity. Finding such bugs is time consuming because the corruption often does not manifest itself until program execution has continued a long distance past the buggy code.

A compile-time error is appropriate if a variable is not initialized on any path before it is used. However, only a warning is appropriate if a variable is not initialized on all paths. A compile-time error in the all-paths case forces programmers into the dummy-value work around, which has the effect of removing all checking for uninitialized variables just when it is needed most: when complicated conditionals are involved. Instead of issuing an error, the compiler should generate code to check at run time for uses of variables that are not initialized on all paths.

2. Add Maybe types

In Java, the null reference can be used anywhere an object is expected. This requires run-time checks to ensure that null is not dereferenced. We recommend a change to the type-system in which programmers explicitly indicate when null is allowed. This change moves most null checks from run-time to compile-time, improving both performance and reliability. This change also makes programs easier to understand. This change can be made in an backward compatible manner.

In CLU-derived languages such as Argus ([Liskov88]) and Theta ([Liskov95]), programmers must indicate explicitly when null can be assigned to a variable or stored into a data structure. This is done by introducing the type Maybe[T]. The null reference can be assigned to Maybe[T], but not to T. Uses of Maybe[T] require run-time checks for null, but uses of T do not. Experience in CLU-derived languages indicates that T is used most of the time, improving performance and increasing reliability. Research work on moving null checking in C and Modula-3 to compile-time support the CLU experience ([Evans96][Detlefs96]).

In addition to removing run-time checks, Maybe types make programs easier to understand. In the Java implementation, the method inline of the class ArrayExpression returns an Expression:

     Expression inline(...) { ... }
Can this method return null? The name of the method suggests no: when an expression is inlined, one might not expect to get null back. But a closer inspection of the code indicates that it can in fact return null. If Java had Maybe types, questions such as this could be answered by simply looking at signatures rather than by looking at code.

We recommend the type expression T ?x for Maybe types in Java. This syntax fits Java's C genealogy: the type qualifier ? would be a type-constructor like * is in C. Under this proposal, null could be assigned to T ?, but not to T. If backward compatibility is desired, the syntax T !x could be used to indicate that null can not be assigned, and the meaning of T x could remain as-is (i.e., null can be assigned). The T ? syntax has the advantage of using the simpler syntax for the common case and also of having the safer default. It is not backward compatible, but a mechanical translator could translate old programs.

3. Add arithmetic exceptions

The current Java specification has wrap-around semantics for integer-arithmetic overflows. For example, adding one to the maximum-sized integer results in the minimum-sized integer. We recommend adding a run-time check for overflows and other arithmetic exceptions, {\eg}, adding one to the maximum-sized integer should raise an exception. This change will catch hard-to-find errors in programs. Although this change is not backward compatible, the incompatibility should not affect most programs.

Most Java programs will not rely on Java's wrap-around semantics. Instead, programs will assume that overflow does not occur. This is dangerous. If buggy code causes an overflow, the error will not be detected; instead, corrupt data will be passed further and further away from the source of the error. As mentioned earlier, this kind of error is hard to debug.

The argument against arithmetic exceptions is that they impose a performance penalty. However, on modern architectures with support for overflow detection, it is not clear that this penalty is significant. Further, the reliability/performance trade-off is a trade-off that should be decided by the end-user, not the language. If the user of a Java applet wants to run fast-and-lose, she can turn off overflow checking (and other run-time checking such as array-bounds checking). A more cautious user might decide to take the penalty and run more safely.

Adding arithmetic exceptions to Java is not backward compatible. However, we feel most programs just assume overflows do not occur and do not rely on wrap-around semantics. This means the change will not be burdensome on the installed base of programs.

4. Add parameterized types

Currently, Java has no facility for defining parameterized types. We recommend adding such a facility. Such a facility will promote a more careful typing of programs by programmers, eliminating many run-time type checks and allowing more errors to be caught at compile-time.

Hashtable, Java's hashtable class, illustrates the benefits of parameterized types. Hashtable maps keys to values. In Hashtable, both keys and values have type Object, which effectively means they are untyped, which in turn means that dynamic type-checks are needed. For example, in the Identifier class of the Java compiler, a Hashtable is used to map Strings to Identifiers. Because Hashtable is untyped, the code for lookup must downcast elements of this table to Identifiers:

     public static synchronized Identifier lookup(String s) \{
        Identifier id = (Identifier)hash.get(s); // Note downcast
        ...
     }
With downcasts of this sort require run-time checks, which imply both the possibility of an exception being raised and run-time overhead.

When using Hashtables, the types of both the keys and the values are usually known at compile-time. Parameterized types would allow programmers to express this information so it can be checked at compile-time. This would eliminate the need for downcasts and the associated run-time checks. This analysis applies to Vector and Enumeration, two other commonly-used Java classes that require extensive run-time type checking. For example, the Enumeration class is used over fifty times in the Java source code, almost all of which involve a run-time type check that could easily be moved to compile-time.

It is beyond the scope of this document to propose a Java extension for parameterized types. [Day95] contains a valuable discussion about the design of parameterization mechanisms and should be consulted. It is available via ftp .

5. Eliminate array subtyping

Java has a special subtyping rule for array types. If type A is a supertype of B, then A[] is a supertype of B[]. This rule requires a run-time type check every time an element is stored into an array. We propose eliminating this rule, replacing it instead with new parameterized types that eliminate the run-time checks. While this change is not backward compatible, it makes Java both faster and more robust.

Java's array subtyping rule is unsound due to the the infamous ``contravariant'' rule for subtyping ([Cardelli88]). Thus, run-time checking is required. To see where, consider the following example:

    
     class A { A(){}; };

     class B extends A {
       B() {};

       public static void main(String args[]) {
         B[] x = new B[1];
         A[] y = x;
         y[0] = new A(); // bug!!
       }
     };
On the line marked ``bug,'' an instance of A is stored into an array of B. This is an error because A is not a subtype of B. The Alpha-3 Java implementation raises a run-time exception on this program:
     java.lang.IncompatibleTypeException
            at B.main(Test.java:9)
In general, when storing an element into an array, a run-time check is needed to ensure that a supertype object isn't being stored into a subtype array. (The Java spec says nothing about this problem in its discussion of arrays or of IncompatibleTypeException.)

The subtype rule for arrays-and its associated run-time check-can be eliminated if parameterized types were available. With parameterized types, programmers can type programs more carefully, allowing the compiler to check for errors. For example, we can add a parameterized Sequence type. A Sequence of B is a read-only supertype of B[], i.e., elements can be read from a Sequence but not write into them. A Sequence of B can safely be passed where a Sequence of A is expected, taking care of many of the places in which the original array subtype rule would be used. Although eliminating the array subtype rule would not be backward compatible, we feel the resulting improvement in robustness and performance justify the change.

References

Cardelli88
 L. Cardelli. A semantics of multiple inheritance. Inf. and Computation, 76(1/2):138-64. Academic Press, Feb./Mar. 1988.
Day95
 M. Day, R. Gruber, B. Liskov, and A. Myers. Subtypes vs. where clauses: constraining parametric polymorphism. OOPSLA '95 Conf. Proceedings (Austin, TX). Published as ACM SIGPLAN Notices, 30(10):156-68. ACM, Oct. 1995. ftp://ftp.pmg.lcs.mit.edu/pub/thor/where-clauses.ps.gz.
Detlefs96
 D. L. Detlefs. An overview of the extended static checking system. ACM, Jan. 1996. To appear.
Evans96
 D. Evans. Static detection of dynamic memory errors. Submitted for publication.
Java95
 Sun Microsystems. The Java language: a white paper. Sun Microsystems, Inc., 1995. https://java.sun.com/1.0alpha3/doc/overview/java/.
Liskov88
 B. Liskov. Distributed programming in Argus. Communications of the ACM, 31(3):300-12. ACM, Mar. 1988.
Liskov95
 B. Liskov, D. Curtis, M. Day, S. Ghemawat, R. Gruber, P. Johnson, and A. Myers. Theta reference manual. MIT LCS, PMG memo 88, Feb. 1995. https://www.pmg.lcs.mit.edu/Theta.html.

Legal Statement Privacy Statement