ESCJ 0: ESC Lite.

Rustan Leino.

[From posting to src.sparta dated 24 Oct 96.]

Here's a half-baked research proposal.

There are different static program-analysis tools for finding errors
in programs. For example, there's lint and LCLint. A course way to
categorize the techniques used by these tools is that their flow analysis
does not take IF and DO guards into consideration.

Way at the other side of the spectrum are program verification tools.
These attempt to prove the full functional correctness of programs
at the price of requiring programmers to submit lengthy specifications.
Tools like this are powered by mechanical theorem provers.

Somewhere in between lies ESC, which checks for certain kinds of errors
only. The technique used by ESC is, however, similar to that of
program verifiers: the semantics of the program is taken into account,
and the tool is powered by a mechanical theorem prover. We believe
that, as a more useful static analysis tool (that is, a tool that
is more effective in finding common errors quickly), ESC is in the right
direction from program verifiers. We also believe that ESC can provide
useful checking that lint-like tools cannot.

To be more useful than a program verifier, ESC has given up on
soundness in various areas. That is, a program can contain errors
that ESC does not catch. We have been rather concerned, however,
with soundness in what regards data abstraction and modifies lists.
Unfortunately, those features are rather expensive, both in the time
it takes programmers to write them and in the time it takes the prover
to churn through them. For example, we don't know how to write meaningful
specifications about the parent/child data structures used in VBT's,
and our theorem prover goes off the deep end for many cyclic dependencies.
One way to try to make ESC more useful is to throw away soundness more
often. And what more a lucrative place to do so than the time-consuming,
hard-to-specify data abstraction that we provide.

ESC Lite is a proposal that falls on the lint side of ESC in the
spectrum of tools described above. The hope is to produce something
that is more useful than ESC, in that a programmer can spend less
effort than he would with ESC, both in writing specifications and
in waiting for the tool to find errors, and the tool would still find
common errors in programs. The essential idea is:

Use program verification techniques like ESC does, but throw out
data abstraction, and with it, parts of soundness.

[ Caution: The mail gets more advanced from this point on. ]
To get to ESC Lite from ESC, follow these steps, outlined for a Java-like
language:

0. Throw out abstract variables, REP declarations, and dependencies.

1. Use object invariants and write modifiers a la KRML 74. Object
    invariants must satisfy the Relaxed Visibility Requirement for
    Object Invariants (see KRML 74) (but we may later want to find
    some exceptions to this rule, see the end of this mail).

2. Methods are specified like they are in today's ESC, that is,
    with REQUIRES, MODIFIES, and ENSURES clauses (but without data
    abstraction). A difference is that the precondition for a method
    is allowed to mention only those fields that a caller can access
    (thanks, Raymie!). For example, the precondition of a "public"
    method can mention only "public" fields, and a "private" method can
    mention any fields that the class declares.

    The checking of a method implementation with respect to its specification
    is done like it is today (but I will explain the semantics of method
    calls below).

3. Method overrides are allowed to extend the original method specification,
by providing extra MODIFIES and ENSURES clauses. Suppose classes
"S" and "T" are declared as follows:

        class S {
          method m(self)
            REQUIRES P
            MODIFIES w
            ENSURES Q
            {
              /* code goes here */
            }
          method n(self)                 // ignore "n" for now; it will
            REQUIES P                    // be used in a later example
            MODIFIES w
            ENSURES Q
            {
                self.m();
            }
        };

        class T extends S {
          override m(self)
            MODIFIES v
            ENSURES R
            {
              /* code goes here */
            }
          .
          .
        };

With these declarations in scope, the implementation of the override
"T::m" is checked to meet the specification

        REQUIRES P
        MODIFIES w, v
        ENSURES Q /\ R

    There is a restriction that "v" can mention only fields that
    are declared in class "T" (in particular, "v" must satisfy the
    Relaxed Visibility Requirement, or whatever rule that we impose
    on object invariants). Although I haven't told you yet how method
    calls are handled, you can imagine that allowing MODIFIES clauses
    to be extended is a source of unsoundness (and your imagination
    would then be correct). The restriction on "v" is intended to help
    soundness a little.

    There is no problem with soundness in extending the ENSURES clause.
    Because the overridden method can have a larger MODIFIES clause,
    the ability to extend the ENSURES clause seems to be a handy
    feature (which answers the cynical question ``If the feature is
    sound, why provide it?'').

    Notice that I don't allow the override to strengthen the REQUIRES
    clause. That would be unsound, and my feeling is that such
    unsoundness would cause the checker to miss some important errors
    in programs. (The override can, of course, assume that the object
    invariant given in the subtype holds on entry to its overridden
    methods.)

    Subclasses of "T" that provide further overrides of "m" are allowed
    to extend the specification of "m" further. The specification
    used to check the new override at a type "Z" is the one that results
    from combining the original method specification with all the
    extensions given between the original class and class "Z".

* Method calls are handled differently from the way ESC handles
them, in two ways.

    The first difference between ESC's and ESC Lite's handling of
    method calls is that ESC Lite uses Java's access control modifiers
    to project MODIFIES and ENSURES clauses to the variables that can
    be accessed by callers. For example, consider the following class:

        class U {
          private x: int;
          public y: int;
          public m(self)
            REQUIRES Q
            MODIFIES self.x, self.y
            ENSURES R
          .
          .
        };

    where "R" is allowed to mention any variables (that the class
    can access), and also the primed variables "x'" and "y'". In ESC,
    we would translate a call to "m" into the guarded command

ASSERT Q ;
VAR x', y' IN R --> x := x'; y := y' END

    In ESC Lite, the translation for a ``public'' caller (that is,
    a caller that can access only the "public" declarations of "U")
    would be:

ASSERT Q ;
VAR x', y' IN R --> y := y' END

    The difference is that, since "x" is a variable that cannot be
    accessed by the caller, "x" is not updated, and so will not appear
    as a target of the caller. The idea behind this approach is that
    the public caller doesn't stand a chance to itself have a MODIFIES
    clause that permits those modification of private fields that the
    callee will do. Why? Because these things require data abstraction.

    Similar translations would be done for other access control levels,
    not just for "private" fields and "public" methods as I showed in
    the example.

This approach is not sound, of course.

        Advanced remark:
        If it would ever be desirable to ignore updates of a field
        only at "self", this can be done by adding to the translation:

x := store(x', self, x[self])

(End of Advanced remark.)

    The second difference between ESC's and ESC Lite's handling of
    method calls is that, to use the example from above, the semantics
    of a method call "o.m()" depends on the \static/ type of "o".
    If the static type of "o" is "S", then the specification for
    "S::m" is used; if the static type of "o" is "T", then the
    specification for "T::m" (that is, the specification for "S::m" plus
    the extension provided in class "T") is used, and so on.

    Using the static type of "o" to determine which specification
    to use is not sound: if "o" has static type "T", then
    "NARROW(o, S).m()" (that is, "WIDEN(o, S).m()") appears to the
    checker to modify "v", whereas "o.m()" does not, and yet both of
    these will end up invoking the same implementation of "m" at
    runtime, and that implementation is allowed to modify "v".
    However, (0) allowing MODIFIES clauses to be extended is not
    sound to begin with, (1) I don't think this will hide many
    actual programming errors from the checker, and (2) using the
    static type of "o" instead of its dynamic type (which would then
    translate the MODIFIES clause roughly as ``"v" might be modified
    only if the dynamic type of "o" is a subtype of "T"'') has the
    advantage that it does not complain about the implementation of
    "S::n" (see above): for "S::n", the dynamic type of "self" is
    not known; it could be "T", and if it is, then the checker knows
    that "self.m()" might change "v", which is not allowed by "S::n",
    so the checker would complain.

I think implementations like "S::n" are common.

        Advanced remark:
        It is less common, however, that class "T" would be in scope at
        the time "S::n" is verified. If "T" is not in scope, the
        verification of "S::n" would be unsound even if the dynamic type
        of "self" were used (because no information is available
        that suggests to the checker that "T::m" modifies more variables
        than are declared by the MODIFIES clause of "S::m". If one is
        willing to live with that unsoundness (in lieu of the unsoundness
        of using the static type of "self", as proposed), then there is
        a possibility of getting rid of the gripe about "S::n" in the
        cases where "T" is visible by paying attention to whether or not
        "T" overrides "n". If "T" does override "n", then "S::n" is
        allowed to modify those things allowed by "T::n"'s extension of
        the MODIFIES list in the cases where the dynamic type of "self"
        is a subtype of "T". Any error flagged by the checker at this
        point is likely to be a real error. That's good, but remember
        that "T" is usually not in scope when "S::n" is verified.
        (End of Advanced remark.)

That's the proposal. In addition to discussing the things I said,
there is room to discuss a way to handle \inlined objects/. For example,
consider the following class:

    class V {
      int x, y, dx, dy;
      int xCenter, yCenter;
      INVARIANT x <= xCenter <= x + dx
             /\ y <= yCenter <= y + dy
      .
      .
    };

This declaration is fine, but suppose the first 4 fields were replaced
by an object:

    class V' {
      Rect r;
      int xCenter, yCenter;
      INVARIANT r # null
             /\ r.x <= xCenter <= r.x + r.dx
             /\ r.y <= yCenter <= r.y + r.dy
      .
      .
    };

This object invariant is not allowed because it violates the (Relaxed)
Visibility Requirement for Object Invariants. One way to get around
this would be to declare that "r.x", "r.y", "r.dx", and "r.dy" are
\inlined fields/ (or simply that "r" is an \inlined object/) that are
``part of the state of each "V'" object''. This would (0) allow these
fields to be mentioned in the object invariant, and (1) would make "r"
a pivot field (but we may choose not to have the checker do anything
about that).

Rustan

Legal Statement Privacy Statement