[From posting to src.sparta dated 24 Oct 96.]
Here's a half-baked research proposal.
There are different static program-analysis tools for finding errors
in programs. For example, there's lint and LCLint. A course
way to
categorize the techniques used by these tools is that their flow analysis
does not take IF and DO guards into consideration.
Way at the other side of the spectrum are program verification tools.
These attempt to prove the full functional correctness of programs
at the price of requiring programmers to submit lengthy specifications.
Tools like this are powered by mechanical theorem provers.
Somewhere in between lies ESC, which checks for certain kinds of errors
only. The technique used by ESC is, however, similar to that
of
program verifiers: the semantics of the program is taken into
account,
and the tool is powered by a mechanical theorem prover. We believe
that, as a more useful static analysis tool (that is, a tool that
is more effective in finding common errors quickly), ESC is in the
right
direction from program verifiers. We also believe that ESC can
provide
useful checking that lint-like tools cannot.
To be more useful than a program verifier, ESC has given up on
soundness in various areas. That is, a program can contain errors
that ESC does not catch. We have been rather concerned, however,
with soundness in what regards data abstraction and modifies lists.
Unfortunately, those features are rather expensive, both in the time
it takes programmers to write them and in the time it takes the prover
to churn through them. For example, we don't know how to write
meaningful
specifications about the parent/child data structures used in VBT's,
and our theorem prover goes off the deep end for many cyclic dependencies.
One way to try to make ESC more useful is to throw away soundness more
often. And what more a lucrative place to do so than the time-consuming,
hard-to-specify data abstraction that we provide.
ESC Lite is a proposal that falls on the lint side of ESC in the
spectrum of tools described above. The hope is to produce something
that is more useful than ESC, in that a programmer can spend less
effort than he would with ESC, both in writing specifications and
in waiting for the tool to find errors, and the tool would still find
common errors in programs. The essential idea is:
Use program verification techniques like ESC does,
but throw out
data abstraction, and with it, parts of soundness.
[ Caution: The mail gets more advanced from this point on. ]
To get to ESC Lite from ESC, follow these steps, outlined for a Java-like
language:
0. Throw out abstract variables, REP declarations, and dependencies.
1. Use object invariants and write modifiers a la KRML 74.
Object
invariants must satisfy the Relaxed Visibility Requirement
for
Object Invariants (see KRML 74) (but we may later
want to find
some exceptions to this rule, see the end of this
mail).
2. Methods are specified like they are in today's ESC, that is,
with REQUIRES, MODIFIES, and ENSURES clauses (but
without data
abstraction). A difference is that the precondition
for a method
is allowed to mention only those fields that a caller
can access
(thanks, Raymie!). For example, the precondition
of a "public"
method can mention only "public" fields, and a "private"
method can
mention any fields that the class declares.
The checking of a method implementation with respect
to its specification
is done like it is today (but I will explain the
semantics of method
calls below).
3. Method overrides are allowed to extend the original method
specification,
by providing extra MODIFIES and ENSURES clauses.
Suppose classes
"S" and "T" are declared as follows:
class S {
method m(self)
REQUIRES P
MODIFIES w
ENSURES Q
{
/* code goes here */
}
method n(self)
// ignore "n" for now; it will
REQUIES P
// be used in a later example
MODIFIES w
ENSURES Q
{
self.m();
}
};
class T extends S {
override m(self)
MODIFIES v
ENSURES R
{
/* code goes here */
}
.
.
};
With these declarations in scope, the implementation
of the override
"T::m" is checked to meet the specification
REQUIRES P
MODIFIES w, v
ENSURES Q /\ R
There is a restriction that "v" can mention only
fields that
are declared in class "T" (in particular, "v" must
satisfy the
Relaxed Visibility Requirement, or whatever rule
that we impose
on object invariants). Although I haven't
told you yet how method
calls are handled, you can imagine that allowing
MODIFIES clauses
to be extended is a source of unsoundness (and your
imagination
would then be correct). The restriction on
"v" is intended to help
soundness a little.
There is no problem with soundness in extending the
ENSURES clause.
Because the overridden method can have a larger
MODIFIES clause,
the ability to extend the ENSURES clause seems to
be a handy
feature (which answers the cynical question ``If
the feature is
sound, why provide it?'').
Notice that I don't allow the override to strengthen
the REQUIRES
clause. That would be unsound, and my feeling
is that such
unsoundness would cause the checker to miss some
important errors
in programs. (The override can, of course,
assume that the object
invariant given in the subtype holds on entry to
its overridden
methods.)
Subclasses of "T" that provide further overrides
of "m" are allowed
to extend the specification of "m" further.
The specification
used to check the new override at a type "Z" is
the one that results
from combining the original method specification
with all the
extensions given between the original class and
class "Z".
* Method calls are handled differently from the way ESC handles
them, in two ways.
The first difference between ESC's and ESC Lite's
handling of
method calls is that ESC Lite uses Java's access
control modifiers
to project MODIFIES and ENSURES clauses to the variables
that can
be accessed by callers. For example, consider
the following class:
class U {
private x: int;
public y: int;
public m(self)
REQUIRES Q
MODIFIES self.x, self.y
ENSURES R
.
.
};
where "R" is allowed to mention any variables (that
the class
can access), and also the primed variables "x'"
and "y'". In ESC,
we would translate a call to "m" into the guarded
command
ASSERT Q ;
VAR x', y' IN R --> x :=
x'; y := y' END
In ESC Lite, the translation for a ``public'' caller
(that is,
a caller that can access only the "public" declarations
of "U")
would be:
ASSERT Q ;
VAR x', y' IN R --> y :=
y' END
The difference is that, since "x" is a variable that
cannot be
accessed by the caller, "x" is not updated, and
so will not appear
as a target of the caller. The idea behind
this approach is that
the public caller doesn't stand a chance to itself
have a MODIFIES
clause that permits those modification of private
fields that the
callee will do. Why? Because these things
require data abstraction.
Similar translations would be done for other access
control levels,
not just for "private" fields and "public" methods
as I showed in
the example.
This approach is not sound, of course.
Advanced remark:
If it would ever be desirable
to ignore updates of a field
only at "self", this can
be done by adding to the translation:
x := store(x', self, x[self])
(End of Advanced remark.)
The second difference between ESC's and ESC Lite's
handling of
method calls is that, to use the example from above,
the semantics
of a method call "o.m()" depends on the \static/
type of "o".
If the static type of "o" is "S", then the specification
for
"S::m" is used; if the static type of "o" is "T",
then the
specification for "T::m" (that is, the specification
for "S::m" plus
the extension provided in class "T") is used, and
so on.
Using the static type of "o" to determine which specification
to use is not sound: if "o" has static type
"T", then
"NARROW(o, S).m()" (that is, "WIDEN(o, S).m()")
appears to the
checker to modify "v", whereas "o.m()" does not,
and yet both of
these will end up invoking the same implementation
of "m" at
runtime, and that implementation is allowed to modify
"v".
However, (0) allowing MODIFIES clauses to be extended
is not
sound to begin with, (1) I don't think this will
hide many
actual programming errors from the checker, and
(2) using the
static type of "o" instead of its dynamic type (which
would then
translate the MODIFIES clause roughly as ``"v" might
be modified
only if the dynamic type of "o" is a subtype of
"T"'') has the
advantage that it does not complain about the implementation
of
"S::n" (see above): for "S::n", the dynamic
type of "self" is
not known; it could be "T", and if it is, then the
checker knows
that "self.m()" might change "v", which is not allowed
by "S::n",
so the checker would complain.
I think implementations like "S::n" are common.
Advanced remark:
It is less common, however,
that class "T" would be in scope at
the time "S::n" is verified.
If "T" is not in scope, the
verification of "S::n" would
be unsound even if the dynamic type
of "self" were used (because
no information is available
that suggests to the checker
that "T::m" modifies more variables
than are declared by the
MODIFIES clause of "S::m". If one is
willing to live with that
unsoundness (in lieu of the unsoundness
of using the static type
of "self", as proposed), then there is
a possibility of getting
rid of the gripe about "S::n" in the
cases where "T" is visible
by paying attention to whether or not
"T" overrides "n".
If "T" does override "n", then "S::n" is
allowed to modify those
things allowed by "T::n"'s extension of
the MODIFIES list in the
cases where the dynamic type of "self"
is a subtype of "T".
Any error flagged by the checker at this
point is likely to be a
real error. That's good, but remember
that "T" is usually not
in scope when "S::n" is verified.
(End of Advanced remark.)
That's the proposal. In addition to discussing the things I said,
there is room to discuss a way to handle \inlined objects/. For
example,
consider the following class:
class V {
int x, y, dx, dy;
int xCenter, yCenter;
INVARIANT x <= xCenter <= x +
dx
/\ y <= yCenter <= y + dy
.
.
};
This declaration is fine, but suppose the first 4 fields were replaced
by an object:
class V' {
Rect r;
int xCenter, yCenter;
INVARIANT r # null
/\ r.x <= xCenter <= r.x + r.dx
/\ r.y <= yCenter <= r.y + r.dy
.
.
};
This object invariant is not allowed because it violates the (Relaxed)
Visibility Requirement for Object Invariants. One way to get
around
this would be to declare that "r.x", "r.y", "r.dx", and "r.dy" are
\inlined fields/ (or simply that "r" is an \inlined object/) that are
``part of the state of each "V'" object''. This would (0) allow
these
fields to be mentioned in the object invariant, and (1) would make
"r"
a pivot field (but we may choose not to have the checker do anything
about that).
Rustan