ESCJ 10: Locking in Java.

Raymie Stata.

[From post to src.sparta dated 16 Oct 96.]

The following is a discussion of the "thread re-entrant" property of
Java's locking primitives and its implications for program design and
ESC/Java. It is a joint effort of Rustan Leino and myself and
captures conversations among myself, Rustan, Roy Levin, Paul McJones,
and Greg Nelson.

** Java's synchronized statement

In Java, a thread can acquire the same lock multiple times (it has to
release it the same number of times before another thread can acquire
it). These are known as "thread re-entrant monitors," as against the
"non-re-entrant" locks found in Modula-3 where a thread will block
forever if it attempts to acquire a lock it already holds.

In Java, locks are usually described as having a counter indicating
how many times the current thread has acquired it. Rustan proposed a
translation for Java's synchronized statement into guarded commands
that does not require the introduction of explicit counters:

synchronized (m) { S } ==>
    var LL_0 in
      LL_0 := LL
      LL := LL \union { m }
      try S finally LL := LL_0 end
      end

This translation depends on the fact that Java does not have routines
analogous to Thread.Acquire and .Release, but only has a synchronized
statement that matches an acquire with a release.

** Preconditions on "Thread.Acquire" that avoid deadlocks

In ESC/Modula-3, to keep threads from deadlocking with themselves, we
would specify Thread.Acquire(m) as having the precondition that "m
\not\in LL" (where LL is the set of locks held by the current thread).
In Java, a thread cannot deadlock with itself, so no precondition is
needed to avoid this problem.

ESC/Modula-3 goes further in trying to prevent a thread from
deadlocking with other threads. The standard approach to deadlock
avoidance is to define a partial order on locks and require that
threads acquire locks according to this order. This leads to a
stronger precondition for Thread.Acquire(m): "Sup(LL) < m" (ie, "the
'largest' lock held by the current thread ('Sup(LL)') is smaller than
m"). In ESC/Java, one can liberalize this. The more obvious
liberalization would be: "Sup(LL) <= m," but an even more liberal
precondition works: "Sup(LL) < m \/ m \in LL".

Adding this precondition to the translation of synchronized proposed
above:

synchronized (m) { S } ==>
    var LL_0 in
      assert Sup(LL) < m \/ m \in LL   //// Add some checking...
      LL_0 := LL
      LL := LL \union { m }
      try S finally LL := LL_0 end
      end

** Thread re-entrant locks and preconditions on routines

To follow the deadlock-avoidance methodology outlined above, routines
that acquire locks usually have preconditions that help ensure that the
routines will acquire locks in the correct order.

With Modula-3-like, non-re-entrant locks, such preconditions follow
the pattern "Sup(LL) < MU", where MU is the lowest lock in the locking
order that the routine might want to acquire. With Java's thread
re-entrant locks, there are three patterns that such preconditions may
take:

   1. Sup(LL) < MU \/ MU \in LL
   2. Sup(LL) <= MU
   3. Sup(LL) < MU (Note: this is the Modula-3 pattern)

We consider the pro's and con's of each pattern in turn.

Preconditions patterned after (1) allow routines routines to be
called under the widest set of circumstances. While the other two
patterns rule out callers who hold a lock greater than MU in the
locking order, (1) does not.

The trade-off, however, is that routines with preconditions patterned
after (1) can acquire only MU and no other locks. To see why, imagine
that the routine tried to acquire a lock MU2 in addition to MU. Before
MU is acquired, all we know about the locking set is "Sup(LL) < MU \/
MU \in LL." This condition does not imply "Sup(LL) < MU2 \/ MU2 \in
LL", even for "MU < MU2", so MU2 can not be safely acquired before MU.
After MU is acquired, all we know is that "MU \in LL /\ (MU \in LL_0
\/ Sup(LL) = MU)". Again, this does not imply that "Sup(LL) < MU2 \/
MU2 \in LL", even for "MU < MU2" (because of the disjunct "MU \in
LL_0"), so MU2 can not be safely acquired after MU either.

This implies that when routines are specified with preconditions
patterned after (1), the preconditions must mention every lock the
routine may want to acquire. This is often too much information,
revealing locks that should be hidden from clients. Further, it makes
it impossible to change the locks used by a routine's implementation
without changing its specification (potentially breaking clients of
the routine).

Preconditions patterned after (2) do not have this limitation: such
preconditions allow a routine to lock any lock after MU in the locking
order and thus do not have to mention every lock that might be
acquired by the routine. However, this added flexibility for the
implementation comes at the expense of flexibility for clients by
restricting calls to program states in which nothing greater than MU
has been previously acquired.

Preconditions patterned after (2) have another problem, a problem
shared by preconditions patterned after (1): they make it impossible
to assume monitor invariants inside the bodies of routines. A monitor
invariant is a relation among the variables protected by the monitor.
Threads establish a lock's monitor invariant before they leave the
monitor; this means that a thread entering the monitor may assume the
monitor invariant. However, with thread re-entrant locks, a thread
can only assume a monitor invariant when it enters the monitor for the
"first time" (ie, whenever the thread increments the lock's count from
zero to one). When the thread re-enters the monitor, there is no
guarantee that the monitor's invariant has been re-established, so it
cannot be assumed. Connecting this back to preconditions,
preconditions patterned after (1) and (2) allow "MU \in LL" on entry
to a routine; this means that when the routine acquires MU, it might
not be doing so for the first time, so it cannot assume the monitor
invariant.

Preconditions patterned after (3) do not have the limitations of
either (1) or (2): they allow routines to acquire locks in addition to
MU, and they allow routines to assume the monitor invariant associated
with MU with MU is acquired. However, preconditions patterned after
(3) are the most restrictive in terms of the program states in which
they allows routines to be called: they restrict calls to program
states in which nothing greater than MU nor MU itself have been
locked. When preconditions patterned after (3) are used, the
re-entrant property of monitors is not used; this explains why, as
noted above, pattern (3) is the pattern used in Modula-3.

(If programs re-established monitor invariants at every procedure-call
in addition to every monitor exit, then the monitor invariant could be
safely assumed _every_ time a thread entered. This would solve the
monitor-invariant problem associated with (1) and (2). However, it
seems impractical to require that the code inside a monitor
re-establish the monitor invariant before calling, eg, a library's
square-root routine.)

** Thread re-entrant locks and program design

Roy Levin, Paul McJones, and Greg Nelson all agree that (in Roy's
words) "thread re-entrant monitors are evil." They say this because
of the monitor invariant issue: when writing large systems with thread
re-entrant locks, it is easy to lose track of where one can and cannot
assume the monitor invariant, leading to errors that are difficult to
find.

Greg points out that thread re-entrant locks seem like a convenience
at first. For example:

o With non-re-entrant locks, it is often necessary to write two
    versions of a routine, one that grabs the lock first and another
    that assumes the lock has already been grabbed. With re-entrant
    locks, only one version is needed.

o With non-re-entrant locks, one is often faced with a routine R
    that calls another routine R' that eventually (through a long
    series of calls) wants to acquire a lock already acquired by the
    original routine R. The fix for these situations often requires
    some awkward coding. With thread re-entrant locks, R' is
    allowed to reacquire the lock so no awkward fix is required.

These and other situations lead people to advocate re-entrant locks.
However, although re-entrant locks can be a convenience, they turn out
to cause more trouble than they are worth. Greg explains that when
locks are not re-entrant, errors lead to deadlocks, which are
relatively easy to find. When locks are re-entrant, errors lead to
corrupted state that might not manifest itself visibly until program
execution has wandered far from the actual error, making errors
difficult to find. Greg points to Ivy as a system in which thread
re-entrant monitors led to many difficult-to-find, corrupted-state
bugs.

The difficulties associated with thread re-entrant locks suggests that
Java programs should avoid using the re-entrant property of locks.
This means that preconditions patterned after (3) should be preferred.
It also suggests that perhaps re-entrant uses of locks should be ruled
out in ESC/Java by rewriting synchronized statements as follows:

synchronized (m) { S } ==>
    var LL_0 in
      assert Sup(LL) < m   //// Stronger check
      LL_0 := LL
      LL := LL \union { m }
      try S finally LL := LL_0 end
      end

Or, as a compromise, the above rewrite could be used as a default and
an annotation can be used to indicate where the more liberal rewrite
should be used:

synchronized /* reentrant */ (m) { S } ==>
    var LL_0 in
      assert Sup(LL) < m \/ m \in LL   //// More liberal check
      LL_0 := LL
      LL := LL \union { m }
      try S finally LL := LL_0 end
      end

Legal Statement Privacy Statement