Cofoja contracts, checkpoints and recursion

(Nhat Minh Lê (rz0) @ 2011-03-20 10:51:47)

Since the release of Cofoja, there have been several changes, some important, some less so. One of the major things I’ve been working on recently (and which is, as I write, under review and should make it to the main branch shortly) is a new implementation of contract checkpoints: that is, where contracts are checked at run time in the final code.

In this short blog post, I will go through the various options we have considered, what we did before, what we will do now, and how it is implemented.

A matter of semantics

On the surface, contract semantics are pretty straightforward, even more so in Cofoja as we have tried to keep it simple by sticking to the most fundamental constructs:

While preconditions and postconditions are mostly just that, this description of invariants hides a small but important issue: they act as both pre-and-postconditions to what? Since invariants are characteristic of a class and not a method, we can legitimately ask: should invariants apply to all methods within the given class? And the answer is, as with most matters of fine semantics, that there is no definite answer, it depends on what we want. The general idea is that we want invariants to hold at those points in time where the object is in a "stable" state, one where it is expected to be behave consistently with respect to its specification. So we want to preclude any internal manipulation that goes on inside our object from checking invariants: they most probably wouldn’t hold there. The real question is thus what constitutes an internal operation. I am pretty sure the question has been discussed to death in contract programming circles; since I am not really a contract programming specialist, here I will simply give a not-so-brief overview of the different options that have been considered for Cofoja, with some Java examples to illustrate what problems they tried to address, and what we settled with in the end.

Perhaps the most important distinction that can be made between strategies lies in the decision to treat all calls to a method in the same way or apply invariant checks to some and not others. It should be noted that while it is tempting to associate whole-method semantics to a compile-time choice and per-call discrimination to a run-time behavior, some call-site-oriented approaches do not require information beyond what would be (easily) accessible to a compiler (such an example is given below).

A simple static strategy

Originally, Cofoja, similarly to its predecessor Modern Jass, used a simple rule: invariants would apply to all public or package-private methods. I know, True Java Hackers will probably hate us because of the inconsistency of that rule: package-private methods have a lesser scope than protected methods, so why check contracts on these and not protected ones? It doesn’t make sense!

The reason we had such a rule was because, we’ve noticed that, in practice, classes with methods internal to a package have their own intra-package specifications that we want to enforce with regard to other classes in the package. Conversely, protected methods, even though strictly broader in scope than their package-private counterparts, are most often used to provide utility routines to implementors who may want to write child classes; in such cases, these protected methods are likely to be called from deep within child classes, when the object is possibly in an inconsistent state. Hence the weird rule. But as awkward as it is, it did work pretty well. There were a few shortcomings, though, that we set out to fix.

Most issues with this strategy revolved around the idea that you may have public methods that may also be called from within child classes. Of course, the workaround is to make a protected method and call that from your public one, but it wasn’t totally satisfactory, especially in the case of constructors. It’s often the case that the default constructor is public; this means that child constructors can pull the invariant checking by calling the default parent constructor implicitly!

A static per-call approach

A possibly better strategy could be to consider all calls originating from a class that shares an ancestry link with the target to be internal, and all others external. This characterization has the advantage that it can be determined locally and without running the program. Its main limitation is that it requires call sites rather than methods to be altered in order to insert contract checks.

Instrumenting call sites is perfectly doable but suffers from a number of unwanted side effects. There are a few ways call sites can be modified to add contracts: we can rewrite internal calls to point to a shadow method that does not have contracts enabled, or we make external calls refer to a wrapper method (which can be inlined, in which case the method implementing the contracts are invoked directly from the caller).

In any case, the transformation means that it is impossible to distribute pre-woven libraries that check their contracts even against an application that does not, since all new call sites need to be instrumented. (New external call points appear naturally as the class is used, and new internal call sites are the result of users extending library classes.)

Another issue, which is perhaps more problematic, is interoperability with other tools that are not aware of our little splitting and wrapping tricks. Beside potentially polluting the stack trace with user-unfriendly names and references, the risk would be confusing other bytecode-reading citizens of the Java world, such as profilers and boilerplate-laying frameworks.

For these reasons, this solution was not retained and a totally dynamic strategy that could be implemented on the caller’s side was considered instead.

A dynamic per-call scheme

In the end, we decided on a dynamic strategy that can be described simply as: invariants are checked the first time we enter a contracted object. By "enter", I mean that it is the first call dispatched to that object from another object; if in the middle of one of its methods, execution leaves the instance and then comes back (A calls B, which calls A), it is not counted as "entering" the object.

Obviously, this is most easily accomplished dynamically, when information about instances is fully available. The problem of checking whether we are entering an object is equivalent to determining whether the current instance is already present in the stack trace. A straightforward way to implement that behavior is by means of a per-object flag that we set to true on entry and false on exit of the method that set it to true. Such a technique implies however that the flagged object cannot have its methods called concurrently from different threads. Such a constraint is fine for most classes and object, but for those that need it, let’s take a look at a thread-safe alternative.

The basic idea here is that we want to associate a flag per object and per thread. Making the per-object flag a ThreadLocal object is not a solution since it would introduce an unbearable number of thread-local variables to manage. Instead, it is possible to maintain our own thread-local set of currently entered objects: this map only needs to hold all contracted objects currently on the stack, which hopefully isn’t so many. My soon-to-be-merged patch implements this exclusion mechanism through a per-thread IdentityHashMap that records which objects have been seen in the current stack trace.

Actually, Cofoja will only use the thread-safe version. While the cost of accessing a thread-local variable is high compared to a simple field, it is shared with another flag, which must already reside in per-thread storage: the recursion prevention flag.

A word about recursion

Another issue with contract checking is the matter of contract recursion: when contracts access other objects, should these objects have their contracts enforced? If there is no other mechanism preventing it, we may end up in an infinite loop trying to check contracts in circle.

Hence, a simple solution is to prevent all kinds of contracts within contracts from activating: this requires a per-thread variable.

But you may ask: doesn’t the same-object exclusion rule presented in the previous section disallow recursion already? Indeed, should the same contract be checked twice in the same trace, the second one will be inhibited. However, it is true only of invariants and it is easy to write postconditions that reference each others’ methods, for example.

Nonetheless, I decided to test without the flag and came upon an interesting situation. The problem was revealed by the following excerpt from the Cofoja source code (when running Cofoja with its own contracts enabled):

@Invariant({
  "getEnclosingElement() == null " +
      "|| getEnclosingElement().getEnclosedElements().contains(this)",
  "Iterables.all(getEnclosedElements(), isValidEnclosedElement())"
})
public abstract class ElementModel

What it does is check that the current element is effectively a child of its parent and has children of its own of the proper kind. When considered alone, this contract is quite nice and very local: it examines the direct parent and children of one node. However, if contracts within contracts are honored, the invariant will trigger itself on the parent as well as all the child objects.

A quick glance at the contract may suggest that this extends the check to the full tree, which is not such a bad thing per se. But a closer look will reveal that it actually introduces many redundant checks. The core of the issue lies in the first clause: the one targeting the parent; it is hidden in the method contains. contains will scan through the entire list of children of the parent object (which are siblings of the current object) in search of one that is equal to the current element. Unfortunately, such a traversal will trigger contracts on each visited instance, hence spawning a new check that will scan all of its siblings. This scan is bound to stop once all the siblings have been seen once in the current trace; it will however repeat the same idiotic ritual with the next, and so on and so forth. This will also propagate to the whole tree.

If there is something to be learnt from this experience, for me, it was that a seemingly very legitimate contract could wreck havoc assuming the new rule only. And it was also very difficult to debug: the program just seemed to hang forever, and if stopped, it would just show a normal stack trace with a couple of nodes being checked, nothing unusual. Hence we decided that even if we could extend the same-object exclusion mechanism to pre-and-postconditions, we shouln’t do it, and instead keep the simple flag that excludes all contracts within contracts, because there shouldn’t be such an easy way for our users to write innocent-looking contracts that would blow the system.