Monday, September 17, 2007

Finalization in the JUndo Runtime

Sun has a new article on its website about Java finalization:



http://java.sun.com/developer/technicalArticles/javase/finalization/



The JUndo runtime uses finalization methods to reclaim memory, and hence finalization issues such as those described in the above article by Tony Printezis affect the design of the JUndo runtime. The article seems to assume that finalization is used in cases where a native resource needs to be managed from within the Java Virtual Machine. The JUndo runtime is using the same finalization techniques to address a different set of problems that do not include the managing of native memory.

To understand the JUndo finalization mechanism, one needs to grasp how JUndo addresses the problem of declarative memory management. In a typical intensional language, declarations are written recursively with an expression like the following:


x = 1 fby ( 2 * ( prev x ) )


Hence, x at time=0 is 1, x at time=1 is 2, x at time=2 is 4, etc. Assuming that no high-level optimizations are performed, each calculation of "x" at time 38 in the example above requires making 37 recursive calculations. This is an expensive process, so intensional languages often use cache memory. The result for "x" at time 38 is stored after the first time it is computed, and then the cached value is used for subsequent requests for "x" at time 38. This raises the question "what items need to remain in the cache, and what items can be removed at a later time?" That is to say, is there a time when a particular item in the cache expires?

JUndo addresses the cache expiration problem in the context of object-orientation. When objects are allocated, their milieu states are added to the cache. When an assignment operation operates on the member of a class, new milieu states are added to the cache. When a particular JUndo object ID is no longer reachable all milieu states related to that object ID can be removed from the cache. Milieux also get collected. The milieu is a primitive type in the JUndo language, but it maps to a JUndo runtime ExtMilieuRef in the generated object code that executes on top of the runtime. ExtMilieuRef instances are garbage-collected by the JVM when they are no longer reachable, i.e. when they are no longer being used by any piece of JUndo code. Milieu go out of scope and/or become unused, and hence it makes sense to get rid of the underlying objects at that point.

When an ExtMilieuRef is collected all milieu states related to that ExtMilieuRef (all object creations, object changes, etc.) can also be collected. This requires some amount of processing, and hence ExtMilieuRef uses Java's finalization APIs to setup that processing. Each definition of a milieu A in terms of a milieu B (e.g. through object creation) makes a B that is access-wise identical to A with the exception of a finite set of changes (e.g. B has an object that A doesn't due to the object creation). At the time B is created A already exists in the JUndo runtime cache in the form of an EntMilieuRef. Hence, the JUndo runtime often creates an EntMilieuRef for B that contains a set of changes (e.g. the object creation) in a table, and delegates the rest of the changes to A. Even more often, the JUndo runtime will structure the delegation so that A delegates to B (rather than the other way around). However, such reversal of the direction of delegation is beyond the scope of this blog post. The important thing to understand is that performing several operations in sequence causes a linked chain EntMilieuRef delegations to be constructed in memory. That is to say, E delegates to D which delegates to C which delegates to B which delegates to A, etc.

Consider the chain running from E to A in the previous paragraph. References can become unreachable in any other in a Java program, so suppose that C gets collected first. C can't just disappear because the delegation path from D to B would be broken. Instead it changes the delegation path so that D delegates directly to B. Moreover, some of the change information that was originally in C is migrated into D. For instance, suppose that an object was created in C and D expects to be using this new object. Just because C was collected doesn't mean the object is going away. It's still valid to use that object from D and E. Hence, information about the object creation has to find its way into D.

To free its memory properly, an EntMilieuRef must know about both its delegators and its delegates. As a result, the chains of EntMilieuRef instances have bidirectional references. An EntMilieuRef can't just be left to garbage collection for this reason-- it won't collect until the chains are broken. Even if it could be collected, one wouldn't want to collect it until the chain has been properly reformatted.

The basic procedure for solving this problem is as follows. Each ExtMilieuRef points to an associated EntMilieuRef. First, the ExtMilieuRef garbage collects. When the ExtMilieuRef collects its finalization method runs, and this finalization method executes a request that asks its associated EntMilieuRef to reformat itself for garbage collection. The reformatting has to be done carefully from a thread standpoint. The finalization method is called on one thread, while the milieu delegation chains are being accessed on another one. Synchronization was ruled out as being inefficient. Instead, it is observed that Verdantium typically runs on the Swing Event Dispatch Thread. At least for the time being, the thread issue is solved by having the reformatting invoked on the Swing Event Dispatch Thread. To use the JUndo Runtime with something other than Swing, one probably should change the threading of the invocation code that causes the EntMilieuRefs to be reformatted.

In summary, the JUndo runtime solves a series of sophisticated finalization problems in order to provide efficient memory organization. This contrasts with the idea that finalization is used to manage objects that reference blocks of native memory, and code for the JUndo runtime on Sourceforge represents the current solution. The reduction in the length of the delegation chains through garbage collection also reduces the number of delegations through which the runtime might need to perform a lookup. As a result, there is a significant performance benefit in addition to the efficient finalization of unused memory.

No comments: