Tuesday, August 7, 2007

A short description of JUndo

JUndo-- A Novel Declarative Language


This is the text of a journal paper that I once submitted to SIGPLAN. The paper was rejected on the argument that JUndo was too similar to Haskell. I don't recall Haskell supporting inheritance, polymorphism, or late binding... oh well.


Summary

Typical object-oriented languages such as C++, C#, and Java do not provide direct support for declarative programming. However, large object-oriented programs often integrate declarative concepts in order to implement requirements such as transaction rollback in complex object systems, multi-level undo in large applications, and sophisticated forms of error handling that are intended to return the software to an initial state after an exceptional condition is encountered. A novel language, JUndo (pronounced "jun-doh"), is presented here that allows manipulations of object-oriented data to be expressed in a declarative manner. Further, a set of objects containing an arbitrarily complex set of object-object references can be reverted to a previous state in a straightforward fashion using the language’s constructs. This can be very useful for returning an object system back to a previous consistent state after encountering an exceptional condition.

Key Words: object-oriented programming, functional languages, declarative languages, history management, undo, redo, JUndo

Introduction

This paper describes a language motivated by issues in the implementation of undo capabilities in complex object systems with typical languages (C, C++, Java). One possible undo implementation is to implement the Memento pattern from [9], and then save a Memento before executing each individual command. For many complex object systems this has prohibitive memory and/or performance costs. Another possibility is to use a Command pattern such that there is a "reverse" operation for each operation performed [9]. For instance, the "reverse" of changing the drawing color from blue to red is to change the same color back from red to blue. Edwards suggests that this approach often doesn't work in "real world" applications [8]. This is because the precise side-effects a command may generate can not be known a priori, and hence it is sometimes extremely difficult to compose a command that will reverse them. Edwards solved this problem in the Flatland system by breaking each user-generated command into a sequence of simpler "atomic" commands for which the undo operations could be expressed [8]. Performing undo on a Flatland user command involves invoking the undo operation for each of the simpler commands in the sequence. The author contends that even this approach will not scale for certain types of large systems. That is to say, there are certain types of systems for which it is non-trivial to create simpler "atomic" commands. Consider a command that parses an expression, evaluates the expression, and then side-effects the system by assigning the results of the expression into variables that are used to calculate the results of other expressions (using a topological sort to determine the order of expression evaluation). Generating simpler undoable commands for this requires opening the private internal data of the entire parsing, evaluation, and topological sort pipeline to the command processor so that the undoable commands can access all the individual data members. At best, this seems like a questionable idea from an architectural standpoint. It could also be extremely difficult to implement.

ACIS [6], a commercial solid modeler, provides an example of a undo implementation that does scale. ACIS defines its own base class, and its own pointer class with overloaded assignment operators. Whenever an object is created or deleted or a pointer is reassigned, a record of each change is stored by ACIS. The ACIS undo process essentially consists of reversing each stored object change in the proper order. The ACIS API represents undo information in the form of delta states, where a delta state is, in a very high-level sense, a pointer to a ordered set of undoable object changes. This implementation succeeds because it does not attempt to generate any a priori commands. Instead, the overloaded assignment /creation/destruction operators react to changes as they happen at the object-member level. Arbitrarily complex commands can be handled by such as system with no need to break them into simpler commands as done in Flatland [8]. Such an implementation is very complex, but it does work. The ACIS system is so complex that it borders on being its own programming language. The ACIS APIs were written in C++ using C++ operator overloading. The use of operator overloading to such an extent arguably changes the semantics of C++ objects. For instance, using the delete operator does not actually delete such an object. Instead, the discarded object is attached to a change list.

The JUndo Language

JUndo is motivated by the idea that the semantics of undoable objects, e.g. those created by the operator overload method, are better defined at the language level. That is to say, the language should provide high-level abstractions of the object system state that enable the programmer to focus on creating value-added algorithms rather than laboring on yet another undo implementation. Intensional languages such as ISWIM [1] define time streams that are in many aspects similar to the timelines in [8]. However the intensional time stream, like the delta state, defines streams of data rather than streams of commands. JUndo is not used to write programs. Instead, it generates undoable classes that are embedded in much larger programs such as Verdantium [10]. The rest of this paper will describe the language, to the fullest extent possible in the remaining space, using a set of small coding examples. In Java, a simple class with getters and setters is written as follows:


class IntRef
{
private int val;

public int getVal( )
{ return( val ); }

public void setVal( int inv )
{ val = inv; }
}


The JUndo equivalent of this class is as follows:


class IntRef
{
private int val;

public int getVal( )
{ val };

public milieu setVal( int inv )
{ val := inv };
}


First, JUndo method bodies are expressions rather than collections of statements. Second, note the presence of the keyword "milieu". This keyword, named by ASU professor Edward Ashcroft, defines an environment containing the state of a set of cross-liked objects. Third, the expression val := inv does not side-effect an object member in the usual sense. Instead, it is a functional operator that takes in a milieu and a value (in this case inv) and returns a milieu. The milieu input to operators such as := is curried into each method, much as the "this" reference is curried into each instance method.


JUndo allows multiple parenthesis-types in an expression (parenthesis, brace, or bracket), so the following is also valid:


class ExprnClass
{

public int getVal( int a , int b , int c , int d )
{ a * ( b % [ c - { d * d } ] ) };

}


The nesting of multiple levels of parentheses in other languages such as LISP and Scheme can be confusing. The use of multiple types of parentheses in JUndo is intended to help address this issue, and allow a syntax that is more similar to that or C++, Java, and C#.

there is no "return" statement in JUndo. Unlike methods in Java and member functions in C++, methods in JUndo can not have a "void" return type. In fact, there is no "void" keyword (and no equivalent) in the JUndo language. Methods in JUndo must return some non-void entity. This makes JUndo a totally declarative language in the sense that imperative semantics (i.e. statements that either directly alter or side-effect previously defined values) are never used. Instead, methods are defined that leave previously declared values unchanged. JUndo equivalents to methods or member functions with a "void" return type would most likely return "milieu" as in the sample code above.

In JUndo, a milieu is a primitive type just like an integer or a character. Milieux can be manipulated and stored in objects just like any other datatype. This is an important feature because it allows object systems to manipulate entire milieux of data within an application. This allows a set of instances to be pigeonholed into a single element that can be moved from structure to structure as necessary. There are a number of design advantages in being able to keep a closed system of objects as an independent element in a data structure. It makes that set of instances so that they can not be manipulated by another object unless that object has access to that particular milieu. This can be used to shield a set of objects from unauthorized access. It also allows for the creation of data structures of milieux, which is important for implementing concepts such as multilevel undo. An example of a simple class containing a milieu as an instance member is shown below:


class MilieuRef
{
private milieu val;

public milieu getVal( )
{ val };

public milieu setVal( milieu inv )
{ val := inv };

}


Pairs and Object Creation


Like the language Objective C, JUndo does not have constructors. Objects are constructed using class methods. This is advantageous for classes that are designed to only create objects in a selective fashion. For example, some classes are designed to only allocate a new object if an object of identical value does not already exist. A number of string implementations use this to ensure that two strings are identical iff. their OIDs are. This can greatly reduce the amount of computation time required for string comparison. Using a class method (as indicated by the static keyword) instead of a new operator gives each class direct control over when and how its instances are created. An example of a class method to create an object of class Singleton is:



class Singleton
{
protected static Singleton instance;

public static pair[ Singleton ] new_Singleton( )
{
if instance != null then [ instance , thismilieu ] else
let
{
pair[ Singleton ] sing = [ Singleton , thismilieu ].allocate_Singleton();
milieu q = [ Singleton , sing.milieu ].instance := ( sing.cobj );
}
with [ instance , q ] fi fi
};


/* More Code Here ... */


}


This example also introduces a new JUndo data type called the pair. The pair of a object reference of type A is defined as . Pairs do NOT have OIDs, and they do not exist in milieux except as members of objects or classes. One can see why the pair is useful by thinking about what the new method does. It both creates a new object, and it creates a modified milieu that contains that new object. The pair makes it easy to return both of these entities to the caller at the same time. A pair is built by placing the item and milieu that it is to contain in brackets in a manner like the following: [ obj , thismilieu ]. Elements can be extracted from a pair using the .cobj and .milieu operators. The construction class methods, which can be given arbitrary names, allocate new objects by using the protected class method allocate( provided by the runtime. After allocating the object, the method is then free to initialize it in arbitrary ways (as the above example does).

Because a pair always contains two and only two elements, the compilation of a pair takes on many of the characteristics of the compilation of a primitive type. An JUndo compiler treats pairs very differently from the manner in which a standard compiler would handle code generation for a typical derived type such as an array or a record. To review, expressions on a primitive type in typical languages are mapped to primitive operations (in a language like Three Address Code) that write to temporary variables defined by the compiler. Each primitive operation has one primitive temporary variable as its result. Similarly each primitive expression on an JUndo pair, in the compiled representation, has two temporary variables as its result: one for the object reference and one for the milieu. Moreover each primitive operation on a pair maps to two Three Address Code operations, one for each of the two temporary variables that make up the pair. In this way pairs get mapped directly to lines of Three Address Code in the same way that a primitive type does, except that everything with a pair gets mapped to two lines instead of one. This allows operations on pairs to mapped into more efficient code than is otherwise possible. This is an important optimization because pairs play a fundamental role in the language.

Inheritance, Encapsulation, and Polymorphism with Functional Programming


An object-oriented language can be defined as one that supports inheritance, encapsulation, and polymorphism. JUndo supports all of these concepts. JUndo's implementation of these concepts is almost identical to JDK 1.4 (except that JUndo doesn't support inner classes). Intensional languages often define functions by declaring them in let blocks. In contrast, JUndo doesn't support this. JUndo, following the Java convention, requires "functions" to be implemented as methods in the scope of some class. However, JUndo methods can call each other in the same recursive fashion as intentional functions. Methods are preferred over let-functions for a couple of reasons. First, methods can be translated into efficient object code more easily then let-functions. This is particularly true when writing code for a Java virtual machine. Second, methods can be overriden yielding a greater degree of expressive power. Finally, the use of methods is more consistent with the concepts of data abstraction.

JUndo contains a complete set of functionality for producing a functional program, but does not provide some of the operators from intensional programming such as “fby”, “prev”, “next”, and “at”. The semantics of these operators are such that their implementation tends to be recursive, although there are optimizations for mapping special cases such as tail-recursion to non-recursive code. A language does not necessarily need to be recursive in order to define its operations using a declarative semantics. For example, the Datalog programming language uses Domain Relational Calculus to map a number of Datalog rules into iterative relational queries. In Tuple Relational Calculus, an example of a declaration of a simple iterative (SQL-like) query expression is as follows:


select A.emp from A, B where [boolean expression]


Another trend in Object-Oriented programming is the use of iterators in frameworks such as Java Collections Framework (JFC). For instance, a Java method using iterators might resemble the following:



public Iterator getEmps( Iterator E , Supervisor S )
{
Vector v = new Vector();
while( E.hasNext() )
{
Employee emp = (Employee)(E.next());
if( emp.getSalary() > S.getSalary() )
{
v.add( E );
}
}
return( v.iterator() );
}



Similar code no doubt exists in a number of production programs written in Java. Note that the above program can be written in a form of pseudocode that looks like the following:



public Iterator getEmps( Iterator E , Supervisor S )
{
return( select Emp from Emp in E where ( (Employee) Emp ).getSalary() > S.getSalary() );
}



The above code has some SQL-like syntax, but the semantics is different. The above expression iterates over an Iterator, whereas typical SQL iterates over relations. More importantly, the above expression suggests a mechanism for expressing JFC-like iteration patterns through declarations (as opposed to imperative while loops). This "select" semantics provides an alternative to operators from intensional programming such as “fby”, “prev”, “next”, and “at”. The "select" semantics creates declarations that map readily to iterative code, whereas typical intensional language compilation requires an optimization process to remove recursive elements of the intensional declarations. A typical select expression in JUndo looks like the following:


pair[ IteratorFactory ] empFac = select e
from e : Employees , s : Supervisors
into now
where ( [Employee] e ).getSalary() > ( [Supervisor] s ).getSalary();



In the above expression, which finds all employees who make more than some supervisor, "Employees" and "Supervisors" are of type IteratorFactory, where IteratorFactory is an interface with a single method called iterator() that returns an Iterator. For instance, the java.util.Vector class in the Java language could easily by turned into a Java iterator factory because it already has an appropriate iterator() method. Several other classes in JCF have similar iterator() methods. In the JUndo syntax, Iterator and IteratorFactory can be defined as follows:


public interface Iterator
{
public pair[ boolean ] hasNext();
public pair[ Object ] next();
}

public interface IteratorFactory
{
public pair[ Iterator ] iterator();
}



Referential Transparency in JUndo

Functions in pure functional languages obey the concept of referential transparency defined as "the execution of a function always produces the same result when given the same parameters" (7). A form of referential transparency also holds for JUndo methods. Consider the following call to a JUndo instance method:


[ myObj , myMilieu ].myMethod( a , b , c , d )


In a typical implementation of a JUndo compiler, myObj would be curried into a this parameter in the object-code version of the method call (much as is currently done in most C++/Java/C compilers). The milieu myMilieu would also be curried into a this_milieu parameter in the object-code version of the method. In JUndo defining referential transparency requires the inclusion of the "hidden" parameters that are typically curried into the object code. For instance, the instance method myMethod above always returns the same result (to within the ability to observe differences using the language) as long as the equivalents of a, b, c, d, myObj, and myMilieu remain the same. This includes functions that return milieu describing modifications to objects and functions that return milieu describing the creation of new objects. The definition of referential transparency for a JUndo class method is similar, but only this_milieu is curried (a this parameter isn't curried because it is an error for class methods to use a this reference).


Conclusion

Multi-level undo can be implemented in JUndo by creating a class library that associates a milieu with each important point in the undo/redo timeline. Details have been omitted in order to fit SIGPLAN's length limitation. To conclude, the reasons for motivating the JUndo language have been expressed, and some novel features of the language have been described. There are a number of other elements in the language, such as arrays and let expressions, that are not described in this paper. For additional explanation, please see the following web site: http://verdantium.blogspot.com

References

1. W. W. Wadge, and E. A. Ashcroft, Lucid, the Dataflow Programming Language, Academic Press, New York, NY, 1985.

2. P. J. Landin, ‘The Next 700 Programming Languages’ CACM, 9, 157-166 (1966).

3. T. Faustini, Visual Java, Computer Software, (unpublished).

4. K. Arnold, and J. Gosling, The Java Programming Language, Addison-Wesley, Menlo Park, CA, 1996.

5. B. Stroustrup, and M. A. Ellis, The Annotated C++ Reference Manual, Addison-Wesley, Menlo Park, CA, 1990.

6 J. Corney, and T. Lim, 3D Programming with ACIS, Saxe-Coburg, Stirling, UK, 2001.

7. R. W. Sebesta, Concepts of Programming Languages, Seventh Edition, Addison-Wesley, San Francisco, CA, 2005.

8 Edwards, K., Igarashi, T., LaMarca, A., and Mynatt, E. D. A Temporal Model for Multi-Level Undo and Redo. In Annual ACM Symposium on User Interface Software and Technology, November 2000, pp. 3140. http://citeseer.csail.mit.edu/edwards00temporal.html

9 E. Gamma, R. Helm, R. Johnson, and J. Vlissides, "Design Patterns", Addison-Wesley, Menlo Park, CA, 1995.

10 Green, Thornton, "Verdantium-- Towards a Java-Enabled Compound-Document Model", Poster Session, OOPSLA 2000 Conference, ACM, October 15-19, 2000, Minneapolis, Minnesota.

No comments: