I decided to re-post this paper here for a couple of reasons. First, nobody seems to know what happened to the original submission that was completed in 1999. Second, the paper provides several examples that may be useful for JUndo programmers. I made a couple of modifications to the original text to make the original examples more readable by those looking for JUndo examples. I changed "methodical" to "seq", I changed ".class" to ".cobj", etc. All the changes were syntactic in nature. Where I have something to add to the text I have put it as a comment in italics.
EphrataX—A functional programming language for investigating complex issues in software design
Summary
Few, if any, programming languages can effectively merge declarative concepts with object-oriented (imperative) programming. A first-order functional language, EphrataX, is presented here that allows manipulations of object-oriented data to be expressed in a declarative manner. Further, sets of objects can be reverted to a previous state in a straightforward fashion using the language’s constructs. Among other things, this can be very useful for returning an object system back to a previous consistent state after encountering an exceptional condition. Other areas of common programming practice where EphrataX has advantageous characteristics are given.
Key Words: object-oriented programming, functional languages, declarative languages, EphrataX
Introduction
Typical uses of purely functional programming languages like ISWIM [1] have focused on producing arithmetic functions using recursive concepts. One example of this is the example of computing factorials, which many are already familiar with.
However, focusing on arithmetic functions such as factorials as an application of functional programming has a number of drawbacks. First, the problems (e.g. how to compute various arithmetic functions recursively) tend to be primarily academic ones. Few software designers lose sleep over the particular implementation used for calculating factorials. Second, the problems tend to adequately solved for current systems. There are no particular weaknesses with the iterative methods typically used for calculating these functions that functional programming could exploit. Third, functional programming tends to use a great deal of caching of data compared to similar iterative methods, and this data caching performs no real purpose other than as a technique for improving performance. Finally, despite the caching the computational performance of functional programming is often inferior to its iterative counterpart.
This paper proposes techniques for creating more sophisticated first-order functional languages based on object-oriented concepts. These languages are primarily designed for using functional programming for solving difficult software design and engineering problems. Because many software design problems are very complex, run-time performance is much less of an issue. Functional programming can provide solutions to some problems in a few hundred lines of code that would require thousands of lines (if not more) in a procedural language. Procedural programs that perform complex tasks are often very difficult to design, create, and debug. Because they may have several pieces of code enforcing the same semantic axioms, they must be extensively tested. Some problems are so difficult to procedurally implement that it is extremely difficult for a small design team to implement a solution in a reasonable period of time. Even though the run-time performance of functional programming may not be quite as good as procedural programming for these tasks, the functionally programmed project can potentially be implemented in a shorter period of time, at less cost, with a greater degree of robustness, and with less “smoke and mirrors” in the implementation (more on this later).
Implementing interactive programs in a particular programming language requires a mechanism for the program to take a user command, and based on that command change the state of the program’s variables from a previous state to a new state. To implement interactive programs using a functional notation, a program must define the current state of the program’s variables at a time index “t” (as an explicit parameter) as a function of the previous state of the program’s variables (i.e. the state at index t – 1) and the latest command entered by the user, where there is one command for every value of “t” from the value of “t” for the first command up to the value of “t” for the current command. The command for a index “t” can itself be treated as a function of “t”, as can the state of the program’s variables at a index “t” given that a default initial state is used for the index before the first command is executed. Thus, a function getState() that defines the state of an interactive program can be written in pseudocode as the following:
command getCommand( int t ) = /* An expression for the input command
given “t” for 1 <= t <= last command received */;
state getState( int t ) if t == 0 then /* initial state */
else /* An expression using getCommand( t ) and
getState( t – 1 ) */ fi;
where the words state and command are meta-level symbols that are replaced by the actual datatypes used to represent a program state and a program command (sometimes also called an event) respectively in a particular design. For instance, in an object-oriented language one might declare a class that represents a command in a particular kind of system.
When such a parameter “t” is used in a piece of code to define how a program or routine like getState() reacts to a set of consecutive inputs, “t” is called a time stream. A set of consecutive values returned by a function on a time stream “t” is said to be defined in terms of the stream “t”. This has some similarities to the concept of a time stream in the programming language Lucid [2], and the Lucid-like implementation of interactive programs in Visual Java [3]. Visual Java in particular provides a compelling example of how an intensional language can be used to implement interactive software. However, there are some important differences between the term “time steams” as used in this paper, and “time streams” in Lucid. In Lucid, a time stream is a concept of intensional programming that is based on an implicit context. For the purposes of this paper, a time stream is a design pattern based upon the use of explicit parameters in functions. A time stream may be used in a functional paradigm any time that one wants to implement a state machine in terms of “stateless” functional programming by expressing the state as a recursive function of the index.
As an example of the use of a time stream, consider a functional program that emulates a subset of the interactive behavior of the “MS-DOS Prompt” found on many personal computers. The DOS prompt provides a blinking cursor. As the user types alphanumeric characters, the characters are interactively appended to the prompt line, and the position of the cursor is moved to the end of the line. A special character, which in this example will be called the backspace character, deletes the last character that appears on the prompt, and moves the cursor back one position, allowing the user to substitute another character in place of what was previously there. Assuming the initial state is a prompt line with no characters, the table below shows how the entry of sequential characters (over time) affects the interactive state of the DOS prompt:
Character Entered | DOS Prompt Display |
A | A |
B | AB |
C | ABC |
BACKSPACE | AB |
D | ABD |
E | ABDE |
F | ABDEF |
BACKSPACE | ABDE |
W | ABDEW |
These input characters can be handled in a functional notation by assigning each input character in sequence a number starting at 1. For instance in the example above, the first input character, “A,” would get the number 1, the second input character, “B,” would get the number 2, etc. Thus one can declare a function like the following:
char inputChar( int t ) = /* … */;
where the character returned by the function is the tth character in the input sequence. For instance, in the example given in the table above, inputChar( 1 ) would be “A,” inputChar( 2 ) would be “B,” inputChar( 4 ) would be BACKSPACE, etc. The inputChar function acts as the implementation of the getCommand, in the pseudocode given above, for this example. It defines a series of commands (not the same as DOS commands) over a parameter “t”, where each command is the input character that the user has entered. The term “command” is used in this context because each character typed represents an interactive request by the user to change the current state of the DOS prompt. That is to say, when the user types “A,” the user is commanding the system to add the character “A” to the current prompt display. For implementation in an interactive program, assume that the implementation of this function is dynamically linked in some manner to the external system API that actually gets characters from the keyboard. Also, it is assumed that a routine calling inputChar is not going to use a value of “t” that is “from the future.” That is to say, it will not use a value of “t” corresponding to a character that is after the last character entered in the current runtime state of the prompt (in principle requests for future characters could be handled, but the user interface to do this is unwieldy in practice because it potentially asks the user questions like “what is the twentieth character you’re going to type?”).
One can then define two functions to get the length of the prompt at a particular “t,” and the contents of the prompt at a particular “t” as follows:
int getLength( int t )
= if t == 0 then 0 else
(
if inputChar( t ) == BACKSPACE then
(
if getLength( t – 1 ) > 0
then getLength( t – 1 ) – 1 else 0 fi
)
else getLength( t – 1 ) + 1 fi
) fi
int getStringChar( int t , int ChrIdx )
= if ChrIdx > ( getLength( t ) – 1 ) then ERROR_CHAR
else
(
if ( ( getLength( t ) – 1 ) == ChrIdx ) && ( inputChar( t ) != BACKSPACE )
then inputChar( t )
else getStringChar( t – 1 , ChrIdx ) fi
) fi
These two functions are the analogue of the getState function defined above. An explicit assumption is being made that some external system API is tracking what these functions are returning for the most recent value of “t,” and is writing this information to the display device. It is also assumed that this external API will not try to call either function for a value of “t” that is “from the future.” Because the prompt may contain multiple characters, a mechanism is needed to define what each character in the prompt is. This is provided by the ChrIdx parameter of getStringChar. Passing values of ChrIdx , for a particular “t,” between 0 and getStringLength( t ) – 1 inclusive returns each character in the prompt from left to right, unless getStringLength( t ) is zero. If getStringLength( t ) is zero, then the prompt is empty. The combination of the length of the prompt plus the set of characters in the prompt define the state of the prompt for a particular “t”.
In the context of using time streams to investigate difficult software design and engineering problems, one can also view the caching of data in a different light. A functional implementation of the MS-DOS prompt example above would likely need to use caching for strictly reasons of improving runtime performance. The use of a cache keeps the implementation from using recursion in an excessive fashion each time a character is added to the prompt. However, the cache is useful for more than just a gain in runtime performance. Many complex applications already engage in the caching of data, but they use different terminology to describe it. Many examples of applications software for personal computers implement what is called “multilevel undo”. That is to say, they allow the user to retract an arbitrary number of previously committed actions. To do this, they must keep a store of previous program states, and then be able to restore those states later. If the user’s actions are considered to be defined over a time stream, then the caching of all changes from one program state to another provides a straightforward and efficient representation of the information that that application already needs in order to perform multiple levels of undo. For instance, multilevel undo could be added to the MS-DOS prompt example by providing a mechanism for an UNDO character command to cause the prompt to return a state for a time “t” that is identical to a previous one (before t – 1 ).
Another example of software that already stores cache-like data sets concerns those programs that already keep logs or audit trails of their actions. These logs are often kept so that transactions on the system that had the effect of leaving the system in an inconsistent state can be rolled back, placing the system back in a previous consistent state, or for the purposes of recovery in the event of a crash. In many cases these logs store all of the information that an implementation of a functional language would place in its cache, but for different purposes. This process of rolling such a system back to a previous state is similar in some aspects to multilevel undo, but appears in a different (often non-interactive) context and with different overall requirements.
Functional languages excel at the ability to return a system to a previous state in a straightforward fashion, and the caching of information is much less of a problem because the log being stored can double as a cache of previously computed information. Using a functional language would simply put that log (i.e. cache) in a functional context.
Sophisticated error handling is an area where the caching of data is useful for similar reasons. Many times the semantics of a module are so complex that a program actually starts to modify its data until something is found that violates one of the axioms that the module is trying to enforce. That is to say, it is sometimes extremely difficult to determine what will lead to an error condition a priori. For example in a system that stores employee information, a set of axioms, such as one that asserts that no supervisor can make more that ten times the sum of the salaries of those employees he or she directly supervises, would make it difficult to check a priori a request to adjust the salaries of multiple employees based on some criteria. It usually makes more sense in this example to adjust all the employee salaries first, and then see if anything went wrong. At this point where a problem is found the module attempts to change the data back to its original state, and then return a status value indicating that some error had occurred. In many procedural implementations, this is easier said than done. At the point that the error was detected, the procedural implementation has to “know” what changes were made previously. For simple systems, algorithms can be written that can reverse the original actions. But often the systems are not simple, and thus can not be done without a large input of man-hours on the part of those implementing the system. We will show that functional programming through the use of constructs such as time streams provides a mechanism to ensure that an entire system or sub-system can be returned to its previous state using a single assertion. Any valid time value can be passed to a function defined over a time stream. As a result, a function defined over a time stream can handle exceptional cases by returning the last consistent state found previously in the time stream. That is to say, it returns the result of a function call with a previous time value.
Functional languages could operate very efficiently in those domains where caching (by any other name) is already widespread. And for some of the reasons listed earlier, procedural languages often do not provide an adequate solution to many design issues that are seen in practice (as opposed to purely academic ones), whereas functional languages have features that make them advantageous for implementing these systems. So why are functional languages not generally used here? The main problem is that there has not been a functional language that supports the level of data abstraction required to directly address these problems. Data abstraction allows complex pieces of data to be subdivided into simpler ones that can be individually manipulated. For instance, a list of data can be manipulated independently of other structures. A data abstraction can also be used in several places to manage other varying kinds of data (e.g. a list of people, a list of buildings, etc.). The ability to use the same abstraction in multiple areas reduces the amount of code required to implement a particular project, but requires that the operations that act on the data also be abstract in the sense that they can operate on different kinds of data (e.g. removing an element from a list has to work independently of whether that list is a list of people or a list of buildings).
Object-oriented languages provide an extremely flexible framework for data abstraction, so one might be motivated to attempt to make a functional language object-oriented (or vice versa). Although some object-oriented concepts can be implemented using a functional language, creating a real merger of the two is very difficult. The abstract operations that allow structures to be manipulated in a more straightforward fashion than is possible in a non-OO language, by separating the interface to a structure from its implementation and providing built-in high-level functionality, tend to be very procedural in nature. And this procedural nature has to be removed in some fashion so that operations so can be rolled back at a later time. However, many of the practical requirements for data abstraction create roadblocks that tend to stymie attempts to integrate OO and functional concepts. A piece of abstract data (called an object in object-oriented circles) often has sub-data which also is abstract. An operation on such a piece of data often has effects that carry through to operations being performed on the sub-data. As a result, it is difficult to declare the state of a piece of data without some knowledge of all operations that might directly or indirectly affect that data. Because a typical OO language is imperative, it does not need to declare the values stored in each object in some functional, clausal, or other axiomatic sense. Instead, an imperative OO method overwrites the previous data states, destroying whatever used to be there. This greatly limits the number of places where a data abstraction can be used in a declarative language as opposed to an imperative one. An abstract structure A can only be used in some other structure B if all the places where operations on B might affect A have already been declared in some fashion. This does not provide a particularly abstract form of programming because every structure has to know about every other structure that might use it.
In hindsight, today I would probably do something like a DOS prompt with JUndo objects as opposed to JUndo recursion. See the DrawApp component in Verdantium for an example.
This paper presents a language called EphrataX that solves the problem of merging the functional and object-oriented data models. EphrataX combines elements of syntax and semantics from the Java [4] object-oriented language and the ISWIM [1] functional language, with some additional structures to integrate the semantics of the two. EphrataX is able to achieve this merger by defining how the state of a set of objects, i.e. the particular values of all members of all objects and classes in a set of objects that does not contain a reference to any object outside the set, changes over time in a functional manner. In a language like C++ [5], a set of objects has a particular runtime state that changes in real time, and accessing a particular object always returns the current runtime state of that object. The state changes at runtime because the C++ program overwrites various object members during its execution. While the runtime state of a set of objects in C++ is a function of time, the state of a set of objects in EphtataX is, analogously, defined as a function of an explicit context. Thus, examining the state of an object requires a combination of two data items in EphrataX. The first item is the unique identifier of the object, similar to an object reference in Java. The second item is the context that represents the equivalent of a timepoint (in a language like C++) for which a set of (potentially interdependent) objects have a particular state. In EphrataX, this context for a set of objects is called a milieu, and is a fundamental datatype in the language. The values of the members of the objects defined in terms of a particular milieu do not change. Modifications are not made directly to the state defined by a milieu in the same sense that methods in an OO language modify objects on the heap. Instead, a method that “changes” an object value, defined by a milieu A, returns a milieu B that defines object values identical to those defined by milieu A, except for the change made by the method.
For instance, suppose one wants an EphrataX method that mimics the behavior of a Java [4] method that sets member W of object X to value Y, where the previous value (before the execution of the method) of member W of object X will be called Z. The Java method will use a statement such as X.W = Y;. In doing this, it overwrites the previous value of X.W. An EphrataX method, given a milieu A, emulates the semantics of such an overwrite by returning a milieu B such that all objects’ member values defined by B are identical to those defined by A except X.W. In the returned milieu, B, X.W will have value Y. By contrast, X.W in milieu A will still have its original value of Z.
As the state of set of objects is defined as a function over the set of milieux, a particular milieu can be treated as if it is the state of an object-oriented program. Hence the word state in the getState() pseudocode presented earlier is often (but not always) replaced by milieu in an actual EphrataX program. This means that a milieu is also an important datatype for defining state machines in the language. A function defined over a stream that returns a milieu completely defines the runtime state of an object-oriented program after executing a sequential list of input commands.
Because methods do not actually alter the state defined by a milieu, the milieux act in a manner consistent with any other declarative functional definition. The fact that a milieu represents the state of a set of interdependent objects at a particular time means that for any piece of data in a milieu, its sub-data are also in the same milieu. As a result, an operation on a piece of abstract data can modify its sub-data without violating either the principles of data abstraction or the fundamental concepts of functional programming. The next section will show some examples of the language, and explain why this is true.
The EphrataX language
As a language, EphrataX is currently in the design phase, with attempts underway to produce an implementation of a compiler for a first version of the language. However, enough of the language has been defined to give a working description of how an EphrataX program would function. To get an introduction to how the EphrataX language works, consider the simple EphrataX example class below:
class IntRef
{
private int val;
public int getVal( )
{ val };
public milieu setVal( int inv )
{ member{ [ this , thismilieu ].val := inv } };
}
This is a simple class that allows access to a single private integer val through two public accessor methods, getVal and setVal. Note that, similar to other OO languages but unlike ISWIM [1], datatypes are declared for each parameter and value returned. Unlike most other OO languages, the method body enclosed in curly braces is an expression rather than a compound statement. In fact, there are no statements in EphrataX at all, only functional declarations in the form of methods. This makes EphrataX a totally declarative language in the sense that imperative semantics (i.e. statements that either directly alter or side-effect previously defined values) are never used. Instead, functions are defined that leave previously declared values unchanged. One can construct compound definitions in EphrataX (the closest equivalent to a compound statement in a traditional language) using a ISWIM-like let block.
The functional nature of EphrataX may lead one to ask “how do I modify an object W in a milieu X to a state Y if I’m not allowed to directly alter or side-effect W?” It sounds impossible, bit it’s not. The answer is not to modify the state of the object in X, but to declare a new milieu Z that is identical to X except for the fact that the state of the object W is Y. That is to say, one needs an operation that returns a milieu as a function of another milieu such that the appropriate modification appears in the returned milieu. An instance of such an operator is shown in the setVal method in the above example. This method will be discussed in more detail below.
EphrataX performs strong type checking in all parts of the language. Besides the declared parameters, two additional elements are curried into each instance method. The first is the this reference that points to the instance on which the method is being called. The second is a milieu called thismilieu that contains the milieu on which the method is being called. In a typical OO language (e.g. C++ [5], Java [4]), an object on the heap may contain different values at different times. Since a milieu defines the state of an object at a particular time, the same object may contain different values in different milieux. As a result, an instance method must know both the OID (an OID is a reference that uniquely identifies a particular object) of the object and the milieu in which execution is taking place in order to be able to uniquely specify the element of instance member data to be accessed. The getVal( method returns the instance value of val. Just as the object accessed in the expression is assumed to be the this object, the milieu is assumed to be the thismilieu milieu. As a result, nondestructive accessor methods can be written in a manner very similar to that in other OO languages.
In a typical OO language like C++ [5] or Java [4], an object OID or pointer is curried into each call to an instance method, and then this curried reference shows up in the method either implicitly or through the keyword this. EphrataX is different because an OID alone does not provide sufficient information to access the state of a particular object. Instead, a milieu is needed as well. As a result, both an OID and a milieu are curried into all EphrataX instance methods, and a milieu is curried into all EphrataX class methods. The curried OID in an instance method can be accessed by using the keyword this, and the curried milieu can be accessed by the keyword thismilieu. Both this and thismilieu are also used implicitly in EphrataX in a manner analogous to the implicit use of this in C++ and Java. For instance, the getVal( method in the example above makes implicit use of both this and thismilieu.
The setVal( method in the example above deserves closer attention:
public milieu setVal( int inv )
{ member{ [ this , thismilieu ].val := inv } };
Where the usual setVal( method in C++ [5] or Java [4] returns void, this method returns a milieu, and the milieu returned is given by the expression member{ [ this , thismilieu ].val := inv }. This expression demonstrates how instance members are modified in EphrataX. Note that this expression does NOT use the lvalue/rvalue semantics in C or C++. Instead EphrataX treats this expression as a single production with the following BNF:
In EphrataX, a milieu is a primitive type just like an integer or a character. Milieux can be manipulated and stored in objects just like any other datatype. This is an important feature because it allows object systems to manipulate entire milieux of data within an application. This allows a set of instances to be pigeonholed into a single element that can be moved from structure to structure as necessary. There are a number of design advantages in being able to keep a closed system of objects as an independent element in a data structure. It makes that set of instances so that they can not be manipulated by another object unless that object has access to that particular milieu. This can be used to shield a set of objects from unauthorized access. It also allows for the creation of data structures of milieux, which is important for implementing concepts such as multilevel undo. An example of a simple class containing a milieu as an instance member is shown below:
class MilieuRef
{
private milieu val;
public milieu getVal( )
{ val };
public milieu setVal( milieu inv )
{ member{ [ this , thismilieu ].val := inv } };
}
The MilieuRef class in this example is just like the IntRef class given above, except that every instance of the word int was replaced with milieu. Otherwise, they have identical semantics. Storing a milieu inside an object is useful when attempting to implement sophisticated error handling, and/or multilevel undo. Note that in the setVal( method, the milieu stored in val if different from the milieu curried into the keyword thismilieu. The thismilieu milieu defines the context in which the MilieuRef object has a particular state. Meanwhile, the member val can take on a variety of states depending on the context thismilieu in which it is accessed. As a result, the value of val is defined by the context thismilieu but may not be equal to thismilieu. However if [ W , X ].setVal( Y ) returns a milieu Z, then one can say that the value of W.val is defined to be equal to Y in the milieu Z. Or put another way, Y is equal to [ W , Z ].getVal().
An object can also be an instance member of another object. EphrataX uses the same conventions for instance declarations as Java [4] in that a declaration as a member or a variable is actually only declaring a reference to a “heap allocated” object, not an actual object. The reference may point to an actual object, or it may point to null. Objects can be created through class methods. The construction of objects will be discussed in detail in a subsequent part of the text. A simple example of an object that stores a reference to another object is shown below:
class MilieuRefRef
{
private MilieuRef val;
public MilieuRef getVal( )
{ val };
public milieu setVal( MilieuRef inv )
{ member{ [ this , thismilieu ].val := inv } };
}
class IntRefRef
{
private IntRef val;
public IntRef getVal( )
{ val };
public milieu setVal( IntRef inv )
{ member{ [ this , thismilieu ].val := inv } };
}
Like the MilieuRef example earlier, this is another simple modification of the IntRef example above with different datatypes.
Using Methods
The real power in EphrataX is not so much in the fact that objects can be declared as it is in how the declarations can be put together. Below is an example of a class called ControllerClass that demonstrates this concept. This is not intended to be a complete class definition in itself, but is intended to demonstrate a concept. Assume that ControllerClass has other accessor methods that are not shown here.
class ControllerClass
{
private IntRef val1;
private IntRefRef val2;
/* … */
public milieu modifyVals( int v1 , int v2 )
{
let
{
milieu p1 = method{ [ val1 ,
thismilieu ].setVal( v1 ) };
IntRef v3 = method{ [ val2 , p1 ].getVal( ) };
milieu p2 = method{ [ v3 , p1 ].setVal( v2 ) };
}
with p2 fi
};
}
The class contains an IntRef and an IntRefRef, which are classes defined in earlier examples in this paper. The method modifyVals( sets the single-referenced integer to v1, and the double-referenced integer to v2. The operation is defined functionally using a let block that is similar to the let block in ISWIM [1], except that each entry in the let block has a type declaration. Note that the second and third lines of the let block use the milieu p1 rather than thismilieu. If they used thismilieu, then the modification in the first line of the let block would be lost because thismilieu does not contain the modification. Instead, the second modification needs to act on the milieu that contains the first modification. In this way, one can create a milieu that contains subsequent changes to multiple members of objects.
As mentioned earlier, milieux in EphrataX provide a context for an entire closed system of objects, rather than a context for a single isolated object. If instead there were one context per object, a method would need a context for every object that it was designed to read or modify. Further such a method would need to return one new context for every object that it modified, and dependencies between objects (which can form a possibly cyclic digraph) would be difficult to manage. By contrast, consider the EphrataX semantics in the first line of the let block above:
milieu p1 = method{ [ val1 , thismilieu ].setVal( v1 ) };
Because a milieu describes the state of all objects in a closed system, the syntax of getting the necessary context information from a method is invariant of the complexity of the method’s implementation, the number of objects the method modifies, or the number of members in each object that the method modifies. The method setVal( can modify one object defined by thismilieu or every object defined by thismilieu, and it only needs to return one milieu to describe the change(s). This invariance is an important feature of the language because it means that a method in EphrataX can act as a “black box” that can be re-implemented to modify objects in different ways without the need to change the parameters used to call it. Thus, EphrataX methods can implement abstract operations on arbitrary pieces of data. In the case above p1 is set to the result of calling the setVal( black box the parameter value v1.
So p1 contains a milieu that is (more than likely) different from thismilieu in some way. One may now be tempted to ask “How does one make use of the new context contained in p1 as opposed to the previous one in thismilieu?” Now that the black box has caused a modification, how is that modification interrogated? This is shown in the second line of the let block in the example above:
IntRef v3 = method{ [ val2 , p1 ].getVal( ) };
This line calls getVal( on the object val2 like previous examples. The difference is the milieu used in the call. The milieu used is p1 rather than thismilieu. If the object referenced by val2 had in fact been modified by the setVal( method in the first line of the let block, then v3 would show that change. But that is not the case in this example, so the method simply returns the same reference found in thismilieu. Getting the OID for v3 is important to completing the last line of the let block:
milieu p2 = method{ [ v3 , p1 ].setVal( v2 ) };
This creates a milieu p2 that is a modification of p1. This shows how successive modifications to a set of objects can be implemented in EphrataX. The first modification took thismilieu to p1. The second modification took p1 to p2. The choice to which milieu to employ in a particular modification is very important. For instance if the second modification took thismilieu to p2, then it would lose anything in p1 that was different from thismilieu. Sometimes losing information in this fashion is a good thing. As an example, it can be used to throw away modifications that were caused by erroneous input data. In the example above, however, the desired result is a milieu containing data from both modifications. Therefore, p2 is implemented as a function of p1.
Note that the objects modified by the method were objects other that the one that the method was called on. If one considers the ControllerClass instance to be a piece of data, and the IntRef objects to be its sub-data, the one can see how many aspects of data abstraction can be specified in a declarative fashion. Methods can be used to provide a more object-oriented implementation as follows:
class IntRefRef
{
private IntRef val;
public IntRef getVal( )
{ val };
public milieu setVal( IntRef inv )
{ member{ [ this , thismilieu ].val := inv } };
public milieu setReferencedVal( int inv )
{ method{ [ val , thismilieu ].setVal( inv ) } };
}
class ControllerClass
{
private IntRef val1;
private IntRefRef val2;
/* … */
public milieu modifyVals( int v1 , int v2 )
{
let
{
milieu p1 = method{ [ val1 ,
thismilieu ].setVal( v1 ) };
milieu p2 = method{ [ val2 ,
p1 ].setReferencedVal( v2 ) };
}
with p2 fi
};
}
This example is very similar to the previous ControllerClass example. Here the same modification is made to member val2 through the method setReferencedVal. This shows how modifications can be routed through multiple methods. The modifyVals( method calls setReferencedVal( which calls setVal(. It is the milieu returned by setVal( that is the one that actually shows up in p2. The object that gets modified by setReferencedVal( is two methods removed from the one on which modifyVals( was called. Arbitrarily sophisticated uses of method calls can be set up in a similar way. Any kind of modification of objects through method calls that is possible in a language such as Java [4] can be implemented functionally in EphrataX through the appropriate use of milieux.
Pairs and Object Creation
Like the language Objective C, EphrataX does not have constructors. Objects are constructed using class methods. This is advantageous for classes that are designed to only create objects in a selective fashion. For example, some classes are designed to only allocate a new object if an object of identical value does not already exist. A number of string implementations use this to ensure that two strings are identical iff. their OIDs are. This can greatly reduce the amount of computation time required for string comparison. Using a class method (as indicated by the static keyword) instead of a new operator gives each class direct control over when and how its instances are created. An example of a class method to create an object of class UniqueUnionClass is:
public static pair[ UniqueUnionClass ] new( StreamClass arga ,
StreamClass argb )
{
let
{
pair[ UniqueUnionClass ] q = class_method{
[ UniqueUnionClass , thismilieu ].allocate() };
pair[ NonUniqueUnionClass ] r = class_method{
[ NonUniqueUnionClass , q.milieu ].new() };
milieu q2 = member{
[ q.cobj , r.milieu ].arg1 := r.cobj };
}
with [ q.cobj , q2 ] fi
};
This example also introduces a new EphrataX data type called the pair. The pair of a object reference of type A is defined as . Pairs do NOT have OIDs, and they do not exist in milieux except as members of objects or classes. One can see why the pair is useful by thinking about what the new method does. It both creates a new object, and it creates a modified milieu that contains that new object. The pair makes it easy to return both of these entities to the caller at the same time. A pair is built by placing the item and milieu that it is to contain in brackets in a manner like the following: [ obj , thismilieu ]. Elements can be extracted from a pair using the .cobj and .milieu operators. The construction class methods, which can be given arbitrary names, allocate new objects by using the protected class method allocate( provided by the runtime. After allocating the object, the method is then free to initialize it in arbitrary ways (as the above example does).
Because a pair always contains two and only two elements, the compilation of a pair takes on many of the characteristics of the compilation of a primitive type. An EphrataX compiler treats pairs very differently from the manner in which a standard compiler would handle code generation for a typical derived type such as an array or a record. To review, expressions on a primitive type in typical languages are mapped to primitive operations (in a language like Three Address Code) that write to temporary variables defined by the compiler. Each primitive operation has one primitive temporary variable as its result. Similarly each primitive expression on an EphrataX pair, in the compiled representation, has two temporary variables as its result: one for the object reference and one for the milieu. Moreover each primitive operation on a pair maps to two Three Address Code operations, one for each of the two temporary variables that make up the pair. In this way pairs get mapped directly to lines of Three Address Code in the same way that a primitive type does, except that everything with a pair gets mapped to two lines instead of one. This allows operations on pairs to mapped into more efficient code than is otherwise possible. This is an important optimization because pairs play a fundamental role in the language.
Inheritance, Encapsulation, and Polymorphism with Functional Programming
An object-oriented language can be defined as one that supports inheritance, encapsulation, and polymorphism. EphrataX supports all of these concepts. Inheritance is declared using the extends keyword:
class abc extends xyz
{
private int added_field;
}
If the extends clause is not specified, the class is assumed to extend a common base class called Object. The first version of EphrataX is currently expected to support single inheritance, with multiple inheritance to be studied as a possibility for a future version. Encapsulation is provided by the private and protected keywords. These keywords can be used to prevent data from being accessed in an unauthorized fashion, which makes the software more robust against accidental corruption. Encapsulation can potentially also apply to classes. For instance, a declaration like the following could be created:
protected class xyz
{
/* … */
}
However, the exact semantics of class-level encapsulation have not been worked out at the present time. Polymorphism exists because a subclass can add members and/or override methods. All methods in EphrataX are “virtual” in the sense that they can be overridden by subclass methods unless the method is declared to be final. Further, any class can be subclassed unless the class is declared to be final.
Abstract classes and methods (also known as pure virtual methods) can be declared using the abstract keyword (similar to Java [4]). A method can only be abstract if it is contained in a class declared to be abstract. The allocate() method of a class will not exist unless the class is not abstract. A subclass of an abstract class is defined to be abstract unless it overrides all abstract superclass methods with non-abstract declarations.
In addition to its object-oriented features, EphrataX is also a fully functional language. Any EphrataX method can contain an entire ISWIM-like program. For example, here is an example of a program that performs recursive processing within a EphrataX method.
class FacClass
{
public static int evalFactorial( int inv )
{
let
{
int fac( int x ) = if x == 0 then 1
else x * fac( x – 1 ) fi;
int y = inv;
int outv = fac( y );
}
with outv fi
}
}
JUndo doesn't allow a function to be defined in a let block as shown above. Although doing so can be used to create some interesting mathematics, it can also be used to create bad software engineering. Functions like fac( should be declared as methods so that one can take advantage of inheritance, polymorphism, and encapsulation. This leads to cleaner designs.
EphrataX follows the Java [4] convention that all “code” (e.g. all functional programs) must exist within some method. That is to say, there are no “global” functions. All functions lie within the scope of some class. However as can be seen in the example above, this does not place any limitation on what one can actually accomplish. A EphrataX program can be executed by calling a class method with the curried value of the initial milieu (also available from the initial_milieu keyword), similar to the use of the main( method in Java applications. The precise mechanism for doing this has not yet been worked out. The current version of EphrataX under development contains a complete set of functionality for producing an functional program, but does not provide some of the operators from intensional programming such as “fby”, “prev”, “next”, and “at”. These operators will be addressed in a future version of the language.
Looking back, I eventually addressed the "fby" problem by doing something different. The typical "fby" expressions generate recursive schemas that can't be mapped into while schemas without sophisticated optimization techniques. One possible alternative was something similar to the Domain Relational Calculus in the Datalog language, but there were some severe mismatches inherent in this approach. JUndo now uses the "select-from-where" declarative syntax from Tuple Relational Calculus (TRC) languages such as SQL. That JUndo select doesn't actually use TRC even though the select expression has the same syntax. The advantage of the "select" syntax is that it's a declarative syntax that generates while schemas by default (while schemas are usually more efficient than recursive schemas). It also potentially permits some 4GL-like code optimizations in the future.
Seq Blocks
In addition to let expressions, EphrataX also provides seq expressions as a form of “syntactic sugar” to aid in the creation of code. The use of milieux can sometimes be tedious. In a let block, a milieu has to be declared for every state change, and then that milieu has to be used in the context of the proper method. The seq expression is intended to simplify this process by currying milieux in a sequential manner so as to produce a semantics resembling a sequential block of procedural code. Below is an example of an expression and its seq counterpart.
let
{
milieu a0 =[ abc , thismilieu ].setVal( 20 );
int q2 = [ xyz , a0 ].getVal( 15 );
milieu a1 = [ xyz , a0 ].setVal( 45 );
milieu a2 = [ def , a1 ].setVal( q2 );
milieu qx = [ ghi , a2 ].setVal( [ abc , a2 ].getVal() +
[ xyz , a2 ].getVal() );
}
with qx fi
seq thismilieu into
{
abc.setVal( 20 );
int q2 = xyz.getVal( 15 );
xyz.setVal( 45 );
def.setVal( q2 );
milieu qx = ghi.setVal( abc.getVal() +
xyz.getVal() );
}
with qx fi
As one can see, the seq block greatly simplifies performing an ordered set of tasks. It does this by using a set of heuristics to transform the lower seq expression into something computationally identical to the upper let expression. Essentially, the seq block is parsed as if it is a piece of procedural code, and then the compiler determines how to declare and curry the proper milieux so that each method gets the right conceptual snapshot of the heap. This isn’t as hard as it sounds. In EphrataX, a method can not have a “side effect” in the procedural sense unless it returns a milieu. So for instance the milieu that goes into the method on the third line of the seq block, xyz.setVal( 45 );
, is found by moving upward until one finds a line where a milieu was returned by the expression, but no milieu was declared as taking that expression. This yields the milieu returned by the first line of the block, abc.setVal( 20 );. The procedure becomes slightly more complex when path expressions are employed, but follows the same general principle.
Casting of Object Types
One of the powerful features of the Java [4] language is the ability to use typecast and instanceof operators to cast an object from a common superclass to a subclass. This is particularly useful when a data structure contains a number of instances of a superclass as its data, and one wants a traversal procedure to separate these instances based on their subclass memberships. Java automatically checks all object casts, and returns a cast value of null if the superclass object being casted is not actually a member of the desired subclass. This prevents an object reference from ever pointing to an object of the wrong type, and prevents the Java VM (at least in theory) from ever entering a situation where it might core dump. In EphrataX, a pair of type pair[ A ] can be cast to pair[ B ] where B is a subclass of A. Although a pair can be cast, an object by itself can not. This is because some object languages (NOT the current version of EphrataX) allow for subclass membership to be predicate-defined. That is to say, which subclass an object belongs to is based on a predicate whose truth can change at run-time. For instance, an object of superclass Person may have two subclasses called Child and Adult with a predicate stating that a Person is an Adult if its age attribute is 18 years or greater, otherwise it is a member of the Child class. Updating the age of a Person could then cause that Person object to switch from being an instance of Child to an instance of Adult. In the EphrataX paradigm, this means that that an object can belong to a different class depending on which milieu it is in. Although EphrataX does not implement this kind of functionality, letting the programmer cast an OID without reference to what milieu it is in would effectively prohibit any future extension of EphrataX from ever being able to directly use predicate-defined subclassing. So to leave the door open to possible future enhancements, it is a requirement that only a pair can be typecast. A EphrataX runtime can then check the actual snapshot of the object in the milieu specified as part of the pair before casting. An example of an object cast in EphrataX is as follows:
[ SubClass ]( [ super_class_object , object_milieu ] )
Path Expressions
Path expressions are a typical part of OO programming in languages such as C++ [5] and Java [4]. For instance, in a set of Java classes like the following:
class Z { /* … */ }
class Y {
/* … */
Z getZ( ) { return( MyZ ); }
Z MyZ;
}
class X {
/* … */
Y getY( ) { return( MyY ); }
Y MyY;
}
class W {
/* … */
X getX( ) { return( MyX ); }
X MyX;
}
a Java program can use a variable MyW of type W in an expression such as MyW.getX().getY().getZ(). A analogous expression can be written in EphrataX as long as one is conscious of which milieu is being curried in. One can make the milieux explicit by using the bracketed form [ [ [ MyW , p1 ].getX() , p2 ].getY() , p3 ].getZ() where p1, p2, and p3 are milieux.
Emulating Pass by Reference Semantics
Passing variables by reference is a technique commonly used in a number of procedural languages. In Java [4], the OID of a mutable object A can be passed as a parameter to a method B, and then the code for B can modify the state of A. Such a modification is typically referred to as a side-effect. Even though EphrataX is a functional language, it can take advantage of an object passed to a parameter in a similar way. Consider the following method:
public static milieu incrementObject( IntRef inv )
{
let
{
int v = inv.getVal();
milieu outv = [ inv , thismilieu ].setVal( v + 1 );
}
with outv fi
}
The method above takes in the OID of an IntRef inv (which is assumed to be valid in the context thismilieu). It then creates a milieu outv in which the value contained in inv has been incremented. Finally, it returns outv as the result of the method. To computations using the milieu returned by incrementObject(), it will appear as if the incrementObject() method had produced a side-effect on the passed object inv. In this way, it should be possible to translate any use of pass-by-reference into EphrataX code.
Re-Examining the previous MS-DOS prompt example
Now that many of the constructs of EphrataX have been defined, one might wonder how the MS-DOS prompt example given above would be modified if it were to be coded in EphrataX. The DOS prompt example listed earlier in this article used a “clever” recursive algorithm to define each character in a string for a particular time “t”. By “clever,” the author means that some amount of inductive sophistication is required on the part of the reader in order to fully comprehend why the code returns the correct prompt character for a particular value of ChrIdx. However, in a real-world program it is very hard to maintain a set of source code that repeatedly employs this kind of clever use of recursion. Object-oriented techniques in EphrataX can be used to shield the programmer from the complexities of these inductive relationships. Instead of recursion based on the value of ChrIdx, an EphrataX program would simply make use of a string class that supports methods for concatenating characters to strings, and extracting substrings. An example of this is given below assuming the existence of a String class patterned after the one in Java [4]. Although the code below is somewhat more lengthy than the previous DOS prompt example, it requires much less inductive sophistication, it is more modularly organized, and it can accept future modifications in a much more robust and flexible manner.
class DosPrompt extends Object
{
public static pair[ KeyEvent ] getKeyEventPair( int t )
{ /* … */ }
private static milieu getInitialStateMilieu( )
{
let
{
pair[ String ] strp = class_method{ [ String ,
initial_milueu ].new() };
milieu initm = member{
[ DosPrompt , strp.milieu ].StateString
:= strp.cobj };
}
with initm fi
}
private static milieu handleNormalBackspace( )
{
let
{
String prev_str = class_method{ [ DosPrompt ,
thismilieu ].getStateString( ) };
int len = method{ [ str , thismilieu ]
.length( ) };
Pair[ String ] new_str = method{ [ prev_str ,
thismilieu ].substring( len - 2 ) };
milieu outm = member{
[ DosPrompt , new_str.milieu ]
.StateString :=
new_str.cobj };
}
with outm fi
}
private static milieu handleBackspace( )
{
let
{
String str = class_method{ [ DosPrompt ,
thismilieu ].getStateString( ) };
int len = method{ [ str , thismilieu ]
.length( ) };
milieu outm = if len > 0 then
(
class_method{ [ DosPrompt ,
thismilieu ]
.handleNormalBackspace( ) };
)
else thismilieu fi;
}
with outm fi
}
private static milieu handleRegularChar( char InChar )
{
let
{
String prev_str = class_method{ [ DosPrompt ,
thismilieu ].getStateString( ) };
Pair[ String ] new_str = method{ [ prev_str ,
thismilieu ].concat( InChar ) };
milieu outm = member{ [ DosPrompt ,
new_str.milieu ].StateString :=
new_str.cobj };
}
with outm fi
}
private static milieu getModifiedStateMilieu( int t )
{
let
{
milieu prev_e = class_method{ [ DosPrompt ,
thismilieu ].getSysStateMilieu(
t – 1 ) };
pair[ KeyEvent ] prev_k = class_method{
[ DosPrompt , prev_e ]
.getKeyEventPair( t ) };
milieu prev = prev_k.milieu;
char InChar = method{ prev_k.getKeyChar() };
milieu outm = if InChar == KeyEvent.BACKSPACE then
(
class_method{ [ DosPrompt , prev ]
.handleBackspace() }
)
else
(
class_method{ [ DosPrompt , prev ]
.handleRegularChar( InChar ) }
) fi
}
with outm fi
}
public static milieu getSysStateMilieu( int t )
{
if t == 0 then
(
class_method{ [ DosPrompt , thismilieu ]
.getInitialStateMilieu() }
)
else
(
class_method{ [ DosPrompt , thismilieu ]
.getModifiedStateMilieu() }
) fi
}
public static String getStateString( )
{ StateString }
private static String StateString;
}
The DosPrompt class definition above makes use of two external classes, String and KeyEvent. String is the string class that the program uses to represent the state of the prompt. KeyEvent is a class whose instances represent keypress events from the keyboard, and is an implementation of command from the time stream example earlier in the paper. Similarly, getSysStateMilieu represents an implementation of getState from the time stream example, and getKeyEventPair represents an implementation of getCommand from the time stream example. Instead of a single character or integer, the getSysStateMilieu method returns a milieu that can potentially contain an arbitrarily complex set of objects, and can be accessed in arbitrarily complex ways by whatever interface code is dynamically linked to the class. For instance, the string instance StateString, that defines the state of the prompt for a particular milieu returned by the getSysStateMilieu method, can be retrieved by calling the static method getStateString with the milieu returned by getSysStateMilieu for a particular “t”. Once the string is retrieved, its length can be determined by calling the length instance method on the string in the same milieu. In this way, the single getSysStateMilieu method replaces two interrelated recursive methods, getLength and getStringChar, from the earlier example. A single recursive method returning a milieu can in a sense be equivalent to an arbitrary number of interrelated recursive methods returning primitive types in a similar fashion. Incidently, getSysStateMilieu is coded so as to be called with the initial_milieu milieu, regardless of the value of “t”.
The getKeyEventPair method shows one way in which instances of event classes can be used in EphrataX. This method, which is assumed to be dynamically linked to an external event-handling API, takes the curried milieu thismilieu and creates a new milieu that is identical to thismilieu except that it contains a new event object representing the input character for the time “t”. It then returns the pair [ X , Y ], where X is the new event object and Y is the new milieu that contains X. Arbitrarily complex event types can be handled by this kind of arrangement. The author believes that it is important to integrate event objects into the same milieux as the system of objects that holds the state that the events are affecting. This allows objects associated with the system state to hold references to various event objects, much as they do in C++ and Java programs now.
Potential Applications
One may wonder is EphrataX isn’t a great solution in search of a problem. On the contrary, there are a number of problems for which EphrataX could greatly aid the tasks of design and implementation. A few examples are given here.
Error Handling in Complex Systems
A subsystem may make one or more modifications to its data before an error is discovered. These modifications then have to somehow be removed so that the system can return to a consistent state. In procedural languages, it is often difficult to design an algorithm that can properly affect this reversal in state. Often the resulting implementation has a number of special cases that must be very extensively tested. Some languages attempt to use exception handling to deal with erroneous cases by providing a way for an error to interrupt program execution in a controlled fashion. However the state may have already been changed by the time the error is found, and exception handling provides no built-in mechanism for reversing these changes. This kind of state recovery can be handled in EphrataX by combining the use of milieux with a layered design technique. An example of this is shown below:
class RequestHandler
{
public pair[ boolean ] handleRequest( RequestClass e )
/* Handles a user request. If the request produces an error,
returns true and leaves the system in its original state.
Otherwise adds the request to the data state and returns
false. */
{
let
{
milieu ths = thismilieu;
pair[ boolean ] res = handleRequestImpl( e );
milieu output = if res.cobj then ths
else res.milieu fi;
}
with [ res.cobj , output ] fi
}
protected pair[ boolean ] handleRequestImpl( RequestClass e )
{
/* Actual implementation here, returns true if there
is an error. This part does not have to worry
about returning error conditions to the original
state. */
}
}
The design of this class has two conceptual layers. The outer public interface layer is represented by the method handleRequest, while the inner implementation layer is represented by the method handleRequestImpl. The outer layer is responsible for calling the inner layer, and then returning the data to a consistent state if an error is found. As one can see, this can be done very simply in EphrataX. As long as the inner layer correctly detects the errors, the outer layer will always be able to recover the data to the previous consistent state. This is true regardless of how complex the implementation of handleRequestImpl is, or the extent to which it made modifications to its data before the error was detected. As the design of the system for which error handling is required becomes more and more complex, the use of a language like EphrataX for the implementation becomes increasingly advantageous.
This has been used in Verdantium. For instance, that a Verdantium script execution fails halfway through there is some milieu rollback that is performed by the UndoManager. I want to do more with this in the future.
Undo
Unless a program is simple enough that every action requested of it has a straightforward generic inverse action, undo can be difficult to implement using procedural languages. The requirements of many complex interactive programs require both OO programming to simplify the actions and some mechanism so that those objects reverse state when the user selects the “Undo” option from a pull-down menu. A program as part of the implementation of its action may create several objects, remove all references to some others, add or remove objects from data structures, and/or create or delete other arbitrarily relationships between objects. Functional programming could greatly reduce the amount of time and effort required to design code that would reverse these actions. Similar to the error handling mechanism described previously, EphrataX could be used to store the milieu containing the object state prior to the user’s last operation in some structure. When the user hits “Undo”, the current milieu is replaced by the stored milieu, changing the program state back to its previous condition.
This is now being used by several Verdantium components.
Scripting
Another potential application is the implementation of “scripts” or “macros”, where a macro is a series of actions recorded by the user for use at a future time. When a macro is used, the program essentially handles an action that contains an ordered list of more primitive actions (that were recorded earlier). The ability to perform recursion on objects greatly simplifies the task of scripting. The program simply breaks the complex action into a series of simpler actions that are sequentially fed into the system. For systems that require both undo and macro support, the advantages are even greater. Because a milieu functions correctly irrespective of how complex a set of operations were used to create it, performing an undo operation on a script is no more difficult than performing the same undo on any other action. Sophisticated scripting systems do not require equally sophisticated undo mechanisms just to reverse whatever changes they made to the object state (as they do in many procedural languages).
As it turns out scripting and undo have to work very closely together in systems that support both. The paper on Temporal Undo from Xerox PARC speaks to this topic. It gets relatively complex for a lot of systems when a scripted event, consisting of several events, has to be rolled back. This is one place where JUndo works quite well.
Use of Formal Specifications
One technique that has been researched in Software Engineering is the formal specification of software. This involves defining the algorithm that a program uses in terms of a set of axioms, and then attempting to produce an implementation that closely conforms to those axioms. Because EphrataX is a completely functional language, it may be very useful for implementing axioms specified in a functional notation. EphrataX also provides the ability for the designer to axiomatically specify the operation of an interactive program. This is because the commands that are entered interactively can be specified in terms of a time stream. That is to say, the first command is the command at time 1, the second command is the command at time 2, etc. This can greatly simplify the development of certain kinds of interactive programs.
Re-entrant Programming
In imperative languages, many procedures temporarily leave some data in an inconsistent state until the procedure finishes executing. As a result, if the procedure were called recursively the nested call might return a incorrect or incomplete result. The term “re-entrant” is used to refer to a procedure that always returns the proper result from such a recursive call. When a procedure might return improper data from a recursive call, it is thus not re-entrant. In EphrataX, recursive calls can be made to give correct results even in algorithms that have not been implemented in a re-entrant fashion. The recursive call can be made using the milieu corresponding to the last consistent state of the algorithm. Using such a previous milieu guarantees that the algorithm will not miscalculate data due to an inconsistency of state.
Merging of Data Models
Because EphrataX merges the object-oriented and functional data models in a relatively seamless fashion, concepts from the language are potentially applicable in a variety of cases where two very different data models need to be merged into a single system. For instance, one area of active research in the database field concerns techniques for merging object-oriented and deductive databases. That is to say, there are some databases systems that are based on object-oriented data abstractions, and there are other databases that are based on concepts inspired by the Prolog language, including the ability to deduce facts from other facts. Merging declarative and procedural models into a single system can be problematic. The fact that EphrataX’s data model is both object-oriented and functional may make this task easier. Where a deductive system uses a relation, an functional system uses a function, which in terms of mathematics is just a special kind of relation. Also, the fact that EphrataX uses a completely declarative style of programming should make EphrataX programs more compatible with database query optimization algorithms than other languages (e.g. C++ [5]).
Updates to the state of a particular element of data are also much easier to represent in EphrataX than in certain other declarative languages (e.g. Datalog). EphrataX can either represent updates as declarative changes defined over a time stream, or can declare them in terms of “actions” that map one milieu into another. Interactive systems based on the time stream concept, such as Faustini’s Visual Java [3] project, have already demonstrated how functional programming can be used to represent updates to GUI programs in a declarative fashion.
Databases are not the only systems where complex data management issues arise. Many applications that do not engage in either persistent storage or transactions must nonetheless manipulate and access large sets of data. The more flexible data model in EphrataX makes it possible to design these applications without having to create entire subsystems devoted just to data management.
Implementation
At this time, many of the implementation details for this language are still being worked out. One important design consideration is the determination of the mechanism to use for evaluating recursive functions. For instance, consider the functions below for finding whether a Person object is the supervisor of another Person:
abstract class SupervisorFinder
{
public abstract boolean isImmediateSupervisor( Person A ,
Person B );
public abstract int getNumEmployees( );
public abstract Person getEmployeeByNumber( int num );
protected boolean isSupervisorImpl( Person A , Person B ,
int num )
{
let
{
boolean result_1 = if num < 1
then isImmediateSupervisor( A , B )
else let
{
person X = getEmployeeByNumber(
num );
boolean ba = isSupervisorImpl(
A , X , num - 1 );
boolean bb = isSupervisorImpl(
X , B , num - 1 );
boolean result_2 = ba && bb;
}
with result_2 fi
}
with result_1 fi
}
public boolean isSupervisor( Person A , Person B )
{
isSupervisorImpl( A , B , getNumEmployees() );
}
}
How can this be compiled into an efficient implementation? One approach is to attempt to convert the recursive methods into iterative methods performing the same function. This might be accomplished by having the computation performed in a bottom-up rather than a top-down manner. More research will need to be done in order to work out the details of this.
I am less concerned about bottom-up eval. now than I was when I wrote that. I simply don't have many classes that track such recursive relationships, and hence I don't have any recursion performance issues to worry about at present.
Implementation of Milieux and Caching
One potential criticism that some might have of EphrataX is that the creation of milieux from other complex milieux might require a lot of complex copying of data, leading to redundant data and poor performance. However, this is only true if each milieu contains a copy of all objects. Instead, a milieu can also be defined only in terms of the differences between itself and some other milieu. For instance, creating a new milieu that is an alteration of a previous milieu can be set up in such a way that the previous milieu is defined in terms of the differences between it and the more recent milieu. The previous milieu would both be compact and accessible, while direct access would be available to all objects in the latest milieu. In other cases, recalculation can be used instead of caching. If all the base information on which the program computed is still in memory, then any computation performed by the program can potentially be repeated. This saves a large amount of memory at the expense of computational time.
It may also be desirable to introduce language constructs that give the programmer more direct control over which methods have their results cached. In many programs written in procedural languages, programmers already cache the results of calculations in order to optimize the computational time the programs require. The only difference is they do it manually in terms of storing redundant data in programmer-defined structures. In effect, they cache implicitly instead of explicitly. This creates a number of problems in procedural designs because updating a particular abstract datum in the design requires updating multiple redundant pieces of data in the implementation. In cases where all the data items do not get updated in a manner consistent with each other, bugs eventually result from the inconsistent program states. It might make a lot of sense to try to replace the current implicit procedural caching with explicit functional caching. That is to say, auxiliary structures would not be as necessary to store the values returned by some methods because those values would always be cached by the runtime itself.
Garbage Collection
An EphrataX system may need to engage in two different kinds of garbage collection. First, there is the garbage collection of milieux in the global heap of the EphrataX runtime. Second, there is the possibility that an object in a milieu can be garbage collected because the EphrataX program is no longer using it. As an example of the garbage collection of milieux, consider the following expression:
let
{
milieu p1 = member{ [ x , thismilieu ].val1 := 10 };
milieu p2 = member{ [ x , p1 ].val2 := 15 };
}
with p2 fi
The milieu p1 may be created in the process of evaluating the expression for p2, but once p2 is needed, there may no longer be a need to have a direct reference to the value of p1. The program may have gotten to a point where the above let block is never evaluated again, and thus it will be impossible for any expression to directly access p1. At this point, the milieu p1 can be garbage collected by the system. The milieu p2 could then be defined in terms of thismilieu, rather than the milieu p1. The mechanism for doing this is probably beyond the scope of this paper. Nevertheless, it is possible to reduce a system to only those milieux that are being used at a particular time.
Garbage collection of programmer-defined objects in EphrataX is different from garbage collection in other programming languages. A reference to an OID may exist in a number of different milieux. An OID from a milieu may be combined with an OID from any other milieu to produce a pair that can be used to attempt to reference the value of that object (the runtime has the responsibility of making sure that a particular object is actually in a particular milieu before attempting to access it). This means that if a reference to an OID exists in any milieu, then “copies” of the object in all milieux must be immune from garbage collection. Otherwise, a EphrataX program would not be guaranteed to have access to the “past” or “future” state of an object. At first this may seem to require a prohibitive amount of memory compared to a procedural language, which can garbage collect an object in a particular snapshot of the heap if there are no references to the object in that snapshot. However, if one uses a smarter implementation of the concept of a milieu, this is not the case. A milieu does not need to contain copies of all objects, only the differences between itself and another milieu. For an object that is no longer in use, there is nothing to garbage collect from such a milieu because there are no changes being made to that particular object. In many cases, such a milieu implementation can store sets of related object values in a space-efficient manner without ever garbage collecting. Once an object stops changing, those changes simply never take up any space in a subsequent milieu.
This has a number of potential implications for the performance of languages that use both garbage collection and some mechanism where multiple “copies” of a heap are kept in memory. Because garbage collection may not be employed, the time overhead required for garbage collection may not be necessary. This is at the expense of storing information about the previous changes to all objects. However, if the milieux that contain certain object states become garbage collected then some references to OIDs can disappear because the milieux containing those references no longer exist. So the garbage collection of a milieu can potentially create a condition where the garbage collection of an object can take place because there no longer exists a milieu containing the OID of the object in question. As a result the garbage collection of milieux can result in increased computational overhead to garbage collect objects, but at the benefit of reducing the amount of storage space required by the program.
Future Research and Concluding Remarks
Solutions have been presented for how a functional language can be used to investigate several software design problems in a cleaner way than is possible in procedural languages. Further a language, EphrataX, has been described which has been designed for these types of problems. A number of possible future research directions exist. One question that comes to mind is “what should be the equivalent of CORBA for systems that use an functional object model similar to that in EphrataX?” Current object standards are not expressive enough to fully support the use of objects in the context of functional programming. New procedures would need to be developed in order to guarantee interoperability between functional object systems.
There are also a number of possible improvements to the language itself. EphrataX needs a more complete set of primitive data types, as well as additional complex types such as arrays. The problem of making objects persistent needs to be researched, as do the issues of improving performance, and providing suitable interfaces to the outside world. By “outside world”, the author is referring to a number of external systems. A GUI is one prominent example of such a system, but so is something such as an interface to a serial port, or an external piece of procedural software.
Nevertheless, it is believed that EphrataX provides exciting possibilities for future growth. The author plans to present future developments of the language in a subsequent paper.
References
1. W. W. Wadge, and E. A. Ashcroft, Lucid, the Dataflow Programming Language, Academic Press, New York, NY, 1985.
2. P. J. Landin, ‘The Next 700 Programming Languages’ CACM, 9, 157-166 (1966).
3. T. Faustini, Visual Java, Computer Software, (unpublished).
4. K. Arnold, and J. Gosling, The Java Programming Language, Addison-Wesley, Menlo Park, CA, 1996.
5. B. Stroustrup, and M. A. Ellis, The Annotated C++ Reference Manual, Addison-Wesley, Menlo Park, CA, 1990.
No comments:
Post a Comment