The Undo-Redo paradox Friday, March 20, 2009

In July 2008 I started development on LLBLGen Pro v3's new designer. The first thing I realized was that I needed a good, solid, generic framework to base the new designer on, especially because v3 would introduce a new big feature: model-first entity model development. In short, model-first means that the user starts the designer and can build an entity model from scratch (so no meta-data available whatsoever) and create meta-data and mappings from that entity model, or modify an existing or reverse engineered model by adding new elements. So the user will edit, delete, and do other things which aren't based on any meta-data, but based on theory, thought processes and perhaps trial/error. In short: the user will make changes to a live model in memory and will try to undo and redo these changes during the process. Everywhere. Always. So undo/redo has to be present everywhere, and always in every situation. Removing an element, like an entity definition, should remove all its related and depending elements or at least make them update themselves and undo-ing that removal should restore the original state.

The framework I had in mind would need to be able to undo any edit action, any change. I also needed a new set of data-structures to store the entity model in. In v2.x of LLBLGen Pro, the entity model is stored in an 'enclosed way': if an entity E has a relationship with entity F, it has a relationship object R which is stored inside E (if F also has a relationship with E, there's a relationship object representing that relationship and which is stored in F). While this might be a natural way of storing object graphs (the graph edges are the references between the vertices), it leads to a problem: you can't reason over the entire model, as it always requires traversal of the object tree in a way where you need to dig through an object (e.g. the EntityDefinition instance which contains the instance of the entity relationship) to get to other elements. A graph object with vertices and edges (so the entities would be the vertices and the relationships would be the edges) would be easier to do reasoning over the model.

To be able to undo any change to a model, you need to have some kind of mechanism to perform the change in the first place and then simply revert the action the mechanism performed. This is solved with the Command pattern. In short, it describes a way to perform actions (the 'commands') onto a data-structure or other element and as you have described the action through a command, you can extend it to perform another action when the command action has to be 'undone' or better: rolled back. However, my v2.x code base of LLBLGen Pro doesn't use the command pattern to do its actions, as it never needed to: to get things done you call methods on objects which call other methods, set properties etc., the basic OO style of maintaining an object model in memory. Implementing everything through commands now seemed like a lot of work: imagine every property get/set action has to be done through commands so the change is undo-able, every method call made might change internal members and these changes have to be undoable as well.

Algorithmia

I decided to solve this properly and from the ground up so I started working on a separate project: Algorithmia. Algorithmia started as a .NET 3.5 class library I wrote in my spare time to learn .NET 3.5's new lambda stuff and which contained some well-known algorithms and data-structures which weren't in the .NET 3.5 BCL (or not implemented in a useful manner). So I implemented in-place sort algorithms (so these sort a data-structure in-place, not like Linq's OrderBy() methods which return a new enumerable) as extension methods, a couple of priority queues and heaps like a full Fibonacci Heap. Algorithmia seemed (and still is) perfect to add my general purpose algorithms and data-structures to, and the undo-redo algorithms and related classes are no exception.

After some long, deep thinking I realized I needed two fundamental things to meet LLBLGen Pro v3 requirements: a general purpose undo/redo mechanism and a set of data-structures, like a graph structure which are undo/redo aware. The two separate areas should have 1 thing in common: undo/redo should be transparent to the user (the developer). With transparent I mean:

someObject.Name = "Some String";

where the property set action in the statement above should be undoable (and redo-able). The traditional command pattern approach would have forced one to write such a simple statement with a command, so the action (setting the property) would be undoable by setting it again to the original value. I wanted it solved differently so I didn't need to write command calls everywhere and I also could leverage databinding for example or events and other things build into the .NET framework.

Commands, Command queues and their overlord manager.

To understand what undo/redo really means and how complex it can get, let's look at an example. Say I have a graph with two entity definitions: Customer and Order, and a one to many relationship R between them. Furthermore I map a foreign key field onto R in Order (so it points to Customer's identifying fields, which happens to be CustomerID). All nice and dandy. I feel a bit bold today and I select Customer and hit the DEL key. Obviously, the Customer entity definition is deleted from the model. But that's not enough. To remove it from the model, I have to remove it from the graph and because I do that, I have a dangling relationship (R) which has to be removed from the graph as well. If R is removed, the foreign key field in Order also has to be removed as it's based on R. Pressing DEL sounds rather complex all of a sudden.

The traditional command pattern approach suggests that you issue the action to remove Customer from the entity model graph through a command however that immediately gives a problem: what has to happen to the actions which follow immediately after the removal, like the removal of R and the foreign key field? Do we have to add these commands to the command which did the removal of Customer from the graph or not? If we don't, undo-ing the removal of Customer doesn't automatically undo the follow up actions as well, as these seem to be unrelated. But if we do add these commands to the initial command, it will create a complex piece of code which is also immediately unmaintainable as it has to know about all things which could happen after we've removed Customer from the graph.

The Command

To undo an action, you can take several approaches. For example, you could use the transactional approach where you make changes to a temporary space and finalize it when you commit the transaction. Another approach is to read the initial state right before the Do action is performed and Undo simply restores that state. I've taken the second approach as it is more flexible: there's no transaction to commit: a change is a change and it's final however, it's always undoable. What makes this easy is the introduction of lambda's in .NET 3.5. Do and Undo are simply lambda's. The command has support for lambdas which read the state before the Do lambda is called and the Undo action is simply passing in the original state into the Undo lambda and the action is undone. There are various bells and whistles added to that of course, but that's the basic idea.

The Command Queue

To be able to manage when to undo what, commands are placed in stack-like data-structures: the last command placed into the data-structure is the first to undo. However, that's a 1-dimensional data-structure. In my example above, undo-ing the removal of Customer requires the undo-ing of the removal of R and the removal of the foreign key field. So I created a Command Queue. A Command Queue is internally a Linked List (with a more flexible implementation than BCL's as concatenating these Linked Lists takes O(1) as it should instead of BCL's LinkedList class) with a simple pointer where the last command is. Commands are placed in a Command Queue, one after the other. This gives the flexibility of undo-ing and redo-ing them by simply moving a pointer along the Linked List inside the Command Queue.

To be able to undo a command which spawned other commands, I placed a Command Queue inside every Command. This gives the advantage that when I undo a command, it first calls Undo on all commands in its own queue and then it performs the Undo of itself. Undo-ing a command inside its queue could mean that that command also will perform an Undo action on several other commands first. And here we have our multi-dimensional structure we needed for the situation of our example. However it of course gives another problem: how do we get all these commands neatly nested into each other without any hassle?

The Command Queue manager

I created a thread-safe singleton class which manages the command queues, the CommandQueueManager. This manager is fairly straight forward and it's the interface for the developer to undo/redo anything, to enqueue and execute commands and to keep everything working in the right order. There are some static helper methods on the Command class to easily enqueue itself, but in general the manager is the one to talk to (ain't that always the case? )

The bare-bones mechanism comes down to this: it has an active stack of Command queues and a command which comes in to be executed is simply placed in the command queue at the top of the stack. If a command is executed its queue is placed on top of this stack and every command that gets created while this command is executed is thus placed inside the queue of the command which originated it. When the command is done with its Do method, its queue is popped and the previous queue is now at the top of the stack, which can be the queue of the previous command or the main queue of the manager.

Scopes and threads

Singletons have the side-effect that there's just one instance at runtime, which is nice because that's the reason they're there. The downside is of course that multi-threaded applications have to deal with a shared resource and that's always a sign trouble is ahead if you're not careful. The manager is thread-safe, which means only one thread can queue and work with commands at any given time. Per thread there's also one stack, so different threads can't add commands to each other's command queues. In a way, per thread there's a unique scope. Such a scope consists of a command queue stack. It might be handy in some cases in single-threaded approaches as well: what if you want to create a boundary in which a user can undo/redo actions but when the user closes the form for example the actions are final? That requires a unique scope for that edit form. The Command Queue manager can deal with that, you simply ask for a scope with a new ID and you get it. If you ask for the scope of an ID which was already known, the scope of that ID becomes active.

Back to our example...

So let's go back to our Customer, Order, R and the foreign key field. The user selected Customer and pressed DEL. The UI controller calls into the main system and asks the Project to remove entity Customer. The Project then starts working, but what exactly does it have to do? Remember that I needed a graph to be undo/redo aware. The Entity Model is implemented using Algorithmia's graph class where entity definitions are vertices and relationships are edges (non-directed edges). The graph is undo-redo aware, it manages itself through commands. So removing the Customer entity definition from the Project is as simple as telling the graph to get rid of the Customer instance it has inside itself as a vertex. The UI controller called the single Project method through a command. That command's Command Queue was placed onto the stack of the current scope and its Do lambda was executed. All commands added to the Command Queue Manager will end up in this queue or in a nested queue, so undo-ing the removal will undo all these commands as well.

The graph removes the vertex Customer from itself through a command, which is placed inside the UI call command's queue. The graph notices a dangling edge, R. It removes it too, also through a command, and this command is also placed in that same queue. And now things start to get interesting: when R was removed from the graph, the graph called a method on the edge which raised the edge's event ElementRemoved. Is anyone listing to that? Yes, the object which is used inside the foreign key field inside Order. As the relationship has been removed (as been told through the event), the foreign key field has no purpose anymore, and has to remove itself as well. As it is placed inside a command-aware list, called CommandifiedList<T>, it simply removes itself from its container though that container does the actual removal through a command. That command ends up in... the queue of the removal of R, as that was the active command in progress and that command's queue is on the stack.

So after all this, we have a nested set of commands which we can undo, in the right order, and also which we can redo, in the right order, without the complexity of requiring command creation everywhere, being aware of which command is spawned from where... none of that at all: it's straight-forward .NET code like you and I are used to write.

Undo-ing this Customer removal starts by calling the Undo method on that command. As that command contains a queue with two commands (removal of Customer from the graph and removal of R from the graph). It starts with the last command, which is the removal of R and calls Undo on that. The removal of R command has also commands in its queue, namely the removal of the foreign key field, and starts undo-ing that command first. This makes sure everything is played back in the right order.

But what about that simple property setter example we started with? Let's look at the logic behind that simple statement and how things are made transparently undoable.

The little worker class under the hood: CommandifiedMember

The following code snippet shows a simple test class used in some unit-tests for the command functionality:

public class HelperClass { private enum HelperChangeType { Name } private readonly CommandifiedMember<string, HelperChangeType> _name; public HelperClass() { // create a new commandifiedmember instance and set the default value to empty string _name = new CommandifiedMember<string, HelperChangeType>("Name", HelperChangeType.Name, string.Empty); } public string Name { get { return _name.MemberValue; } set { _name.MemberValue = value; } } }

To combine a lot of functionality around a single member which was needed in a lot of cases I created a class called CommandifiedMember. CommandifiedMember does a lot of things: it sets the value of the member using commands, so setting the value is undo-able. It checks whether the value to set is equal to the current value of the member, so it doesn't issue unnecessary commands. It raises events when the value changes so observers can subscribe on these changes and act accordingly. It has awareness of interfaces which might be implemented on values set as the member value. This is important in the case of the foreign key field of our example: if the identifying field's type changes, the foreign key field's type also changes. To be aware of that, it needs a signal from the identifying field it relates to. Simply changing the identifying field's type will raise an event which will end up in the foreign key field's member which notices this as it automatically subscribed to the event as it recognized it. The member then simply raises an event so the foreign key field notices this and can act accordingly. Similar to removing the relationship R for example: R is removed, so it raises an event that it's been removed. Observers, like CommandifiedMember instances which refer to it, can now act accordingly and set themselves to null or raise an event for example.

The code snippet above doesn't show it, but there's more built in: it is also IDataErrorInfo aware. This is done through an object which is pluggable into a CommandifiedMember and which is also part of Algorithmia, called ErrorContainer. CommandifiedMember is aware of validation and calls a virtual method before it continues to call the Do action. It takes care of logging the error in the ErrorContainer and if a correct value is accepted, it clears the error accordingly. The code snippet above also shows the usage of an enum which is used for the change-type specification. This is useful if you want to use undo/redo to its full potential and implement a lot of logic through events using the Observer pattern: HelperClass could sport an ElementChanged event which propagated the HelperChangeType to its subscribers, which could then easily determine what exactly changed in HelperClass without the necessity for a lot of events and also avoiding string-based approaches like INotifyPropertyChanged.

So with the CommandifiedMember in place, I can create the following, undo/redo aware code:

HelperClass h = new HelperClass(); h.Name = "Foo";

By setting the property, I indirectly create a command which sets the actual member, compile time checked. I can undo this action by simply asking the Command Queue Manager to undo the last command. However, I'm not even aware that setting the property is an undo/redo aware affair nor do I care. I simply write code like I used to do, without the hassle of creating commands to make sure things are undoable later on: missing one spot makes some things suddenly not undoable, with the CommandifiedMember, that's not possible. As it's transparent, I can bind the Name property of an instance of HelperClass to a control and have undo/redo awareness without even writing any code: if the control sets the value of Name, it will be undoable. Of course, to make the control become aware of the fact that Name has been rolled back, I have to implement INotifyPropertyChanged on HelperClass, but that's pretty easy to do: I get an event when _name changes so I can anticipate on the change by a simple event handler:

public class BindableHelperClass : INotifyPropertyChanged { public event PropertyChangedEventHandler PropertyChanged; private enum HelperChangeType { Name } private readonly CommandifiedMember<string, HelperChangeType> _name; public BindableHelperClass() { _name = new CommandifiedMember<string, HelperChangeType>("Name", HelperChangeType.Name, string.Empty); _name.ValueChanged += new EventHandler<MemberChangedEventArgs<HelperChangeType, string>>(_name_ValueChanged); } private void OnPropertyChanged(string propertyName) { if(this.PropertyChanged!=null) { this.PropertyChanged(this, new PropertyChangedEventArgs(propertyName)); } } private void _name_ValueChanged(object sender, MemberChangedEventArgs<HelperChangeType, string> e) { switch(e.TypeOfChange) { case HelperChangeType.Name: OnPropertyChanged("Name"); break; } } public string Name { get { return _name.MemberValue; } set { _name.MemberValue = value; } } }

I introduced a switch for checking on the change type, which is a little overkill as there's just 1 member, but you get the idea. It's not really more code than one would write in the case of a simple normal class, however you get value checking, event raising, undo/redo etc. all for free. Binding the Name property of an instance of this class to a control, say a TextBox, will make it possible to edit this instance with undo/redo awareness.

So where does this 'Paradox' of the title come into play exactly? Well I think you now know enough information to understand the following example of it.

The Undo/Redo 'Paradox'

The Undo/Redo 'paradox' as I dubbed it (probably a bad name, so forgive me), is the contradiction between what the user thinks what's being undone and what the system thinks the user means that should be undone. I've specified 'paradox' in quotes as sometimes people call things a paradox while they clearly aren't a paradox and I'm not yet sure if this is a true paradox, though I have a feeling it unfortunately is.

I've created a real-life example of the paradox in the following screenshot. It's a screenshot of a part of the LLBLGen Pro v3 GUI (where I moved everything close together so it fits in a tiny area):





There's a lot of info in this tiny screenshot and I'll describe briefly what's important to understand the problem. The project shown is a dummy test project with a couple of random entities. At the left you'll see the Project Explorer which shows the groups, the entities, the value types and the typed lists (some elements are still not there, in the case you're missing something. It's not done yet ). At the right of the Project Explorer you see the editor for the Customer entity which is a subtype of Person, and below it a debug panel for the command queue manager where I can see which commands are in the queue and inside which other commands they're stored. As you can see, after I've loaded the project, I created a typed list called Test which spawned one command, the addition of adding a new item to a CommandifiedList. The arrow suggests it's the current command, so pressing cntrl-Z or clicking Undo in the toolbar will undo that command.

So, what's the problem? Well, it's at the top: I typed a space in the entity name and tabbed away from the textbox. The validator plugged into the CommandifiedMember kicked in and denied the value and reported an error: names can't have spaces. So the cursor stays in that textbox.

What will happen if I press cntrl-Z or click Undo? Will that undo the change I made inside the textbox by undoing the insertion of the space, or will it undo the last command it knows, creating the typed list?

The 'paradox' is that the system isn't aware of any command setting the Entity Name to an invalid value (as that would make the project become erroneous: what if I entered a name which is already taken?) however the user is. The textbox has a cntrl-Z mechanism, where pressing cntrl-Z will undo the changes in the textbox, which in the case above would remove the inserted space and everything would be normal. However, what does the user mean: local undo or global undo when issuing the undo command and when are local undo's all of a sudden global undo's?

In general: there's a global undo/redo system with a global access mechanism (cntrl-Z/cntrl-Y) and there are two different scopes in play: the local editor scope and the global model scope: issuing an Undo action raises the question: do you want to undo a local action which might not be propagated to the global model scope (e.g. the change hasn't been processed yet) or do you want to undo the last change at the global model scope level? This isn't an easy question to answer, as I hope to illustrate in the explanations below.

A perhaps more well-known example of this problem is the issue you run into with the Windows Forms designer and after that when you change code in the form class: after you've made some changes to a form in design view, you switch to the class and add some code, like a member declaration. Then press cntrl-Z a couple of times till you've undone all your changes to the code and you'll likely see a message box pop up which tells you that you can undo one last thing which can't be redone. Why is that?

It's the same issue: suddenly the local scope you were working in (the code editor) has no more commands to undo and pressing cntrl-Z again then raises the question: does the user want to undo more commands in the editor (though there aren't any left) or does the user want to undo things on a model /global scale, like the changes made to the design view of the form? That's unclear and can't be solved by the undo/redo system by itself: perhaps the user simply only wants to undo/redo the changes in the editor (like the textbox or code editor) and stop undo-ing commands if there aren't any left, in that scope. However, perhaps the user wanted to undo things on a global scale after all commands in the local scope are undone and to do that the user has to leave the editor to signal that the undo action is not for a local scope. This is of course confusing and unclear for a user as the user isn't aware of the length of command queues or even local / global scopes.

In the specific situation of the screenshot above, there are a couple of obvious things which one might want to try to solve this paradox, like disabling the global undo/redo mechanism when an error occurs, however that doesn't solve the situation where I don't create an error but simply append a couple of characters to the name and then press cntrl-Z. One could think of introducing a scope used only for the textbox, but it then gets tricky to get rid of that scope once the value is indeed valid as that action has to be in the global scope to be able to be undone on a global scale (so I don't have to go back to the textbox to undo the name change). Another solution might be to store the invalid value in the model and simply use the mechanism available so pressing cntrl-Z will undo the change which caused the error. The downside is that if the user presses cntrl-S after the change, the erroneous value is saved which could cause a problem, for example if the file format is in an XML format and elements are referenced by name, so what happens if I specify a name which is already in use, which is an error trapped by the validator, however I still save the project?

I can't find a simple solution for this 'paradox', and I fear there isn't one either, but perhaps some solution pops up soon.

LLBLGen Pro v3.0 is slated for release later this summer/autumn, with support for LLBLGen Pro Runtime Framework, Entity Framework, NHibernate and Linq to Sql, and Algorithmia is shipped with LLBLGen Pro v3, very likely in sourcecode form and a flexible license so you can use it in your own applications as well.