Collaborative Editing in JavaScript: An Intro to Operational Transformation

I've set out to build a robust collaborative code editor for the web. It's called Codr, and it lets developers work together in real time - like Google Docs for code. For web developers, Codr doubles as a shared reactive work surface where every change is instantly rendered for all viewers. Check out Codr's newly launched Kickstarter campaign to learn more.

A collaborative editor allows multiple people to edit the same document simultaneously and to see each other's edits and selection changes as they occur. Concurrent text editing allows for engaging and efficient collaboration that would otherwise be impossible. Building Codr has enabled me to better understand and (I hope) convey how to build a fast and reliable collaborative application.

The Challenge

If you've built a collaborative editor or have talked with someone who has, then you know that gracefully handling concurrent edits in a multi-user environment is challenging. It turns out, however, that a few relatively simple concepts greatly simplify this problem. Below I'll share what I've learned in this regard through building Codr.

The primary challenge associated with collaborative editing is concurrency control. Codr uses a concurrency control mechanism based on Operational Transformation (OT). If you'd like to read up about the history and theory of OT, then check out the wikipedia page. I'll introduce some of the theory below, but this post is intended as an implementor's guide and is hands-on rather than abstract.

Codr is built in JavaScript and code examples are in JavaScript. Significant logic needs to be shared between the server and client to support collaborative editing, so a node/iojs backend is an excellent choice. In the interest of readability, code examples are in ES6.

A Naive Approach to Collaborative Editing

In a zero-latency environment, you might write a collaborative editor like this:

Client

editor.on('edit', (operation) => socket.send('edit', operation)); socket.on('edit', (operation) => editor.applyEdit(operation));

Server

socket.on('edit', (operation) => { document.applyEdit(operation); getOtherSockets(socket).forEach((otherSocket) => otherSocket.emit('edit', operation) ); });

Every action is conceptualized as either an insert or delete operation. Each operation is:

Locally applied in the editing component Sent to the server Applied to a server-side copy of the document Broadcast to other remote editors Locally applied to each remote editor's copy of the document

Latency Breaks Things

When you introduce latency between the client and server, however, you run into problems. As you've likely foreseen, latency in a collaborative editor introduces the possibility of version conflicts. For example:

Starting document state:

bcd

User 1 inserts a at document start. The operation looks like this:

{ type: 'insert', lines: ['a'], range: { start: { row: 0, column: 0} end: {row: 0, column: 1} } }

At the same time, User 2 types e at document end:

{ type: 'insert', lines: ['e'], range: { start: { row: 0, column: 3} end: {row: 0, column: 4} } }

What should happen is that User 1 and User 2 end up with:

abcde

In reality, User 1 sees:

bcd <-- Starting Document State abcd <-- Apply local "insert 'a'" operation at offset 0 abced <-- Apply remote "insert 'e'" operation at offset 3

And User 2 sees:

bcd <-- Starting Document State bcde <-- Apply local "insert 'e'" operation at offset 3 abcde <-- Apply remote "insert 'a'" operation at offset 0

Oops! 'abced' != 'abcde' - the shared document is now in an inconsistent state.

The Easy Fix is Too Slow

The above conflict occurs because each user is "optimistically" applying edits locally without first ensuring that no one else is making edits. Since User 1 changed the document out from under User 2, a conflict occurred. User 2's edit operation presupposes a document state that no longer exists by the time it's applied to User 1's document.

A simple fix is to switch to a pessimistic concurrency control model where each client requests an exclusive write lock from the server before applying updates locally. This avoids conflicts altogether. Unfortunately, the lag resulting from such an approach over an average internet connection would render the editor unusable.

Operational Transformation to the Rescue

Operational Transformation (OT) is a technique to support concurrent editing without compromising performance. Using OT, each client optimistically updates its own document locally and the OT implementation figures out how to automatically resolve conflicts.

OT dictates that when we apply a remote operation we first "transform" the operation to compensate for conflicting edits from other users. The goals are two-fold:

Ensure that all clients end up with consistent document states Ensure that the intent of each edit operation is preserved

In my original example, we'd want to transform User 2's insert operation to insert at character offset 4 instead of offset 3 when we apply it to User 1's document. This way, we respect User 2's intention to insert e after d and ensure that both users end up with the same document state.

Using OT, User 1 will see:

bcd <-- Starting Document State abcd <-- Apply local "insert 'a'" operation at offset 0 abcde <-- Apply TRANSFORMED "insert 'e'" operation at offset 4

And User 2 will see:

bcd <-- Starting Document State bcde <-- Apply local "insert 'e'" operation at offset 3 abcde <-- Apply remote "insert 'a'" operation at offset 0

The Life-Cycle of An Operation

A helpful way to visualize how edits are synchronized using OT is to think of a collaborative document as a git repository:

Edit operations are commits The server is the master branch Each client is a topic branch off of master

Merging Edits Into Master (Server-Side) When you make an edit in Codr, the following occurs:

The Codr client branches from master and locally applies your edit The Codr client makes a merge request to the server

Here's git's lovely (slightly adapted) diagram. Letters reference commits (operations):

Before Merge:

A topic (client) / D---E---F master (server)

After Merge:

A ------ topic / \ D---E---F---G master

To do the merge, the server updates (transforms) operation A so that it still make sense in light of preceding operations E and F , then applies the transformed operation ( G ) to master. The transformed operation is directly analogous to a git merge commit.

Rebasing Onto Master (Client-Side) After an operation is transformed and applied server-side, it is broadcasted to the other clients. When a client receives the change, it does the equivalent of a git rebase:

Reverts all "pending" (non-merged) local operations Applies remote operation Re-applies pending operations, transforming each operation against the new operation from the server

By rebasing the client rather than merging the remote operation as is done server-side, Codr ensure that edits are applied in the same order across all clients.

Establishing a Canonical Order of Edit Operations

The order in which edit operations are applied is important. Imagine that two users type the characters a and b simultaneously at the same document offset. The order in which the operations occur will determine if ab or ba is shown. Since latency is variable, we can't know with certainty what order the events actually occurred in, but it's nonetheless important that all clients agree on the same ordering of events. Codr treats the order in which events arrive at the server as the canonical order.

The server stores a version number for the document which is incremented whenever an operation is applied. When the server receives an operation, it tags the operation with the current version number before broadcasting it to the other clients. The server also sends a message to the client initiating the operation indicating the new version. This way every client knows what its "server version" is.

Whenever a client sends an operation to the server, it also sends the client's current server version. This tells the server where the client "branched", so the server knows what previous operations the new change needs to be transformed against.

Transforming an Operation

The core of Codr's OT logic is this function:

function transformOperation(operation1, operation2) { // Modify operation2 such that its intent is preserved // subsequent to intervening change operation1 }

I won't go into the full logic here, as it gets involved, but here are some examples:

If op1 inserted line(s) before op2 's line, increase op2 's line offset accordingly. If op1 inserted text before op2 on the same line, increase op2 's character offset accordingly. If op1 occurred entirely after op2 , then don't do anything. If op1 inserts text into a range that op2 deletes, then grow op2 's deletion range to include the inserted text and add the inserted text. Note: Another approach would be to split op2 into two deletion actions, one on either side of op1 's insertion, thus preserving the inserted text. If op1 and op2 are both range deletion operations and the ranges overlap, then shrink op2 's deletion range to only include text NOT deleted by op1 .

Syncing Cursor Position and Selection

A user selection is simply a text range. If the start and end points of the range are equal, then the range is a collapsed cursor. When the user selection changes, the client sends the new selection to the server and the server broadcasts the selection to the other clients. As with editing operations, Codr transforms the selection against conflicting operations from other users. The transform logic for a selection is simply a subset of the logic needed to transform an insert or delete operation.

Undo/Redo

Codr gives each user their own undo stack. This is important for a good editing experience: otherwise hitting CMD+Z could undo someone else's edit in a different part of the document.

Giving each user their own undo stack also requires OT. In fact, this is one case where OT would be necessary even in a zero-latency environment. Imagine the following scenario:

abc <-- User 1 types "abc" abcde <-- User 2 types "de" ce <-- User 1 deletes "bcd" ?? <-- User 2 hits CMD+Z

User2's last action was:

{ type: 'insert', lines: ['de'], range: { start: { row: 0, column: 3} end: {row: 0, column: 5} } }

The inverse (undo) action would be:

{ type: 'delete', lines: ['de'], range: { start: { row: 0, column: 3} end: {row: 0, column: 5} } }

But we obviously can't just apply the inverse action. Thanks to User 1's intervening change, there is no longer a character offset 3 in the document!

Once again, we can use OT:

var undoOperation = getInverseOperation(myLastOperation); getOperationsAfterMyLastOperation().forEach((operation) => transformOperation(operation, undoOperation); ); editor.applyEdit(undoOperation); socket.emit('edit', undoOperation);

By transforming the undo operation against subsequent operations from other clients, Codr will instead apply the following operation on undo, achieving the desired behavior.

{ type: 'delete', lines: ['e'], range: { start: { row: 0, column: 1} end: {row: 0, column: 2} } }

Implementing undo/redo correctly is one of the more challenging aspects of building a collaborative editor. The full solution is somewhat more involved than what I've described above because you need to undo contiguous inserts and deletes as a unit. Since operations that were contiguous may become non-contiguous due to edits made by other collaborators, this is non-trivial. What's cool though, is that we can reuse the same OT used for syncing edits to achieve per-user undo histories.

Conclusion

OT is a powerful tool that allows us to build high-performance collaborative apps with support for non-blocking concurrent editing. I hope that this summary of Codr's collaborative implementation provides a helpful starting point for understanding OT. A huge thanks to David for his invitation to let me share this piece on his blog.

Want to learn more about Codr? Check out the KickStarter campaign or tweet to @CodrEditor to request an invite.