Building a real-time collaborative text editor for the web (DraftJS ❤ ShareDB)

Building your own collaborative text editor for the web has become fairly doable. In the rich, though chaotic, Javascript world there are many different approaches : this article is one of them. If you‘re interested in our setup or like stories about developers in kamikaze mode, keep reading.

A few months ago, I was sitting around the meeting table with Johannes Weiss and Felix Gast on a Wednesday night. This was the weekly Jour Fixe for our startup, Conode, a productivity SaaS that helps teams to organize meetings.

The excitement was high.

Our sales leads and users wanted to edit pages collaboratively — you know, Google Docs style. We initially thought this to be too big of a challenge, since we lacked the budget for outsourcing and the internal knowhow to implement this ourselves. Until that night, we realized that this was critical for our survival, so Felix and I bravely said

Challenge accepted!

I’m saying brave, because we just took over the code base from a competent agency called Thinslices. We didn’t really know the tech stack that well to start out with. So, it promised to be a bumpy ride… Then again, that’s how we like them.

The Theory

Before jumping into code, we need to talk theory. The complexity of this distributed system is not to be underestimated and therefore a high-level overview will help to understand what's going on.

The behavior and appearance of a text editor can be extracted at any point in time like a snapshot and stored in a simple javascript object. We call this the document state.

DraftJS can render the document state of our Conode Page into a simple Javascript object.

In order to collaborate, this document state must be shared among multiple peers by sending messages between them over an insecure network. A protocol is needed to properly manage this.

Wait, what exactly needs to be managed? Why can’t we just send the state object around as soon as someone edits some text?

It’s good practice to challenge yourself with simple questions along the way. It helps to wrap your head around the problem.

Well, imagine that two users type something at the same time. In such a scenario,

both clients will end up with a different state, and one of the two changes will be overwritten.

This scenario leads to different end results and overwritten operations.

That’s a bad UI, so we definitely want to avoid that.

The two issues described above, correspond to two major technical conditions that our protocol needs to fulfill :

Convergence : all editors must converge to the same document state after a finite amount of time. Concurrency : edits that occur in parallel lead to a correct end result, independent of the order they are executed in.

This is a bit of a simplification. Research papers will talk about eventual consistency, commutative & idempotent conditions, the need for a central server, … All this academic literature has proposed a plethora of protocols and algorithms — some more legit than others (see article below). For the sake of conciseness, we will not delve deeper into that matter and simply say that they can be classified in either one of the two following categories :

Operational Transformation (OT) represents the document state as a sequence of operations. Every operation is created on top of a local snapshot. Now, imagine that the operation is sent to a peer, that made an edit in the meantime. That peer will have a different snapshot, so the operation first needs to be transformed before being applied. This is the essence of how OT works.

(OT) represents the document state as a sequence of operations. Every operation is created on top of a local snapshot. Now, imagine that the operation is sent to a peer, that made an edit in the meantime. That peer will have a different snapshot, so the operation first needs to be transformed before being applied. This is the essence of how OT works. Conflict-free replicated data type (CRDT) is a bit more complicated than OT. It uses more memory and bandwidth, but in return guarantees eventual consistency without the need of a central server. So, you could say it is more theoretically complete.

We chose to start with OT, because (1) it’s the most popular, (2) we found a good javascript library called ShareDB offering out-of-the-box functionality, and (3) we didn’t really understand what we were doing.

We were happy in the end to discover that it was the right choice :-)

The Frontend

Conode is a single-page application, which uses React+Redux. The text editor is based on the famous Draft.js framework. It doesn't offer much out of the box, but according to their own words "In Draft.js, everything is customizable."

How it looks — try yourself on conode.io

The problem is that Draft.js isn't made for collaborative editing. This has to do with the fact that it’s API mostly exposes State and not Operations. The community actually seems divided on the issue. In the end, whether it's doable or not, depends on your functional and performance requirements.

There are other Javascript editors out there, such as Quill, that handle real-time collaboration way better. In our case, we already have a highly-customized and code-heavy editor. Rebuilding it would take too much time. Since we knew that other(s) in the community had made this work, we decided to take a chance and build that sh*t.

The Research

To inter-connect DraftJS editors for collaboration, we need web sockets. This technology allows us to send messages from and to the browser (bidirectional) with little overhead, which is not possible over traditional HTTP.

So far for messaging. But, what about the application layer that takes care of that fancy OT protocol? After a lot of research, mainly consisting of reading countless Github issues and, admittedly, investigating existing apps using the Chrome Developer Tools' Network tab, ShareDB was the winning option.

You’d be surprised how much you can learn by simply observing other solutions. (Ben Affleck, Paycheck)

ShareDB is a library that stores a javascript object on a server and shares it over multiple clients, using a web socket. So, if any client passes an operation, ShareDB will automatically notify the other subscribed clients.

The Prototype

Time to start coding. Firstly, we created a simple prototype which combined Draft.js with ShareDB. This allowed a quick test of our architecture without yet needing to face the complexity of building it into our existing codebase.

Our prototype architecture. We use the ‘onChange’ and ‘value’ props of the Editor component as a controlled React component. There is where we couple incoming and outgoing operations.

Remember that we said that Draft.js does not expose operations, only the EditorState. But, OT works with operations… To solve this, we used json0-ot-diff, a library that will compare the previous state with the new one (using convertToRaw). This gives us a JSON-type OT transaction, which we then pass on to ShareDB.

Such a calculation is costly in terms of performance, but the end result worked like a charm. Feel free to get in touch if you wish to receive a copy of that prototype.

The Integration

The next step was to integrate this working solution into our existing codebase. This brought along challenges — more than we expected.

1. Single source of truth

To manage the document state in our frontend we use Redux. So, we needed to manage a single source of truth of the EditorState between Draft.js, Redux, and ShareDB. In the end we built a loop of functions and events, which can be seen in the image below.

Event loop for an outgoing operation. Incoming operations are handled similarly.

2. Troubling race conditions

Our Text Editor React Component, containing Draft.js, had a few race conditions. In single user mode these were not a problem. As soon as users started making changes concurrently, occasional edits got overwritten. It was hard to detect patterns and when we fixed one, new errors were triggered.

3. Microservices backend

ShareDB stores every change as an operation in its database. As we are creating a text editor for real-time collaboration, this amounts to a large number of operations, which will be detrimental for storage capacity and computing power. Therefore, we built a collaboration service on top of our REST API workflows, that systematically empties itself. This kept the number of stored operations to a minimum and extracts the complexity of collaboration into an independent microservice.

A simplified overview of our backend : the colored part was added for collaboration. In single-user mode the normal RESTful API is used. As soon as a page is shared with multiple users, then communication switches to the web socket.

3. Block-level locking

Our editor visually separates each paragraph into blocks. To minimize performance issues due to the EditorState comparison, we opted for a block-level locking after selection changes. So we disable an EditorBlock to all collaborators, whenever a user has selected it. This kept the diffing to JSON-type OT only without needing to compute it for the strings on top.

4. Detaching from React components

Our Editor was a pretty large React component to start out with. More than a 1'000 lines… In order not to lose ourselves in an endless refactoring effort, we first thought about creating a higher-order component, which will add collaboration flavor to the existing editor. In the end, it was way more simple to put our collaboration logic in the redux action creator that handled updates from our editor.

5. Dealing with edge cases

To avoid breakdowns many edge cases needed to be covered. For example, automatic web socket reconnection when your wifi falls out, detecting dead web socket clients, properly opening/closing ShareDB subscriptions when the user goes to the dashboard and opens another page, etc.

To Conclude

The end result was working, but had some glitches left due to the race conditions of bullet point 2. These bugs were very difficult and we decided to not lose any further time on them due to a client deadline. As a temporary solution we placed a lock on the entire page, which can be requested and passed from one user to the other.

Admittedly, the final solution is not perfect. However, now we know it works and what refactoring is needed in order to make it shine.

Lessons learned

Prototyping really pays off, as it allows to quickly validate your architecture. Without it we'd never had gotten this far.

Plan more time for refactoring code.

A bit of theory goes a long way in distributed systems. Even though ShareDB is out of the box, understanding the model behind was a necessity.

I hope this blog post gives insight to teams, that develop their first real-time collaborative text editor for the web. If it does, let me know. If it doesn't… thank you, come again.