Realtime collaboration

I hate to write using markup languages.

The problem with markups is that when I see a typo in a rendered output, I have to click through the text and search for exact place with the mistake. I have the same feeling about editing Wikipedia, documentation on code.google.com, Trac, Blogger, WordPress and so on.

But I hate writing in WYSIWYG editors even more. Almost all graphical editors generate crappy output: badly closed html tags, broken styles, stripped white space. Considering this problems I usually try to stay with markups.

Next problem is that I’m the only person that can fix mistakes in my texts. My friends tell me about typos, but I have to fix them by hand. I tried to share texts on google docs, but the collaboration doesn’t work well enough.

A few months ago I saw an online real-time editor Etherpad. That’s quite a cool toy. It solves the problem of sharing the text with my friends, but it doesn’t support any markups – it’s just a plaintext editor.

But I know how to create Comet applications easily using EvServer and Django. I realized that I could build a simplified Etherpad clone, which supports a markup language!

The Etherpad clone

I decided to spend a very limited time on this project. Actually I wanted to do everything in my spare time in a week, that is about 6 afternoons.

Features I wanted, ordered by importance:

must support editing by many users in real time – like Etherpad

must generate rendered markup with reasonable latency

must support all major browsers (though IE and Konqueror are not a must)

must be dead simple

should be able to scale up (for reasons I’ll describe below)

should show who created what – like Etherpad. In the end I dropped this requirement due to the limited time.

With such hard time constraints I was ready to make some technological decisions:

Python on the server side

EvServer as a server

Django

Haproxy as a loadbalancer

use EvServer’s Comet transports

RabbitMq as a message broker

MemcacheDB as a database

Memcache for temporary storage

Support reStructuredText as a markup language

The hardest part of the project is synchronizing and merging updates from many clients in real time and fixing collisions and propagating changes to users. Fortunately Neil Fraser from Google solved this problem and published it as a very nice project diff-match-patch.

It seems that the only job I have to do is to glue this parts together.





Scalability?

A few days after Etherpad was launched I saw this dialog: