A total rewrite: costly, time-consuming, but worth it?

Joel Spolsky wrote a famous blog post back in 2000 called "Things You Should Never Do, Part I ", where he wrote the following: "..the single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch."

Back in 2005 we found ourselves in a situation where we had to decide the future of our content management system. We had been developing content management systems since the company was founded in 2000. Our CMS was developed using Active Server Pages, and consisted of around 80,000 lines of VBScript code. It was moderately successful, being one of the leading content management systems in Norway, and we had an international partner network with partners located on all continents. The vast majority of our partners were located in Europe, US, Canada and Australia.

We landed some high profile clients, for example El-Al, the Israeli national airline, who ran their main website on our CMS for several years, with many million page views per month.

Technically, our solution was very advanced in many areas, but there were some fundamental problems that we struggled with:

The whole administration interface only worked in IE, and it used a lot of client side VBScript. In 2005, browser alternatives started to appear (Firefox 1.0 was released in November 2004). Creating a standards compliant interface meant rewriting the whole interface.

Since the whole solution had to be easy to run on shared hosting, the whole application was basically a big script, without any external dependencies like COM objects, and far from an n-tier architecture. So it was hard to maintain and extend.

SQL overload - Since there was no central point of entry to the database, caching was very hard to do efficiently. This meant that everything had to access the database on every request. It was common that a single page view needed somewhere between 10 and 60 SQL, killing scalability. This is something Drupal still struggles with today, even though they are moving in the right direction.

In addition to the pure technical problems, there were a few other factors that played a role in our eventual decision:

It was starting to become a problem to recruit developers to work on ASP based solutions. And would we be able to keep the existing developers happy in the future continuing to use ASP?

We saw a trend that websites were becoming more complex and in many situations more of a web application than a simple informational website. We wanted a solution that would be flexible enough to be used for these applications, and enable regular CMS users to manage the content in these web applications in the same way they manage the content in the rest of the website. While some of the ideas were based on technology we had in the existing solution, we took those ideas a lot further.

So in early 2005 we made the decision to reduce development on the old ASP based system to a minimum, and to focus on creating a completely new CMS based on Microsoft .Net 2.0, which had just been released as a beta version.

Before the decision, we did consider making the transition gradual, by rewriting the system in .Net, module by module, and improving the architecture was we went along, but in the end we decided against it, mainly due to the messy nature of such a solution, and the fact that it would be harder to make innovative and breaking changes to the core system. Some might argue that it's insane to base such a monumental decision on how elegant a solution is, but remember that we were developing the core of a system that probably would be used for more than a decade. Any technical debt in the core of the system is bound to come back and bite you - hard.

We started development of an early prototype of the core framework back in February/March 2005. Our original timeline was to test parts of the core framework in a project that was due to be launched in September 2005, and release the first version of the full CMS in the summer of 2006.

We delivered the project with the prototype framework as promised, but as we continued to develop the rest of the framework, we saw that it took a lot longer than anticipated. There were several reasons for the delays, but the most important were:

Tight finances.

We're a relatively small company with finite financial resources. From the start of the project, the project was financed by profit from the consulting part of the company that implemented our CMS in different web projects, and international license revenue from partners around the world. This limited progress in the project, but luckily we qualified for R&D grants from Innovation Norway, several years in a row, which helped quite a bit. It gave us the required time to make the base framework very solid and well-thought out.

R&D is time consuming and difficult to estimate.

When we started the project, the goal was to create a new system that would fix the shortcomings in the old CMS, and be prepared for the changes in the marketplace that we anticipated. How to do that, and exactly what had to be developed wasn?t clearly specified. We had some ideas on what the system had to be capable of, but it's a long way from an idea to a complete and tested feature. This resulted in a lot research of various technologies and writing quick prototypes to test various implementations.

Feature creep.

The lack of a clear specification also made it hard to limit feature creep. Since we didn't have a clear picture on how the different features would be implemented, we often discovered that when we implemented one feature, it "forced" us to implement another feature.

We started the pilot project on the new CMS in October 2006. It was due to launch before Christmas 2006. In the end we launched it in March 2007. By that time, the framework was fairly stable, but the CMS user interface was very raw, and it missed a lot of functionality that we had in the old CMS. Truth be told, it was also quite buggy. But it was mainly the editors that noticed the rough state of the CMS. It's still online today, using a much improved version of our CMS.

Since the first project launched in 2007, more than two hundred web projects using our new CMS have been completed in Norway. During those projects, the system has gradually matured and grown. The experience we gained from using the system for real projects from such an early stage has been invaluable in the later stages of development. Looking back at all those projects, it has been a very difficult and challenging period, especially in the beginning, but we're finally in a position where we're very happy with our product. During the last couple of years we've seen that the difficult decisions we made back in 2005 and 2006 are starting to pay off.

Conclusion

Do we think Joel was correct that a rewrite is "the single worst strategic mistake that any software company can make"?

Not categorically, but we experienced a lot of the hardship and trouble that he talks about, so we can easily see how many companies could get into trouble during such a process. If the board of directors had known in 2005 that we wouldn't be ready to re-launch in the international market until 2010, I doubt we would have gotten the green light for the project. But now we clearly see that the end result is so much better than if we had done a gradual transition. That might not be true in all cases. We had a clear idea on how we wanted to improve the existing solution. If you?re rewriting an application just to change platform, without any fundamental improvements that would be hard to do in your existing codebase, I agree totally with Joel. This fits together with another quote from Joel:

"It's important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time."

If this is true, then a rewrite of an existing application will always be a huge mistake. Joel is basically saying that developers don't benefit from previous experience, a notion that I think is plainly wrong... All the good developers I've ever worked with have been more than capable of seeing what works and what doesn't work in a solution, and would have been able to do a better job on the second try.

In our case, the technological shift from VBScript to .Net was also a major factor in making the end result much, much better than the old version.

There are also other examples of total rewrites that have been successful. Apple's transition to OS X is a very good example. The classic Mac OS was an operating system with some good ideas, but the core architecture had a lot of problems. Did Apple make a bad decision to start from scratch with OS X? I don't think so. They brought over and improved the best bits from OS 9, and merged it with the solid Darwin core (much like we did). During and after the switch, they had a few rough years, but since the launch of OS X, they have been on a more or less constant upwards trend. With the international launch of Webnodes.com behind us, I'm confident we'll follow the same trajectory.

What do you think? Did we make a mistake in rewriting the whole system?

In the coming weeks we'll go into more detail on the different features we've developed in WAF (Webnodes Adaptive Framework).