Version Control: Design for Integration July 30, 2007

Posted by Ian Clatworthy in Bazaar

Can you name a single successful software product where more resources (time and money) were spend on developing it than integrating it with other products and systems? I don’t believe such software products exist. Is that likely to change? If not, what can we do about it as software developers? And what does that mean for those of us interested in streamlining how developers work in general and delivering better version control tools in particular?

#5 on my list of criteria for evaluating version control tools is Integration. Software exists to get things done and it rarely, if ever, exists in isolation. The more successful software is, the more pressure there is to integrate it with other tools and systems. I believe lack of mature integration with other systems will be the #1 reason for many teams delaying their move from a central VCS tool (like CVS and SVN) to a distributed VCS tool (like Bazaar, Git or Mercurial) in the next 12 months. The good news is that the new breed of VCS tools all do a lot right in terms of enabling integration but we need to do much more. Firstly, we could and ought to be doing more at the core of the new products. Secondly, we need to get behind the really important integration add-ons and help them reach maturity faster.

At the core product level, Design for Integration comes down to four key things …

The software must be open. The software needs a layered architecture. The total solution needs to be manageable. A rich domain model is highly desirable.

Closed systems are extremely difficult to integrate with. To (partially) quote my own blog title …

Open Software – life is too short for anything else.

At a minimum, systems need to support open protocols or open APIs so other products can talk to the product in a stable way. Ideally, the system will be open source so integrators can access, understand and improve the integration points.

The benefits of a layered architecture where the UI is decoupled from back-end services is well understood these days. As well as multiple layers though, multiple communication models are needed – both synchronous and asynchronous. In other words, systems wanting to integrate cleanly with a product usually need some way of hearing about interesting events in the core product – even if that is just via email – in addition to the more common request-response communication model.

Once the various pieces exist and talk to each other as desired, application management issues become really important. The big issues here are things like security (e.g. sharing authentication credentials) and dependency management between the core product and add-ons. Do all the pieces have to be upgraded at exactly the same time or can the core be upgraded then the other pieces at another time?

Finally, it greatly helps if the core product has a rich semantic model at its core. This makes it easier for external products to communicate in a logical way and minimises the amount of data loss when interchanging data with the core product. Mark Shuttleworth captures this point nicely in Choose lossless VCS tools if you have that luxury. Truly caring about integration goes even deeper in my opinion: it means explicitly making it easier to carry/manage attributes from external systems that you may not have nor want in your core model. Round-tripping between systems, particularly central <-> distributed VCS tools, is and will remain a fact of life.

Once again, I think the leading distributed VCS systems are all off to a great start in terms on enabling integration. All are open source. All have a layered architecture. On the ease of management front, I think Bazaar and Mercurial are ahead of Git thanks to their better plug-in architectures. On the rich domain model front, Bazaar leads the pack. As Robert Collins commented in a discussion on Bazaar vs Git, facts plus heuristics is a better direction than just heuristics alone. As part of the Bazaar team, of course I think it’s best! But the things that are common across Bazaar, Mercurial and Git are arguably more important than their differences. Can we share integration designs and code better across the various distributed VCS communities so all of us get to where we need to be faster?

So what are the really important integration projects we need to get behind? In the Bazaar community, I believe these are:

GUI integration projects, particularly bzr-Eclipse and TortoiseBZR

VCS migration/round-tripping projects, particularly bzr-svn.

To some people, projects like these are uninteresting sugar. If that’s you, please help with our performance and documentation drives! To me though, these projects are some of the most important things going on in the Bazaar community today. Many developers, let alone less technical project contributors, want version control to be largely transparent. After all, in great development tools right now, the VCS interface is unified so whether the central repository is hosted in CVS, SVN, Perforce, etc. is largely irrelevant in day to day programming. Mass adoption of distributed VCS technology will need that same level of ubiquitous integration. If you’re looking to get more involved in Bazaar development, integration projects are a great place to help. In true open source tradition, a small number of passionate people can really make a really big difference.