Source code is a vital asset to any software development team, and over the decades a set of source code management tools have been developed to keep code in shape. These tools allow changes to be tracked, so we recreate previous versions of the software and see how it develops over time. These tools are also central to the coordination of a team of multiple programmers, all working on a common codebase. By recording the changes each developer makes, these systems can keep track of many lines of work at once, and help developers work out how to merge these lines of work together.

This division of development into lines of work that split and merge is central to the workflow of software development teams, and several patterns have evolved to help us keep a handle on all this activity. Like most software patterns, few of them are gold standards that all teams should follow. Software development workflow is very dependent on context, in particular the social structure of the team and the other practices that the team follows.

My task in this article is to discuss these patterns, and I'm doing so in the context of a single article where I describe the patterns but intersperse the pattern explanations with narrative sections that better explain context and the interrelationships between them. To help make it easier to distinguish them, I've identified the pattern sections with the "✣" dingbat.

If I keep my personal branch healthy, this also makes it much easier to commit to the mainline - I know that any errors that crop up with Mainline Integration are purely due to integration issues, not errors within my codebase alone. This will make it much quicker and easier to find and fix them.

For personal development branches, it's wise to keep them healthy since that way it enables Diff Debugging . But that desire runs counter to making frequent commits to checkpoint your current state. I might make a checkpoint even with a failing compile if I'm about to try a different path. The way I resolve this tension is to squash out any unhealthy commits once I'm done with my immediate work. That way only healthy commits remain on my branch beyond a few hours.

Critical to having a healthy mainline is Self Testing Code with a commit suite that runs in a few minutes. It can be a significant investment to build this capability, but once we can ensure within a few minutes that my commit hasn't broken anything, that completely changes our whole development process. We can make changes much more quickly, confidently refactor our code to keep it easy to work with, and drastically reduce the cycle time from a desired capability to code running in production.

A healthy mainline also smooths the path to production. A new production candidate can be built at any time from the head of the mainline. The best teams find they need to do little work to stabilize such a code-base, often able to release directly from mainline to production.

Each team should have clear standards for the health of each branch in their development workflow. There is an immense value in keeping the mainline healthy. If the mainline is healthy then a developer can start a new piece of work by just pulling the current mainline and not be tangled up in defects that get in the way of their work. Too often we hear people spending days trying to fix, or work around, bugs in the code they pull before they can start with a new piece of work.

That the code runs without bugs is not enough to say that the code is good. In order to maintain a steady pace of delivery, we need to keep the internal quality of the code high . A popular way of doing that is to use Reviewed Commits , although as we shall see, there are other alternatives.

Ideally the full range of tests should be run on every commit. However if the tests are slow, for example performance tests that need to soak a server for a couple of hours, that isn't practical. These days teams can usually build a commit suite that can run on every commit, and run later stages of the deployment pipeline as often as they can.

There is a tension around the degree of testing to provide sufficient confidence of health. Many more thorough tests require a lot of time to run, delaying feedback on whether the commit is healthy. Teams handle this by separating tests into multiple stages on a Deployment Pipeline . The first stage of these tests should run quickly, usually no more than ten minutes, but still be reasonably comprehensive. I refer to such a suite as the commit suite (although it's often referred to as "the unit tests" since the commit suite usually is mostly Unit Tests ).

To combat this, we can strive to keep a branch healthy - meaning it builds successfully and the software runs with few, if any, bugs. To ensure this, I've found it critical that we write Self Testing Code . This development practice means that as we write the production code, we also write a comprehensive suite of automated tests so that we can be confident that if these tests pass, then the code contains no bugs. If we do this, then we can keep a branch healthy by running a build with every commit, this build includes running this test suite. Should the system fail to compile, or the tests fail, then our number one priority is to fix them before we do anything else on that branch. Often this means we "freeze" the branch - no commits are allowed to it other than fixes to make it healthy again.

Since Mainline has this shared, approved status, it's important that it be kept in a stable state. Again in the early 2000s, I remember talking to a team from another organization that was famous for doing daily builds of each of their products. This was considered quite an advanced practice at the time, and this organization was lauded for doing it. What wasn't mentioned in such write ups was that these daily builds didn't always succeed. Indeed it wasn't unusual to find teams whose daily builds hadn't compiled for several months.

On each commit, perform automated checks, usually building and running tests, to ensure there are no defects on the branch

In contrast, with a mainline, anyone can quickly start an up-to-date build of the product from the tip of mainline. Furthermore, a mainline doesn't just make it easier to see what the state of the code base is, it's the foundation for many other patterns that I'll be exploring shortly.

I remember going to talk to a client's build engineer in the early 2000s. His job was assemble a build of the product the team was working on. He'd send an email to every member of the team, and they would reply by sending over various files from their code base that were ready for integration. He'd then copy those files into his integration tree and try to compile the code base. It would usually take him a couple of weeks to create a build that would compile, and be ready for some form of testing.

Similarly, if I want to create a new version of the product for release, I can start with the current mainline. If I need to fix bugs to make the product stable enough for release, I can use a Release Branch .

While I'm working on my feature, I have my own personal development branch which may be my local master, or I may create a separate local branch. If I'm working on this for a while, I can keep up to date with changes in the mainline by pulling mainline's changes at intervals and merging them into my personal development branch.

I must stress here that mainline is a single, shared codeline. When people talk about “master” in git, they can mean several different things, since every repository clone has it's own local master. Usually such teams have a central repository - a shared repository that acts as the single point of record for the project and is the origin for most clones. Starting a new piece of work from scratch means cloning that central repository. If I already have a clone, I begin by pulling master from the central repository, so it's up to date with the mainline. In this case mainline is the master branch in the central repository.

Different teams use different names for this special branch, often encouraged by the conventions of the version control systems used. git users will often call it “master”, subversion users usually call it “trunk”.

The mainline is a special codeline that we consider to be the current state of the team's code. Whenever I wish to start a new piece of work, I'll pull code from mainline into my local repository to begin working on. Whenever I want to share my work with the rest of the team, I'll update that mainline with my work, ideally using the Mainline Integration pattern that I'll discuss shortly.

Hence the rest of this article, where I lay out various patterns that support the pleasant isolation and the rush of wind through your hair as you fall, but minimizing the consequences of the inevitable contact with the hard ground.

I start with: "what if there was no branching". Everybody would be editing the live code, half-baked changes would bork the system, people would be stepping all over each other. And so we give individuals the illusion of frozen time, that they are the only ones changing the system and those changes can wait until they are fully baked before risking the system. But this is an illusion and eventually the price for it comes due. Who pays? When? How much? That's what these patterns are discussing: alternatives for paying the piper.

The problem is familiar to anyone who has worked with concurrent or distributed computing. We have some shared state (the code base) with developers making updates in parallel. We need to somehow combine these by serializing the changes into some consensus update. Our task is made more complicated by the fact that getting a system to execute and run correctly implies very complex validity criteria for that shared state. There's no way of creating a deterministic algorithm to find consensus. Humans need to find the consensus, and that consensus may involve mixing choice parts of different updates. Often consensus can only be reached with original updates to resolve the conflicts.

Source control systems that record every change on the commit do make the process of merging easier, but they don't make it trivial. If Scarlett and Violet both change the name of a variable, but to different names, then there's a conflict that the source management system cannot resolve without human intervention. To make it more awkward this kind of textual conflict is at least something the source code control system can spot and alert the humans to take a look. But often conflicts appear where the text merges without a problem, but the system still doesn't work. Imagine Scarlett changes the name of a function, and Violet adds some code to her branch that calls this function under its old name. This is what I call a Semantic Conflict . When these kinds of conflicts happen the system may fail to build, or it may build but fail at run-time.

An old joke says that if you fall off a tall building, the falling isn't going to hurt you, but the landing will. So with source code: branching is easy, merging is harder.

A consequence of this definition is that, whatever version control system you're using, every developer has at least one personal codeline on the working copy on their own machine as soon as they make local changes. If I clone a project's git repo, checkout master, and update some files - that's a new codeline even before I commit anything. Similarly if I make my own working copy of the trunk of a suberversion repository, that working copy is its own codeline, even if there's no subversion branch involved.

All of this terminological confusion leads some to avoid the term. A more generic term that's useful here is codeline. I define a codeline as a particular sequence of versions of the code base. It can end in a tag, be a branch, or be lost in git's reflog. You'll notice an intense similarity between my definitions of branch and codeline. Codeline is in many ways the more useful term, and I do use it, but it's not as widely used in practice. So for this article, unless I'm in the particular context of git (or another tool's) terminology, I'll use branch and codeline interchangeably.

This terminological confusion gets worse when we run into different version control systems as they all have their own definitions of what constitutes a branch. A branch in Mercurial is quite different to a branch in git, which is closer to Mercurial's bookmark. Mercurial can also branch with unnamed heads and Mercurial folks often branch by cloning repositories.

With distributed version control systems like git, this means we also get additional branches whenever we further clone a repository. If Scarlett clones her local repository to put on her laptop for her train home, she's created a third master branch. The same effect occurs with forking in GitHub - each forked repository has its own extra set of branches.

According to the definition of branch I gave earlier, Scarlett and Violet are working on separate branches, both separate from each other, and separate from the master branch on the shared repository. When Scarlett puts aside her work with a tag, it's still a branch according to my definition (and she may well think of it as a branch), but in git's parlance it's a tagged line of code.

I can illustrate this with a common situation in a modern development team that's holding their source code in a shared git repository. One developer, Scarlett, needs to make a few changes so she clones that git repository and checks out the master branch. She makes a couple of changes committing back into her master. Meanwhile, another developer, let's call her Violet, clones the repository onto the her desktop and checks out the master branch. Are Scarlett and Violet working on the same branch or a different one? They are both working on "master", but their commits are independent of each other and will need to be merged when they push their changes back to the shared repository. What happens if Scarlett decides she's not sure about the changes that she's made, so she tags the last commit and resets her master branch to origin/master (the last commit she cloned from the shared repository).

The definitions I'm using for "branch" correspond to how I observe most developers talking about them. But source code control systems tend to use "branch" in a more particular way.

That's the noun, but there's also the verb, "to branch". By this I mean creating a new branch, which we can also think of as splitting the original branch into two. Branches merge when commits from one branch are applied to another.

This leads me to the definition of branch that I'll use for this article. I define a branch as a particular sequence of commits to the code base. The head , or tip , of a branch is the latest commit in that sequence.

A source code control system makes this process much easier. The key is that it records every change made to each branch as commit. Not just does this ensure nobody forgets the little change they made to utils.java , recording changes makes it easier to perform the merge, particularly when several people have changed the same file.

The simple answer to this is for each developer to take a copy of the code base. Now we can easily work on our own features, but a new problem arises: how do we merge our two copies back together again when we're done?

If several people work on the same code base, it quickly becomes impossible for them to work on the same files. If I want to run a compile, and my colleague is the middle of typing an expression, then the compile will fail. We would have to holler at each other: "I'm compiling, don't change anything". Even with two this would be difficult to sustain, with a larger team it would be incomprehensible.

In thinking about these patterns, I find it useful to develop two main categories. One group looks at integration , how multiple developers combine their work into a coherent whole. The other looks at the path to production , using branching to help manage the route from an integrated code base to a product running in production. Some patterns underpin both of these, and I'll tackle these now as the base patterns. That leaves a couple of patterns that are neither fundamental, nor fit into the two main groups - so I'll leave those till the end.

Integration Patterns

Branching is about managing the interplay of isolation and integration. Having everyone work on a single shared codebase all the time, doesn't work because I can't compile the program if you're in the middle of typing a variable name. So at least to some degree, we need a notion of a private workspace that I can work on for a while. Modern source code controls tools make it easy to branch and monitor changes to those branches. At some point however we need to integrate. Thinking about branching strategies is really all about deciding how and when we integrate.

Mainline Integration Developers integrate their work by pulling from mainline, merging, and - if healthy - pushing back into mainline A mainline gives a clear definition of what the current state of the teams' software looks like. One of the biggest benefits of using a mainline is that it simplifies integration. Without mainline, it's the complicated task of coordinating with everyone in the team that I described above. With a mainline however, each developer can integrate on their own. I'll walk through an example of how this works. A developer, who I'll call Scarlett, starts some work by cloning the mainline into her own repository. With git, if she doesn't already have a clone of the central repository, she would clone it and checkout the master branch. If she already has the clone, she would pull from mainline into her local master. She can then work locally, making commits into her local master. While she's working, her colleague Violet pushes some changes onto mainline. As she's working in her own codeline, Scarlett can be oblivious to those changes while she works on her own task. At some point, she reaches a point where she wants to integrate. The first part of this is to fetch the current state of mainline into her local master branch, this will pull in Violet's changes. As she's working on local master, the commits will show on origin/master as a separate codeline. Now she needs to combine her changes with those of Violet. Some teams like to do this by merging, others by rebasing. In general people use the word "merge" whenever they talk about bringing branches together, whether they actually use a git merge or rebase operation. I'll follow that usage, so unless I'm actually discussing the differences between merging and rebasing consider "merge" to be the logical task that can be implemented with either. There's a whole other discussion on whether to use vanilla merges, use or avoid fast-forward merges, or use rebasing. That's outside the scope of this article, although if people send me enough Tripel Karmeliet, I might write an article on that issue. After all, quid-pro-quos are all the rage these days. If Scarlett is fortunate, merging in Violet's code will be a clean merge, if not she'll have some conflicts to deal with. These may be textual conflicts, most of which the source control system can handle automatically. But semantic conflicts are much harder to deal with, and this is where Self Testing Code is very handy. (Since conflicts can generate a considerable amount of work, and always introduce the risk of a lot of work, I mark them with an alarming lump of yellow.) At this point, Scarlett needs to verify that the merged code satisfies the health standards of the mainline (assuming mainline is a Healthy Branch). This usually means building the code and running whatever tests form the commit suite for mainline. She needs to do this even if it's a clean merge, because even a clean merge can hide semantic conflicts. Any failures in the commit suite should be purely due to the merge, since both merge parents should be green. Knowing this should help her track down the problem as she can look at the diffs for clues. With this build and test, she has successfully pulled mainline into her codeline, but - and this is both important and often overlooked - she hasn't yet finished integrating with mainline. To finish integrating she must push her changes into the mainline. Unless she does this, everyone else on the team will be isolated from her changes - essentially not integrating. Integration is both a pull and a push - only once Scarlett has pushed is her work integrated with the rest of the project. Many teams these days require a code review step before commit is added to mainline - a pattern I call Reviewed Commits and will discuss later. Occasionally someone else will integrate with mainline before Scarlett can do her push. In which case she has to pull and merge again. Usually this is only an occasional issue and can be sorted out without any further coordination. I have seen teams with long builds use an integration baton, so that only the developer holding the baton could integrate. But I haven't heard so much of that in recent years as build times improve. When to use it As the name suggests, I can only use mainline integration if we're also using mainline on our product. One alternative to using mainline integration is to just pull from mainline, merging those changes into the personal development branch. This can be useful - pulling can at least alert Scarlett to changes other people have integrated, and detect conflicts between her work and mainline. But until Scarlett pushes, Violet won't be able to detect any conflicts between what she's working on and Scarlett's changes. When people use the word “integrate”, they often miss this important point. It's common to hear someone say they are integrating the mainline into their branch when they are merely pulling. I've learned to be wary of that, and probe further to check to see if they mean just a pull or a proper mainline integration. The consequences of the two are very different, so it's important not to confuse the terms. Another alternative is when Scarlett is in the middle of doing some work that isn't ready for full integration with the rest of the team, but it overlaps with Violet and she wants to share it with her. In that case they can open a Collaboration Branch.

Feature Branching Put all work for a feature on its own branch, integrate into mainline when the feature is complete. With feature branching, developers open a branch when they begin work on a feature, continue working on that feature until they are done, and then integrate with mainline. For example, let's follow Scarlett. She would pick up the feature to add collection of local sales taxes to their website. She begins with the current stable version of the product, she'll pull mainline into her local repository and then create a new branch starting at the tip of the current mainline. She works on the feature for as long as it takes, making a series of commits to that local branch. She might push that branch to the project repo so that others may look at her changes. While she's working, other commits are landing on mainline. So from time to time she may pull from mainline so she can tell if any changes there are likely to impact her feature. Note this isn't integration as I described above, since she didn't push back to mainline. At this point only she is seeing her work, others don't. Some teams like to ensure all code, whether integrated or not, is kept in the central repository. In this case Scarlett would push her feature branch into the central repository. This would also allow other team members to see what she's working on, even if it's not integrated into other people's work yet. When she's done working on the feature, she'll then perform Mainline Integration to incorporate the feature into the product. If Scarlett works on more than one feature at the same time, she'll open a separate branch for each one. When to use it Feature Branching is a popular pattern in the industry today. To talk about when to use it, I need to introduce its principal alternative - Continuous Integration. But first I need to talk about the role of integration frequency.

Integration Frequency How often we do integration has a remarkably powerful effect on how a team operates. Research from the State Of Dev Ops Report indicated that elite development teams integrate notably more often than low performers - an observation that matches my experience and the experiences of so many of my industry peers. I'll illustrate how this plays out by considering two examples of integration frequency starring Scarlett and Violet. Low-Frequency Integration I'll start with the low-frequency case. Here, our two heroes begin an episode of work by cloning the mainline into their branches, then doing a couple of local commits that they don't want to push yet. As they work, someone else puts a commit onto mainline. (I can't quickly come up with another person's name that's a color - maybe grayham?) This team works by keeping a healthy branch and pulling from mainline after each commit. Scarlett didn't have anything to pull with her first two commits as mainline was unchanged, but now needs to pull M1. I've marked the merge with the yellow box. This one merges commits S1..3 with M1. Soon Violet needs to do the same thing. At this point both developers are up to date with mainline, but they haven't integrated since they are both isolated from each other. Scarlett is unaware of any changes Violet has made in V1..3. Scarlett makes a couple more local commits then is ready to do mainline integration. This is an easy push for her, since she pulled M1 earlier. Violet, however has a more complicated exercise. When she does mainline integration she now has to integrate S1..5 with V1..6. I have scientifically calculated the sizes of the merges based on how many commits are involved. But even if you ignore the tongue-shaped bulge in my cheek, you'll appreciate that Violet's merge is the mostly likely to be difficult. High-Frequency Integration In the previous example, our two colorful developers integrated after a handful of local commits. Let's see what happens if they do mainline integration after every local commit. The first change is apparent with Violet's very first commit, as she integrates right away. Since mainline hasn't changed, this is a simple push. Scarlett's first commit also has mainline integration, but because Violet got there first, she needs do a merge. But since she's only merging V1 with S1, the merge is small. Scarlett's next integration is a simple push which means Violet's next commit will also require merging with Scarlett's latest two commits. However it's still a pretty small merge, one of Violet's and two of Scarlett's. When the external push to mainline appears, it gets picked up in the usual rhythm of Scarlett and Violet's integrations. While it's similar to what happened before, the integrations are smaller. Scarlett only has to integrate S3 with M1 this time, because S1 and S2 were already on mainline. This means that Grayham would have had to integrate whatever was already on mainline (S1..2, V1..2) before pushing M1. The developers continue with their remaining work, integrating with each commit. Comparing integration frequencies Let's look again at the two overall pictures Low Frequency High Frequency Integration Fear When teams get a couple of bad merge experiences, they tend to be wary of doing integration. This can easily turn into a positive feedback loop - which like many positive feedback loops, has very negative consequences. The most obvious consequence is that the team does integration less frequently, which leads to more nasty merge incidents, which leads to less frequent integrations… and so on. A more subtle problem is that teams stop doing things that they think will make integration harder. In particular this makes them resist refactoring. But reducing refactoring leads to the code base getting increasingly unhealthy, difficult to understand and modify, and thus slowing down the teams feature delivery. Since it takes longer to complete features, that further increases integration frequency, contributing to the debilitating positive feedback loop. The counter-intuitive answer to this is captured by the slogan - "if it hurts… do it more often" There are two very obvious differences here. Firstly the high-frequency integration, as the name implies, has a lot more integrations - twice as many just in this toy example. But more importantly these integrations are much smaller than those in the low-frequency case. Smaller integrations mean less work, since there's less code changes that might hold up conflicts. But more importantly than less work, it's also less risk. The problem with big merges is not so much the work involved with them, it's the uncertainty of that work. Most of the time even big merges go smoothly, but occasionally they go very, very, badly. That occasional pain ends up being worse than a regular pain. If I compare spending an extra ten minutes per integration with a 1 out of fifty chance of spending 6 hours fixing an integration - which do I prefer? If I just look at the effort, then the 1-in-50 is better, since it's 6 hours rather 8 hours and twenty minutes. But the uncertainty makes the 1-in-50 case feel much worse, an uncertainly that leads to integration fear. Let's look the difference between these frequencies from another perspective. What happens if Scarlett and Violet develop a conflict in their very first commits? When do they detect the conflict has occurred? In the low-frequency case, they don't detect it until Violet's final merge, because that's the first time S1 and V1 are put together. But in the high-frequency case, they are detected at Scarlett's very first merge. Low Frequency High Frequency Frequent integration increases the frequency of merges but reduces their complexity and risk. Frequent integration also alerts teams to conflicts much more quickly. These two things are connected, of course. Nasty merges are usually the result of a conflict that's been latent in the team's work, surfacing only when integration happens. Perhaps Violet was looking at a billing calculation and saw that it included appraising tax where the author had assumed a particular taxation mechanism. Her feature requires different treatments to tax, so the direct route was to take the tax out of the billing calculation and do it as a separate function later. The billing calculation was only called in a couple of places, so it's easy to use Move Statements to Callers - and the result makes more sense for the future evolution of the program. Scarlett, however, didn't know Violet was doing this and wrote her feature assuming the billing function took care of tax. Self Testing Code is our life-saver here. If we have a strong test suite, using it as part of the healthy branch will spot the conflict so there's far less chance of a bug making its way into production. But even with a strong test suite acting as a gatekeeper to mainline, large integrations make life harder. The more code we have to integrate, the harder it is to find the bug. We also have a higher chance of multiple, interfering bugs, that are extra-difficult to understand. Not just do we have less to look at with smaller commits, we can also use Diff Debugging to help narrow down which change introduced the problem. What a lot of people don't realize is that a source control system is a communication tool. It allows Scarlett to see what other people on the team are doing. With frequent integrations, not just is she alerted right away when there are conflicts, she's also more aware of what everyone is up to, and how the codebase is evolving. We're less like individuals hacking away independently and more like a team working together. Increasing the frequency of integration is an important reason to reduce the size of features, but there are other advantages too. The smaller the feature, the quicker it is to build, quicker to get into production, quicker to start delivering its value. Furthermore smaller features reduces the feedback time, allowing a team to make better feature decisions as they learn more about their customers.

Continuous Integration Developers do mainline integration as soon as they have a healthy commit they can share, usually less than a day's work Once a team has experienced that high-frequency integration is both more efficient and less stressful, that natural question to ask is "how frequently can we go?" Feature branching implies a lower bound to the size of a change-set - you can't be smaller than a cohesive feature. For more details on how to do Continuous Integration effectively, take a look at the my detailed article. For even more details, consult the book by Paul Duvall, Steve Matyas, and Andrew Glover. Paul Hammant maintains a website full of techniques for continuous integration at trunkbaseddevelopment.com Continuous Integration applies a different trigger for integration - you integrate whenever you've made a hunk of progress on the feature and your branch is still healthy. There's no expectation that the feature be complete, just that there's been a worthwhile amount of changes to the codebase. The rule of thumb is that "everyone commits to the mainline every day", or more precisely: you should never have more than a day's work sitting unintegrated in your local repository. In practice, most practitioners of Continuous Integration integrate many times a day, happy to integrate an hour's worth of work or less. Developers using Continuous Integration need to get used to the idea of reaching frequent integration points with a partially built feature. They need to consider how to do this without exposing a partially built feature in the running system. Often this is easy: if I'm implementing a discount algorithm that relies on a coupon code, and that code isn't in the valid list yet, then my code isn't going to get called even if it is production. Similarly if I'm adding a feature that asks an insurance claimant if they are a smoker, I can build and test the logic behind the code and ensure it isn't used in production by leaving the UI that asks the question until the last day of building the feature. Hiding a partially built feature by hooking up a Keystone Interface last is often an effective technique. Continuous Integration and Trunk-Based Development When ThoughtWorks started using Continuous Integration in 2000, we wrote CruiseControl, a daemon that automatically built the software product after each commit to mainline. Since then, many such tools (e.g. Jenkins, Team City, Travis CI, Circle CI, Bamboo, and many others) have been developed. But most organizations that use these tools use them to automatically build feature branches on commit - which, while useful, means they don't actually practice Continuous Integration. (A better name for them would be Continuous Build tools.) Because of this Semantic Diffusion, some people started to use the term “Trunk-Based Development” instead of “Continuous Integration”. (Some people do make subtle distinctions between the two terms, but there isn't a consistent usage.) Although I’m usually a descriptivist when it comes to language, I prefer to use “Continuous Integration”. Partly this is because I don’t think trying to continually come up with new terms is a viable way to fight semantic diffusion. Perhaps the main reason, however, is because I think changing terminology rudely erases the contribution of the early Extreme Programming pioneers, in particular Kent Beck, who coined and clearly defined the practice of Continuous Integration in the 1990s. If there's no way to easily hide the partial feature, we can use feature flags. As well as hiding a partially built feature, such flags also allow the feature to be selectively revealed to a subset of users - often handy for a slow roll-out of a new feature. Integrating part-built features particularly concerns those who worry about having buggy code in mainline. Consequently, those who use Continuous Integration also need Self Testing Code, so that there's confidence that having partially built features in mainline doesn't increase the chance of bugs. With this approach, developers write tests for the partially built features as they are writing that feature code and commit both feature code and tests into mainline together (perhaps using Test Driven Development). In terms of a local repo, most people who use Continuous Integration don't bother with a separate local branch to work on. It's usually straightforward to commit to the local master and perform mainline integration when done. However it's perfectly fine to open a feature branch and do the work there, if developers prefer it, integrating back into the local master and mainline at frequent intervals. The difference between feature branching and continuous integration isn't whether or not there's a feature branch, but when developers integrate with mainline. When to use it Continuous Integration is an alternative to Feature Branching. The trade-offs between the two are sufficiently involved to deserve their own section of this article, and now is the time to tackle it.

Comparing Feature Branching and Continuous Integration Feature Branching appears to be the most common branching strategy in the industry at the moment, but there is a vocal group of practitioners who argue that Continuous Integration is usually a superior approach. The key advantage that Continuous Integration provides is that it supports higher, often a much higher, integration frequency. The difference in integration frequency depends on how small a team is able to make its features. If a team's features can all be done in less than a day, then they can perform both Feature Branching and Continuous Integration. But most teams have longer feature lengths than this - and the greater the feature length, the greater the difference between the two patterns. As I've indicated already, higher frequency of integration leads to less involved integration and less fear of integration. This is often a difficult thing to communicate. If you've lived in a world of integrating every few weeks or months, integration is likely to be a very fraught activity. It can be very hard to believe that it's something that can be done many times a day. But integration is one of those things where Frequency Reduces Difficulty. It's a counter-intuitive notion - "if it hurts - do it more often". But the smaller the integrations, the less likely they are to turn into an epic merge of misery and despair. With feature branching, this argues for smaller features: days not weeks (and months are right out). Continuous Integration allows a team to get the benefits of high-frequency integration, while decoupling feature length from integration frequency. If a team prefers feature lengths of a week or two, Continuous Integration allows them to do this while still getting all the benefits of the highest integration frequency. Merges are smaller, requiring less work to deal with. More importantly, as I explained above, doing merges more frequently reduces the risk of a nasty merge, which both cuts out the bad surprises that this brings and reduces the overall fear of merging. If conflicts arise in the code, high-frequency integration discovers them quickly, before they lead to those nasty integration problems. These benefits are strong enough that there are teams that have features that only take a couple of days that still do Continuous Integration. The clear downside of Continuous Integration is that it lacks the closure of that climactic integration to mainline. Not just is this a lost celebration, it's a risk if a team isn't good at keeping a Healthy Branch. Keeping all the commits of a feature together also makes it possible to make a late decision on whether to include a feature in an upcoming release. While feature flags allow features to be switched on or off from the users' perspective, the code for the feature is still in the product. Concerns about this are often overblown, after all code doesn't weigh anything, but it does mean that teams who want to do Continuous Integration must develop a strong testing regimen so they can be confident that mainline remains healthy even with many integrations a day. Some teams find this skill difficult to imagine, but others find it to be both possible and liberating. This prerequisite does mean that Feature Branching is better for teams that don't force a Healthy Branch and require release branches to stabilize code before release. While the size and uncertainty of merges is the most obvious problem with Feature Branching, the biggest problem with it may be that it can deter refactoring. Refactoring is at its most effective when it's done regularly and with little friction. Refactoring will introduce conflicts, if these conflicts aren't spotted and resolved quickly, merging gets fraught. Refactoring thus works best with a high frequency of integration, so it's no surprise that it became popular as part of Extreme Programming which also has Continuous Integration as one of the original practices. Feature Branching also discourages developers from making changes that aren't seen as part of the feature being built, which undermines the ability of refactoring to steadily improve a code base. We found that having branches or forks with very short lifetimes (less than a day) before being merged into trunk, and less than three active branches in total, are important aspects of continuous delivery, and all contribute to higher performance. So does merging code into trunk or master on a daily basis. -- State of DevOps Report 2016 When I come across scientific studies of software development practices, I usually remain unconvinced due to serious problems with their methodology. One exception is the State Of Dev Ops Report, which has developed a metric of software delivery performance, which they correlated to a wider measure of organizational performance, which in turn correlates to business metrics such as return on investment and profitability. In 2016, they first assessed Continuous Integration and found it contributed to higher software development performance, a finding that has been repeated in every survey since. Using Continuous Integration doesn't remove the other advantages of keeping features small. Frequently releasing small features provides a rapid feedback cycle which can do wonders for improving a product. Many teams that use Continuous Integration also strive to build thin slices of product and release new features as frequently as they can. Feature Branching All the code in a feature can be assessed for quality as a unit

Feature code only added to product when feature is complete

Less frequent merges Continuous Integration Supports higher frequency integration than feature length

Reduced time to find conflicts

Smaller merges

Encourages refactoring

Requires commitment to healthy branches (and thus self-testing code)

Scientific evidence that it contributes to higher software delivery performance Feature Branching and Open Source Many people ascribe the popularity of Feature Branching to GitHub and the pull-request model which originated in open-source development. Given that, it's worthwhile to understand the very different contexts that exist between open-source work and much commercial software development. Open-source projects are structured in many different ways, but a common structure is that of a one person, or a small group, that acts as the maintainer doing most of the programming. The maintainer works with a larger group of programmers who are contributors. The maintainer usually doesn't know the contributors, so has no sense of the quality of the code they contribute. The maintainer also has little certainty about how much time the contributors will actually put into the work, let alone how effective they are at getting things done. In this context, Feature Branching makes a whole lot of sense. If someone is going to add a feature, small or large, and I have no idea when (or if) it's going to be finished, then it makes sense for me to wait till it's done before integrating. It's also more important to be able to review the code, to ensure it passes whatever quality bar I have for my code base. But many commercial software teams have a very different working context. There's a full-time team of people, all of which commit substantial, usually full-time, to the software. The leaders of the project know these people well (other than when they just start) and can have a reliable expectation of code quality and ability to deliver. Since they are paid employees, the leaders also have greater control about time put into the project and on such things as coding standards and group habits. Given this very different context, it should be clear that a branching strategy for such commercial teams need not be the same as that which operates in the open-source world. Continuous Integration is near-impossible fit for occasional contributors to open-source work, but is a realistic alternative for commercial work. Teams should not assume that what works for an open-source environment is automatically correct for their different context.

Reviewed Commits Every commit to mainline is peer-reviewed before the commit is accepted. Code review has long been encouraged as a way of improving code quality, improving modularity, readability, and removing defects. Despite this, commercial organizations often found it difficult to fit into software development workflows. The open-source world, however, widely adopted the idea that contributions to a project should be reviewed before accepting them onto the project's mainline, and this approach has spread widely through development organizations in recent years, particularly in Silicon Valley. A workflow like this fits particularly well with the GitHub mechanism of pull-requests. A workflow like this begins when Scarlett finishes a hunk of work that she wishes to integrate. As she does Mainline Integration (assuming her team practices that) once she has a successful build, but before she pushes to mainline, she sends her commit out for review. Some other member of the team, say Violet, then does a code review on the commit. If she has problems with the commit, she makes some comments and there's some back-and-forth until both Scarlett and Violet are happy. Only once they are done is the commit placed on mainline. Reviewed Commits grew in popularity with open source, where they fit very well with the organizational model of committed maintainers and occasional contributors. They allow the maintainer to keep a close eye on any contributions. They also mesh well with Feature Branching, since a completed feature marks a clear point to do a code review like this. If you're not certain that a contributor is going to complete a feature, why review their partial work? Better to wait until the feature is complete. The practice also spread widely at the larger internet firms, Google and Facebook both build special tooling to help make this work smoothly. Developing the discipline for timely Reviewed Commits is important. If a developer finishes some work, and goes onto something else for a couple of days, then that work is no longer on the top of their mind when the review comments come back. This is frustrating with a completed feature, but it's much worse for a partially completed feature, where it may be difficult to make further progress until the review is confirmed. In principle, it's possible to do Continuous Integration with Reviewed Commits, and indeed it's possible in practice too - Google follows this approach. But although this is possible, it's hard, and relatively rare. Reviewed Commits and Feature Branching are the more common combination. When to use it Conflating OSS and private software development team needs is like the original sin of current software development rituals -- Camille Fournier Although Reviewed Commits have become a popular practice over the last decade, there are downsides and alternatives. Even when done well, Reviewed Commits always introduces some latency into the integration process, encouraging a lower integration frequency. Pair Programming offers a continuous code review process, with a faster feedback cycle than waiting for a code review. (Like Continuous Integration and Refactoring, it's one of the original practices of Extreme Programming). Many teams that use reviewed commits don't do them quickly enough. The valuable feedback that they can offer then comes too late to be useful. At that point there's an awkward choice between a lot of rework, or accepting something that may work, but undermines the quality of the code-base. Code review isn't confined to before the code hits the mainline. Many tech leaders find it useful to review code after a commit, catching up with developers when they see concerns. A culture of refactoring is valuable here. Done well this sets up a community where everyone on the team is regularly reviewing the code base and fixing problems that they see. The trade-offs around reviewed commits rest primarily on the social structure of the team. As I've already mentioned, open-source projects commonly have a structure of a few trusted maintainers and many untrusted contributors. Commercial teams are frequently all full-time, but may have a similar structure. The project leader (like a maintainer) trusts a small (perhaps singular) group of maintainers, and is wary of code contributed from the rest of the team. Team members may be allocated to multiple projects at once, making them much more like open-source contributors. If such a social structure exists, then Reviewed Commits and Feature Branching make a great deal of sense. But a team with a higher degree of trust often finds other mechanisms keep code quality high without adding friction to the integration process. So, while reviewed commits can be a valuable practice, it's by no means a necessary route to a healthy code base, particularly if you're looking to grow a well-balanced team that isn't overly dependent on its initial leader.

Integration Friction Pull requests add overhead to cope with low-trust situations, e.g. to allow people you don't know to offer contributions to your project. Imposing pull requests on devs in your own team is like making your family go through an airport security checkpoint to enter your home. -- Kief Morris One of the problems of Reviewed Commits, is that it often makes it more of a hassle to integrate. This is an example of integration friction - activities that make integration take time or be an effort to do. The more integration fiction there is, the more developers are inclined to lower the frequency of integration. Imagine some (dysfunctional) organization that insists that all commits to mainline need a form that takes half-an-hour to fill in. Such a regime discourages people from integrating frequently. Whatever your attitude is to Feature Branching and Continuous Integration, it's valuable to examine anything that adds this kind of friction. Unless it clearly adds value, any such friction should be removed. Manual process are a common source of friction here, particularly if it involves coordination with separate organizations. This kind of friction can often be reduced by using automated processes, improving developer educations (to remove the need), and pushing steps to later steps of a Deployment Pipeline or QA in production. You can find more ideas for eliminating this kind of friction in material on continuous integration and continuous delivery. This sort of friction also crops up in the path to production, with the same difficulties and treatments. One of the things that makes people reluctant to consider continuous integration is if they've only worked in environments with a high degree of integration friction. If it takes an hour to do an integration, then it's clearly absurd to do it several times a day. Joining a team where integration is a non-event, that someone can dash off in a few minutes feels like a different world. I suspect much of the argument about the merits of Feature Branching and Continuous Integration is muddied because people haven't experienced both of these worlds, and thus can't fully understand both points of view. Cultural factors influence integration friction - in particular the trust between members of a team. If I'm a team leader, and I don't trust my colleagues to do a decent job, then I'm likely to want to prevent commits that damage the codebase. Naturally this is one of the drivers for Reviewed Commits. But if I'm on a team where I trust the judgment of my colleagues, I'm likely to be more comfortable with a post-commit review, or cutting out the reviews entirely and rely on communal refactoring to clean up any problems. My gain in this environment is removing the friction that pre-commit reviews introduce, thus encouraging a higher-frequency of integration. Often team trust is the most important factor in the Feature Branch versus Continuous Integration argument.

The Importance of Modularity Most people who care about software architecture stress the importance of modularity to a well-behaved system. If I'm faced with making a small change to a system with poor modularity, I have to understand nearly all of it, since even a small change can ripple through so many parts of the codebase. With good modularity, however, I only need to understand the code in one or two modules, the interfaces to a few more, and can ignore the rest. This ability to reduce the effort of understanding I need is why it's worth putting so much effort on modularity as a system grows. Modularity also impacts integration. If a system has good modules then most of the time Scarlett and Violet will be working in well-separated parts of the code base, and their changes won't cause conflicts. Good modularity also enhances techniques like Keystone Interface and Branch By Abstraction to avoid the need for the isolation that branches provide. Often teams are forced to use source branching because the lack of modularity starves them of other options. Feature Branching is a poor man's modular architecture, instead of building systems with the ability to easy swap in and out features at runtime/deploytime they couple themselves to the source control providing this mechanism through manual merging. -- Dan Bodart The support goes in both directions. Despite many attempts, it remains extremely difficult to build a good modular architecture before we start programming. To achieve modularity we need to constantly watch our system as it grows and tend it in a more modular direction. Refactoring is the key to achieving this, and refactoring requires high-frequency integration. Modularity and rapid integration thus support each other in a healthy codebase. Which is all to say that modularity, while hard to achieve, is worth the effort. The effort involves good development practices, learning about design patterns, and learning from experience with the code base. Messy merges shouldn't just be closed off with an understandable desire to forget about them - instead ask why the merge is messy. These answers will often be an important clue to how modularity can be improved, improving the health of the code base, and thus enhancing the productivity of the team.