Everyone knows the sheer simplicity of the effective A/B test. Establish a baseline, produce a micro change, test and measure, determine the winner and that becomes the new baseline. Produce another small change, test and measure, determine the winner and that becomes the baseline. And so on and so forth.

But why aren’t you testing your teams in the same way? Let’s walk through a way on how to test software development teams to find the optimal team.

Start by establishing two different teams, Team A and Team B. We want to establish the optimal Team A and optimal Team B.

You now have to decide on how you measure their productivity:

If you’re practicing Scrum, you can use common KPIs such as velocity or burndown rate.

Measure success via the project iron triangle (try to be SMART).

If you’re practicing Continuous Integration, you can use a Time to Trunk measurement.

There really are a variety of methods to measure SCRUM effectiveness and efficiency. Chose two different KPIs. You’ll want to measure both teams using both of these metrics over the course of three Sprints.

Team A vs. Team B

For this example, we’ll choose velocity and quality as our metrics that we’ll use to create optimized teams.

The first step is to establish a baseline between both teams. Do not tell them how you will be measuring them whatsoever.

Why is that so important? The short answer is because of the Hawthorne Effect. You want to first establish a normalized baseline for each team without potentially adding bias or subconscious behavior.

After the teams have completed the baseline first sprint, start a second sprint and tell each team that they’re being measured via different metrics. After the second sprint is over, tell Team A they’re now going to be measured by the KPI that Team B had been measured with in Sprint 2. Similarly reverse the metrics for Team B. We need to compare the two team’s results across all of the metrics that were collected throughout the three sprint test period.

What we are interested in testing is how they react to the different KPI measurement throughout the three sprints including the baseline sprint and see how optimized they are as measured by the amount of revenue their sprint work is generating.

Let’s say that Team A is told they’re being measured by the number of completed story points or velocity by the end of the sprint. Team B is told they’re being measured by the number of new defects introduced or quality by the end of the sprint.

The theory would be that Team B would produce less velocity but with a significantly better overall quality, whereas Team A should produce more velocity but at the cost of quality. In other words, Team B may be perceived as “slow” or “less effective” while Team A seems to “rush” their work while being “sloppy”.

However now in the third sprint, Team A is told that they’re being measured by the number of new defects introduced while Team B is being measured by the number of completed story points.

Now you want to measure all of the KPIs next to one another. What is the significant delta between the two teams?

Now remember, we want to actually compare all of the metrics throughout all three sprints so we can ultimately build the most optimal team. But to do this properly we want to quantify our velocity and quality KPIs into more meaningful and actionable data. Let us suggest that a single defect costs your organization on average $1000, while each storypoint contributes an average of $800. Let’s look at our results from our three sprint A/B test:

Team A vs Team B Results

As one can see, Team A contributed 45% of the total dollars gained throughout the three sprints when they were focused on quality over velocity. However their velocity in sprint 3 was nearly in line with sprint 1 when we did not tell them how they were being measured.

Team B on the other hand did remarkably better when they were focused on completely more on velocity and less focused on quality.

So what can we surmise from this A/B test to build our optimal teams? It seems as if Team B naturally produces less defects in each sprint while Team A had roughly the same velocity throughout all three sprints but produced much lower quality while producing only a slightly better velocity than when we did not tell them how we were measuring them. But when Team A was asked to focus on quality over velocity, we see the team was able to produce a much more valuable sprint. Team B on the other hand was able to produce a much more valuable sprint when they focused on velocity over quality.

Now our baseline is that Team B should be asked to increase their velocity while Team A should be asked to increase their quality.

Now do a second round of A/B testing by diving deeper into each team and see how each individual member contributed to the above metrics. Perhaps you take a member from Team A that when asked to focus on velocity produced more defects and add them to Team B who may be able to coach them on quality, while swapping a less error prone Team B member to the Team A where they perhaps can learn how to increase their velocity. Measure both velocity and quality over the next sprints while telling each team that you are measuring both metrics.

Continue tweaking this until you get consistently optimal results.

This is just one example of how you can A/B test your software development teams. Another A/B test you could try is by building a team with different skillsets or multiple disciplines to see if that produces better results. Continue mixing and matching skillsets/disciplines until you find the most optimal. Or you can try to test team size. Another possible test could by varying the seniority of the team members ie. test a team comprised of 2 junior level with 3 senior level team members versus a team comprised 3 junior level and 2 senior level team members.

By applying similar tactics to how we A/B test software features, UIs, flows, and pricing to software development teams companies can find optimizations and savings. Never be afraid to fail and never be afraid to A/B test.