There’s more and more buzz around estimates and #noestimates in software development. People like to write bold statements and go extreme about things in blogs. Usually, personal dialogues are much more balanced. Some hate estimates and believe it’s a useless activity. Some defend it with arguments of arguable truth.

I want to dig into intrinsic estimates complications, what people mean by “estimate” and what future directions we may attack.

Estimate is a Distribution

It’s impossible to give 100% precise estimate to almost anything. Driving is a very, very customary activity without creative (I hope) decisions. Sounds like we can estimate how long it will take to get from point A to point Z with a great accuracy.

I used to pick up my friend recently every day. I called him before the drive to reconcile the arrival time and pick him up without delays. The distance and the route are exactly the same and indeed I learned to estimate timing almost perfectly — 13 minutes. Still sometime it was 12 minutes and sometime it was 15 minutes. Once it took just 10 minutes (sorry, I drove too fast and all lights were green).

The point is trivial, you can’t give exact estimate to simplest tasks. Estimate is a distribution. Maybe normal, maybe narrow, but still it’s not a single number.

If we talk about software development, you have no luxury to have a narrow distribution. Hell no. You have a wide positively skewed distribution with quite significant probability to be 50% off the expected estimate. Why is that?

Software Development is Complicated

You have a feature to estimate. There are various scenarios from there. You might have dealt with a similar feature previously, and you’re pretty confident now. Or, maybe, you don’t have a clue how to attack this feature at all. In any case, you don’t know 100% of details about this feature. It is almost impossible.

Let’s take a very simple feature like “As a user I want to login into the system”. Most of you already recall some images inside your brain with login and password fields, Sign in button and Remember me checkbox. That is fine. Now we’re ready to review some details. What we need to provide the perfect estimate? We need to know the scope. Here is the checklist:

Finalized graphical design

Fields specification (max length, allowed characters, etc).

Error handling (with all possible errors copyrighted)

List of supported web browsers (Opera mobile maybe?)

List of supported locales (Japanese?)

Password strength requirements

Remember me spec (for how long should we keep this info?)

Transitions (what happens when I click Sign In?)

Security protections (brute force, various injections)

This list is related to functional specification only. Unfortunately, other things affect scope as well:

Should we write functional automated tests?

Should we update user guide or any other documentation?

Should we test other features that may be affected?

Are there more questions to ask? Oh, yes! Stop there. It’s a very interesting moment. We defined the scope and we should estimate scope. Very, very often people do exactly that. However, there are many, many things (sorry for repetitions) that affect duration. Funny enough, managers ask for “scope estimate”, but then replace them with “duration estimate” in their heads somehow. I don’t know what kind of mind trick is that, but it’s so common.

If you hear that a task will take 4 hours to complete and developer starts working on it right now, you expect it will be completed in 5-6 hours (you are smart enough to expect interruptions and got used to developers’ optimism). However, you will be quite surprised if it’ll take 2 days to get the task done. You (and I) unconsciously carry this surprise feeling through all life. But should we? Maybe 2 days is a usual duration for tasks estimated with 4 hours. You should collect the data to differentiate usual and unusual events, to know the duration distribution as well.

Well, what affects duration?

Who will implement this feature?

Will developer work on his productive or unproductive hours?

Are there any refactorings developer decides to do before the task actually starts?

What is a probability that designer will change his mind and ask for significant re-work?

How many funny pictures developer’s friends will post on Facebook today?

You may continue the list. Anyway, there’re many factors that make duration predictions impossible hard.

Software Development is a Discovery

With every feature we learn. We learn how to code, how to design, how to test. We expand the system and discover new opportunities, new improvements and new usage patterns.

What if we start implementation and think that login via Twitter would be great? What if you receive additional data and discover that your audience actually doesn’t use Twitter, but almost everybody has Google account? Well, this may sound like a new user story and it is new indeed. But recall how many times did you do some little tweaks here and there? Re-wrote error message here, added some additional checks here, changed design of that area, etc. There are many small changes you can’t expect from the beginning.

You discover improvements on the go.

These discoveries change scope. We’re very bad at predicting scope changes. We are awfully bad at predicting accumulation of many small changes. Ironically, these changes are good! Imagine you always follow the original design and original decisions. It may improve estimates and forecasts, but it will kill creativity and race to perfection. Everybody will stick to spec all the time, and in most contexts this will lead to mediocre solutions at best.

You should encourage re-work to make things better, but it’s quite hard to find a good balance between re-work in the context of current user story and creation of a new user story that will be implemented later.

There is a chance you will discover new dimensions for the product. Maybe, people start using it in a totally unexpected way. This opens even more possibilities. OK, this is another story.

How to live with that?

One option is to stop estimating. Think carefully. How are you going to use these estimates? To impose sprint commitment? To discuss team’s velocity variations on the next retrospective? To measure progress? To reduce scope creep? These are false goals.

Estimate is just one additional metric that helps us make decisions, forecast and model the future.

You can collect this metric and use it wisely (both things aren’t easy though).

One idea I have is that we can look for similar patterns in features and tasks. We may collect various attributes like technology, developers and their skills, domain knowledge, teams, development practices, process practices, etc. to set a context.

It may happen that we can apply statistic and machine learning to find these patterns. Or we can go hard way and invent a decent model that describes all these patterns. Thus we’ll be able to compare a new feature with a library of existing patterns and have the estimate distribution for this new, unestimated feature. People suck at estimating, maybe machines will not eventually.

Together with estimate distribution, we can have “some other interesting facts” like expected bugs, expected duration, expected liquidity (something David Anderson is digging into), expected re-work, etc. This can help us to provide aggregated probabilistic forecasts for whole teams and projects.

Yes, it sounds complex, but I think it’s doable in the long run. Most likely this can be applicable in the context of stable teams working on similar projects, but who knows, maybe we’ll find some general laws and models.

We can collect information about many projects in various industries and contexts (hot topic, big data, you know). This initiative is huge, but beneficial to all. I know Noam Chomsky doesn’t like this approach, but still probabilistic statistical models can provide practical results. And our young industry needs at least some practical things to rely on.

The most complex thing is how to define these patterns to compare features. It seems it will be required to split work into quite small chunks (tasks with less than a day expected estimate), provide various information about these tasks and use hierarchical structures to find similarities. I’m curious to hear any suggestions.

Another trivial idea is to narrow down the estimate distribution. This idea looks tempting. We can try to reduce or control all possible factors that affect estimate distribution, thus increasing estimates accuracy.

Let’s think how we can achieve that. We should have 100% detailed specifications up-front, ban re-work, reduce context variation (change development process rarely)… Stop. WTF? This reminds me of a good old waterfall! I hope this idea isn’t appealing to you now.

I think we should embrace estimate distribution and invent new ways to model and use it. We shouldn’t fight it. This will be a war against our allies. This will be a war against creativity, perfectionism, learning and team spirit. I’d better surrender.