Many bugs are implementation errors: there is a mistake in the code that makes it not do what you wanted it to do. For example, you may have accidentally left out the “list is empty” case, or written a nonterminating function. You can identify it as “definitely wrong” for a given input. Most testing, in fact most writing on software correctness, deals primarily with implementation errors.

Above that we have specification errors. The code perfectly matches your design, but your design doesn’t satisfy your requirements. Something like “we didn’t specify what happens if you load the same record twice.” These can be tougher to find than implementation errors. They span the code and the design, not just the code. Most of my writing focuses on specification errors.

Above that we have requirement errors. The design satisfies the requirements, but you have the wrong requirements entirely. Maybe it alerts when it detects an anomaly, but the client really wanted it to log. Requirement errors can be the most difficult to “debug”. Adding the client causes all sorts of logistical problems, just one of which is requirement ambiguity. A client might not even realize they wanted something until they try the product!

This is why so many Agile methods emphasize short sprints and prototyping. It reduces the time between getting requirements and finding issues in them. I wondered if tooling can also help here. Maybe they can catch particular types of errors earlier in the project, even before you prototype. I started a couple experiments on this and so far they’re cautiously promising.

This post focuses on identifying emergent ambiguity (EA): where the rules miss a given case. This can happen when there are many rules with overlapping domains: if flags A, B, and C can all potentially apply to a situation, you have 8 possible combinations of 3 flags. It’s easy to miss specifying what should happen in one of them. EA seems like one of the “easier” errors to find. If you have a finite list of rules, you can enumerate every combination and ask the client to fill in any gaps.

Let’s see this in action. We’ll use the Gilded Rose kata as an example because that’s what got me thinking about this in the first place.

The Problem

Show Problem Hi and welcome to team Gilded Rose. As you know, we are a small inn with a prime location in a prominent city ran by a friendly innkeeper named Allison. We also buy and sell only the finest goods. Unfortunately, our goods are constantly degrading in quality as they approach their sell by date. We have a system in place that updates our inventory for us. It was developed by a no-nonsense type named Leeroy, who has moved on to new adventures. Your task is to add the new feature to our system so that we can begin selling a new category of items. First an introduction to our system: All items have a SellIn value which denotes the number of days we have to sell the item

All items have a Quality value which denotes how valuable the item is

At the end of each day our system lowers both values for every item Pretty simple, right? Well this is where it gets interesting: Once the sell by date has passed, Quality degrades twice as fast

The Quality of an item is never negative

“Aged Brie” actually increases in Quality the older it gets

The Quality of an item is never more than 50

“Sulfuras”, being a legendary item, never has to be sold or decreases in Quality

“Backstage passes”, like aged brie, increases in Quality as its SellIn value approaches; Quality increases by 2 when there are 10 days or less and by 3 when there are 5 days or less but Quality drops to 0 after the concert We have recently signed a supplier of conjured items. This requires an update to our system: “Conjured” items degrade in Quality twice as fast as normal items Feel free to make any changes to the UpdateQuality method and add any new code as long as everything still works correctly. However, do not alter the Item class or Items property as those belong to the goblin in the corner who will insta-rage and one-shot you as he doesn’t believe in shared code ownership (you can make the UpdateQuality method and Items property static if you like, we’ll cover for you). Just for clarification, an item can never have its Quality increase above 50, however “Sulfuras” is a legendary item and as such its Quality is 80 and it never alters.

We’re presented with an existing system that satisfies these requirements, but terribly. Nested if statements and all that fun. In order to add the new feature, we have to refactor the existing code. This pushes the practitioner to write tests that ensure the existing behavior is unchanged.

It’s not enough to just write tests that conform to the requirements, though. This is because the requirements are incomplete. In particular:

When we say “lowers both values”, how much do we lower by? Does Quality decrease by 1, 2, 1.5? Are item types exclusive? Can something be both “backstage passes” and “aged brie”? Do we decrement quality and sell-by in an order, or simultaneously? Depending on our choice here this can affect the value of the item at boundary times, like sell_by = 0 .

The provided implementation implicitly answers all three: items are at most one additional type, we lower values 1 at a time, and we do this weird thing where we modify quality twice, both before and after the sell-by calculation. They’re ambiguous, but not the particular kind of EA I care about right now.

We’ll look instead at the value calculation logic. We have seven rules for determining the change in quality for an item. For the requirements to be complete, we should know all possible ways they can interact. The easiest way to do that is to construct a decision table.

Base Case

Decision tables map a set of finite enumerations to outputs, where every possible combination of inputs is represented in exactly one row. If there are any missed requirements, it will correspond to a missing row. If there are contradictory requirements, there will be two rows with the same input and different outputs. Here the inputs would be the item properties and the output will be the change in item quality.

Let the initial quality be q . We can decompose the problem into two decision tables. The first determines q' , the new value if we don’t restrict quality to the 0-50 range. The second determines final , which is the clamped value of q' . final then becomes the new quality.

The clamp table is pretty easy:

sulfuras? q' final T - 80 F <0 0 F 0-50 q' F 51- 50

Now for the input table. We have two enumerable inputs:

type is either (S)ulferas, (B)rie, (P)ass, or (M)isc.

is either (S)ulferas, (B)rie, (P)ass, or (M)isc. days_left is four ranges: <0 , 0-4 , 5-9 , 10- . Since these full ranges only matter for the Pass, in the rest of the cases I’ll instead use <0 and >=0 . In a real problem I wouldn’t do this, but it makes showing the concept easier here.

type days_left q' S - q P 10- q+1 P 5-9 q+2 P 0-4 q+3 P <0 0 B >=0 q+1 B <0 ??? M >=0 q-1 M <0 q-2

We have one emergent ambiguity: what happens when Brie becomes outdated? We have “Brie increases with quality over time” and “outdated items lose quality twice as fast.” How are these supposed to interact? I can see a few different ways the client might want this to go:

Brie rule overrides sell-by rule: q'=q+1 . This is what the standard implementation does. Sell-by rule overrides brie rule: q'=q-2 . When the client said “lose quality twice as fast”, they meant “degrades one step faster”, which is the same result for misc items. We have q'=(q+1)-1 , or q'=q . The client meant “changes twice as fast”, regardless of the direction. q'=q+2 . We should handle Brie in a special way not currently covered by these rules.

None of these seem particularly unlikely. Some seem more likely than others, but none of them trigger a “that’s stupid” feeling in me.

The New Rule

Then the kata adds a new rule:

“Conjured” items degrade in Quality twice as fast as normal items

Is “conjured” its own type? Maybe, maybe not. If the client ends up saying “conjured” is not its own type, and can be added to any item, we get the new table:

conjured? type days_left q' F - - see above T S - q T P 10- ??? T P 5-9 ??? T P 0-4 ??? T P <0 0 T B >=0 ??? T B <0 ??? T M >=0 q-2 T M <0 ???

Going down the list:

If you conjure a ticket, what’s the new value? Does it still gain value over time? Does it even have value at all? Maybe it’ll be considered counterfeit! No matter what, though, we can safely assume it goes to zero after the due date. There’s no reasonable reading of the rules that implies otherwise.

Does conjured brie gain quality or lose quality? This is the same problem as with overdue unconjured brie.

What happens when conjured brie goes overdue? Now instead of two intersecting rules, you have three.

What happens when a miscellaneous conjured item becomes overdue?

That last one interests me the most because it’s a “common” case. You might argue that “overdue conjured aged brie” is an edge case and it’s normal to be ambiguous about edge cases. But “overdue conjured item” might happen all the time. It even happens if “conjured” is its own type! The answer is not self-evident, as there’s at least two meaningful interpretations:

The client intended the penalties to multiply, so now it’s losing quality four times as fast. q'=q-4 .

. The client wanted the penalties to each be applied and then aggregated. So “overdue” applies -1 and “conjured” applies -1 , giving us q'=q-3 .

and “conjured” applies , giving us . Something else.

In my opinion the first choice seems the most reasonable. But the client decides that, not me! If they expected it to be q'=q-3 , then implementing q'=q-4 is a requirements error. And I know of at least one case where “apply-then-aggregate” was what the client decided.

Discussion

I get three impressions from this exercise:

There are tangible benefits to modeling requirements. “Gilded Rose” is a very small kata, something that’s not supposed to take more than hour or two. And even it has emergent ambiguity. It’s also likely the author didn’t intend this ambiguity, as they framed the kata as a “refactoring” problem. There’s requirement issues that they may have missed, and we found, via formal modeling. Decision tables are a good way of modeling requirements. Their simplicity shines here. It took me about as much time to write those DTs as it did to write and edit this paragraph. They can be understood by anyone, even without prior exposure. And with a few minutes of training, anybody can write one. You can show them to clients and they’ll know what’s going on. Decision tables aren’t the most powerful way of modeling requirements. I was fairly lucky with this problem: it didn’t involve complex state, input ordering, anything that was out of DT scope. And even then I stretched a lil’ bit with the date ranges. DTs are great because they have such incredibly high strength/weight ratios. More powerful tools are harder to learn and apply than DTs are. But it makes me optimistic that we can push this further.

Thanks to Richard Feldman and Oskar Wickström for feedback.