Triangle Classification Problem

Inspired by Glenford Myers's The Art of Software Testing, I did an experiment in July 2002 of having people supply implementations and/or test suites for the following small problem:

Given a function int f(int a, int b, int c) which takes 3 inputs that represent the lengths of sides of triangles, return an integer where the result 1 means it is equilateral triangle (all sides equal length)

2 means it is isosceles triangle (exactly 2 equal sides)

3 means it is scalene (no equal sides)

4 means it is illegal triangle

Thanks to cow-orker Greg who supplied a test driver I further hacked, as well as several intentionally wrong-but-plausible implementations to help test the test suites. I built a few intentionally wrong-but-plausible implementations as well. One of them (the one using unsigned ints) turned out to be very subtle indeed as we shall see (and I owe cow-orker Moloch thanks for proving that it was indeed a buggy implementation as I intended it to be). The other participants were all RussCon gaming buddies, plus Moloch's friend kaanchan.

Click here for a zip file with the cpp source file (containing test driver and all the submitted functions) and all the supplied test suites.

Results

russtest.txt 33 tests gregtests 46 tests stevetestcases 17 tests stevetestcases2 17 tests pjtests 343 tests claytests 57 tests dudleytests 14 tests dudleytests2 14 tests jimpartialtests.txt 4 tests seabiscuitpartialtests.txt 5 tests jptests.txt 30 tests moloch_tests.txt 46 tests kaanchan_tests.txt 5010 tests russ 0 0 0 0 0 0 1 0 0 0 0 0 0 russ ac ERROR 3 1 1 1 14 3 1 0 1 1 2 1 466 russ ab ERROR 3 6 1 1 14 3 2 2 0 1 3 1 426 russ bc ERROR 3 1 1 1 14 3 1 0 1 1 2 1 466 russ typo lessthan ERROR 3 6 0 1 30 6 4 3 0 0 3 6 1 russ unsigned ERROR 0 0 0 0 0 0 1 0 0 0 0 3 0 greg 0 0 0 0 0 0 1 0 0 0 0 0 0 greg isoceles ERROR 3 8 2 2 18 6 1 1 1 0 3 3 0 greg ignores ERROR 1 2 0 0 1 1 3 2 0 0 0 1 0 greg overflow ERROR 12 2 0 2 0 12 1 0 2 3 2 0 0 steve ERROR 3 5 1 1 8 3 2 1 1 1 1 2 1233 pj ERROR 13 3 0 2 0 13 1 0 3 4 2 1 0 sea biscuit ERROR 3 8 2 2 18 6 1 1 1 0 3 3 0 jim 1 ERROR 13 2 0 1 0 10 1 0 3 4 2 1 0 jim 2 (fixed bug) ERROR 13 2 0 1 0 10 1 0 3 4 2 1 0 dudley ERROR 13 4 0 1 0 10 1 0 3 4 2 2 0 dudley 2 ERROR 13 5 0 2 0 13 1 0 3 4 2 2 0 jeff f ERROR 13 3 0 2 0 13 1 0 3 4 2 1 0 jp 0 0 0 0 0 0 1 0 0 0 0 0 0 moloch ERROR 3 3 0 1 0 2 1 0 0 0 1 0 0 kaanchan ERROR 13 3 0 2 0 13 1 0 3 4 2 1 0 dave 0 0 0 0 0 0 1 0 0 0 0 0 0

Implementation issues

Only 4 people submitted correct implementations (myself, Greg, JP, and Dave). [Note that the first version of this report accidentally omitted Dave; his email to me with an implementation had been misplaced, alas. Sorry Dave!] (Actually I'm not 100% sure about JP's since it does wacky stuff with absolute values which makes it harder to reason about, but the various submitted test suites haven't found obviously wrong stuff with it... Our Most Assiduous Reader is encouraged to prove the correctness of JP's.)

By far the most common implementation error was forgetting about integer overflow. Most of the incorrect implementations included this as a bug. Solutions typically looked something like:

int f(int a, int b, int c) { if (a <= 0 || b <= 0 || c <= 0) { return 4; } if (! (a + b > c && a + c > b && b + c > a)) { return 4; } if (a == b && b == c) { return 1; } if (a == b || b == c || a == c) { return 2; } return 3; }

a + b > c

a > c - b

Several folks took more unnecessarily complex approaches to dealing with overflow, e.g. JP's

if ( a <= abs(b-c) || b <= abs(a-c) || c <= abs(a-b) ) { return 4; }

Sea Biscuit's looks tragically like it correctly (though needlessly complicatedly) handles overflow in testing for legality of side lengths, but sadly does the test too late, after the isosceles test, so that it mistakenly says 1 1 2 is isosceles instead of illegal:

if ((a == b) || (a == c) || (b == c)) return 2; if (((a<c)&&(b<c)&&(-c+a+b<=0)) || ((b<a)&&(c<a)&&(-a+b+c<=0)) || ((a<b)&&(c<b)&&(-b+a+c<=0))) return 4;

Then there was Moloch's resort to floating point and trigonometry:

int f_moloch( int side_a, int side_b, int side_c ) { double a = (double) side_a; double b = (double) side_b; double c = (double) side_c; double cos_gamma = ( pow(c,2) - pow(a,2) - pow(b,2) ) / (-2 * a * b); if ( side_a < 1 || side_b < 1 || side_c < 1 ) return 4; else if ( cos_gamma >= 1 || cos_gamma <= -1 ) return 4;

This solution fails on test cases such as my 2147483647 2147483647 1 2 (claiming this is an illegal triangle). I'm not sure, but I'll guess this might be due to floating point error, which would demonstrate the wisdom of staying with discrete math when possible.

Another over-engineered solution was Dudley's which bubble-sorted the 3 inputs into descending order before further analysis, leading to a much longer function than most. He later got the aha insight and submitted a second simpler implementation. Jim also sorted the inputs before further analysis.

Note there is a tendency to want to forget one of the necessary disjuncts in the isosceles tests (by analogy to only needing 2 conjuncts in the equilateral test), e.g.:

if (a == b && b == c) { return 1; } if (a == b || b == c) { return 2; }

Another interesting mistake was Steve's if (a == b == c) return 1; instead of if (a == b && b == c) return 1; ... this is of course a classic C/C++ gotcha. Too bad, since he dealt with overflow correctly (though with twice as many tests as needed, essentially analogous to JP's abs method):

if (a <= (c-b)) return 4; if (b <= (c-a)) return 4; if (c <= (b-a)) return 4; if (a <= (b-c)) return 4; if (b <= (a-c)) return 4; if (c <= (a-b)) return 4;

Jim's code actually was uncompilable as submitted (missing semicolon, which I corrected). But I'll give him credit for making a template function so one could use other numeric types than int if desired.

I'd say Dave's correct code to classify (once legality was determined) is less clear than it could be:

if(a == b) { if(a == c) return EQUILATERAL_TRIANGLE; else return ISOSCELES_TRIANGLE; } else { if(a == b || a == c || b == c) return ISOSCELES_TRIANGLE; else return SCALENE_TRIANGLE; }

a==b

if(a==b)

if (a == b && b == c) { return 1; } if (a == b || b == c || a == c) { return 2; } return 3;

Another example of redundant/cluttering code was in kanchaan's:

if ( len1 == len2 || len2 == len3 || len1 == len3 ) return 2; else // Scalene -> all 3 lengths are unequal if ( len1 != len2 && len2 != len3 && len1 != len3 ) return 3;

Test suite issues

INT_MIN INT_MAX INT_MAX

int f_russ_unsigned_error(int a, int b, int c) { // intentionally strange attempt to fix arithmetic overflow problem -- is this code correct or not? // a puzzle for the reader! :) // it seems to work, yet only Moloch's test suite found it to be buggy. unsigned int au = a, bu = b, cu = c; if (au <= 0 || bu <= 0 || cu <= 0) { return 4; } if (! (au + bu > cu && au + cu > bu && bu + cu > au)) { return 4; } if (au == bu && bu == cu) { return 1; } if (au == bu || au == cu || bu == cu) { return 2; } return 3; }

Why didn't my, Greg's, or Clay's suites detect the bugs in my unsigned implementation? I myself was sloppy and used -INT_MAX instead of INT_MIN in my input file! Amazingly, this laziness on my part (putting -2147483647 instead of -2147483648) caused my suite to miss the unsigned implementation bug, because -INT_MAX INT_MAX INT_MAX does indeed produce 4 as expected from the buggy unsigned function, whereas INT_MIN INT_MAX INT_MAX erroneously produces 2. Perhaps even more amazingly, Greg also made the same test suite lazy error!

To see why this is so, pretend we're using 4-bit instead of 32-bit integers. Then INT_MAX=0111 and INT_MIN=1000 and -INT_MAX=1001. Then -INT_MAX+INT_MAX=0000, so au+bu is not > cu=0111, and 4 is returned as expected. But INT_MIN+INT_MAX=1111 which is > cu=0111, so 4 is not returned, and the erroneous function goes on to decide the input is isosceles. Cool indeed! Who would have expected that such drastically different behavior could result from using -INT_MAX instead of INT_MIN? I am delighted to see it unexpectedly shown so clearly that testing the minimum value in a domain is crucial, even if this does show my own test suite was sloppy! :)

Had Greg & I used INT_MIN instead of -INT_MAX, we would have had correct test suites. Clay's test suite mysteriously used values that were a few integers away from INT_MIN and INT_MAX, e.g. his original text had 2 MAX_INT-4 MAX_INT-5 3 and MAX_INT-5 2 MAX_INT-4 3 -- more evidence that one should actually use the min and max legal values, rather than just values close to them.

Of course many suites did not detect the integer overflow bug. This confirms the folk wisdom that someone writing tests for their own code is likely to have the same blind spots when writing test as when writing code. It also suggests the usefulness of having a set of formal heuristics to guide test generation rather than doing random test cases or going by intuition.

A very illuminating lesson was that quality rather than quantity is important in a test suite. The 2 largest suites (343 and 5010 cases) did not perform very well. The best suites were mine (33), Greg (46), Clay (57), Steve (17), and Moloch (46). I suspect 30 or so tests are needed to feel good confidence in an implementation of this function.

PJ and kaanchan both did a fascinating tautological fallacy of using their implementation to automatically generate their test suite. Therefore their own implementations vacuously pass their own test suites! E.g. PJ systematically generated all combinations of input values from -1 to 5, making 73 = 343 test cases. I'm not sure what kaanchan did, but it looks like random generation of values, and he had 5010 tests! These are way larger suites than necessary, and what's worse, the suites are erroneous (both generating some false negatives, though interestingly neither generates a false positive because they happen not to have tried input cases that would generate overflow -- had they generated such input, their expected results would be wrong and cause some false positives as well). Note they failed to detect some buggy implementations, and totally "over-detected" some others (e.g. kaanchan's generating hundreds of failures on some of the buggy implementations, where of course only 1 failure is sufficient to point out that a bug exists.)

The lesson there is that tests cannot be created from the implementation. They need to be crafted independently of the implementation, else they'll have the same bugs as the implementation.

Further, I believe strongly that random testing is not nearly as useful as carefully crafted test cases. (I think random test generation can be useful in some circumstances, e.g. just to verify that an implementation doesn't crash. But to verify correct results being returned, you need a separate mechanism for determining the correct expected results, and you need more wisely chosen input values.) E.g. domain analysis should be done and test cases selected to cover the endpoints of domains. In particular, this problem has interesting partitions of the positive (legal) inputs and the nonpositive (illegal) input values for a given parameter, so there should be at least one test case covering each of the 4 interesting boundary conditions a==INT_MIN, a==0, a==1, a==INT_MAX, for example (a lesson I repeat here with egg on my face, given the lesson of the buggy unsigned function!)

There was a false positive test: Dudley sent a suite, then minutes later sent a corrected suite since his first had the bogus test case 7 7 15 2 which he changed to 7 7 13 2 .

I highly recommend Brian Marick's The Craft Of Software Testing for further discussion of test case design.

Thanks to all who participated! This was a fun little experiment, and I hope you found it interesting.

Return to Russ Zone