Which of following two Java package structures is the least well-designed?

Figure 1: Spaghetti structure (sources here and here).

Oops! Let's try that again.

Figure 2: Two Java package structures, JUnit and Spoiklin Soice.

This blog's banged on and on about how much the package structure on the right in Figure 2 (in which circles are packages, straight lines represent down-the-page dependencies, and curved lines are up-the-page) is better than that on the left because the right's structure presents clearer dependencies, making update costs easier to predict and to update themselves often easier to implement.

Two problems, however, persist.

Firstly, graphical evaluation is subjective. Yes, most would agree that the structure on the right is better, but consider Figure 3.

Figure 3: Two more messy Java package structures, Lucene and Struts.

Which do you think the messier structure in Figure 3? (The answer is in Table 2.)

Secondly, diagrams such as these offer insight when we evaluate a small number of nodes, as on package-level, but fail before the ghastly node-pocalypse of class- and method-level.

Figure 4: A Java method-level structure. Good luck with that.

What we need is to objectively quantify spaghettiness: its messiness, its disorder. How on earth do we do that? What makes a structure messy?

Fortunately, mathematics has already defined what spaghettiness is by defining its opposite: total order. And we can apply this to computer programs, with just one teensy supposition.

Mathematics says that if a set has a binary relation with just three specific properties, then that set enjoys total ordering. Let's go through it.

Consider the three methods in Figure 5, where method a() calls b() and b() calls c(), forming the single transitive dependency: abc.

Figure 5: Three simple methods.

From this diagram, we must extract a set of numbers. We'll extract our old friend depth, where depth is method's position in the transitive dependency. Thus, a() is at position 0 (because programmers), b() is at position 1, and c() is at 2.

Figure 6: Three simple methods numbered by their depth in a transitive dependency.

Mathematics tells us that this "program" is totally ordered with respect to depth, if, when you extract these depth values and iterate over them in pairs – a pair of depth values being, say, d1 and d2 – then the following properties hold:

If d1 >= d2, and d2 >= d1, then d1 == d2. (Duh.)

If d1 d2, and d2 d3, then d1 d3. We'll come back to this.It is always true that either d1 d2 or d2 d1.

The first and third properties are rather trivial, but that second property says that if we write out our transitive dependencies then depth values should never decrease. And in Figure 6, they do not: a(0)b(1)c(2).

As our program's depth satisfies all these properties, then it is totally ordered. We have achieved mathematical objectivity.

Figure 7 shows a slightly more complicated program of two transitive dependencies, again with methods' depths indicated.

Figure 7: Oooooh! Two transitive dependencies.

Both transitive dependencies separately satisfy the three properties required.

Now let's look at a bad boy. Suppose someone grabs this code and calls e() from c(), that is, creating a dependency from c() back up to e().

Figure 8: Our first messiness.

Recall that curved lines represent dependencies that go up-the-page and with c() now depending on e() we have the transitive dependency: a(0)b(1)c(2)e(1)f(2), in which the depth value decreases at one node. This transitive dependency is therefore not totally ordered, so we cry carbohydrate!

Thus we can now define our metric. No, not "spaghettiness." Let us channel our inner squares and call it "structural disorder." A transitive dependency is structurally disordered if it does not satisfy the total order properties above, and a program's overall structural disorder is then the percentage of disordered transitive dependencies.

Let's take this puppy out for a spin.

Looking at the two packages structures in Figure 2 once again, we would intuitively expect the structure the right to be far less disordered than that on the left, and it turns out to be so:

Figure 9: JUnit disorder is 76%, but Spoiklin Soice disorder is 3%.

Although we seek an objective measure, we nevertheless expect that as structures become subjectively messier-looking, their structural disorder values should rise. We can test this by taking two perfectly structured systems, "refactoring" them by applying random dependencies between nodes and checking whether their disorder values generally rise as their structures collapse. See Figure 10.

Figure 10: Two sad, decaying systems.

We can even simplify matters by defining the (admittedly arbitrary) categorization whereby a program suffering from 50% disorder or more is spaghetti. The threshold might have been 40% or 60% - feel free to choose your own. In fact, we'll have four categories, distinguished by garish, child-friendly color coding: red and black=naughty, green and white=nice.

Evaluation 0-24% Good 25-49% Fair

50-74% Spaghetti 75-100%



Table 1: The four categories of structural disorder.

Let's point our disorder-binoculars at 15 Java programs, some quite well-known. Table 2 shows the programs and their structural disorder percentages on method-, class- and package-level. You'd expect most professionally designed programs to be "good" to "fair" on the disorder spectrum, so the table should appear overwhelmingly green and white, yes?

Program Method Class Package Cassandra 41 82 84 Zookeeper 28 85 93 ActiveMQ Broker 24 80 89 Jenkins 26 72 90 JUnit 34 78 76 Camel 22 90 70 Lucene 33 70 73 FitNesse 33 55 61 Tomcat (Coyote) 22 81 40 Maven 30 30 74 Log4j 25 59 47 Struts 11 42 74 Spring 27 60 35 Netty 22 69 20 Spoiklin Soice 26 25 3 Average 27 65 62



Table 2: The structural disorder percentages of 15 Java programs.

Oh.

It seems that we, as professional programmers, can write more or less well-structured methods, but above that...RRRR MRRRR GRRRRD.

Three points are noteworthy.

First, we chest-bump endlessly about refactoring, yet refactoring definitionally involves just one thing: improving software structure. Table 2 suggests that we fail to consider refactoring at class- and package-level.

Second, higher-level structure can provide a model, a simplified view, of the lower levels: a good package structure, for example, can offer a great map of functionality without pushing the programmer's nose into foul code. Yet our higher-level models seem vastly more disordered than that which they model. Table 2 suggests that we fail to maximize the benefits of higher-level structure.

Thirdly, Oracle will release Java9 any day now (honest!) with its new modules, offering a level of structure above even package-level. Yet we apparently lack the desire or competence to manage the levels we already have. Table 2 predicts the rise of spaghetti modules.

Figure 11: Not another code review... (source here).

So, are we still writing spaghetti code?

Hell, yes! Not only are we still writing spaghetti code, we're living in the golden age of spaghetti code, an age in which we professional programmers don't just observe and casually ignore spaghetti, we don't even recognize it in the first place.

The GOTO statement used to be the alarm that forced programmers to manage control flow in their programs. Abandoning the GOTO statement, however, in no way removes this concern but rather migrates control flow to the realm of inter-method, inter-class and inter-package dependency where (in those last two cases, at least) its complexity now thrives, far from the programmer's gaze.

The greatest trick spaghetti code ever pulled was convincing the world that it didn't exist.

Summary

This is my structure. There are many like it, but this one is mine.

My structure my best friend. It is my life. I must master it as I must master my life.

Without me, my structure is useless. Without my structure, I am useless. I must design my structure true. I must design cleaner than my enemy who is trying to out-structure me. I must embarrass him before he embarrasses me. I will...

My structure and I know that what counts in programming is not the variables we rename, the methods we extract, nor the conditionals we replace with polymorphism. We know that it is reduced disorder that counts. We will reduce disorder...