Exploring the Lower Depths of Terseness

There's a 100+ year old system for recording everything that happens in a baseball game. It uses sheet of paper with a small box for each batter. Whether that batter gets to base or is out--and why--gets coded into that box. It's a scorekeeping method that's still in use at the professional and amateur level, and at major league games you can buy a program which includes a scorecard.

What's surprising is how cryptic the commonly used system is. For starters, each position is identified by a number. The pitcher is 1. The center fielder 8. If the ball is hit to the shortstop who throws it to the first baseman, the sequence is 6-3. See, there isn't even the obvious mnemonic of the first, second, and third basemen being numbered 1 through 3 (they're 3, 4, and 5).

In programming, no one would stand for this. It breaks the rule of not having magic numbers. I expect the center fielder would be represented by something like:

visitingTeam.outfield.center

The difference, though, is that programming isn't done in real-time like scorekeeping. After the initial learning curve, 8 is much more concise, and the terseness is a virtue when recording plays with the ball moving between multiple players. Are we too quick to dismiss extremely terse syntax and justify long-winded notations because they're easier for the uninitiated to read?

Suppose you have a file where each line starts with a number in parentheses, like "(124)", and you want to replace that number with an asterisk. In the vim editor the keystrokes for this are " ^cib* " followed by the escape key. "^" moves to the start of the line. The "c" means you're going to change something, but what? The following "ib" means "inner block" or roughly "whatever is inside parentheses." The asterisk fills in the new character.

Once you get over the dense notation, you may notice a significant win: this manipulation of text in vim can be described and shared with others using only five characters. There's no "now press control+home" narrative.

The ultimate in terse programming languages is J. The boring old "*" symbol not only multiplies two numbers, but it pairwise multiplies two lists together (as if a map operation were built in) and also multiplies a scalar value with each element of a list, depending on the types of its operands.

That's what happens with two operands anyway. Each verb (the J terminology for "operator"), also works in a unary fashion, much like the minus sign in C represents both subtraction and negation. When applied to a lone value "*" is the sign function, returning either -1, 0, or 1 if the operand is negative, zero, or positive.

So now each single-character verb has two meanings, but it goes further than that. To increase the number of symbolic verbs, each can have either a period or a colon as a second character, and then each of these have both one and two operand versions. "*:" squares a single parameter or returns the nand ("not and") of two parameters. Then there's the two operand version of "*." which computes the least common multiple, and I'll give it up now before everyone stops reading.

Here's the reason for this madness: it allows a wide range of built-in verbs that never conflict with user-defined, alphanumeric identifiers. Without referencing a single library you've got access to prime number generation ("p:"), factorial ("!"), random numbers ("?"), and matrix inverse ("%.").

Am I recommending that you switch to vim for text editing and J for coding? No. But when you see an expert working with those tools, getting results with fewer keystrokes than it would take to import a Python module, let alone the equivalent scripting, then...well, there's something to the terseness that's worth remembering. It's too impressive to ignore simply because it doesn't line up with the prevailing aesthetic for readable code.

(If you liked this, you might enjoy Papers from the Lost Culture of Array Languages.)

permalink April 8, 2013

previously