Syntax Matters

(Note: I had been working on this post before this thread showed up on Artima today, so I figured it was an appropriate time to finish it off and publish it.)

One of Carson’s favorite phrases is to say that “syntax matters,” so I’m kind of stealing his idea here. But it’s something I firmly believe in as well, and it’s central enough to how we’ve designed GScript that it warrants a detailed explanation and defense.

The most common response to any sort of language criticism always seems to be, “But you can still do that in my language, here’s how.” But of course that response is pretty much always an option; any Turing-complete language has the same set of fundamental capabilities. The point is that it doesn’t just matter what your language can do; what matters is how you do things using the language. In the end it’s impossible to be truly objective and judgments will depend on individual taste, which is fine: I’m not trying to convince anyone that this or that way is better right now, but rather just that there are such things as better and worse ways to accomplish something and that the difference matters.

Of course, on the other hand you can take things too far; the overhead of learning new syntactic structures can be pretty high, so in my opinion it’s not worth jamming everything possible into the language in the name of greater expressivity. With that said, this entry will focus more on reasons why clean, expressive syntax is important, and the subject of how to keep things from going too far will be a different entry. I’m fairly certain that my analysis is incomplete here, but I’ve broken down my argument into 4 main reasons why the syntax of a language and how you express things matters.

Lines Of Code

All things being equal, less code is almost always better. In real life, of course, things are never really equal, but as a general rule I hope it’s non-controversial to say that being able to do task X with 50 lines of code is preferable to needing 500 lines of code to do task X. Less code takes longer to write, but the real benefits are around maintenance: less code means less of a chance of bugs, less to keep in your head, less for someone else (or yourself 6 months later) to read through and learn, less to test, and less to modify when you change the rest of the system.

There are always exceptions of course; 50 lines of incomprehensible code is probably not preferable to 200 lines of dead-simple, straight-line code, and 50 lines of highly-coupled code might not be preferable to 6 different buckets of 50 lines of independent, decoupled code. But from a language design perspective, reducing the amount of code a programmer needs to write is generally the right thing to do, and the fact that languages like Ruby and Python require so much less code than Java is one main reason why people generally end up being more productive in those languages.

Readability

I think it’s fair to say that code is often harder to read than it is to write, and it’s certainly true that code is read many more times than it’s written. As a result, writing readable code is critically important to any development project. Better syntax makes code more readable by more clearly expressing the intent of the author. For example, something like:

var userNames = users.map(\u -> u.Name)

is clearer to me than

List<String> userNames = new ArrayList<String>(); for (User user : users) { userNames.add(user.getName()) }

It’s not just a lines-of-code issue, it’s the fact that in the first case you read the word “map” and it immediately conveys a large amount of information: that the developer wanted to extract a list of names, that the operation is non-destructive, etc. The second case requires slightly more work to understand because it looks like every other for loop in Java, so you have to dig more into the details to realize what it does (“okay . . . we’re iterating here to do a simple mapping, not to perform an operation on each element, partition the list up, transform the list in place”). Of course, you can write readable code in just about any language, and in the Java case you could always refactor it into a helper method, or use some functional-like library with an anonymous inner class. The point of better syntax is often that it makes it easier to write readable code; the fact that there’s only one obvious way to do most things in Python, for example, tends to make Python code much more readable than Perl code (at least in my opinion). You can write readable code in Perl, but you have to work at it; the language itself makes it difficult. Better syntax makes it easier to write readable code, which means that the code written in that language will, on average, be more readable.

Memorability

One thing that I think people often don’t pay enough attention to is how easy it is to remember how to do things. The key metric is how often you have to use something in order to stop having to look it up or rely on an auto-complete treasure-hunt in the IDE. Are things so obvious that, even though you don’t exactly remember, your first guess is generally right? If so, the designer did a good job. Do you have to run off to the internet if you go more than 2 days without writing code using that syntax? That’s generally a bad sign.

Memorability plays into both reading and writing code; obviously it’s hard to read code if you don’t remember what the function calls mean or how the syntax elements interact, and it’s clearly laborious to write code if you have to constantly look in a reference guide.

Good syntax will cause things to stick better in your head, whereas less-clear syntax might make it nearly impossible to remember things. For example, XPath just refuses to stick in my brain; I don’t use it often enough, and while it’s powerful, the syntax is basically arbitrary as far as I can tell, which means that whenever I try to write XPath expressions or need to read someone else’s I have to look things up, slowing me down immensely. That’s the primary reason I didn’t add XPath support to GScript’s XML library; I’d much rather use closures with findFirst() or findAll() methods, since the intention is unambiguous and the extra code is more than made up for by the hours I don’t have to spend re-learning XPath every time I need to do something. Is it useful to have a consistent, declarative way to query XML trees? Sure, in cases where you don’t have a full-fledged programming language to use and something declarative and severely constrained is necessary. But otherwise it’s just too many arbitrary bits of information for me to remember on top of what I already have to keep in my head to program in Java/GScript, so I avoid it when I do have a real language to work with.

Note that this plays into both how much syntactic help you should add into a language and what syntax there should be. If you have too many special syntactic elements, it might become difficult to remember them all. If they’re arbitrary, inconsistent, or otherwise unfamiliar to people, it’ll definitely be difficult to remember them.

Discoverability

Related to the idea of memorability is the idea of discoverability: how easy is it to figure out how to do something when you’re first starting out? The easier the learning curve, the more likely you are to try to learn something new, and the less time you waste doing it. As with memorability, the ability to just guess and have that work out makes it easier to discover the correct path. In addition, well-designed syntax will often play well with auto-complete in editors, making it easier to explore using an IDE and learn an API that way. Lots more goes into that as well, such as proper encapsulation and object relationships, but syntax also helps.

The Upshot

Again, I’m not trying to convince anyone right now that my particular views on language design are right; the important thing to me is that people agree that syntax and language design matter, and that it’s the right debate to have in the first place. How you go about doing things in a language (or in interacting with an API) really does matter, and the details really are important.