One Small Step Toward Reducing Programming Language Complexity

I've taught Python a couple of times. Something that experience made clear to me is just how many concepts and features there are, even in a language designed to be simple. I kept finding myself saying "Oh, and there's one more thing..."

Take something that you'd run into early on, like displaying what's in a dictionary:

for key, value in dictionary.iteritems(): print key, value

Tuples are a bit odd in Python, so I put off talking about them as long as possible, but that's what iteritems returns, so no more dodging that. There's multiple assignment, too. And what the heck is iteritems anyway? Why not just use the keys method instead? Working out a clean path that avoids constant footnotes takes some effort.

This isn't specific to Python. Pick any language and it likely contains a larger interconnected set of features than it first appears. Languages tend to continually grow, too, so this just gets worse over time. Opportunities to reverse that trend--backward compatibility be damned!--would be most welcome. Let me propose one.

The humble string constant has a few gotchas. How to print a string containing quotes, for example. In Python that's easy, just use single quotes around the string that has double quotes in it. It's a little more awkward in Erlang and other languages. Now open the file "c:\my_project\input.txt" under windows. You need to type "c:\\my_projects\\input.txt", but first you've got to say "Oh, and there's one more thing" and explain about how backslashes work in strings.

Which would be fine...except the backslash notation for string constants is, in the twenty-first century, an anachronism.

Who ever uses "\a" (bell)? Or "\b" (backspace)? Who even knows what "\v" (vertical tab) does? The escape sequence that gets used more than all the others combined is "

" (newline), but it's simpler to have a print function that puts a "return" at the end and one that doesn't. Then there's "\t" (tab), but it has it's own set of quirks, and it's almost always better to use spaces instead. The price for supporting a feature that few people use is core confusion about what a string literal is in the first place. "The length of "



" isn't four? What?"

There's an easy solution to all of this. Strings are literal, with no escapes of any kind. Special characters are either predefined constants (e.g., TAB , CR , LF ) or created through a few functions (e.g., char(Value) , unicode(Name) ). Normal string concatenation pastes them all together. In Python:

"Content-type: text/plain" + NL + NL

In Erlang:

"Content-type: text/plain" ++ NL ++ NL

In both cases, the compiler mashes everything together into one string. There's no actual concatenation taking place at runtime.

Note that in Python you can get rid of backslash notation by preceding a string with the "r" character (meaning "raw"), like this:

r"c:\my_projects\input.txt"

But that adds another feature to the language, one to patch up the problems caused by the first.

(If you liked this, you might like In Praise of Non-Alphanumeric Identifiers.)

permalink July 24, 2010

previously