Feb 26, 2009

Sigils are little symbols attached to a variable name that provide some information regarding its type, scope, or simply marking it as different from non-variables. There is very little middle-ground on the opinion toward variable sigils among programmers; you either love them, or you hate them. The quintessential language containing sigils is probably Perl, followed by BASIC, and more recently Ruby. I pick these three because they use sigils for different purposes:

BASIC 1 sigils denote datatypes foo$ denotes a variable holding a string

denotes a variable holding a string foo% denotes an integer Perl sigils denote a datatype category 2 $foo denotes a scalar type

denotes a scalar type @foo denotes an array

denotes an array %foo denotes a hash

denotes a hash &foo denotes a subroutine Ruby sigils denote a variable’s scope 3 $foo denotes a global variable

denotes a global variable @foo denotes an instance variable

denotes an instance variable @@foo denotes a class variable

I personally like sigils — very much so. However, I tend to prefer the types of sigils used by Ruby rather than the finer-grained meaning attached to Perl and BASIC sigils (which is also the reason that I dislike Hungarian notation). I like being able to read my source and, at a glance, soak in the maximum amount of information. Sigils, when used sparingly, can provide a tremendous service. However, there is a fine line between sigils providing useful information and those akin to line-noise. My head tends to swim when looking at some Perl code due to the presence of sigils, but maybe that’s just me (probably not). Therefore, when I set out to design my own experimental language, effective sigils were high on my wish list.

First Cut

Since my sandbox programming language Ix is based on the CLIPS source base, I wholly adopted the CLIPS convention. That is, CLIPS denotes variables using by pre-pending ? or $? to the front of a symbol. By convention, the former was meant to denote a scalar while the latter was meant for multifield 4 variables, but they could both be used interchangeably 5. Therefore, a simple reduce function initially looked like this:

fn( reduce [?f $?lst] if(empty?($?lst) call(?f) else call(?f first($?lst) reduce(?f rest($?lst))))) reduce(_(1 2 3 4 5) +)

Not too bad, but the sigils clutter up what is effectively a succinct function. As an added disadvantage, I decided a long time ago that predicate functions in Ix should have a question mark at the end of them; therefore, in this small function the question mark has two different meanings depending on the context. But even still, I stuck with this syntax for months.

Second Cut

After writing a pile of code in the first version of Ix, I decided to add some syntactic sugar for the (call) function (see its usage above). As a result, the code above became:

fn( reduce [?f $?lst] if(empty?($?lst) ?f() else ?f(first($?lst) reduce(?f rest($?lst))))) reduce(_(1 2 3 4 5) +)

This looked a little better than the original, but there were a couple of issues that stuck with me:

The $? sigil was still too noisey The ?f() form is hideous, and $?f() even moreso The issue of differing meanings for ? still remained

Third Cut

I initially decided to live with issues #2 and #3 and instead remove the $? form altogether.

fn( reduce [?f ?lst] if(empty?(?lst) ?f() else ?f(first(?lst) reduce(?f rest(?lst))))) reduce(_(1 2 3 4 5) +)

Better? It took me a while to learn to hate this new syntax, but eventually I did. While reducing the $? noise, it introduced a whole new problem. That is, when calling predicate functions, the pattern ?(? tended to cause mass confusion (at least for me). My mind would often fill in the second question mark even in its absence thus turning something like symbol?(x) into symbol?(?x) . Why is this a problem instead of and outright syntax error? The answer is that symbols in Ix are defined as any sequence of characters not starting with a number, and not containing a small set of delimiters. 6 Therefore, in the first call x is a symbol and thus the call to symbol?() always evaluates to true. It took me only a few frustrating debugging sessions to see the err of my ways.

Today

After much despairing over the seeming disparity between wishing to keep sigils and requiring the presence of symbols as defined above, I hit on a very nice compromise. That is, who’s to say that a sigil must be a non-alphabetical character (or sequence thereof)?

fn( reduce [F Lst] if(empty?(Lst) F() else F(first(Lst) reduce(F rest(Lst))))) reduce(_(1 2 3 4 5) +)

So what happened? Simple. Variables are now identified as starting with a capital letter. Assuredly, this is nothing new in the history of programming language design, but it did solve nicely the issues above:

Less visible noise Variables and symbols are clearly delineated F() looks much nicer There is now only one meaning of ?

Of course, the symbol(x) issue now evolved into symbol(X) issue, I found that the occurrences of this mistake disappeared once the confusing ?(? pattern disappeared likewise. I think I’ve hit on the right formula for sigils in Ix. That is, I’ve reduced the granularity of their meaning to be agnostic of type and scope, while at the same time clearly separating symbols from variables.

Sigils are nice; as long as they are not abused.

-m