User-Defined Literals in the D Programming Language While programming languages allow the user to define his own types, functions, and variable names, it's pretty rare to allow defining one's own literals. It sure would be nice to be able to do so to go with defining types.



Programming languages define many kinds of literals. The most common ones are:

string literals, like "hello"

integer literals, like 1234

floating point literals, like 1.06e+3

character literals, like 'a'

The C programming language adds some more, like:

hexadecimal integers, like 0xDEADBEEF

octal integers, like 0677

While programming languages allow the user to define his own types, functions, and variable names, it's pretty rare to allow defining one's own literals. It sure would be nice to be able to do so to go with defining types.

So why not? One answer comes from how programming language compilers are designed. The compiler operates as a series of passes:

lexing

parsing

semantic analysis

optimization

code generation



Literals are recognized in the lexing phase, while user defined things are recognized in the semantic analysis phase. Having the semantic phase feed back into the lexing phase tends to make a mess of both the language and the compilers for it. Most language designers eschew doing that with the fervor of an English professor reviewing one of my essays.

But, darn it, it sure would be nice to have user-defined literals. Hmmmm.

Let's harken back to C's octal integers, i.e. things like 0677. The leading zero makes it octal rather than decimal. Who the heck uses octal, and why is it in C? It turns out that many old machines (before the dawn of man) were programmed in octal rather than hexadecimal. The rise of the microcomputer pretty much killed off octal in favor of hexadecimal. A vestige of octal remains in the file permissions on Posix operating systems. It's pretty much all that's left of octal.

It's rare enough that having the leading 0 meaning octal often comes as a nasty surprise to modern programmers. Hence there's pressure to remove those literals. The D programming language certainly feels that pressure.[1]

But, I like octal notation. I have a soft spot for it; it feels nice and comfortable. It's like a favorite shirt that unfortunately has too many holes in it to wear in public anymore, and frankly needs to go in the rag bin to wipe oil up from my leaky hotrod. I still like octal, though, and the thought of writing Linux file permissions:

creat("filename", 0677);

as

creat("filename", ((6<<6)|(7<<3)|7));

leaves me cold.

What can we do about it?

Let's start with a D function to turn an octal string into a number:

auto octal(string s) { uint result = 0; foreach (octalDigit; s) { enforce(octalDigit >= '0' && octalDigit <= '7' && result < (1u << 29)); result = (result << 3) | (octalDigit - '0'); } return result; }

(The enforce is error checking for valid octal digits and overflows.) We can then write file permissions as:

creat("filename", octal("677"));

But because the octal value is computed at runtime rather than compile time, this just irks me like a bug in my soup. D has a perfectly marvy feature where functions can be executed at compile time rather than runtime. Let's see if this can be pressed into service.

We could try:

enum mode = octal("677"); creat("filename", mode);

and that'll work at compile time. (D enums are manifest constants.) But of course that is hardly a workable user-defined literal.

Another way to force a function to be run at compile time is to wrap it in a template:

auto octalImpl(string s) { ... same implementation as above ... } template octal(string s) { enum octal = octalImpl(s); }

D templates can use the 'eponymous name trick' where, if there is only one member of the template and it matches the name of the template, the template gets replaced by its member.

It is then used like:

creat("filename", octal!"677");

(Templates with only one argument can be called with the name!arg syntax.) This is not looking half bad. But we can make it even better:

creat("filename", octal!677);

Wait, what? Isn't 677 a decimal literal? Yes. The trick is to overload the octal template to take an integer literal, then take the number apart digit by digit and rebuild it as octal:

auto octalImpl(uint i) { uint result = 0; int n; while (i) { auto octalDigit = i % 10; i /= 10; enforce(octalDigit < 8 && result < (1u << 29)); result |= octalDigit << n; n += 3; } return result; } template octal(uint i) { enum octal = octalImpl(i); }

This all happens at compile time, which can be verified by looking at the output for:

int main() { creat("filename", octal!677); return 0; }

which is:

<table> __Dmain: <tr><td>push</td><td> </td><td>01BFh</td><td> </td><td>// octal 677 in hexadecimal</td></tr> <tr><td>mov</td><td> </td><td>EAX,offset FLAT:_DATA</td></tr> <tr><td>push</td><td> </td><td>EAX</td></tr> <tr><td>call</td><td> </td><td>near ptr _creat</td></tr> <tr><td>xor</td><td> </td><td>EAX,EAX</td></tr> <tr><td>add</td><td> </td><td>ESP,8</td></tr> <tr><td>ret</td></tr></table>

Here is the complete code conceived and implemented by Adam D. Ruppe.

The implementation is a fair bit more involved than above, but for good reason; the idea stays the same. The complete library implementation detects and minds the usual integral suffixes and automatically switches to 64-bit representation when the input string is too large -- just as you'd expect from a well-behaved literal. In fact, the code is not unlike the code handling C-style octal literals inside the compiler. That this can all be done in 'user space' is, I think, quite remarkable.

Conclusion

While this isn't technically a user-defined literal, it came surprisingly close to one: flexible notation, compile-time evaluation -- all with user-defined code, not code hardwired in the compiler.

The key to user-defined literals is compile-time evaluation of complex code (in this case, code that computes octal values from decimal values or strings). Putting 'octal' in the standard library brings progress -- it allows us to gracefully remove an obsolete and troublesome feature like octal literals from the language, and opens the door to all sorts of user-defined literals customized for user-defined types. The feature is compelling enough that we have recently decided to effectively phase out built-in C-style octal literals from the D reference compiler [2].

References

[1] 0nnn octal notation considered harmful

[2] Deprecate Octal Literals

Acknowledgments

Thanks to Andrei Alexandrescu, David Held, Eric Niebler, and Brad Roberts for reviewing a draft of this.