You might have seen that I have been coding a lot in D lately and as a few weeks ago there was the Expressive C++17 Coding Challenge with its winner in C++ now being public, I thought this is an excellent opportunity to show why I like D so much.

The requirements

Let me first recap the requirements of this challenge:

This command line tool should accept the following arguments: the filename of a CSV file,

the name of the column to overwrite in that file,

the string that will be used as a replacement for that column,

the filename where the output will be written.

./program <input.csv> <colum-name> <replacement-string> <output.csv>

Example input

Given this simple CSV as input

name,surname,city,country Adam,Jones,Manchester,UK Joe,Doe,Cracow,Poland Michael,Smith,Paris,France Alex,McNeil,Gdynia,Poland

the program called with:

./program input.csv city London output.csv

should write:

name,surname,city,country Adam,Jones,London,UK Joe,Doe,London,Poland Michael,Smith,London,France Alex,McNeil,London,Poland

Sounds fairly trivial, right? Please have a short look first at the “best” C++ and Rust solutions before you look at the D solution.

The solution

Okay, so here’s one way to solve this in D. If you are scared at the moment - don’t worry. I will explain it line by line below.

# !/ usr / bin / env rdmd import std . algorithm , std . exception , std . format , std . range , std . stdio ; void main ( string [] args ) { enforce ( args . length == 5 , "Invalid args

" ~ "./tool <input.csv> <colum-name> <replacement-string> <output.csv>" ); auto inFile = args [ 1 ], columName = args [ 2 ], replacement = args [ 3 ], outFile = args [ 4 ]; auto lines = File ( inFile ). byLine . map !( a => a . splitter ( "," )); auto colIndex = lines . front . countUntil ( columName ); enforce ( colIndex >= 0 , "Invalid column. Valid columns: %(%s, %)" . format ( lines . front )); auto os = File ( outFile , "w" ); os . writefln ( "%-(%s,%)" , lines . front ); foreach ( line ; lines . dropOne ) { os . writeln ( line . enumerate // iterate with an (index, value) tuple . map !( a => a . index == colIndex ? replacement : a . value ) . joiner ( "," )); } }

So how does this compare to C++17 and Rust?

Language LoC Words Characters Time C++ 125 365 2724 15s Rust 83 231 1703 6s D 17 84 710 12s D (slightly tweaked) 25 98 794 5s (4s with LTO)

CSV parsing is a lot more complicated than splitting by a delimiter and in the real world shouldn’t roll your own CSV parser. However, this article aims to analyze the expressive power of D in comparison to C++ and Rust by focusing on common example. Later in this article I will also present a solution with only 12 lines by using D’s built-in std.csv module.

I used the following D code to generate a simple CSV file with 10 fields and 10m lines:

rdmd --eval='10.iota.map!(a => "field".text(a)).join(",") .repeat(10_000_000).joiner(newline).writeln' > input_big.csv

The resulting input_big.csv has a size of 668M.

LTO stands for link-time optimization and I included it to show that we can always tweak the performance with a few easy tricks. See the benchmarking section for more details.

Aren’t you concered that Rust is faster in this benchmark?

Not at all. The challenge was to write expressive code. When performance really matters D provides the same tools as C or C++ and D even supports native interoperability with C and most of C++.

In this example, however, I/O is the bottleneck and D provides a few convenience features like using locked file handles, s.t. accessing files is thread-safe by default, or supporting unicode input. However, it’s easy to opt out of such productivity features and use other tricks like memory mapped files. For the interested readers I have attached a slightly optimized version at the end.

In addition, if you are interested in performance, Jon Degenhardt (member of eBay’s data science team), has made an excellent performance benchmark between eBay’s tsv-utils and existing CSV/TSV processing tools written in C, Go, and Rust.

1) What is #!/usr/bin/env rdmd ?

One of the favorite aspects of D is that it has a blazingly fast compiler. Period. I can compile the entire D front-end of the compiler (~200 kLoC) in less than two seconds or the entire standard library with lots and lots of compile-time function evaluation and templates and > 300 kLoC in 5 seconds from scratch without any cache or incremental compilation.

This means that the compiler is almost as fast as an interpreter and rdmd is the tool that allows handy usage as “pseudo-interpreted” language. You can invoke rdmd with any file and it will automatically figure out all required files based on your dependencies and pass them to the compiler.

It’s very popular in the D community because for small scripts one doesn’t even notice that the program is compiled to real machine code under the hood. Also if the shebang header is added and the file is executable, D scripts can be used as if they would be script files:

./main.d input.csv city London output.csv

2) So you import a bunch of libraries. What do they do?

import std.algorithm, std.exception, std.format, std.range, std.stdio;

In short std.stdio is for input and output, std.range is about D’s magic streams called “ranges” and std.algorithm abstracts on top of them and provides generic interfaces for a lot of sophisticated algorithms.

Moreover, std.exception offers methods for working with exceptions like enforce and finally std.format bundles methods for string formatting.

Don’t worry - the functionality imported from these modules will be explained soon.

3) Your program has a main function. What’s so special about it compared to C or C++?

void main ( string [] args ) { …

For starters, arrays in D have a length. Try:

args [ 5 ]. writeln ;

Compared to C/C++ null-terminated strings and arrays, it won’t segfault. It would just throw a nice Error:

core.exception.RangeError@./main.d(10): Range violation ---------------- ??:? _d_arrayboundsp [0x43b622] prog.d:9 void main.foo(immutable(char)[][]) [0x43ac93] prog.d:4 _Dmain [0x43ac67]

Oh so D performs automatic bounds-checking before accessing the memory. Isn’t that expensive?

It’s almost negligible compared to the safety it buys, but D is a language for everyone, so the people who want to squeeze out the last cycles of their processor can do so by simply compiling with -boundscheck=off (for obvious reasons this isn’t recommended).

In D, strings are arrays too and there’s another nice property about D’s arrays. They are only a view on the actual memory and you don’t copy the array, but just the view of the memory (in D it’s called a slice).

Consider this example:

int [] arr = [ 1 , 2 , 3 ]; auto bArr = arr [ 1 .. $]; bArr [] += 2 ; // this is a vectorized operation arr . writeln ; // [1, 4, 5]

There many other things D has learned from C and C++. Walter has recently written a great article on how D helps to vanquish forever these bugs that blasted your kingdom which I highly recommend if you have a C/C++ background.

4) What’s up with this enforce ?

enforce ( args . length == 5 , "Invalid args.

" ~ "./tool <input.csv> <colum-name> <replacement-string> <output.csv>" );

I have never seen the ~ operator before!

It’s the string concatenation (or more general array concatenation) operator. How often have you encountered code like a + b and needed to know the types of a and b to know whether it’s a addition or concatenation?

Why don’t you use an if statement and terminate the program explicitly?

if ( args . length < 5 ) { writeln ( "Invalid args." ); writeln ( "./tool <input.csv> <colum-name> <replacement-string> <output.csv>" ); return 1 ; }

That’s valid D too. D allows a lot of different programming styles, but this article is intended to highlight a few specific D styles like enforce .

enforce is a function defined in std.exception and throws an exception if its first argument has a falsy value.

Hmm, I looked at the documentation and saw this monster. I thought it simply throws an exception?

auto enforce ( E : Throwable = Exception , T )( T value , lazy string msg = null , string file = __FILE__ , size_t line = __LINE__ )

I don’t have the time to fully dive into D’s syntax, but auto instructs the compiler infer the return type for you. This leads to the interesting Voldemort return types as they can’t be named by the user, but that’s a good topic for another article.

The next part looks a bit complicated (E : Throwable = Exception, T) , but don’t worry yet. It means that E is a template parameter which needs to inherit from Throwable (the root of all exceptions), and is by default Exception . T is the template type of value .

Wait. I just instantiated a template without specifying its template parameters?

Yes, the D compiler does all the hard work for you. The technical term is Implicit Function-Template Instantiation (IFTI). Of course, we could have instructed enforce to throw a custom exception, but more on template instantiation later.

Alright. So this function takes a generic value and a msg , but a lazy string msg ?

lazy is a special keyword in D and tells the compiler to defer the evaluation of an argument expression until is actually needed.

I don’t understand. msg seems to be a string concatentation of two strings. Isn’t this done before the enforce is called?

"Invalid args.

" ~ "./tool <input.csv> <colum-name> <replacement-string> <output.csv>"

No, lazy is lazy and the string concatenation doesn’t happen at the caller site, but can be requested explicitly by the callee.

It gets a bit clearer if we look at the second enforce as there’s runtime work involved:

enforce ( colIndex < 0 , "Invalid column name. Valid are: %(%s, %)" . format ( lines . front ));

format and all the expensive work of formatting the error message is never done on the default path, but only if an exception actually gets thrown. Ignore the %(%s, %) formatting string for a bit, it will be explained soon.

Ok, but how does that work?

In short: the compiler does a few smart lowerings for you and creates an anonymous lambda. It’s more complicated in practice, and interested readers can learn more at Walter’s advanced article D’s lazy .

For now I will use a simple trick to show what’s going on under the hood. The AST explorer at run.dlang.io allows us to peek at the internal representation of a D source file in the compiler after all semantic processing has been done. This means we can see that for the first enforce the concatenation is even done at compile-time:

enforce ( args . length == 5L U , delegate const ( char )[]() => "Invalid args\x0a./tool <input.csv> <colum-name> <replacement-string> <output.csv>" , "onlineapp.d" , 4L U );

As mentioned this is a representation of the internal state of the compiler. Hence, nice aliases like string which is an alias for an array of const(char) elements are resolved and numeric types are serialized with their inferred type. Similarly, \x0a is the hexadecimal representation of the new line character

and delegate const(char)[]() => is a lambda function without arguments that returns a string . Of course, D has a shorthand syntax for lambda functions: () => "hello" , but the compiler internally expands this syntax sugar.

But there’s more magic here. What’s __FILE__ and __LINE__ ?

string file = __FILE__ , size_t line = __LINE__

Remember that D is a compiled language and accessing the stack isn’t as easy as asking the interpreter nicely. These two default arguments are automatically set by the compiler with the file and line number of the caller. This is important for logging or throwing exceptions like we have done here.

So, an API author can simply say “Hey, I would like to know the line number of my caller.” and doesn’t depend on the user hacking the replacements like its done in C/C++ with preprocessor macros:

#ifdef SPDLOG_DEBUG_ON #define SPDLOG_DEBUG(logger, ...) logger->debug(__VA_ARGS__) << " (" << __FILE__ << " #" << __LINE__ <<")"; #else #define SPDLOG_DEBUG(logger, ...) #endif

In fact, D doesn’t even have a preprocessor.

5) auto and a statically typed language

auto inFile = args [ 1 ], columName = args [ 2 ], replacement = args [ 3 ], outFile = args [ 4 ];

Hmm, but what’s auto ? I thought D has a static type system?

Yes D is statically typed, but the compiler is pretty smart, so we can let him do all the hard work for us. auto is a filler word for the compiler that means “whatever the type of the assignment, use this as the type of this variable”.

6) What the heck is UFCS?

auto lines = File ( inFile ). byLine . map !( a => a . splitter ( "," ));

One of the major features of D is the Unified Function Call Syntax (UFCS). In short, the compiler will look up a function in the current namespace if it’s not found as a member function of a type, but let’s go through this step by step.

I looked at the documentation of File and it has a method byLine . So where’s the magic?

Have another look at map , it’s located in std.algorithm .

Okay, wait. How does this work?

The compiler internally rewrites the expression File.byLine.map to the following:

map ( File . byLine ());

Missing parenthesis are allowed too - after all the compiler knows that the symbol is a function.

Okay, but what’s up with this !(a => a.splitter(","))) ?

! is similar to C++/Java’s <> and allows to instantiate a template. In this case it’s a lambda function of a => a.splitter(",") . Notice that for splitter UFCS is used again and your brain might be more used to reading splitter(a, ",") for now.

7) Ranges

Okay to recap, we have taken the input of a file by line, splitting every line by commas , .

Wouldn’t this result in a lot of unnecessary allocation?

The short answer is: D uses “iterators on steroids” which are lazy and work is only done when explicitly requested. Usually range algorithms don’t even require any heap allocation as everything is done on the stack.

For example, in the next line .front returns the the first line though which countUntil explicitly iterates:

auto colIndex = lines . front . countUntil ( columnName );

So lines.front looks something like:

[ "name" , "surname" , "city" , "country" ]

countUntil will return the of the first match or -1 otherwise. It’s a bit similar to indexOf function known from e.g. JavaScript, but it accepts a template. So we could have supplied a custom predicate function:

lines . front . countUntil !( a => a . endsWith ( "ty" ));

8) std.format: and compile-time checking of parameters

The next lines are:

enforce ( colIndex >= 0 , "Invalid column name. Valid are: %(%s, %)" . format ( lines . front )); auto os = File ( outFile , "w" ); os . writefln ( "%-(%s,%)" , lines . front );

I have never seen writefln("%(%s, %)") . What happens here?

writefln is just a handy wrapper around D’s format function. format itself provides a lot of options for serialization, but it’s very similar to printf , although it does provide a few goodies like the special syntax for arrays %(%s, %) .

This syntax opens an array formatting “scope” by %( and closes it with %) . Within this array “scope” the elements should be formatted with %s (their string serialization) and use , a delimiter between the element.

"%(%s, %)" will quote the elements by default, which is useful in most cases, but - can be used to avoid quoting. However, as th Expressive C++17 Coding Challenge has an expected output without quotes, "%-(%s,%)" is used to avoid quoting and concats the elements without a delimiter. We can use rdmd to test this:

> head -n1 input.csv | rdmd --loop='writefln("%-(%s|%)", line.splitter(","))' name|surname|city|country

--loop is a simple wrapper around foreach (line; stdin.byLine) { … } and makes it even easier to use D in command-line pipes.

%( … %) a shorthand syntax that often comes in handy, but if you don’t like it there are many other ways to achieve the same result. For example, joiner :

> head - n1 input . csv | rdmd -- loop = ' line . splitter ( "," ). joiner ( "|" ). writeln ' name | surname | city | country

Let’s get back to enforce . How would such an error message look like?

object.Exception@./main.d(9): Invalid column name. Valid are: "name", "surname", "city", "country" ---------------- ??:? pure @safe void std.exception.bailOut!(Exception).bailOut(immutable(char)[], ulong, const(char[])) [0x7a34b57e] ??:? pure @safe bool std.exception.enforce!(Exception, bool).enforce(bool, lazy const(char)[], immutable(char)[], ulong) [0x7a34b4f8] ??:? _Dmain [0x7a34b17f]

Okay, but isn’t printf bad and unsafe? I heard that languages like Python are moving away from C-like formatting.

A Python library can only realize that arguments and formatted string don’t fit when it’s called. In D, the compiler knows the types of the arguments and if you pass the format string at compile-time, guess what, the format can be checked compile-time. Try to compile a format string that tries to format strings as numbers:

writefln ! "%d" ( "foo" );

The compiler will complain:

/dlang/dmd/linux/bin64/../../src/phobos/std/stdio.d(3876): Error: static assert "Incorrect format specifier for range: %d" onlineapp.d(4): instantiated from here: writefln!("%d", string)

Wow, that’s really cool. How does this work?

D has another unique feature: compile-time function evaluation (CTFE) that allows to execute almost any function at compile-time. All that happens is that writefln is instantiated at compile-time with the string as template argument and then it calls the same format function that would normally be called at run-time with the known format string. The coolest part about this is that there’s no special casing in the compiler and everything is just a few lines of library code.

9) Let’s parse the file

Now that we have found the index of the replacement column, have opened the output csv file and have already written the header to it, all that’s left is to go over the input CSV file line by line and replace the specific CSV column with the replacement :

foreach ( line ; lines . dropOne ) // remove the header { os . writeln ( line . enumerate // iterate with an (index, value) tuple . map !( a => a . index == colIndex ? replacement : a . value ) . joiner ( "," )); }

One of the cool parts of D ranges is that they are so flexible. You want to do everything in a functional way? D has you covered:

alias csvPipe = pipe !( enumerate , map !( a => a . index == colIndex ? replacement : a . value ), partial !( reverseArgs ! joiner , "_" ), ); lines . dropOne . map ! csvPipe . each !( a => os . writeln ( a ));

There’s another cool thing about D - std.parallelism . Have you ever been annoyed that a loop takes too long, but didn’t know a quick way to parallelize your code? Again, D has you covered with .parallel :

foreach ( line ; lines . parallel ) // expensive operation comes here

No way. I don’t believe this can be so simple.

Just try it yourself.

The Garbage Collector (GC)

On the internet and especially on reddit and HackerNews there’s a huge criticism of D’s decision to do use a GC. Go , Java , Ruby , JavaScript etc. all use a GC, but I can’t better phrase it than Adam D. Ruppe:

D is a pragmatic language aimed toward writing fast code, fast. Garbage collection has proved to be a smashing success in the industry, providing productivity and memory-safety to programmers of all skill levels. D’s GC implementation follows in the footsteps of industry giants without compromising expert’s ability to tweak even further.

So, ask your question:

Okay, “ability to tweak even further” sounds a bit vague, what does this mean? I can tweak the memory usage?

Well, of course you can do that, but that’s something most languages with a GC allow you to do. D allows you to get the benefit of both worlds, profit from the convenience of the GC and use manual allocation methods for the hot paths in your program. This is great, because you can use the same language for prototyping and shipping your application.

A short and simplified summary of allocation patterns in D:

RAII is supported (e.g. File you saw earlier is reference-counted and automatically deallocates its buffer and close the file once all references are dead)

you saw earlier is reference-counted and automatically deallocates its buffer and close the file once all references are dead) std.typecons provides a lot of library goodies like Unique , Scoped , RefCounted for @nogc allocation

provides a lot of library goodies like , , for allocation there’s std.experimental.allocator for everyone with custom allocation needs

for everyone with custom allocation needs malloc and friends are available in D too (everything from C is) - though if you want to use the C heap allocator I recommend its high-level wrapper

Mike Parker has recently started an extensive GC Series on the DBlog which I recommend to everyone who prefers performance over convenience.

Other goodies

std.csv

Hey, I saw that there’s std.csv in D, why didn’t you use it?

Apart from the motivation to be comparable to C++ and Rust which don’t have a built-in CSV library, it felt like cheating:

import std . algorithm , std . csv , std . functional , std . file , std . range ; void main ( string [] args ) { auto inFile = args [ 1 ], columnName = args [ 2 ], replacement = args [ 3 ], outFile = args [ 4 ]; auto records = inFile . readText . csvReader !( string [ string ])( null ); outFile . write ( records . map !(( r ) { r [ columnName ] = replacement ; return r ; }). pipe !( rows => records . header . join ( "," ) ~ "

" ~ rows . map !( r => records . header . map !( h => r [ h ]). join ( "," )). join ( "

" ) )); }

std.getopt

One of the reasons why this challenge used positional arguments and no flags is that argument parsing is pretty hard in C++. It’s not in D. std.getopt provides convenience for everything out of the box:

import std . getopt ; int main ( string [] args ) { string input , output , selectedColumn , fill = "FOO" ; auto opts = getopt ( args , "i|input" , & input , "o|output" , & output , "s|select" , "Select a column to overwrite" , & selectedColumn , "f|fill" , "Overwrite (default: FOO)" , & fill , ); if ( opts . helpWanted || input . length == 0 ) { defaultGetoptPrinter ( "./program" , opts . options ); return 1 ; } return 0 ; }

DMD, LDC and GDC

One of the things that newcomers are often getting confused by is that D has three compilers. The short summary is:

DMD (DigitalMars D compiler) - latest greatest features + fast compilation (= ideal for development)

LDC (uses the LLVM backend) - battle-tested LLVM backend + sophisticated optimizers + cross-compilation (=ideal for production)

GDC (uses the GCC backend) - similar points as LDC

Benchmark and performance

Benchmarking a language compiler is a bit tricky as very often you end up benchmarking library functions. In general, D code can be as fast as C++ and often is even faster - after all the LDC and GDC compilers have the same backend as clang++ or g++ with all its optimization logic. If you are interested to see how D programs perform against similar programs written in other languages, checkout Kostya’s benchmarks.

There’s also an excellent performance benchmark from Jon Degenhardt (member of eBay’s data science team) on how eBay’s tsv-utils compare against existing CSV/TSV processing tools written in C, Go, and Rust.

Apart from the typical -O3 and -release flags, the performance-savvy can use -boundscheck=off . Additionally LDC also makes it easy to do link-time optimization (LTO) and profile-guided optimization (PGO). According to Jon’s benchmarks LTO brings on average an additional performance gain of 10 %, just by adding the -flto=full flag. If you want to learn more about this LTO and PGO in D, checkout his superb tutorial or the in-depth technical article about LTO by Johan Engelen (one of the LDC developers).

@safe

Even though D is a system programming language that allows you to mess with pointers, raw memory and even inline assembly, it provides a sane way to deal with the dirty details. D has a @safe subset of the language in which the compiler will enforce that you don’t do anything stupid thing and shoot yourself in the feet with e.g. accessing undefined memory.

Unittest

One strategic advantage of D is that unit-testing is so easy as it’s built-in in the language and compiler. This is a valid D program:

unittest { assert ( 1 == 2 ); }

And with -unittest the compiler can be instructed to emit unittest block to the object files or binary. Here, rdmd is again a friendly tool and you can directly go ahead and test your line with you this:

rdmd -main -unittest test.d

No advanced tooling setup required. Of course, this also means that it’s particulary easy to automatically verify all examples that are listed in the documentation, because there part of the testsuite. I even went one step further and made it possible to directly edit and run the examples on dlang.org.

Other cool D features

There are many other cool features that D offers that didn’t make it in this article, but as a teaser for future articles:

Code generation within the language (cut down your boilerplate)

Strong and easy Compile-Time introspection (Meta-programming)

alias this for subtyping

for subtyping -betterC (using D without a runtime)

(using D without a runtime) mixin for easily generating code

for easily generating code A module system that doesn’t suck

debug attribute to break out of pure code

attribute to break out of code Built-in documentation

Contracts and invariants

scope(exit) and scope(failure) for structuring creation with its destruction

and for structuring creation with its destruction Native interfacing with C (and most of C++)

with for loading symbols into the current name

For a full list, see the Overview of D and don’t forget that the full language specification is readable in one evening.

Downsides

Okay, so you say D is so great, but why hasn’t it taken off?

There’s a lot more to a programming language than just the language and compiler. D has to fight with the problems all young languages have to deal with e.g. small ecosystem, few tutorials / sparse documentation and occasional rough edges. Languages like Kotlin, Rust or Go have it a lot easier, because they have a big corporate sponsor which gives these language a big boost.

Without such a boost, it’s a chicken/egg problem: if nobody is learning D, it also means that no one can write tutorials or better documentation. Also many people have learnt a few languages and use them in production. There’s little incentive for them to redesign their entire stack.

However, things improved greatly over the last years and nowadays even companies like Netflix, eBay, or Remedy Games use D. A few examples:

the fastest parallel file system for High Performance Computing is written in D

if you drive by train in Europe, chances are good that you were guided by D (Funkwerk - the company that manages the transport passenger information system - develops their software in D)

if you don’t use an Adblocker, chances are good that algorithms written in D bid in real-time for showing you advertisement (two of the leading companies in digital advertising (Sociomantic and Adroll) use D)

The organizations using D page lists more of these success stories.

Of course, D - like every other language - has its “ugly” parts, but there’s always work in progress to fix these and compared to all other languages I have worked with, the ugly parts are relatively tiny.

Where to go from here?

Okay that sounds great, but how do I install D on my system?

Use the install script:

curl https://dlang.org/install.sh | bash -s

or use your package manager.

And start hacking!

Acknowledgements

Thanks a lot to Timothee Cour, Juan Miguel Cejuela, Jon Degenhardt, Lio Lunesu, Mike Franklin, Steven Schveighoffer, Simen Kjærås, Walter Bright, Arredondo, Martin Tschierschke, Nicholas Wilson, Arun Chandrasekaran, Per Nordlöw, John Gabriele, jmh530, Dukc, tornchi, and ketmar for their helpful feedback.

A huge thanks also goes to Jonathan Boccara and Bartłomiej Filipek for organizing the Expressive C++17 Coding Challenge and opening the discussion about expressiveness of modern systems programming languages.

Attachements

It’s possible to do three easy tweaks to make I/O faster in D:

disabling auto-decoding with byCodeUnit

non-thread-safe I/O with lockingTextWriter

use of std.mmfile