Programming in D for C Programmers

Et tu, D? Then fall, C! -- William Nerdspeare

Every experienced C programmer accumulates a series of idioms and techniques which become second nature. Sometimes, when learning a new language, those idioms can be so comfortable it's hard to see how to do the equivalent in the new language. So here's a collection of common C techniques, and how to do the corresponding task in D.

Since C does not have object-oriented features, there's a separate section for object-oriented issues Programming in D for C++ Programmers.

The C preprocessor is covered in The C Preprocessor vs D.

The C Way

sizeof(int) sizeof(char *) sizeof(double) sizeof(struct Foo)

The D Way

Use the size property:

int .sizeof ( char *).sizeof double .sizeof Foo.sizeof

The C Way

#include <limits.h> #include <math.h> CHAR_MAX CHAR_MIN ULONG_MAX DBL_MIN

The D Way

char .max char .min ulong .max double .min

C to D types

bool => bit char => char signed char => byte unsigned char => ubyte short => short unsigned short => ushort wchar_t => wchar int => int unsigned => uint long => int unsigned long => uint long long => long unsigned long long => ulong float => float double => double long double => real _Imaginary long double => ireal _Complex long double => creal

Although char is an unsigned 8 bit type, and wchar is an unsigned 16 bit type, they have their own separate types in order to aid overloading and type safety.

Ints and unsigneds in C are of varying size; not so in D.

The C Way

#include <fp.h> NAN INFINITY #include <float.h> DBL_DIG DBL_EPSILON DBL_MANT_DIG DBL_MAX_10_EXP DBL_MAX_EXP DBL_MIN_10_EXP DBL_MIN_EXP

The D Way

double .nan double .infinity double .dig double .epsilon double .mant_dig double .max_10_exp double .max_exp double .min_10_exp double .min_exp

The C Way

#include <math.h> float f = fmodf(x,y); double d = fmod(x,y); long double r = fmodl(x,y);

The D Way

float f = x % y; double d = x % y; real r = x % y;

The C Way

#include <math.h> if (isnan(x) || isnan(y)) result = FALSE; else result = (x < y);

The D Way

result = (x < y);

The C Way

D supports the remainder ('%') operator on floating point operands:C doesn't define what happens if an operand to a compare is NAN, and few C compilers check for it (the Digital Mars C compiler is an exception, DM's compilers do check for NAN operands).D offers a full complement of comparisons and operators that work with NAN arguments.

C doesn't directly support assert, but does support __FILE__ and __LINE__ from which an assert macro can be built. In fact, there appears to be practically no other use for __FILE__ and __LINE__.

#include <assert.h> assert(e == 0);

The D Way

assert (e == 0);

The C Way

#define ARRAY_LENGTH 17 int array[ARRAY_LENGTH]; for (i = 0; i < ARRAY_LENGTH; i++) array[i] = value;

The D Way

int array[17]; array[] = value;

The C Way

D simply builds assert into the language:

The array length is defined separately, or a clumsy sizeof() expression is used to get the length.

#define ARRAY_LENGTH 17 int array[ARRAY_LENGTH]; for (i = 0; i < ARRAY_LENGTH; i++) func(array[i]);

int array[17]; for (i = 0; i < sizeof(array) / sizeof(array[0]); i++) func(array[i]);

The D Way

int array[17]; for (i = 0; i < array.length; i++) func(array[i]);

int array[17]; foreach ( int value; array) func(value);

The C Way

#include <stdlib.h> int array_length; int *array; int *newarray; newarray = (int *) realloc(array, (array_length + 1) * sizeof(int)); if (!newarray) error("out of memory"); array = newarray; array[array_length++] = x;

The D Way

int [] array; array.length = array.length + 1; array[array.length - 1] = x;

The C Way

#include <string.h> char *s1; char *s2; char *s; // Concatenate s1 and s2, and put result in s free(s); s = (char *)malloc((s1 ? strlen(s1) : 0) + (s2 ? strlen(s2) : 0) + 1); if (!s) error("out of memory"); if (s1) strcpy(s, s1); else *s = 0; if (s2) strcpy(s + strlen(s), s2); // Append "hello" to s char hello[] = "hello"; char *news; size_t lens = s ? strlen(s) : 0; news = (char *) realloc(s, (lens + sizeof(hello) + 1) * sizeof(char)); if (!news) error("out of memory"); s = news; memcpy(s + lens, hello, sizeof(hello));

The D Way

char [] s1; char [] s2; char [] s; s = s1 ~ s2; s ~= "hello" ;

The C Way

#include <stdio.h> printf("Calling all cars %d times!

", ntimes);

The D Way

printf( "Calling all cars %d times!

" , ntimes);

import std.stdio; writefln( "Calling all cars %s times!" , ntimes);

The C Way

void forwardfunc(); void myfunc() { forwardfunc(); } void forwardfunc() { ... }

The D Way

void myfunc() { forwardfunc(); } void forwardfunc() { ... }

The C Way

void function(void);

The D Way

void function () { ... }

The C Way

for (i = 0; i < 10; i++) { for (j = 0; j < 10; j++) { if (j == 3) goto Louter; if (j == 4) goto L2; } L2: ; } Louter: ;

The D Way

Louter: for (i = 0; i < 10; i++) { for (j = 0; j < 10; j++) { if (j == 3) break Louter; if (j == 4) continue Louter; } }

The C Way

The D Way

The C Way

typedef struct ABC { ... } ABC;

The D Way

struct ABC { ... }

The C Way

#include <string.h> void dostring(char *s) { enum Strings { Hello, Goodbye, Maybe, Max }; static char *table[] = { "hello", "goodbye", "maybe" }; int i; for (i = 0; i < Max; i++) { if (strcmp(s, table[i]) == 0) break; } switch (i) { case Hello: ... case Goodbye: ... case Maybe: ... default: ... } }

The D Way

void dostring( char [] s) { switch (s) { case "hello" : ... case "goodbye" : ... case "maybe" : ... default : ... } }

The C Way

#pragma

#pragma pack(1) struct ABC { ... }; #pragma pack()

#pragma

The D Way

or:The length of an array is accessible through the property "length".or even better:C cannot do this with arrays. It is necessary to create a separate variable for the length, and then explicitly manage the size of the array:D supports dynamic arrays, which can be easily resized. D supports all the requisite memory management.There are several difficulties to be resolved, like when can storage be freed, dealing with null pointers, finding the length of the strings, and memory allocation:D overloads the operators ~ and ~= for char and wchar arrays to mean concatenate and append, respectively:printf() is the general purpose formatted print routine:What can we say? printf() rules:writefln() improves on printf() by being type-aware and type-safe:Functions cannot be forward referenced. Hence, to call a function not yet encountered in the source file, it is necessary to insert a function declaration lexically preceding the call.The program is looked at as a whole, and so not only is it not necessary to code forward declarations, it is not even allowed! D avoids the tedium and errors associated with writing forward referenced function declarations twice. Functions can be defined in any order.D is a strongly typed language, so there is no need to explicitly say a function takes no arguments, just don't declare it has having arguments.Break and continue statements only apply to the innermost nested loop or switch, so a multilevel break must use a goto:Break and continue statements can be followed by a label. The label is the label for an enclosing loop or switch, and the break applies to that loop.The much maligned goto statement is a staple for professional C coders. It's necessary to make up for sometimes inadequate control flow statements.Many C-way goto statements can be eliminated with the D feature of labeled break and continue statements. But D is a practical language for practical programmers who know when the rules need to be broken. So of course D supports goto statements.It's annoying to have to put the struct keyword every time a type is specified, so a common idiom is to use:Struct tag names are not in a separate name space, they are in the same name space as ordinary names. Hence:Given a string, compare the string against a list of possible values and take action based on which one it is. A typical use for this might be command line argument processing.The problem with this is trying to maintain 3 parallel data structures, the enum, the table, and the switch cases. If there are a lot of values, the connection between the 3 may not be so obvious when doing maintenance, and so the situation is ripe for bugs. Additionally, if the number of values becomes large, a binary or hash lookup will yield a considerable performance increase over a simple linear search. But coding these can be time consuming, and they need to be debugged. It's typical that such just never gets done.D extends the concept of switch statements to be able to handle strings as well as numbers. Then, the way to code the string lookup becomes straightforward:Adding new cases becomes easy. The compiler can be relied on to generate a fast lookup scheme for it, eliminating the bugs and time required in hand-coding one.It's done through a command line switch which affects the entire program, and woe results if any modules or libraries didn't get recompiled. To address this,s are used:Buts are nonportable both in theory and in practice from compiler to compiler.

D has a syntax for setting the alignment that is common to all D compilers. The actual alignment done is compatible with the companion C compiler's alignment, for ABI compatibility. To match a particular layout across architectures, use align(1) and manually specify it.

struct ABC { int z; align (1) int x; align (4) { ... } align (2): int y; }

The C Way

struct Foo { int i; union Bar { struct Abc { int x; long y; } _abc; char *p; } _bar; }; #define x _bar._abc.x #define y _bar._abc.y #define p _bar.p struct Foo f; f.i; f.x; f.y; f.p;

The D Way

struct Foo { int i; union { struct { int x; long y; } char * p; } } Foo f; f.i; f.x; f.y; f.p;

The C Way

Sometimes, it's nice to control the layout of a struct with nested structs and unions.C doesn't allow anonymous structs or unions, which means that dummy tag names and dummy members are necessary:Not only is it clumsy, but using macros means a symbolic debugger won't understand what is being done, and the macros have global scope instead of struct scope.Anonymous structs and unions are used to control the layout in a more natural manner:

Is to do it in one statement ending with a semicolon:

struct Foo { int x; int y; } foo;

Or to separate the two:

struct Foo { int x; int y; }; // note terminating ; struct Foo foo;

The D Way

Struct definitions and declarations can't be done in the same statement:

struct Foo { int x; int y; } Foo foo;

which means that the terminating ; can be dispensed with, eliminating the confusing difference between struct {} and function block {} in how semicolons are used.

The C Way

#include <stddef> struct Foo { int x; int y; }; off = offsetof(Foo, y);

The D Way

struct Foo { int x; int y; } off = Foo.y.offsetof;

The C Way

union U { int a; long b; }; union U x = { 5 }; // initialize member 'a' to 5

The D Way

union U { int a; long b; } U x = { a:5 };

The C Way

struct S { int a; int b; }; struct S x = { 5, 3 };

The D Way

struct S { int a; int b; } S x = { b:3, a:5 };

The C Way

int a[3] = { 3,2,2 };

int b[3][2] = { 2,3, {6,5}, 3,4 };

The D Way

int [3] a = [ 3, 2, 0 ]; int [3] a = [ 3, 2 ]; int [3] a = [ 2:0, 0:3, 1:2 ]; int [3] a = [ 2:0, 0:3, 2 ];

enum color { black, red, green } int [3] c = [ black:3, green:2, red:5 ];

int [2][3] b = [ [2,3], [6,5], [3,4] ]; int [2][3] b = [[2,6,3],[3,5,4]];

The C Way

char file[] = "c:\\root\\file.c";

/"[^\\]*(\\.[^\\]*)*"/

Naturally, another macro is used:An offset is just another property:Unions are initialized using the "first member" rule:Adding union members or rearranging them can have disastrous consequences for any initializers.In D, which member is being initialized is mentioned explicitly:avoiding the confusion and maintenance problems.Members are initialized by their position within the { }s:This isn't much of a problem with small structs, but when there are numerous members, it becomes tedious to get the initializers carefully lined up with the field declarations. Then, if members are added or rearranged, all the initializations have to be found and modified appropriately. This is a minefield for bugs.Member initialization can be done explicitly:The meaning is clear, and there no longer is a positional dependence.C initializes array by positional dependence:Nested arrays may or may not have the { }:D does it by positional dependence too, but an index can be used as well. The following all produce the same result:This can be handy if the array will be indexed by an enum, and the order of enums may be changed or added to:Nested array initializations must be explicit:C has problems with the DOS file system because a \ is an escape in a string. To specifiy file c:\root\file.c:This gets even more unpleasant with regular expressions. Consider the escape sequence to match a quoted string:

In C, this horror is expressed as:

char quoteString[] = "\"[^\\\\]*(\\\\.[^\\\\]*)*\"";

The D Way

char [] file = `c:\root\file.c` ; char [] quoteString = \" r"[^\\]*(\\.[^\\]*)*" \" ;

char [] hello = "hello world"

;

Within strings, it is WYSIWYG (what you see is what you get). Escapes are in separate strings. So:The famous hello world string becomes:

Modern programming requires that wchar strings be supported in an easy way, for internationalization of the programs.

The C Way

#include <wchar.h> char foo_ascii[] = "hello"; wchar_t foo_wchar[] = L"hello";

#include <tchar.h> tchar string[] = TEXT("hello");

The D Way

char [] foo_ascii = "hello" ; wchar [] foo_wchar = "hello" ;

The C Way

enum COLORS { red, blue, green, max }; char *cstring[max] = {"red", "blue", "green" };

The D Way

enum COLORS { red, blue, green } char [][COLORS.max + 1] cstring = [ COLORS.red : "red" , COLORS.blue : "blue" , COLORS.green : "green" , ];

The C Way

typedef void *Handle; void foo(void *); void bar(Handle); Handle h; foo(h); // coding bug not caught bar(h); // ok

struct Handle__ { void *value; } typedef struct Handle__ *Handle; void foo(void *); void bar(Handle); Handle h; foo(h); // syntax error bar(h); // ok

#define HANDLE_INIT ((Handle)-1) Handle h = HANDLE_INIT; h = func(); if (h != HANDLE_INIT) ...

struct Handle__ HANDLE_INIT; void init_handle() // call this function upon startup { HANDLE_INIT.value = (void *)-1; } Handle h = HANDLE_INIT; h = func(); if (memcmp(&h,&HANDLE_INIT,sizeof(Handle)) != 0) ...

Handle, HANDLE_INIT, struct Handle__, value

The D Way

typedef void * Handle; void foo( void *); void bar(Handle); Handle h; foo(h); bar(h);

.init

typedef void * Handle = cast ( void *)(-1); Handle h; h = func(); if (h != Handle.init) ...

Handle

The C Way

struct A x, y; ... x = y;

#include <string.h> struct A x, y; ... if (memcmp(&x, &y, sizeof(struct A)) == 0) ...

C uses the wchar_t and the L prefix on strings:Things get worse if code is written to be both ascii and wchar compatible. A macro is used to switch strings from ascii to wchar:The type of a string is determined by semantic analysis, so there is no need to wrap strings in a macro call:Consider:This is fairly easy to get right because the number of entries is small. But suppose it gets to be fairly large. Then it can get difficult to maintain correctly when new entries are added.Not perfect, but better.Typedefs in C are weak, that is, they really do not introduce a new type. The compiler doesn't distinguish between a typedef and its underlying type.The C solution is to create a dummy struct whose sole purpose is to get type checking and overloading on the new type.Having a default value for the type involves defining a macro, a naming convention, and then pedantically following that convention:For the struct solution, things get even more complex:There are 4 names to remember:No need for idiomatic constructions like the above. Just write:To handle a default value, add an initializer to the typedef, and refer to it with theproperty:There's only one name to remember:While C defines struct assignment in a simple, convenient manner:it does not for struct comparisons. Hence, to compare two struct instances for equality:Note the obtuseness of this, coupled with the lack of any kind of help from the language with type checking.

There's a nasty bug lurking in the memcmp(). The layout of a struct, due to alignment, can have 'holes' in it. C does not guarantee those holes are assigned any values, and so two different struct instances can have the same value for each member, but compare different because the holes contain different garbage.

The D Way

A x, y; ... if (x == y) ...

The C Way

char string[] = "hello"; if (strcmp(string, "betty") == 0) // do strings match? ...

The D Way

char [] string = "hello" ; if (string == "betty" ) ...

D does it the obvious, straightforward way:The library function strcmp() is used:C uses 0 terminated strings, so the C way has an inherent inefficiency in constantly scanning for the terminating 0.Why not use the == operator?D strings have the length stored separately from the string. Thus, the implementation of string compares can be much faster than in C (the difference being equivalent to the difference in speed between the C memcmp() and strcmp()).

D supports comparison operators on strings, too:

char [] string = "hello" ; if (string < "betty" ) ...

The C Way

int compare(const void *p1, const void *p2) { type *t1 = (type *)p1; type *t2 = (type *)p2; return *t1 - *t2; } type array[10]; ... qsort(array, sizeof(array)/sizeof(array[0]), sizeof(array[0]), compare);

The D Way

type[] array; ... array.sort;

The C Way

volatile int *p = address; i = *p;

The D Way

int * p = address; volatile { i = *p; }

The C Way

"This text spans

\ multiple

\ lines

"

The D Way

"This text spans multiple lines "

The C Way

which is useful for sorting/searching.Although many C programmers tend to reimplmement bubble sorts over and over, the right way to sort in C is to use qsort():A compare() must be written for each type, and much careful typo-prone code needs to be written to make it work.Sorting couldn't be easier:To access volatile memory, such as shared memory or memory mapped I/O, a pointer to volatile is created:D has volatile as a statement type, not as a type modifier:String literals in C cannot span multiple lines, so to have a block of text it is necessary to use \ line splicing:If there is a lot of text, this can wind up being tedious.String literals can span multiple lines, as in:So blocks of text can just be cut and pasted into the D source.Consider a function to traverse a recursive data structure. In this example, there's a simple symbol table of strings. The data structure is an array of binary trees. The code needs to do an exhaustive search of it to find a particular string in it, and determine if it is a unique instance.

To make this work, a helper function membersearchx is needed to recursively walk the trees. The helper function needs to read and write some context outside of the trees, so a custom struct Paramblock is created and a pointer to it is used to maximize efficiency.

struct Symbol { char *id; struct Symbol *left; struct Symbol *right; }; struct Paramblock { char *id; struct Symbol *sm; }; static void membersearchx(struct Paramblock *p, struct Symbol *s) { while (s) { if (strcmp(p->id,s->id) == 0) { if (p->sm) error("ambiguous member %s

",p->id); p->sm = s; } if (s->left) membersearchx(p,s->left); s = s->right; } } struct Symbol *symbol_membersearch(Symbol *table[], int tablemax, char *id) { struct Paramblock pb; int i; pb.id = id; pb.sm = NULL; for (i = 0; i < tablemax; i++) { membersearchx(pb, table[i]); } return pb.sm; }

The D Way

This is the same algorithm in D, and it shrinks dramatically. Since nested functions have access to the lexically enclosing function's variables, there's no need for a Paramblock or to deal with its bookkeeping details. The nested helper function is contained wholly within the function that needs it, improving locality and maintainability.

The performance of the two versions is indistinguishable.

class Symbol { char [] id; Symbol left; Symbol right; } Symbol symbol_membersearch(Symbol[] table, char [] id) { Symbol sm; void membersearchx(Symbol s) { while (s) { if (id == s.id) { if (sm) error( "ambiguous member %s

" , id); sm = s; } if (s.left) membersearchx(s.left); s = s.right; } } for ( int i = 0; i < table.length; i++) { membersearchx(table[i]); } return sm; }

The C Way

int i, j; ... j = (unsigned)i >> 3;

i

int

i

myint i, j; ... j = (unsigned)i >> 3;

myint

long int

The D Way

myint i, j; ... j = i >>> 3;

The C Way

The right shift operators >> and >>= are signed shifts if the left operand is a signed integral type, and are unsigned right shifts if the left operand is an unsigned integral type. To produce an unsigned right shift on an int, a cast is necessary:Ifis an, this works fine. But ifis of a type created with typedef,andhappens to be a, then the cast to unsigned will silently throw away the most significant bits, corrupting the answer.D has the right shift operators >> and >>= which behave as they do in C. But D also has explicitly unsigned right shift operators >>> and >>>= which will do an unsigned right shift regardless of the sign of the left operand. Hence,avoids the unsafe cast and will work as expected with any integral type.Consider a reusable container type. In order to be reusable, it must support a way to apply arbitrary code to each element of the container. This is done by creating anfunction that accepts a function pointer to which is passed each element of the container contents.

A generic context pointer is also needed, represented here by void *p . The example here is of a trivial container class that holds an array of ints, and a user of that container that computes the maximum of those ints.

void apply(void *p, int *array, int dim, void (*fp)(void *, int)) { for (int i = 0; i < dim; i++) fp(p, array[i]); } struct Collection { int array[10]; }; void comp_max(void *p, int i) { int *pmax = (int *)p; if (i > *pmax) *pmax = i; } void func(struct Collection *c) { int max = INT_MIN; apply(&max, c->array, sizeof(c->array)/sizeof(c->array[0]), comp_max); }

While this works, it isn't very flexible.

The D Way

class Collection { int [10] array; void apply( void delegate ( int ) fp) { for ( int i = 0; i < array.length; i++) fp(array[i]); } } void func(Collection c) { int max = int .min; void comp_max( int i) { if (i > max) max = i; } c.apply(comp_max); }

void func(Collection c) { int max = int .min; c.apply( delegate ( int i) { if (i > max) max = i; } ); }

The C Way

#include <stdio.h> #include <stdarg.h> int sum(int dim, ...) { int i; int s = 0; va_list ap; va_start(ap, dim); for (i = 0; i < dim; i++) s += va_arg(ap, int); va_end(ap); return s; } int main() { int i; i = sum(3, 8,7,6); printf("sum = %d

", i); return 0; }

sum

The D Way

import std.stdio; int sum( int [] values ...) { int s = 0; foreach ( int x; values) s += x; return s; } int main() { int i; i = sum(8,7,6); writefln( "sum = %d" , i); return 0; }

The D version makes use ofto transmit context information for thefunction, andboth to capture context information and to improve locality.Pointers are eliminated, as well as casting and generic pointers. The D version is fully type safe. An alternate method in D makes use ofeliminating the need to create irrelevant function names.The task is to write a function that takes a varying number of arguments, such as a function that sums its arguments.There are two problems with this. The first is that thefunction needs to know how many arguments were supplied. It has to be explicitly written, and it can get out of sync with respect to the actual number of arguments written. The second is that there's no way to check that the types of the arguments provided really were ints, and not doubles, strings, structs, etc.The ... following an array parameter declaration means that the trailing arguments are collected together to form an array. The arguments are type checked against the array type, and the number of arguments becomes a property of the array: