Today, we will continue with the implementation of a type capable of storing short codes. For the previous post on the subject see here. This time, we will focus on type safety.

Literals

Often, in small toy examples as well as in unit tests and sometimes on other occasions, we need an ability to specify the value of the object at compile time. Typically we do it with literals:

std::string s ("ABCD"); std::string s2 {"ABC"}; // C++11 initialization syntax

We would like to have a similar capability for our type code<N> , but there is one difficulty here. Unlike for string s, not every string literal in double quotes is acceptable, only literals of size N are correct.

One thing we could do is to add a constructor that takes const char * , measures its length, and throws upon an undesired value. But throwing exceptions upon detecting incorrect programs (bad literal value is a bug in the program), is not something we should be proud of doing. While sometimes there may no better way, this is not what exceptions are for. Luckily, in our use case, there exists a way that I consider better.

We start with an observation that the type of literal "ABC" is const char[4] , while the type of literal "ABCD" is const char[5] .

So, the information we need is already encoded in the type, and we should be able to extract it from the type at compile-time. Indeed, instead of taking the argument as a pointer, we should take it as an array. While arrays decay to pointers in C++, array references do not:

template <int N> class code { // ... public: template <int N2> explicit code( const char(&literal)[N2] ) { static_assert(N2 == N + 1, "literal of bad size"); // copy } };

The contrived function argument declaration is how you define references to arrays in C++. It is similar to declaring pointers to functions. The expected size is N + 1 , this is to take care of the trailing null character.

Now, we get what we wanted:

code<2> cc {"CC"}; // ok code<3> ccc {"CC"}; // compile-time error!

We could have implemented our constructor in a different way:

template <int N> class code { // ... public: explicit code( const char(&literal)[N + 1] ) { // copy } };

The other one also accepts only the literals of the desired length. However, the second variant, when rejecting a too short or too long literal at compile-time will give us a less descriptive message, like “No suitable constructor found; the candidates were…”. In contrast, in the first variant, the constructor template will intercept all possible literals, and we can build a more informative message, like “You intended to construct a code from a literal, but the length does not match. It will work when you adjust the length.”

You may also wonder why I have marked the constructor as explicit . It appears we could safely use it for conversions from the literals of appropriate size. Later in the post we will see why it would be a bad idea.

There is a certain security hole in this design: type const char(&)[N] applies not only to string literals, but also to regular automatic character arrays which do not necessarily end with a null character:

auto&& v1 = "AB"; // type: const char[3] const char v2[] = {'A', 'B', 'C'}; // type: const char[3]

While I acknowledge this as a security hole technically, I do not find it a practical issue.

Adding more semantics to a type

Because a code handling 2 characters and a code handling 3 characters are different types, we have achieved additional type safety. If we used std::string to represent country codes (2-letter) and airport codes (3-letter), they are encoded in the same type:

typedef std::string country_code; typedef std::string airport_code; // same type

This can cause subtle bugs, where we inadvertently pass a country code, where airport code is expected:

bool is_port_in_country(airport_code ac, country_code cc); void test() { airport_code port {"LHR"}; country_code country {"GB"}; assert (is_port_in_country(country, port)); // ^ compiles, but I meant something else! }

With our type code , we get a type-system error instead:

typedef code<2> country_code; typedef code<3> airport_code; // different type void test() { airport_code port {"LHR"}; country_code country {"GB"}; assert (is_port_in_country(country, port)); // ^ compile-time error }

This is really something. Parameter N in code<N> is used not only to select the best fitting size, but also to indicate that two types represent different things. Nobody should have any business in mixing (e.g. comparing) a country code with an airport code, so whoever requests it, has likely made a mistake. This is what I call “additional semantics.”

But while it works for different kinds of codes of different length, it does not work in the situations where we have two different kinds of codes which happen to be of same length, but otherwise have nothing in common.

For instance, if I want to use 3-letter country codes and 3-letter airport codes, they result in the same type again:

typedef code<3> country_code; typedef code<3> airport_code; // same type!

We are missing something like an opaque typedef and we could probably invent one, but because our code is already a template, we can easily change it to accommodate our need. We only need to add another, dummy, template parameter:

template <typename /*tag*/, int N> class code { // ... };

The additional (first) template parameter does not even have to have a name. Its only purpose is to be there. What we pass as the first argument is irrelevant, we only need to pass different types for different kinds of codes:

struct country_code_tag {}; // a tag (empty) class struct airport_code_tag {}; // typedef code<country_code_tag, 3> country_code; // different typedef code<airport_code_tag, 3> airport_code; // types!

Thus, the first template parameter represents a kind of the code. Codes of different kinds are encoded in different types. This also explains why we do not want to implicitly convert from a string literal to a code: a string literal is not of any particular kind: it is just a bunch of letters, whereas a code has some semantics: it represents a country or an airport or a location. In a sense we would be changing semantics, and this is where implicit conversion is inappropriate. We wouldn’t like the following (buggy) code to compile:

is_port_in_country("GBR", "LHR"); // bad order of args

Because the constructor is explicit, the above fails to compile. Even the following fails to compile:

is_port_in_country({"GBR"}, {"LHR"}); // compile-time error

Which is desirable. It is not enough to say “construct something from the literal,” you have to say which type exactly you want. This also shows why explicit works as it works with brace initializers, and why changing this behavior would be harmful: it would make the above example compile with unintended incorrect run-time behavior.

The only way to use the literals is this:

is_port_in_country(country_code{"GBR"}, airport_code{"LHR"}); // ^ type system error is_port_in_country(airport_code{"LHR"}, country_code{"GBR"}); // ^ ok

Code bloat prevention

However, with the addition of the second template parameter, we have added an unnecessary compile-time and run-time overhead: a code bloat. Definitions of the member functions of country_code and airport_code are effectively identical, yet, they will be instantiated twice, because formally these are two different types. To counteract this, we will have to separate the member functions where the extra tag parameter is irrelevant to a base class which is no longer parametrized by the tag:

template <int N> class code_base { std::string to_string() const; const char* data() const; unsigned size() const; // ... }; template <typename /*tag*/, int N> class code : public code_base<N> { friend bool operator==(code l, code r); friend bool operator< (code l, code r); // ... };

Comparison is defined in code : no one should have any business in comparing the codes of different kinds. In the rare case where it is desired, the framework provides a dedicated, and a bit verbose (on purpose) function:

template <typename tag1, typename tag2, int N> bool equal(code<tag1, N> l, code<tag2, N> r);

Variable-length codes

Sometimes the codes representing some abstract thing can vary in length. For instance, it is not uncommon that we want to store codes of size either 2 or 3; or that we want to store codes of size between 2 and 4. Our library can be easily extended to accommodate this requirement. Instead of one, we will need two integer template parameters: representing minimum and maximum length:

template <int Lo, int Hi> class code_base { // ... }; template <typename /*tag*/, int Lo, int Hi = Lo> class code : public code_base<Lo, Hi> { // ... };

For the purposes of storage, we will use parameter Hi , and pad it with zeros, so that when we want to compare the values as integers, the unused space should not affect the result. We will need to change the implementation of size() so that it stops at the first zero value (if any). We will also need to change a couple of other places where N was previously used:

template <int Lo, int Hi> class code_base { // ... public: template <int S> explicit code_base( const char(&literal)[S] ) { static_assert(S >= lo + 1, "literal too short"); static_assert(S <= Hi + 1, "literal too long"); // copy } };

And that’s it for today. there are more points to be addressed before we can call our library finished. I wanted to share with you the type-safety-related ones, as I consider them the most interesting.