Units in C++

Writing code is hard. Anything that can help ensuring correctness should be used but this is usually not the case. For library functions which might be used by people other than the original author this is doubly true since here it is also important that the functions are used correctly in additional to being correctly implemented in the first place.

Take the following piece of code as an example:

1 #include <algorithm> 2 #include <iostream> 3 #include <ext/cmath> 4 5 6 inline auto 7 cdnf ( double v ) 8 { 9 return erfc ( v * - __gnu_cxx :: __math_constants < double >:: __one_div_root_2 ) / 2.0 ; 10 } 11 12 auto 13 european_bsm ( double sigma , double T , 14 double S0 , double K , 15 double r , double q ) 16 { 17 auto di = sigma * sqrt ( T ); 18 auto d1 = ( log ( S0 / K ) + ( r - q + sigma * sigma * 0.5 ) * T ) / di ; 19 auto d2 = d1 - di ; 20 auto K0 = K * exp (- r * T ); 21 auto ST = S0 * exp (- q * T ); 22 23 auto N1 = cdnf ( d1 ); 24 auto N2 = cdnf ( d2 ); 25 26 auto v = ST * N1 - K0 * N2 ; 27 return std :: make_pair ( v , v + K0 - ST ); 28 } 29 30 31 int 32 main () 33 { 34 auto ve1 = european_bsm ( 0.4 , 5.0 / 12.0 , 35 50.0 , 50.0 , 36 0.1 , 0.0 ); 37 std :: cout << "ve1 = " << std :: get < 0 >( ve1 ) 38 << " / " << std :: get < 1 >( ve1 ) << std :: endl ; 39 }

For those interested, it’s a very simple-minded implementation of Black-Scholes to price European options. A example which shows a couple of different problems:

the european_bsm function takes many different arguments, several (here all) with the same type;

function takes many different arguments, several (here all) with the same type; the implementation is following a formula by just computing one double floating-point value from another.

This former is handled in some languages where function parameters can be passed by name. Here is the same code and the invocation of the function in R:

1 european_bsm <- function ( sigma , T , S0 , K , r , q ) { 2 di = sigma * sqrt ( T ); 3 d1 = ( log ( S0 / K ) + ( r - q + sigma * sigma * 0.5 ) * T ) / di ; 4 d2 = d1 - di ; 5 K0 = K * exp (- r * T ); 6 ST = S0 * exp (- q * T ); 7 8 N1 = pnorm ( d1 ); 9 N2 = pnorm ( d2 ); 10 11 v = ST * N1 - K0 * N2 ; 12 return ( c ( v , v + K0 - ST )); 13 } 14 15 print ( european_bsm ( sigma = 0.4 , T = 5.0 / 12.0 , S0 = 50.0 , K = 50.0 , r = 0.1 , q = 0.0 ));

This helps somewhat but programmers do not have to use this syntax and can just pass parameters by position. Also, this does not help the implementer.

There is one lesson which can and should be learned from another discipline which can help a lot. In physics, whenever computations have to be performed, but probably at first make the units work. If and when this is the case the correctness will often automatically follow. No wonder that physicists and those in CS who work with them pushed for this functionality in the development of the C++ standard.

The result comes in the form of literal operators. Before we go there let’s make the code above more general.

Generalized BSM Function

The code for european_bsm and cdnf above is already written for reusability by using auto instead of explicit types wherever possible. Note that you need gcc 4.8 or later with the option -std=c++1y to compile the code. To utilize units all we have to do is give the parameters, which take different types of double values, different C++ types. If we do not want to sacrifice current we can use templates:

1 #include <algorithm> 2 #include <iostream> 3 #include <ext/cmath> 4 5 6 template < typename _Tsr > 7 inline auto 8 cdnf ( const _Tsr & v ) 9 { 10 typedef __decltype ( v * v ) _Tr ; 11 return erfc ( v * - __gnu_cxx :: __math_constants < _Tr >:: __one_div_root_2 ) / _Tr ( 2.0 ); 12 } 13 14 template < typename _Tv , typename _Tt , typename _Tc , typename _Ti > 15 auto 16 european_bsm ( const _Tv & sigma , const _Tt & T , 17 const _Tc & S0 , const _Tc & K , 18 const _Ti & r , const _Ti & q ) 19 { 20 typedef __decltype ( _Tv ( 0 ) * _Tv ( 0 ) * _Tt ( 0 )) _Tr ; 21 auto di = sigma * sqrt ( T ); 22 auto d1 = ( log ( S0 / K ) + ( r - q + sigma * sigma * _Tr ( 0.5 )) * T ) / di ; 23 auto d2 = d1 - di ; 24 auto K0 = K * exp (- r * T ); 25 auto ST = S0 * exp (- q * T ); 26 27 auto N1 = cdnf ( d1 ); 28 auto N2 = cdnf ( d2 ); 29 30 auto v = ST * N1 - K0 * N2 ; 31 return std :: make_pair ( v , v + K0 - ST ); 32 } 33 34 35 int 36 main () 37 { 38 auto ve1 = european_bsm ( 0.4 , 5.0 / 12.0 , 39 50.0 , 50.0 , 40 0.1 , 0.0 ); 41 std :: cout << "ve1 = " << std :: get < 0 >( ve1 ) 42 << " / " << std :: get < 1 >( ve1 ) << std :: endl ; 43 }

This is pretty straight-forward. Not much changed and here is the diff output to highlight this:

1 {+template<typename _Tsr>+} 2 inline auto 3 cdnf ( [-double-] {+const _Tsr&+} v) 4 { 5 {+typedef __decltype(v * v) _Tr;+} 6 return erfc(v * -__gnu_cxx::__math_constants< [-double-] {+_Tr+} >::__one_div_root_2) / [-2.0;-] {+_Tr(2.0);+} 7 } 8 9 {+template<typename _Tv, typename _Tt, typename _Tc, typename _Ti>+} 10 auto 11 european_bsm ( [-double-] {+const _Tv&+} sigma, [-double-] {+const _Tt&+} T, 12 [-double-] {+const _Tc&+} S0, [-double-] {+const _Tc&+} K, 13 [-double-] {+const _Ti&+} r, [-double-] {+const _Ti&+} q) 14 { 15 {+typedef __decltype(_Tv(0) * _Tv(0) * _Tt(0)) _Tr;+} 16 auto di = sigma * sqrt (T); 17 auto d1 = ( log (S0 / K) + (r - q + sigma * sigma * [-0.5)-] {+_Tr(0.5))+} * T) / di; 18 auto d2 = d1 - di; 19 auto K0 = K * exp (-r * T); 20 auto ST = S0 * exp (-q * T); 21 22 auto N1 = cdnf (d1); 23 auto N2 = cdnf (d2); 24 25 auto v = ST * N1 - K0 * N2; 26 return std:: make_pair (v, v + K0 - ST); 27 }

As you can see, apart from annotating some constants used no code changes are necessary. The code compiles and runs just as before. The changes introduced different types for numbers which serve different purposes. The template parameter _Tv , for instance, is used for volatility, _Ti for interest rate.

A New Set Of Types

This by itself does not help much but now we can actually introduce real, new types for these purposes. It is not possible to just say

typedef double u_dollar;

or something to that extend because u_dollar would not be a real type, only an alias.

1 struct u_ratio_peryear 2 { 3 typedef double type ; 4 type v ; 5 constexpr u_ratio_peryear ( const type & p ) : v ( p ) {} 6 }; 7 8 struct u_months 9 { 10 typedef double type ; 11 type v ; 12 constexpr u_months ( const type & p ) : v ( p ) {} 13 }; 14 15 struct u_sqrt_ratio_peryear 16 { 17 typedef double type ; 18 type v ; 19 constexpr u_sqrt_ratio_peryear ( const type & p ) : v ( p ) {} 20 }; 21 22 struct u_dollar 23 { 24 typedef double type ; 25 type v ; 26 constexpr u_dollar ( const type & p ) : v ( p ) {} 27 };

The u_ratio_peryear type is used to represent the interest rate, u_months to represent time (more accurately in this case: months), u_sqrt_ratio_peryear is the representation for volatility, and u_dollar is obviously a representation for currency amounts. Neither of the types adds overhead to the numeric value, they are just wrappers around a double value.

Convenience Operators

The only problem is how to make it convenient to define objects of these types. This is where literal operators come into play. We can define a few appropriate operators:

1 inline u_ratio_peryear operator "" _percent_peryear ( long double v ) 2 { 3 return u_ratio_peryear ( double ( v ) / 100.0 ); 4 } 5 6 inline u_ratio_peryear operator "" _percent_peryear ( unsigned long long v ) 7 { 8 return u_ratio_peryear ( double ( v ) / 100.0 ); 9 } 10 11 inline auto operator "" _months ( long double v ) 12 { 13 return u_months ( v ); 14 } 15 16 inline auto operator "" _volatility ( long double v ) 17 { 18 return u_sqrt_ratio_peryear ( v ); 19 } 20 21 inline auto operator "" _dollars ( long double v ) 22 { 23 return u_dollar ( v ); 24 } 25 26 inline auto operator "" _dollars ( unsigned long long v ) 27 { 28 return u_dollar ( v ); 29 }

Some of the operators are duplicated for convenience. If only a long double operator is defined then using a notation without a decimal point will fail and similarly if just the long long operator is defined. With these definitions we can now rewrite the call to european_bsm :

1 auto ve2 = european_bsm ( 0.4 _volatility , 5.0 _months , 2 50 _dollars , 50.0 _dollars , 3 10.0 _percent_peryear , 0 _percent_peryear );

The Result

Having different types for the parameters of the function makes it less likely that they are swapped. This only makes sense if the newly defined objects can be operated on and if we do not want to change the source code of european_bsm . This means we now have to define operators on these new types. This is where the main advantage will come from since we will obviously only define valid operations. This, together with the different types for the parameters makes it unlikely that any invalid code will compile. For completeness, below is the set of class definitions and the operators defined on them.

With these definitions in place and the european_bsm can be used with double values as well as the unit types. Comparing the code generated for either of the calls shows that gcc manages to completely see through all the indirections of the unit classes and all we are left with are the benefits of using units.

It is possible to further refine the units. For instance, the two monetary values are actualy different: present-day and future values. We could introduce u_future_dollar and further eliminate problems with passing values to the function and possible also in the computation which are made. Given that the compiler can eliminate all the indirections there really is no limit.

The Complete Unit Classes

The following code defines all the necessary operations to allow the european_bsm function to compile. To enable this a few more intermediate classes are defined. Whenever possible the functions are defined as inline functions. When global functions are used with double s equivalent functions are defined for the unit types. In addition a specialization for is_floating_point is needed and as an added benefit an output function to print u_dollar values is defined.