Goal of these posts

These posts are designed with the goal of learning more about Modern C++, and looking at some of the conventional wisdom that I’ve always taken for granted. I’m going to start off by building a matrix library and see where it goes. This first post will be about challenging some of that conventional wisdom and looking at an underlying implementation choice. The next post I’ll look at using some Modern C++ and adding Concepts to the library.

Overview of the Idea

One thing a matrix library needs to do is represent matrices of different sizes. In this case, it is ideal to have the matrix object backed by a dynamically sized container. Ideally this container has fast access times meaning that a container such as a linked list is not ideal for this scenario. Of the common data structures arrays and hash tables fit this criteria, both having O(1) access times. Since it is possible to map each item in a matrix to a unique index (its location), an array type data structure makes sense for the underlying container.

A common piece of advice in programming is to use the standard library. However, C++’s standard library has often come under scrutiny for being too general, and due to that generality, can often be beaten by hand-crafted solutions. Let’s see if for a basic matrix implementation, if there is any difference between using a pointer to an array on a heap and the standard library’s vector type.

The Code

The code for the two matrices can be seen here and below

#ifndef __MATRIX2D_HPP__

#define __MATRIX2D_HPP__ #include <vector>

template <typename T>

class MatrixVec2D

{

std::vector<T> mat;

std::size_t rows;

std::size_t cols;

std::size_t rcToIdx(std::size_t r, std::size_t c) const

{

return cols * r + c;

} public:

MatrixVec2D<T>(std::size_t r, std::size_t c) :

mat(r*c, 0),

rows(r),

cols(c)

{

} MatrixVec2D<T>(const MatrixVec2D<T> & m):

mat(m.mat),

rows(m.rows),

cols(m.cols)

{

} std::size_t getNumRows() const

{

return rows;

}

std::size_t getNumCols() const

{

return cols;

} const T& at(std::size_t row, std::size_t col) const

{

return mat[rcToIdx(row, col)];

} T& at(std::size_t row, std::size_t col)

{

return mat[rcToIdx(row, col)];

}

}; template <typename T>

class MatrixArr2D

{

T * mat;

std::size_t rows;

std::size_t cols; std::size_t rcToIdx(std::size_t r, std::size_t c) const

{

return cols * r + c;

} public:

MatrixArr2D<T>(std::size_t r, std::size_t c):

mat(new T[r*c]),

rows(r),

cols(c)

{

for(auto i = 0; i < r*c; ++i)

{

mat[i] = 0;

}

}



MatrixArr2D<T>(const MatrixArr2D<T> & m):

mat(new T[m.rows * m.cols]),

rows(m.rows),

cols(m.cols)

{

for(auto i = 0; i < rows*cols; ++i)

{

mat[i] = m.mat[i];

}

}



~MatrixArr2D()

{

delete [] mat;

}



std::size_t getNumRows() const

{

return rows;

} std::size_t getNumCols() const

{

return cols;

} const T& at(std::size_t row, std::size_t col) const

{

return mat[rcToIdx(row, col)];

} T& at(std::size_t row, std::size_t col)

{

return mat[rcToIdx(row, col)];

}

};

#endif

The major difference between the two implementations are the underlying container. MatrixVec2D uses a vector from the standard library to store the data, where as the MatrixArr2D uses a dynamically allocated array. Since vectors are typically implemented with a dynamically allocated array, the vector implementation should act like another layer of indirection when performing an element access. By eliminating this layer of indirection and using a dynamically allocated array, it is expected that MatrixArr2D should be faster than MatrixVec2D for accesses.

The Testing Framework

When testing the two implementations, the test should be agnostic to the underlying implementation. The two ways to implement this are through an abstract base class that both derive from, or a templated function. In order to not worry about the costs of calling a virtual method, the templated function approach was chosen.

For testing, a multiplication function was written and can be seen below. This uses O(n³) accesses for square matrices, where n is the column length. The test used fills two matrices with random numbers using rand(), then multiplies the two.

template <template <class S> class T, typename S>

T<S> mult(const T<S>& A, const T<S>& B)

{

assert(A.getNumCols() == B.getNumRows());

T<S> result(A.getNumRows(), B.getNumCols());

for(std::size_t r = 0; r < result.getNumRows(); ++r)

{

for(std::size_t c = 0; c < result.getNumCols(); ++c)

{

S sum = 0;

for(std::size_t m = 0; m < A.getNumCols(); ++m)

{

sum += A.at(r,m) * B.at(m,c);

}

result.at(r,c) = sum;

}

}

return result;

}

Results

The results are summarized in the graphs at the end, and can be seen in the table below.

Table summarizing the results of the programs for n=64

These results were what was predicted when no optimizations were applied, however at the higher optimization levels the difference between the two became negligable, and at points other than the end point, the vector implementation was faster than the array implementation. This indicates that at higher optimization levels (-O1 for GCC and -O2 for Clang) the compiler is able to in some way either optimize out the extra level of indirection the vector incurs.

Another interesting thing to note is how that for the entire test, clang’s -Ofast level of optimization performed worse than the -O3 optimization level. This reinforces the idea to profile and not always trust that more optimizations make the code faster.

Conclusion

Since there was no meaningful difference between an array backed matrix and a vector backed matrix at higher optimization levels, there is little benefit to using a raw array for this implementation. The vector provided by the standard library is less prone to memory related errors such as leaks than a raw pointer is. Additionally, the vector class provides access to many of the standard library algorithms that may be useful in expanding the matrix class.

If you see anything you disagree with or feel could be improved, feel free to send me a message!

Appendix: Testing Hardware

These tests were performed on a Microsoft surface book using the linux-windows subsystem. GCC version 6.2 for ubuntu 14.04. Clang version 3.8.0. Intel i5–6300U @ 2.40GHz 2.50GHz, 8GB of RAM

Appendix: Graphs of results