A matrix is a rectangular array of numbers. That's the usual mathematical definition. Something like this: \(A=\begin{bmatrix}1&1&1&2\\2&3&1&3\\1&-1&-2&-6\end{bmatrix}\).

When we have lots of data, or want top performance, we have to be aware of (at least) three major issues:

Are we wasting space by storing unnecessary elements (usually zeros)?

Does the way we store the data optimally matches the way we access (read/write) and compute data?

Are we using the right algorithm for the data structure at hand?

Let's say that the data is dense, so we have to store all entries. Why the order we store the elements is important?

Most programming languages have means to construct and manipulate 2-d arrays. Code such as a[i][j] is a straightforward way to access the element (number) at i-th row and j-th column of a Java (or C) array of arrays. Straightforward, and good enough for small or medium matrices. A good way to waste performance though!

The catch here is that, although a programming language might provide a fine abstraction for 2-dimensional (or even N-dimensional) arrays, typical computer memory holds data only in one dimension. If we are careful, the data will be stored in consecutive locations, if not - the chunks will be scattered all over! This is a big problem because memory access times are highly variable, by orders of magnitude. When a modern processor needs to access some data, it is best if the data is in registers. The next best is the first level of cache, which is slower. If it's not there, then from the second level, still much slower, and so on. Once the data is fetched to the faster, but smaller, level closer to the processor, the data that was in that place is kicked out, and the data that was near the data that was fetched is also brought in, in the hope that there is increased chance that it will be asked for next. If we access the numbers that are near each other in memory, we will have more cache hits (good) than cache misses (bad).

Take a look at how Java handles access to 2-d arrays: a[343][4098] . You might think that this gets you direct access to the element at index \(343,4098\) but it does not. Java arrays are always one-dimensional. In the case of 2-d arrays, the first-level array actually holds references to m object arrays, each holding the actual numbers, n of them. a[343][4098] fetches 343-rd reference from the first array, and then it's 4098-th element. This is a huge problem because those second level arrays will be scattered through memory.

What I want, is to lay \(A\) in memory in a way that is efficient, but still easy to work with. Obviously, those two dimensions have to be projected into one. But how? Should it be

\(\begin{bmatrix}1&1&1&2&|&2&3&1&3&|&1&-1&-2&-6\end{bmatrix}\),

or \(\begin{bmatrix}1&2&1&|&1&3&-1&|&1&1&-2&|&2&3&-6\end{bmatrix}\)?

The answer is: it depends on how your algorithm accesses it most often.

Neanderthal gives you both options. When you create any kind of matrix, you can specify whether you want it to be column-oriented ( :column , which is the default), or row-oriented ( :row ). In the following example, we will use CPU matrices from the native namespace. The same options also work for functions that create GPU CUDA matrices (cuda namespace), or OpenCL's GPU and CPU matrices (opencl namespace).

( dge 3 2 [ 1 2 3 4 5 6 ] )

#RealGEMatrix[double, mxn:3x2, layout:column, offset:0] ▥ ↓ ↓ ┓ → 1.00 4.00 → 2.00 5.00 → 3.00 6.00 ┗ ┛

This created a dense \(3\times{2}\) column-oriented matrix. Notice how the 1-d Clojure sequence that we used as data source has been read column-by-column.

The other option is row orientation:

( dge 3 2 [ 1 2 3 4 5 6 ] { :layout :row } )

#RealGEMatrix[double, mxn:3x2, layout:column, offset:0] ▥ ↓ ↓ ┓ → 1.00 4.00 → 2.00 5.00 → 3.00 6.00 ┗ ┛

In this case, the elements have been laid out row-by-row.

When printing the matrix out in the REPL, we can also see the information about its structure.