Perspective Projections: Beyond 3D

Summary The maths behind Topologic - https://dee.pe/r - or, how to take the common 3D projections used in OpenGL waaay to the next level. Or dimension.

I've been pushing this one off for years now. Not because I find it boring or hard to express, but rather because I spent most of my time actually fiddling with Topologic and its WebGL frontend, which uses the formulae you're about to see.

So what is this article about? Well, you may recall how your computer can do 3D graphics, even though your display is only 2D. The way that works is that your graphics card can do certain types of matrix and vector maths that turn points in 3D space into points in 2D space, and some additional bits that draw triangles. These operations, specifically, are called projections - and in the case of video games, they tend to be perspective projections in particular.

Going further, these projections are not actually limited to 3-space. All they do is remove a dimension from your source vectors. And what OpenGL in particular does is easily extended to projecting from n-d to (n-1)-d. This allows us to, kind of, "see" space as though it was four-dimensional. Or five-dimensional, etc. We simply need to chain these projections and find a way of coming up with projection matrices in arbitrary dimensions.

This article is exactly about that: coming up with the matrices needed for the projections, and how to apply them to vectors. It'll get kind of math-y, so... you've been warned. In case you need to refresh your memory, have a look at my previous article on Homogeneous Coordinates, and the one on Normal Vectors in Higher Dimensional Spaces. I'm also assuming you still remember your standard vector and matrix maths from linear algebra 101.

Notes and Notation

This is by no means a definitive source for this kind of math. In general, do not trust a computer scientist to do "real" maths. Rather, it describes the particular implementation I derived for Topologic, my higher-dimensional geometric primitive and fractal visualiser. There's a live demo of that, if these things interest you.

Some pointers on the notation and conventions used throughout the article:

vector

Vectors are written in bold.

vector c

A vector with a superscript c on the left hand side of a definition denotes an ordered list of c vectors. On the right hand side the c is used to select a particular vector.

All indices start at 0 and extend to n-1. Vectors are supposed to have a dimension of n, and conversions between the "normal" and homogeneous versions are implicit if they should occur and aren't dealt with specifically. To convert a non-homogeneous n-D vector to the homogeneous equivalent, add a new coordinate of 1. To convert back, divide by the last coordinate and drop it.

The same applies to matrices. To extend an n*n matrix to its homogeneous equivalent, add a new row and a new column at the end and set all cells to 0, except for the very last one which needs to be 1. All matrices are square. The matrices can usually be transposed if you also transpose input vectors, but the article should be self-consistent.

V S T

A set of vectors like this on the right hand side of a definition describes a 3x3, non-homogeneous matrix where the columns are described by the vectors V, S and T, respectively. Columns are also counted starting at 0 to n-1. Higher-dimensional equivalents of this appear as well.

Now, without further ado, let's get revisitin'!

Revision: Getting from 3D to 2D

How do we get from a 3D vector to a 2D vector? There's a few transformations that you want to run on your vector in sequence. All but the last of the lot are affine transformations, so you can combine all the matrices for them into one - a nice property of vector/matrix math that is one of the reasons we're using matrices for just about everything. The other nice thing about matrices is that we can prepare one matrix to use for a lot of vectors in the same scene.

Translations

The first type of matrix we need is a simple translation matrix. The reason for this is simple: when we want to render a 3D scene, we're always looking at some point from some point. Unless the point we're looking from is at the origin, we need to move our "camera". This is also the reason we use affine transformations instead of linear ones, which means the matrices to manipulate things in 3D are 4x4. I've already pointed out how those matrices work in the article on homogeneous coordinates, but it doesn't hurt to repeat it again here. To translate a vector by another vector t, we apply this matrix:

translate 3 t :=

1 0 0 t 0 0 1 0 t 1 0 0 1 t 2 0 0 0 1

... so, business as usual with homogeneous coordinates and affine transformations. If your input vector were already homogeneous, you'd replace the last cell in the matrix with the fourth coordinate of t. Nothing unusual here.

Looking at Things

Next we need is a look-at matrix. This transformation will do the equivalent of rotating the "camera" so that it's looking at the target point. Without this, the point we're looking at could be behind the camera - and that would be boring. This matrix is surprisingly the most complicated to construct, because the formula for it is somewhat fuzzy - at least in the general case. We need three vectors as input: the to point is where we're looking at, the from point is where our camera is, and the up vector is used to orient the camera. up is where the hair on your head is when you tilt it.

The look-at matrix is a rotation matrix. This means that not only is it affine, it is also linear, as rotations are linear transformations. This, in turn, means that we do not need to use homogeneous coordinates, and in 3D we can use a simple 3x3 matrix instead of a 4x4 matrix. It will still be useful to turn it into a 4x4 matrix later, so we can merge it in with the other matrices. In the 3D case, we make use of the cross product - denoted by the ⨯ symbol - to calculate this matrix:

look-at 3 to from up :=

column 1 ⨯ column 2 up ⨯ column 2 to - from

The columns in this matrix are self-referencing; you start with the last column and fill the difference between the to and from vectors. Then you move to the middle column and fill in the cross product of the up vector with the last column you just generated. Then finally you fill in the first column with the cross product of the last two. This approach seems odd at first, but it has two advantages: we don't need any trigonometric functions to calculate the matrix, and this approach actually scales to higher dimensions - which we'll see in the next part of the article.

Resolving this into the actual matrix would be pretty hard to write down in mathanese, so I'll skip this - I tried, but the resulting matrix didn't fit on the screen, which defeated the goal of making it more readable by writing it out. The row-vector-of-column-vectors-form should be explicit enough, though.

The Perspective Projection Matrix

Next - and finally - we need the perspective projection matrix. You'll find this in the documentation for OpenGL and it's kind of become the standard way of doing this. The matrix looks like this:

perspective 3 eye-angle aspect near far :=

1 tan eye-angle 2 aspect 0 0 0 0 1 tan eye-angle 2 0 0 0 0 near + far near - far -1 0 0 -2 × near × far near - far 0

This matrix is better described elsewhere - for instance the OpenGL man pages - so I'll only glance over it briefly. In a nutshell this matrix moves the vertices around so that the trapezoid area in front of the camera, between the near and far cutoff distances and widening along the eye angle, end up as a cubic area in front of the camera. The final transform to move things that are farther away closer to the centre is accomplished by a division with the distance coordinate; more on that in a second. We can still treat this matrix as affine for the purpose of creating a merged matrix with all the transforms we need, which we do like this:

view-matrix 3 to from up eye-angle aspect near far :=

translate 3 ( - from ) × look-at 3 to from up × perspective 3 eye-angle aspect near far

The look-at matrix would implicitly have been extended to a 4x4 matrix for this formula. Remember that matrix multiplications are not commutative, so the order is important.

Projecting Vectors

Once you've created your matrix M, you'll want to use it to transform 3D vectors to 2D vectors. To do so, you first need to extend the 3D vector to be homogeneous - by adding a fourth coordinate that is simply set to 1 - then we multiply that vector with the matrix we got, divide the resulting 4-vector by the fourth coordinate - i.e. normalise the homogeneous 3D vector - drop that last coordinate, and then finally divide the first two remaining coordinates by the remaining third. In mathanese:

normalise-reduce V :=

V 0 V n - 1 V 1 V n - 1 ... V n - 2 V n - 1

project 3 V M :=

normalise-reduce ( normalise-reduce ( V 0 V 1 V 2 1 × M ) )

normalise-reduce is a helper function that takes a vector of n dimensions, divides every coordinate by the last one and then drops that last one. The result is an n-1 dimensional vector. This is the operation that is performed to turn a homogeneous vector into a "normal" one. Since we're interested in perspective projections, this is also the way we need to "cut off" the third coordinate of our 3D vector.

For the sake of completeness, we could also apply normalise-reduce once and simply drop the last coordinate instead of applying normalise-reduce twice. The result would be closer to a parallel projection. The reason for this is that the matrix we constructed earlier moves everything in front of the camera. The final, third coordinate after the transform represents the distance from the camera. By dividing by this coordinate, things that are farther away from the camera are moved towards the centre.

You can use the same matrix for as many vectors projected by the same camera as you like. To draw triangles and the like, you would usually draw the triangles in 2D after projecting all the component vectors. This is something your graphics card does for you, however, and they've become increasingly efficient at it.

Interestingly, none of the things we did here are particularly specific to 3D. Which means we can easily extend this general concept to higher dimensions...

Getting from 4D (or higher) to 3D

So, how do we extend this? The first thing to realise is that a projection will only "shave off" one dimension. If you have a 4D model, then by doing a perspective - or parallel, or similar - projection will simply land you a 3D model. But that's OK. You just take that 3D model, do another projection and you get something in 2D to put on your screen.

A corollary of this is that for your projections you will have separate camera locations for each of your projections. That means you have a separate set of to and from vectors in 3D, 4D, 5D, etc. It would in theory be possible to merge all the transforms into one, but that makes it a lot harder to understand, so we'll only do the easy variant here, with separate sets of cameras.

On the bright side, this is also closer to how a 4D (or higher) eye really would be working. A hypothetical 4D eye would "see" all sides of a 3D object at the same time, and moving it in 4D would create a whole new 3D space - just like moving our 3D eyes create completely new slices through 2D space whenever we move them. Since our eyes cannot see all the sides of a 3D object at the same time, we would need a way to look at different parts of the created 3D space. Flatland has kind of an olden but golden take on this.

Translations

So, on to creating those projective matrices for a 4D-to-3D projection. Just like last time, we need to be able to have affine transformations, to step away from the scene. These work exactly the same way as in 3D in any kind of dimension. Instead of a 4x4 matrix in 3D, we now have a 5x5 matrix in 4D - or an (n+1)x(n+1) matrix in n-D.

translate 4 t :=

1 0 0 0 t 0 0 1 0 0 t 1 0 0 1 0 t 2 0 0 0 1 t 3 0 0 0 0 1

It's immediately obvious how this translates to even higher dimensions:

translate n t :=

1 0 ... 0 t 0 0 1 ... 0 t 1 ... ... ... ... ... 0 0 ... 1 t n - 1 0 0 ... 0 1

In a nutshell, we just create the right size of identity matrix - i.e. all ones in the diagonal - and fill in the last column with the homogeneous vector we want to translate by. Easy as that.

Looking at Things - in Space!

On to the hard part. The look-at matrix is, again, the hardest part of the lot. Mostly because of us having to construct it in a somewhat odd way. This is analogous to the way we did it in 3D, but the explanation was also somewhat convoluted in that case.

Before we can construct this rotation matrix, we find there is one problem with the approach above: we used a cross product in the 3D case. There is no cross product in 4D, however. It turns out this is actually the single biggest problem in the whole process. Fortunately, I've previously described a solution to this in the article on normal vectors in higher dimensional spaces. It turns out we only used the cross product in the 3D case, because what we really wanted was a normal to a given set of vectors. A normal - in this case - is any vector that is orthogonal to all of a given set of other vectors.

In 3D, the cross product is the way of computing the normal of two vectors. To the point where the two terms are used completely interchangeably, even in some of the more scientific books on geometry. The reason we don't have a cross product in 4D is the other property of cross products: it's the product of two vectors. It's easy to see why we can't keep this constraint in 4D: if we try to find normals in 4D with only two vectors, the resulting set of normals is actually a whole 2D plane - as opposed to the 1D set of two potential vectors we get in 3D with two vectors. Just like in 2D we only need one vector to find an orthogonal vector. For this reason we need to use three 4D vectors to get our normal - introducing the following notation:

V 0 ⨯ V 1 ⨯ ... ⨯ V n - 2

We still use the cross product sign, but we use it to get the normal of n-1 vectors, for n being the dimension we care about. The previously mentioned article on normals covers how to calculate that.

Now that we covered this, let's see how we can actually calculate the matrix we need. As mentioned before, this is a rotation matrix, so in 4D we only need a 4x4 matrix - which we implicitly convert to a homogeneous 4D matrix at the size of 5x5 by filling the empty cells with 0 - except for the very last one which needs to be one. Same in 5D, where we calculate a 5x5 matrix, and scale it up to 6x6. The procedure in 4D goes like this:

look-at 4 to from up back :=

col 1 ⨯ col 2 ⨯ col 3 back ⨯ col 2 ⨯ col 3 up ⨯ back ⨯ col 3 to - from

Notice how we needed an additional base vector - back. In order to orient our 4D camera we need two vectors to pinpoint a plane. Think of the up vector in the 3D case as pinning one of the axes. The result in 3D is then obviously a plane. In 4D, if we only pinned one axis then we'd end up with a hyperplane. But we only want a 2-plane. So we use two vectors to pin that. And why do we want this to be a plane, you ask? Well, the reason for that is that want to have one axis along which there will be the depth of our projection. And in order to fix a single axis, we need to be looking from a 2-plane.

The algorithm for this is pretty much the same as for the 3D case. Fill in the last column with the difference between to and from. Then, starting from the second-to-last column, create the normal of the base vectors and the last column. For each subsequent vector to the left, you add the column you just calculated and "slide out" your set of base vectors, until in the very first column you just create the normal for all of the other columns. In even higher dimensions, a generalised description of this could be:

look-at n to from b n - 2 :=

c 1 ⨯ ... ⨯ c n - 1 ... b 1 ⨯ ... ⨯ b n - 3 ⨯ c n - 2 ⨯ c n - 1 b 0 ⨯ ... ⨯ b n - 3 ⨯ c n - 1 to - from

... yeah. This really is kind of hard to read. The textual description was probably clearer. Have a look at my generic C++ template implementation in libefgy for something a bit more concrete.

Update (2014-12-30): the previous formula had a minor glitch. Thanks to @langley_va on Twitter for finding this and pointing it out! :)

Aaaanyway, it is what it is and you've now successfully tackled the hardest part. On to the one thing that actually gets easier in higher dimensions.

The Perspective Projection Matrix

Much like in 3D, we need a perspective projection matrix. This is easier in 4D - or higher - because the aspect ratio correction and the near/far cutoff will be handled by the 3D-to-2D projections we'll have to do afterwards, anyway. This means we only need to take the eye angle into consideration, resulting in a much simpler 4D-specific matrix:

perspective 4 eye-angle :=

1 tan eye-angle 2 0 0 0 0 0 1 tan eye-angle 2 0 0 0 0 0 1 tan eye-angle 2 0 0 0 0 0 1 0 0 0 0 0 1

The basic thing to take away from this, is that you want to correct for eye angle in all of the dimensions but the last one. So in 3D you correct for it in the first three, which means only the last two cells on the diagonal are set to 1 - homogeneous 4D matrix and all. In higher dimensions this looks pretty much the same:

perspective n eye-angle :=

1 tan eye-angle 2 0 ... 0 0 0 1 tan eye-angle 2 ... 0 0 ... ... ... ... ... 0 0 ... 1 0 0 0 ... 0 1

Assembling the full view matrices is also quite the same as in 3D. In the 4D case we get:

view-matrix 4 to from up back eye-angle :=

translate 4 ( - from ) × look-at 4 to from up back × perspective 4 eye-angle

... and in the general case:

view-matrix n to from base n - 2 eye-angle :=

translate n ( - from ) × look-at n to from base n - 2 × perspective n eye-angle

Nothing special here, at all. We could almost stop here, but for completeness - and, since the whole point of writing this is to actually have a complete text on this online...

Projecting Vectors

We're still doing perspective projections, and the note on parallel projections instead of perspective ones from the above 3D case still applies. To project a vector, we first have to multiply it with the right view matrix - which we only need to calculate once for all vectors - then normalise the vector to be non-homogeneous, then divide and drop the last coordinate. Again, normalising and the projection part are the same normalise-reduce function. In the 4D case, we get:

project 4 V M :=

normalise-reduce ( normalise-reduce ( V 0 V 1 V 2 V 3 1 × M ) )

This is almost identical to the 3D function, except that we need a 4D input vector. In the general case, the function is as follows:

project n V M :=

normalise-reduce ( normalise-reduce ( V 0 V 1 V 2 ... V n - 1 1 × M ) )

And there you have it. That's how you do a perspective projection of vectors in arbitrary dimensions - and remember that you actually draw triangles in 2D, once you're done with all the projecting. So that's all you need to create arbitrary-dimensional perspective projections.

Sources

It's hard to name sources for this, because the 3D part is basically just describing basic linear algebra and things from the OpenGL manual; so for these parts...

Your favourite linear algebra book - for all the absolute basics

OpenGL: gluPerspective() - for the basic 3D perspective matrix

Steven Richard Hollasch's thesis "Four-Space Visualization of 4D Objects" - an excellent reference for 4D projections

... the generalised n-D projections were not based on others' work, as I could not find a decent reference anywhere on the internet. That said, I'm sure there's some good linear algebra textbooks that cover these parts. I came up with these particular 5D+ projections for these projects:

If you spot any particular issues in this article, please tell me so I can fix them. Thanks!

This article is part of a series on linear algebra.

Last Modified: 2014-12-30T13:00:00Z