Interactive guide to homogeneous coordinates

Why would you care about homogeneous coordinates, whatever they are? Well, if you work with geometry: 3D-graphics , image processing, physical simulation, — the answer is obvious. Knowing the mathematics behind your framework enables you to write more efficient code.

But even if you don’t work with geometry at all, you still might enjoy learning about projective space and its link to linear algebra. It is a great example of mathematical alchemy: you pay with a small complication, you gain an enormous simplification in return.

I think, learning this particular piece of mathematics is a valuable experience in its own right. And you know how it works. More experience, higher level, better loot.

The complication

In a Cartesian coordinate system, a point on a plane is set by a pair of numbers (x c , y c ). Here is a plot you can choose a point on. Just click or tap anywhere you want.

In homogeneous coordinates, a point on a plane is set by a tuple of 3 numbers (x h , y h , w h ).

This is a bit unusual and it seems excessive since every Cartesian point can be obtained from the homogeneous tuple just like this:

x c = x h w h

y c = y h w h

To translate a point from Cartesian to homogeneous coordinates, you can simply say:

x h = x c , y h = y c , w h = 1

Or, if you're feeling adventurous, you can pick (almost) any value for w h . Then you just multiply your x and y by w and here you go!

Here is a coordinate translater. It will translate your point into homogeneous coordinates for (almost) every w h you propose.

Please enter w h : - + Transform

But it wouldn't work for all the possible numbers. There is one and only one exception.

Exactly! There is no transformation between homogeneous and Cartesian coordinates when w h is 0. Kudos for finding this by yourself!

Usually, Cartesian coordinates are just the first two of homogeneous coordinates divided by the third. So when the third one is 1, homogeneous coordinates are the same as Cartesian.

The smaller w h gets, the further the point in Cartesian coordinates “travels” from the null. You can slide the point along its axis on the plot. Note that the first two coordinates remain intact all the time, you make the point slide only by altering the w h .

That’s all rather simple until one moment. What if the third coordinate is 0 after all?

Intuition tells that the point with its w h = 0 should be further from the beginning of the coordinates than every other point with w h ≠ 0. All the points in the Euclidean space have w h ≠ 0, so this point should be somewhere other than in Euclidean space.

And that's when it gets fascinating. Homogeneous coordinates denote points not only in Euclidean (or, more generally, affine space) but in the projective space that includes and expands the affine one. There is more geometry that fits in our cozy Cartesian system. There is the Euclidean space, and there is also an infinite number of points that are infinitely far from it.

You can imagine a point from this projective extension as a direction and not a specific point in space. A ray that starts at null and has no length, no end, only the direction.

This representation is often used in 3D graphics. With homogeneous coordinates, we can compose a 3D-scene so that every object that can be possibly reached, like a house, a tree, or a cat, remains in the affine space with the coordinates like (x, y, z, 1). All the objects that can not be reached by design, like the moon in a racing simulator, go to the projective extension with the coordinates like (x, y, z, 0). Both types of objects share the same space.

Living in a projective space gives you the benefit of being unreachable if you desire so. But that’s not all it is good for. In fact, we are only starting to get into the benefits.

1. Central and parallel projections are the same

There are two kinds of projections in Euclidean space: central and parallel. The central projection is what makes the perspective view, so things closer to a viewer seem bigger. That’s what we use in video games to render a 3D scene into a flat picture on a screen. The parallel projection preserves proportions. That’s what we usually use in CAD systems to show bolts and nuts on technical drawings so the engineers would see that the equally large details are indeed equally large regardless of the point of view.

In projective space, these two projections are the same.

In affine space, you can set a center for a central projection very-very far away from the scene you want to render. This will make disproportion very small. But in projective space, you can hurl a center infinitely far — further away than any point in affine space at all, and the disproportion will disappear completely.

So bear in mind, if you want to make a game about zombies who happen to be CAD engineers, you don’t have to implement both kinds of projections. The central projection should be enough. Just set the central point to (x, y, z, 0), and this will automatically turn it into the parallel projection with no additional programming.

2. All the quadric surfaces are the same

I remember my first year in college. We were studying quadric surfaces and one of the exercises was to make an album with all of them. 17 sheets of paper with different graphics and formulas all drawn by hand. The main purpose of this album was to be briefly examined by the professor and thrown away a day after. What a waste!

Now in projective space, this exercise would have been much more environmentally friendly. In homogeneous coordinates, all the algebraic surfaces are homogeneous too. This means that every piece of a polynomial that defines the surface has the same degree. It may contain different variables with different degrees of their own, but they all magically add up to the very same degree for every element in the sum.

And this means only one drawing with one formula to be drawn and thrown away instead of 17. That should sum up to a couple of dead trees over the years.

3. All projective transformations are matrices

Geometric transformations are something that happens to points. They are functions (x', y') = f(x, y). If you want to apply a transformation to some object, most of the time you would have to represent it with points and then apply a transformation to each and every one of them.

This may get computationally heavy. For instance, transforming a picture 3 000 × 4 000 pixels requires 12 000 000 transformations. And transforming a 512 × 512 × 1024 3D-image requires 268 435 456 transformations. If we want to see holographic television anytime soon, we should learn to do these transformations really-really fast.

Some of the most common transformations are called translation;

rotation;

and scale.

They are generalized by the affine transformation that can do translations, and rotations, and scales simultaneously:

The affine transformation is quite powerful but it has its noticeable constraint. It preserves parallelism, which is in a way limiting. If, for instance, you want to show some perspective, you should do a projective transformation that looks like this:

Formulas in Cartesian coordinates

The formula for projective transformations in Cartesian coordinates is:

x' = A x + B y + C a x + b y +c

y' = D x + E y + F a x + b y +c

It is a simple geometric transformation just like all the others we have seen before. It preserves the degree of curves and surfaces so every straight line gets transformed into a straight line and in 3D, each plane into a plane. Since all the second-degree surfaces are the same surface, and it also preserves the degree but not the classification from the affine space. An ellipsoid may become a paraboloid or a hyperboloid.

It also generalizes the affine transformations that have a simpler formula:

x' = A x + B y + C

y' = D x + E y + F

It's a special case of projective transformation for a = 0, b = 0, and c = 1.

And the affine transformations, in its turn, generalize translations, rotations, and scales. A translation is:

x' = x + C (A = 1, B = 0)

y' = y + F (D = 0, E = 1)

A rotation is:

x' = sin(r) x + cos(r) y (A = sin(r), B = cos(r), C = 0)

y' = cos(r) x - sin(r) y (D = cos(r), E = -sin(r), F = 0)

And a scale is:

x' = A x (B = 0, C = 0)

y' = E y (D = 0, F = 0)

They are all special cases of projective transformations.

The matrix multiplication in homogeneous coordinates

Let's multiply a square matrix by a point in homogeneous coordinates.

A

D

a B

E

b C

F

c x

y

w = Ax + By + Cw

Dx + Ey + Fw

ax + by + cw

If our point comes from the Cartesian coordinates then w h = 1. Now we see that:

x' = A x + B y + C

y' = D x + E y + F

w' = a x + b y + c

To get back to the Cartesian coordinates, let's make our w' = 1. We can do this by dividing everything by w'.

x' = (A x + B y + C) / (a x + b y + c)

y' = (D x + E y + F) / (a x + b y + c)

w' = 1

Doesn't it look familiar? Well, of course, it does! It's a projective transformation. Or, with the specific coefficients, it could even be an affine one. Or it could be a translation, or a rotation, or a scale. Every one of these transformations can be conducted by mere matrix multiplication.

It gets better. Matrices are composable. You can compose your own transformation like: translation, and then rotation, and another translation, and then scale, and projection, — and it will still be a single matrix multiplication!

Translation Rotation Scale Projection



1

0

0 0

1

0 0

0

1

Revert to E

This is particularly important because whatever you do: animation, image processing, physics simulation, you always want to do as little computation as possible. Composability allows you to squeeze a series of transformations into a single matrix multiplication which is in turn very super-scalar friendly. With matrices, not only you do fewer calculations, you also benefit from vectorization both on CPU and GPU so you do them faster. Ultra-fast transformations everywhere!

Conclusion

Pragmatically, you can save a lot of processor time transforming points with one single matrix multiplication instead of applying all the transformations separately. And you can also write fewer lines of code by exploiting the common nature of all the transformations. But this is not the whole point yet.

Usually, we lose performance not because of some small computational inefficiencies but because of all the code that shouldn't be there in the first place. One time I made a piece of code run 200 times faster by simply replacing a transformation provided by the framework with a simple matrix multiplication on the spot. The original transformation was designed like this:

Drawing.Drawing2D.Matrix.TransformPoints: Drawing.SafeNativeMethods.Gdip.ConvertPointToMemory, Drawing.SafeNativeMethods.Gdip.ConvertGPPOINTFArrayF: Drawing.UnsafeNativeMethods.PtrToStructure: Drawing.Internal.GPPOINTF..ctor, RuntimeType.CreateInstanceSlow: Runtime.InteropServices.Marshal.PtrToStructure.

Conversions between identical structures, copying data with no good reason, constructors that do no essential work, — all this needless pseudo-computation costs time but gives nothing of value in return. The good news is, it's avoidable. It can only occur when programmers don’t understand and don’t trust the beauty of plain mathematics.

I hope this page reveals some of it. I hope this page makes plain mathematics a little more trustworthy.