View matrices: a straight answer

I've been following the (mostly) great modern OpenGL tutorial over at learnopengl.com. It's been very helpful so far in laying out just how to do OpenGL the modern way.

However, it is completely wrong about the view matrix. To save others from tearing their hair out over this (no thanks to the extremely unhelpful Wikipedia pages on the matter), I will explain it all here.

First of all, you need to understand what a view matrix is. It's a change of basis transformation, yes, but more precisely, it's a rotation and a translation. However, it's important to get the order right! Translation followed by rotation is NOT the same thing as rotation followed by translation. For example, consider a translation of (3, 0, 3). What does this do to the point (-3, 0, -3)? It moves it to (0, 0, 0), which means that any following rotation will have zero effect on that point! However, if you do the rotation first, then (assuming it's a non-trivial rotation) the point will cease to be (3, 0, 3), so the translation will no longer move it to (0, 0, 0).

So, how does one make a view matrix? Well, according to learnopengl.com, one multiplies a translation matrix by a rotation matrix. In other words, the rotation is applied first, then the translation. This is wrong! You want to reposition your origin first, then rotate. So, the correct multiplication is as follows:

D: camera direction vector (normalized)
R: camera right vector (normalized)
U: camera up vector (normalized)
P: camera position vector

L = | R_x R_y R_z 0 |   | 1 0 0 -P_x |
    | U_x U_y U_z 0 | * | 0 1 0 -P_y |
    | D_x D_y D_z 0 |   | 0 0 1 -P_z |
    | 0   0   0   1 |   | 0 0 0  1   |

In other words, learnopengl.com has it backwards. I suspect this error was not caught because they use the GLM library to generate their view matrix, and it takes care of all this for you. If you're trying to do it all yourself (either for educational reasons or because you aren't using C++), doing what they describe will lead to much wailing an gnashing of teeth.

Here's a quick way to check that their matrix is wrong: consider the case where the camera is at (3, 0, 3) and looking at (0, 0, 0). What is the position of (0, 0, 0) in camera space? If you do the rotation first, then you will get an answer of (-3, 0, -3), which is obviously wrong, since that point does not lie on the camera space's Z axis! This is because the rotation maps (0, 0, 0) to itself, so all you get is a translation. If you do the translation first, then you get (0, 0, -sqrt(18)), which is on the Z axis and the correct distance away (Pythagoreaon Theorem; it's negative because camera space has the positive Z axis point into the "lens").

I hope this clears things up. If you're using C++, I highly recommend using GLM to do it for you.