Matrices used to be black magic to me. This year, I started working through every page and every problem in Introduction to Linear Algbera by Strang. A book which I highly recommend. It made me see the bigger picture of matrices and Linear Algebra. Having learned alot, now I am convinced that Morpheus was right. “Nobody can be told what the matrix is, you have to see it for yourself.” No wonder I had such a hard time following matrices before, there are way more to matrices than their multiplication rules.
Having learned so much, I am also convinced that, specially for tech art, you can use matrices with a sense of direction, without needing to spend hundard of hours mastering linear Algebra (although I strongly recommend that). There is something there between the two extremes of being lost every time you see MVP in vertex shader, and manipulating n dimensional spaces like a god. In graphics, we mostly transform three dimensional vectors using 3x3 matrices (let’s ignore homogeneous coordinates for now). We can intuitively think about 3 dimensional spaces and which properties they have, because it matches the world we live in. Hence we can view matrices with regards to the space spanned by its columns (more on that below). This is not that easy to do, if you are dealing with a 100 in 100 matrix, which is not that uncommon in other fields. Another advantage of graphics is, we mostly deal with square matrices with linearly Independent columns, that makes it easy again to view them just in terms of their column space and not worry about the other subspaces of the matrix such as its left null space.
If the above reasoning didn’t make sense to you, don’t worry. This is a very practical post with code, I just wanted to say this way of looking at matrices is just one, partially incomplete way, which works nice in certain scenarios, but becomes incomplete or unintuitive in other scenarios.
All matrices showed here are on github. They are implemented with coordinate system visualization. I also tried to set up the environment in a way, so that it is easy to try out new matrices and switching between them. It is automatically animated between the matrix you wish and the identity matrix, to see what transformation took place. The implementation is in Unity, but the ideas have no technology dependency. Here is the code: https://github.com/IRCSS/Transformation-Matrices-Cheat-Sheet
And the video with all transformations. I strongly recommend for you to watch this, helps with the understanding, and the pictures below are static.
Think of Matrices as Spaces
If you have a vector, it spans a space. For just a vector, this space is a line. How do you know that it does? You can get every point in that space just using the vector, and a scalar (a number) you multiply it with. For example the global right vector (1, 0, 0). You could get every point on that line by multiplying a number with that vector. For example the point (2, 0, 0) is 2*(1, 0, 0).
In this case the vector (1, 0, 0) is the basis for the space it spans. There are unlimited number of basis you could chose for your space. What is important, is that you just stick to what you chose. Another basis could have also been (0.5, 0, 0).
What if you have more than one vector? Two vectors span a plane. Three, the 3d space. But this is the case, only if all of the vectors are linearly Independent. That is a fancy way of saying that they are not parallel. Going back to the (1, 0, 0) example. We add a second vector as a basis, lets say (0.5, 0, 0), which is parallel to (1, 0, 0). Both vectors span the same space, so having both, you have not gained anything new to extend your space with.
Now back to matrices. Here is a typical matrix:
I am not going to go over the multiplication rules, because you can either look it up yourself, or let your engine/ library of choice handle it. One thing I want to point out is that this matrix is made out of several vectors itself, which are the columns of the matrix.
These three vector also span a space. This space is called the Column Space of the Matrix. Here comes the part you definitely need to understand to use matrices in tech art. If you multiply a vector with a matrix, you get a vector. This vector has somehow changed, it has been transformed. The new vector, has been transformed to the column space of the matrix you multiplied it with. The column space of the matrix has acted on this vector. Call it what you want, but what does it mean? Well that depends on your frame of reference.
Frame of reference is important. For us at least. Numerically is all the same. If our frame of observation remains the same, the multiplication causes the vector to adopt the property of the column space of the matrix we used. For example if the space is stretched twice in an axis, the vector will also be stretched the same way. But if we also move our frame of reference, so that we are viewing the vector from the column space of the matrix, then the new vector doesn’t change from our perspective, it has only been expressed a new using the column space of that matrix as basis. For example, my global up, right and forward coordinate system is nice and all, but what if I wanted to know where the ball in the room is in regard to your point of view? We could use matrices to get the new coordinate of the ball in relation to you, though the ball stays the same.
So summary, multiplying with a vector (or series of them which make up a mesh) with a matrix causes a transformation to the column space of that matrix. Either the vector changes or your frame of reference (the world) does.
Now lets get down to matrices. I will name the simple matrices and what type of space they span, using these you can pretty much make the majority of your needed matrices yourself. To visualize I will transform a mesh with the matrices we name.
Identity Matrix is the basis for everything. The space its columns span is simply your world coordinate, your standard cartesian coordinate, your forward, up right. Name it what you want. This matrix is important, because by understanding it and comparing other matrices with it, you know how the space of that matrix is different compared to your standard space. The same way that by comparing the number 5 to number 1, you know 5 is 5 times bigger than 1.
Multiplying a vector with this matrix just returns the vector, it doesn’t do anything
Uniform Scaled Matrix
If you change numbers on the main diagonal of the matrix, you are scaling the x, y, z component of the vector you multiply with. In other words the space the columns span, is one where the axis are scaled compared to the Identity column space. In case of a mesh you are scaling the mesh. For example lets take 0.5 for all entries of the diagonal. Since we are changing all the diagonal entries the same, the mesh will be scaled uniformly.
Axis Scaled Matrix
What if you only change one of the entries? Or change them independently? In that case you are scaling not uniformly in all axis, but scaling only along one axis. For example, lets take the following matrix. The column space of this matrix is scaled in x axis 3 times compared to the identity matrix. You can see that well if you compare the x axis in the picture (red arrow) compared to the other two.
You can also reflect or mirror your vector around an axis. This matrix’s column space is mirrored in a certain way compared to the identity. To do this you need to multiply an entry of the diagonal with minus.
For example, lets flip the mesh in the x axis.
Matrix with Normalized Basis
So far we have only been modifying the diagonal and leaving everything else zero. Having all other entries as zero means, that when calculating the new entries of the transformed vector, the x component does not contribute to the calculation of the y or the z, or otherwise. If we start changing other entries, all sort of things will happen in one go. So I am going to break it further in different scenarios.
In the first scenario, we change other entries too, but each column of the matrix is still a normalized vector.
Lets try a matrix like that. For example:
What we see is a skewing of the y axis. The only other entries we are changing except the diagonals is the second entry of the first column, or in other words, the first entry of the second row. The second row contributes to the calculation of the y value. What the column space of that matrix is, is one where the y axis of the space is scaled as a linear function of its x axis. Lets assume the first entry of the second row was 1. That means at x=0 the y value matches the identity column space. At x=1 the y axis is the y entry of identity plus 1. At x=2 plus 2 and so on. So point (0,1) will be transformed to (0,1), point (1,1) to (1,2), point (2,1) to (1,3) and so on.
There is nothing unique about the normalized basis as far as I can tell. You would have similar effect even if it wasn’t. Hopefully someone will correct me there if I am wrong.
Matrix with Orthogonal Basis
So what if the basis are not normalized but are orthogonal with respect to eachother?
Something interesting happens. Lets approach this again the same way. What properties does the column space of this matrix has? Since the new basis are orthogonal, meaning they have a 90 degree angle between eachother, one axis can not possibly contribute to the effect the other has in expressing a point in the space they span. This means no skewing could happen. The only difference between them and an identity matrix is that the axis are pointing somewhere else, and have different length. Since skewing is out, we are left with only scaling, reflection and rotation. These are the possible transformation which a matrix with orthogonal basis can do.
In the image below, notice how the axis are orthogonal, and how the mesh is not skewed, just rotated and scaled.
The column space of the matrix above was the identity matrix rotated and scaled, if we normalize the basis, we would take out the scaling. What we are left with is then just the rotation. This matrix should have been called an orthonormal matrix, but since mathematicians like pain and confusion, they named this matrix the orthogonal matrix.
So lets just normalize the matrix above. And we get our first rotation matrix. The column space of this matrix is rotated with respect to identity’s column space.
So we already derived a rotation matrix above, but there are also formulas for this, which let you rotate around x, y or z. Here are the formulas:
Lets take an example, we rotate around x axis 45 degrees.
What about Translation
You might have noticed that translation is missing. Well what we have been doing so far is a so called Linear Transformation. This group of transformations have a series of neat properties, but the relevant one is that they keep the zero at zero. None of the above matrices would move a vector (0,0,0) to be anything beside what it is. This obviously makes it impossible to have a translation, because every translation would break this rule.
There is another type of transformations called Affine Transformation. Every linear transformation is an affine transformation, but not the other way around. Affine transformations also have neat properties such as keeping parallel lines parallel and such, but interesting is that they allow an addition after the linear transformation by a vector. Below M is a linear transformation matrix. Mv is the result of that transformation which is offset by vector b. We can have our translation there.
There is a big problem though. Matrices have this very useful property. If you multiply Matrix A with Matrix B to get Matrix C. Multiplying a vector with C is the same as multiplying it with B then with A. Why would you want this? Because lets say you want to transform a million points (for example a mesh with a million vertices), instead of doing A*B*VertexPosition a million times, you can do A*B = C once and C*VertexPosition a million times.
As a matter of fact render engines already do this. Let’s assume for a second that we somehow found a matrix T which does translation. Then you can build a matrix which takes your mesh from object space directly to screen space in one multiplication. (I am simplifying the picture here, ignoring NDC, clipping and perspective divide for the sake of explanation, also the order of T, S and R is made up, I don’t know what the convention is) This Matrix is the famous MVP matrix. Model, View and Projection.
People are smart. They realized you can encode translation in your matrix multiplication, if you use a higher dimensional matrix than you need. Instead of a 3x3 matrix you would use a 4x4 matrix to transform a 3 dimensional point. You can’t multiply a 4x4 matrix with a 3 dimensional vector though. To fix this you convert your cartesian coordinates to a 4 dimensional homogeneous coordinates. You can always switch between the two coordinate systems by following simple rules. For example the new component you add to your matrix, the w component, should be 1 if you wish to transform a point, and 0 if you wish to transform a direction (this practically turns off translation, since directions are unbound vectors and don’t have a fix origin). Another one is once you are done with your matrix multiplication, divide your xyz components of the vector with w component to normalize the coordinates back to Cartesian. In rendering this is typically referred to as perspective divide, since this is where the effect of perspective is applied. The frustum is skewed from a homogeneous unit cube, which is convenient for clipping, back to a pyramid.
Enough explanation, how does this matrix look? Your translation goes in the fourth column. The inner matrix, the 3x3 with entries marked as a, these can be any of the above mentioned transformations.
Lets try translating along x axis by 6 unit. Moving the mesh 6 units to the right.
These feel like magic. Though they are not. They do the exact same thing as what we have discussed so far. They transform the point in a new space, with a very specific set of criterias. This is a non affine transformation since the xyz component are scaled as a function of their distance from the near plane of the camera.
Going in depth is beyond this article though, since they are deeply inter connected with rendering and how the camera space is set up. With clipping and z buffer. I tried visualizing it, but the best I got was projecting a 3d shape on a 2D plane. This you can find in the code under Projection.
I highly recommend reading Scratch Pixels entry on projection matrix. One thing I would say though, is if you have been following along this article, you will already notice some of the stuff which the matrix does. Example below is the standard OpenGl projection matrix. Notice how the diagonals are used to scale the objects as a function of the left, right, top and bottom plane. This is for example to simulate field of view of a real camera, by scaling the world to fit more in the frustum.
Some Extra Details
Here are some random stuff I would like to add on top.
Arbitrary axis: if you want to do any of the above transformations, but not with respect to origin and around the standard axis, you need to first transform the point so that your desired point is on the origin and your desired axis aligned with one of the standard ones. I will do another blog post for this later with code and example.
Row or Column Major: Remember how I said people like pain? There are two conventions in the matrix world. As far as math is concerned, it doesn’t really matter which one you take, people go with the one they learned. You can easily switch between them. Some libraries and APIs give you the option to use which you want. As far as computer science is concerned, whatever you take, it should make sense with regards to the memory layout of how your data is saved, its access patterns and cache coherency.
Transpose: Might have seen this often too. It flips the matrix, columns becomes rows. Of course I could write a book on this, but I would be rewritting what I read too. So if you want to know more, go and read Introduction to Linear Algebra.
Inverse: It lets you undo whatever your matrix transformation did, it does the opposite basically.
Determinant: … This is probably the best time to just go and buy a linear algebera book, I am going to stop here.
Thanks for reading, you can follow me on my twitter, IRCSS
- Link to my favourit Linear Algebera book; http://math.mit.edu/~gs/linearalgebra/
- A fantastic series on YouTube which explains the matrix transformations: https://youtu.be/XkY2DOUCWMU
- Affine Transformations: https://en.wikipedia.org/wiki/Affine_transformation
- An amazing article on projection matrices from Scratch Pixel: https://www.scratchapixel.com/lessons/3d-basic-rendering/perspective-and-orthographic-projection-matrix