People must get taught math terribly if they think "I don't need to worry about piles of abstract math to understand a rotation, all I have to do is think about what happens to the XYZ axes under the matrix rotation". That is what you should learn in the math class!
Anyone who has taken linear algebra should know that (1) a rotation is a linear operation, (2) the result of a linear operation is calculated with matrix multiplication, (3) the result of a matrix multiplication is determined by what it does to the standard basis vectors, the results of which form the columns of the matrix.
This guy makes it sound like he had to come up with these concepts from scratch, and it's some sort of pure visual genius rather than math. But... it's just math.
A lot of people who find themselves having to deal with matrices when programming have never taken that class or learned those things (or did so such a long time ago that they've completely forgotten). I assume this is aimed at such people, and he's just reassuring them that he's not going to talk about the abstract aspects of linear algebra, which certainly exist.
I'd take issue with his "most programmers are visual thinkers", though. Maybe most graphics programmers are, but I doubt it's an overwhelming majority even there.
Math achievement correlates strongly with visuospatial reasoning. Programmers may not be as proficient in math as economists, but they are better at it than biologists or lawyers.
I would distinguish between visual imagination and visuospatial reasoning.
For people like myself with aphantasia, there are often problems solving strategies that can help you when you can’t visualize. Like draw a picture.
And lots of problems don’t really require as much visual imagination as you would think. I’m pretty good at math, programming, and economics. Not top tier, but pretty good.
If there are problems out there that you struggle with compared to others, then that’s the universe telling you that you don’t have a comparative advantage in it. Do something else and hire the people who can more easily solve them if you need it.
Do you have anything I can read about that? I'm definitely on the spectrum and have whatever the opposite of aphantasia is, I can see things very clearly in my head
This is interesting because, to me, programing is a deeply visual activity. It feels like wandering around in a world of forms until I find the structures I need and actually writing out the code is mostly a formality.
I have taken several linear algebra courses, one from my high school and two from universities. The thing is, not all courses of linear algebra will discuss rotations the way you discuss it. One reason is that sometimes a high school linear algebra course cannot assume students have learned trigonometry. I've seen teachers teach it just to solve larger linear systems of equations. Another reason is that sometimes a course will focus just on properties of vector spaces without relating them to geometry; after all who can visualize things when the course routinely deals with 10-dimensional vectors or N-dimensional ones where N isn't a constant.
When I was studying and made the mistake of choosing 3D computer graphics as a lecture, I remember some 4x4 matrix that was used for rotation, with all kinds of weird terms in it, derived only once, in a way I was not able to understand and that didn't relate to any visual idea or imagination, which makes it extra hard for me to understand it, because I rely a lot on visualization of everything. So basically, there was a "magical formula" to rotate things and I didn't memorize it. Exam came and demanded having memorized this shitty rotation matrix. Failed the exam, changed lectures. High quality lecturing.
Later in another lecture at another university, I had to rotate points around a center point again. This time found 3 3x3 matrices on wikipedia, one for each axis. Maybe making at least seemingly a little bit more sense, but I think I never got to the basis of that stuff. Never seen a good visual explanation of this stuff. I ended up implementing the 3 matrices multiplications and checked the 3D coordinates coming out of that in my head by visualizing and thinking hard about whether the coordinates could be correct.
I think visualization is the least of my problems. Most math teaching sucks though, and sometimes it is just the wrong format or not visualized at all, which makes it very hard to understand.
The first lecture was using a 4x4 matrix because you can use it for a more general set of transformations, including affine transforms (think: translating an object by moving it in a particular direction).
Since you can combine a series of matrix multiplications by just pre-multiplying the matrix, this sets you up for doing a very efficient "move, scale, rotate" of an object using a single matrix multiplication of that pre-calculated 4x4 matrix.
If you just want to, e.g., scale and rotate the object, a 3x3 matrix suffices. Sounds like your first lecture jumped way too fast to the "here's the fully general version of this", which is much harder for building intuition for.
Sorry you had a bad intro to this stuff. It's actually kinda cool when explained well. I think they probably should have started by showing how you can use a matrix for scaling:
[[2, 0, 0],
[0, 1.5, 0],
[0, 0, 1]]
for example, will grow an object by 2x in the x dimension, 1.5x in the y dimension, and keep it unchanged in the z dimension. (You'll note that it follows the pattern of the identity matrix). The derivation of the rotation matrix is probably best first derived for 2d; the wikipedia article has a decentish explanation:
The first time I learned it was from a book by LaMothe in the 90s and it starts with your demonstration of 3D matrix transforms, then goes "ha! gimbal lock" then shows 4D transforms and the extension to projection transforms, and from there you just have an abstraction of your world coordinate transform and your camera transform(s) and most everything else becomes vectors. I think it's probably the best way to teach it, with some 2D work leading into it as you suggest. It also sets up well for how most modern game dev platforms deal with coordinates.
You can do a rotation or some rotations but SO(3) is not simply connected.
It mostly works for rigid bodies centered on the origin, but gimbal lock or Dirac's Plate Trick are good counter example lenses. Twirling a baton or a lasso will show that 720 degrees is the invariant rotation in SO(3)
The point at infinity with a 4x4 matrix is one solution, SU(3), quaternions, or recently geometric product are other options with benefits at the cost of complexity.
I think you are confused about what 'simply connected' means. A 3x3 matrix can represent any rotation. Also from a given rotation there is a path through the space of rotations to any other rotation. It's just that some paths can't be smoothly mapped to some other paths.
In computer graphics, 4x4 matrices let you do a rotation and a translation together (among other things). There's the 3x3 rotation block you found later as well as a translation vector embedded in it. Multiplying a sequence of 4x4 matrices together accumulates the rotations and translations appropriately as if they were just a bunch of function applications. i.e. rotate(translate(point)) is just rotation_matrix * translation_matrix * point_vector if you construct your matrices properly. Multiplying a 4x4 matrix with another 4x4 matrix yields a 4x4 matrix result, which means that you can store an arbitrary chain of rotations and translations accumulated together into a single matrix...
Yeah you need to build up the understanding so that you can re-derive those matrices as needed (it's mostly just basic trigonometry). If you can't, that means a failure of your lecturer or a failure in your studying.
The mathematical term for the four by four matrices you were looking at is "quaternion" (I.e. you were looking at a set of four by four matrices isomorphic to the unit quaternions).
Why use quaternions at all, when three by three matrices can also represent rotations? Three by three matrices contain lots of redundant information beyond rotation, and multiplying quaternions requires fewer scalar additions and multiplications than multiplying three by three matrices. So it is cheaper to compose rotations. It also avoids singularities (gimbal lock).
Honestly, many math teachers are kinda bad at conveying all that.
When everything clicked a few years down the line it all became so simple.
Like you mention "linear operation", the word linear doesn't always make intuitive sense in terms of rotations or scaling if you have encountered simple 1 or 2 dimensional linear transformations when doing more basic graphics programming.
As a teacher, I think the biggest lesson I had to learn was to always have at least 3 different ways of explaining everything to give different kinds of people different entrypoints into understanding concepts.
For someone uninitiated a term like "basis vector" can be pure gibberish if it doesn't follow an example of a transform as a viewport change, and it needs to be repeated after your other explanations (of for example how vector components in the source view just are scalars upon the basis vectors when multiplied with a matrix instead of a heavy un-intuitive concept).
Math is just a standardized way to communicate those concepts though, it's a model of the world like any other. I get what you mean, but these intuitive or visualising approaches help many people with different thinking processes.
Just imagine that everyone has equal math ability, except the model of math and representations of mathematical concepts and notation is more made for a certain type of brains than others. These kind of explanations allow bringing those people in as well.
I don't think there's any mathematical reason to lay out the elements in memory that way. Sure given no context I would probably use i = row + n col as index, but it doesn't really matter much me.
If I had to pick between a matrix being a row of vectors or a column of covectors, I'd pick the latter. And M[i][j] should be the element in row i column j, which is nonnegotiable.
> Mathematicians like to see their matrices laid out on paper this way (with the array indices increasing down the columns instead of across the rows as a programmer would usually write them).
Could a mathematician please confirm of disconfirm this?
I think that different branches of mathematics have different rules about this, which is why careful writers make it explicit.
Not a mathematician, just an engineer that used matrix a lot (and even worked for MathWorks at one point), I would say that most mathematicians don't care. Matrix is 2D, they don't have a good way to be laid out in 1D (which is what is done here, by giving them linear indices). They should not be represented in 1D.
The only type of mathematicians that actually care are:
- the one that use software where using one or the other and the "incorrect" algorithm may impact the performance significantly. Or worse, the one that would use software that don't use the same arbitrary choice (column major vs row major). And when I say that they care, it's probably a pain for them to think about it.
- the one that write these kind of software (they may describe themselves as software engineer, but some may still call themselves mathematicians, applied mathematicians, or other things like that).
Now maybe what the author wanted to say is that some language "favored by mathematician" (Fortran, MATLAB, Julia, R) are column major, while language "favored by computer scientist" (C, C++) are row major
What I suspect he really means is that FORTRAN lays out its arrays column-major, whilst C choose row-major. Historically most math software was written in the former, including the de facto standard BLAS and LAPACK APIs used for most linear algebra. Mix-and-matching memory layouts is a recipe for confusion and bugs, so "mathematicians" (which I'll read as people writing a lot of non-ML matrix-related code) tend to prefer to stick with column major.
Of course things have moved on since then and a lot of software these days is written in languages that inherited their array ordering from C, leading to much fun and confusion.
The other gotcha with a lot of these APIs is of course 0 vs 1-based array numbering.
I'm a mathematician. It's kind of a strange statement since, if we are talking about a matrix, it has two indices not one. Even if we do flatten the matrix to a vector, rows then columns are an almost universal ordering of those two indices and the natural lexicographic ordering would stride down the rows.
At this point might as well make them match the x/y convention, with first index increasing to the right, and second index increasing from bottom to top.
(In many branches the idea is that you care about the abstract linear transformation and properties instead of the dirty coefficients that depend on the specific base. I don't expect a mathematician to have an strong opinion on the order. All are equivalent via isomorphism.)
Most fields of math that use matrices don't number each element of the matrix separately, and if they do there will usually be two subscripts (one for the row number and one for the column number).
Generally, matrices would be thought in terms of the vectors that make up each row or column.
There are a lot more ways to look at and understand these mysterious beasts called matrices. They seem to represent a more fundamental primordial truth. I'm not sure what it is. Determinant of a matrix indicate the area of or volume spanned by its component vectors. Complex matrices used in Fourier transform are beautiful. Quantum mechanics and AI seem to be built on matrices. There is hardly any area of mathematics that doesn't utilize matrices as tools. What exactly is a matrix? Just a grid of numbers? don't think so.
>[Matrices] seem to represent a more fundamental primordial truth.
No, matrices (or more specifically matrix multiplication) are a useful result picked out of a huge search space defined as "all the ways to combine piles of numbers with arithmetic operators". The utility of the discovery is determined by humans looking for compact ways to represent ideas (abstraction). One of the most interesting anecdotes in the history of linear algebra was how Hamilton finally "discovered" a way to multiply them. "...he was out walking along the Royal Canal in Dublin with his wife when the solution in the form of the equation i2 = j2 = k2 = ijk = −1 occurred to him; Hamilton then carved this equation using his penknife into the side of the nearby Broom Bridge" [0]
The "primordial truth" is found in the selection criteria of the human minds performing the search.
The fundamental truth is that matrices represent linear transformations, and all of linear algebra is developed in terms of linear transformations rather than just grid of numbers. It all becomes much clearer when you let go of the tabular representation and study the original intentions that motivated the operations you do on matrices.
My appreciation for the subject grew considerably after working through the book "Linear Algebra done right" by Axler https://linear.axler.net
Spatial transformations? Take a look at the complex matrices in Fourier transforms with nth roots of unity as its elements. The values are cyclic, and do not represent points in an n-D space of Euclidean coordinates.
Yes; I wrote linear transformation on purpose not to remain constrained on spatial or geometric interpretations.
The (discrete) Fourier transform is also a linear transformation, which is why the initial effort of thinking abstractly in terms of vector spaces and transformations between them pays lots of dividends when it's time to understand more advanced topics such as the DFT, which is "just" a change of basis.
A lot of areas use use grid of numbers. And matrix theory actually incorporates every area that uses grids of numbers, and every rule in those areas.
For example the simplest difficult thing in matrix theory, matrix multiplication is an example for this IMO. It looks really weird in the context of grid of numbers, and its properties seem incidental, and the proofs are complicated. But matrix multiplication is really simple and natural in the context of linear transformations between vector spaces.
Rotation (turn something 2 degrees, 90 degrees, 180 degrees, 360 degrees back to the same heading)
Scaling (make something larger, smaller, etc)
(And a few more that doesn't help right now)
The 2 first can be visualized simply in 2d, just take a paper/book/etc. Move it left-right, up down, rotate it.. the book in the original position and rotation compared to the new position and rotation can be described as a vector space transformation, why?
Because you can look at it in 2 ways, either the book moved from your vantage point, or you follow the book looking at it the same way and the world around the book moved.
In both cases, something moved from one space (point of reference) to another "space".
The thing that defines the space is a "basis vector", basically it says what is "up", what is "left" and was in "in" in the way we move from one space to another.
Think of it as, you have a piece card on a paper. Draw an line/axis along the bottom edge as the X count, then draw on the left side upwards the Y count. In the X,Y space (from space) you count the X and Y steps of various feature points.
Now draw the "to space" as another X axis and another Y axis (could be rotated, could be scaled, could just be moved) and take the counts in steps and put them inside the "to space" measured in equal units as they were in the from space.
Once the feature points are replicated in the "to space" you should have the same image as before, just within the new space.
This is the essence of a so called linear(equal number steps) transform (moved somewhere else), and also exactly what multiplying a set of vectors by a matrix achieves (simplified, in this context, the matrix really is mostly a representation of a number of above mentioned basis vectors that defines the X, Y,etc of the movement).
the set of all matrices of a fixed size are a vector space because matrix addition and scalar multiplication are well-defined and follow all vector space axioms.
But be careful of the map–territory relation.
If you can find a model that is a vector space, that you can extend to an Inner product space and extend that to a Hilbert space; nice things happen.
Really the amazing part is finding a map (model) that works within the superpowers of algorithms, which often depends upon finding many to one reductions.
Get stuck with a hay in the haystack problem and math as we know it now can be intractable.
Vector spaces are nice and you can map them to abstract algebra, categories, or topos and see why.
A matrix is just a list of where a linear map sends each basis element (the nth column of a matrix is the output vector for the nth input basis vector). Lots of things are linear (e.g. scaling, rotating, differentiating, integrating, projecting, and any weighted sums of these things). Lots of other things are approximately linear locally (the derivative if it exists is the best linear approximation. i.e. the best matrix to approximate a more general function), and e.g. knowing the linear behavior near a fixed point can tell you a lot about even nonlinear systems.
Yes, I think of them as saying "and this is what the coordinates in our coordinate system [basis] shall mean from now on". Systems of nonlinear equations, on the other hand, are some kind of sea monsters.
The age doesn't affect this matrix part, but just FYI that any specific APIs discussed will probably be out of date compared to modern GPU programming.
Anyone who has taken linear algebra should know that (1) a rotation is a linear operation, (2) the result of a linear operation is calculated with matrix multiplication, (3) the result of a matrix multiplication is determined by what it does to the standard basis vectors, the results of which form the columns of the matrix.
This guy makes it sound like he had to come up with these concepts from scratch, and it's some sort of pure visual genius rather than math. But... it's just math.
I'd take issue with his "most programmers are visual thinkers", though. Maybe most graphics programmers are, but I doubt it's an overwhelming majority even there.
I remember reading that there's a link between aphantasia (inability to visualize) and being on the spectrum.
Being an armchair psychologist expert with decades of experience, I can say with absolute certainty that a lot of programmers are NOT visual thinkers.
For people like myself with aphantasia, there are often problems solving strategies that can help you when you can’t visualize. Like draw a picture.
And lots of problems don’t really require as much visual imagination as you would think. I’m pretty good at math, programming, and economics. Not top tier, but pretty good.
If there are problems out there that you struggle with compared to others, then that’s the universe telling you that you don’t have a comparative advantage in it. Do something else and hire the people who can more easily solve them if you need it.
Later in another lecture at another university, I had to rotate points around a center point again. This time found 3 3x3 matrices on wikipedia, one for each axis. Maybe making at least seemingly a little bit more sense, but I think I never got to the basis of that stuff. Never seen a good visual explanation of this stuff. I ended up implementing the 3 matrices multiplications and checked the 3D coordinates coming out of that in my head by visualizing and thinking hard about whether the coordinates could be correct.
I think visualization is the least of my problems. Most math teaching sucks though, and sometimes it is just the wrong format or not visualized at all, which makes it very hard to understand.
The first lecture was using a 4x4 matrix because you can use it for a more general set of transformations, including affine transforms (think: translating an object by moving it in a particular direction).
Since you can combine a series of matrix multiplications by just pre-multiplying the matrix, this sets you up for doing a very efficient "move, scale, rotate" of an object using a single matrix multiplication of that pre-calculated 4x4 matrix.
If you just want to, e.g., scale and rotate the object, a 3x3 matrix suffices. Sounds like your first lecture jumped way too fast to the "here's the fully general version of this", which is much harder for building intuition for.
Sorry you had a bad intro to this stuff. It's actually kinda cool when explained well. I think they probably should have started by showing how you can use a matrix for scaling:
for example, will grow an object by 2x in the x dimension, 1.5x in the y dimension, and keep it unchanged in the z dimension. (You'll note that it follows the pattern of the identity matrix). The derivation of the rotation matrix is probably best first derived for 2d; the wikipedia article has a decentish explanation:https://en.wikipedia.org/wiki/Rotation_matrix
You can do a rotation or some rotations but SO(3) is not simply connected.
It mostly works for rigid bodies centered on the origin, but gimbal lock or Dirac's Plate Trick are good counter example lenses. Twirling a baton or a lasso will show that 720 degrees is the invariant rotation in SO(3)
The point at infinity with a 4x4 matrix is one solution, SU(3), quaternions, or recently geometric product are other options with benefits at the cost of complexity.
Why use quaternions at all, when three by three matrices can also represent rotations? Three by three matrices contain lots of redundant information beyond rotation, and multiplying quaternions requires fewer scalar additions and multiplications than multiplying three by three matrices. So it is cheaper to compose rotations. It also avoids singularities (gimbal lock).
When everything clicked a few years down the line it all became so simple.
Like you mention "linear operation", the word linear doesn't always make intuitive sense in terms of rotations or scaling if you have encountered simple 1 or 2 dimensional linear transformations when doing more basic graphics programming.
As a teacher, I think the biggest lesson I had to learn was to always have at least 3 different ways of explaining everything to give different kinds of people different entrypoints into understanding concepts.
For someone uninitiated a term like "basis vector" can be pure gibberish if it doesn't follow an example of a transform as a viewport change, and it needs to be repeated after your other explanations (of for example how vector components in the source view just are scalars upon the basis vectors when multiplied with a matrix instead of a heavy un-intuitive concept).
Just imagine that everyone has equal math ability, except the model of math and representations of mathematical concepts and notation is more made for a certain type of brains than others. These kind of explanations allow bringing those people in as well.
If I had to pick between a matrix being a row of vectors or a column of covectors, I'd pick the latter. And M[i][j] should be the element in row i column j, which is nonnegotiable.
Could a mathematician please confirm of disconfirm this?
I think that different branches of mathematics have different rules about this, which is why careful writers make it explicit.
The only type of mathematicians that actually care are: - the one that use software where using one or the other and the "incorrect" algorithm may impact the performance significantly. Or worse, the one that would use software that don't use the same arbitrary choice (column major vs row major). And when I say that they care, it's probably a pain for them to think about it. - the one that write these kind of software (they may describe themselves as software engineer, but some may still call themselves mathematicians, applied mathematicians, or other things like that).
Now maybe what the author wanted to say is that some language "favored by mathematician" (Fortran, MATLAB, Julia, R) are column major, while language "favored by computer scientist" (C, C++) are row major
Of course things have moved on since then and a lot of software these days is written in languages that inherited their array ordering from C, leading to much fun and confusion.
The other gotcha with a lot of these APIs is of course 0 vs 1-based array numbering.
(In many branches the idea is that you care about the abstract linear transformation and properties instead of the dirty coefficients that depend on the specific base. I don't expect a mathematician to have an strong opinion on the order. All are equivalent via isomorphism.)
Generally, matrices would be thought in terms of the vectors that make up each row or column.
No, matrices (or more specifically matrix multiplication) are a useful result picked out of a huge search space defined as "all the ways to combine piles of numbers with arithmetic operators". The utility of the discovery is determined by humans looking for compact ways to represent ideas (abstraction). One of the most interesting anecdotes in the history of linear algebra was how Hamilton finally "discovered" a way to multiply them. "...he was out walking along the Royal Canal in Dublin with his wife when the solution in the form of the equation i2 = j2 = k2 = ijk = −1 occurred to him; Hamilton then carved this equation using his penknife into the side of the nearby Broom Bridge" [0]
The "primordial truth" is found in the selection criteria of the human minds performing the search.
0 - https://en.wikipedia.org/wiki/William_Rowan_Hamilton
My appreciation for the subject grew considerably after working through the book "Linear Algebra done right" by Axler https://linear.axler.net
The (discrete) Fourier transform is also a linear transformation, which is why the initial effort of thinking abstractly in terms of vector spaces and transformations between them pays lots of dividends when it's time to understand more advanced topics such as the DFT, which is "just" a change of basis.
A lot of areas use use grid of numbers. And matrix theory actually incorporates every area that uses grids of numbers, and every rule in those areas.
For example the simplest difficult thing in matrix theory, matrix multiplication is an example for this IMO. It looks really weird in the context of grid of numbers, and its properties seem incidental, and the proofs are complicated. But matrix multiplication is really simple and natural in the context of linear transformations between vector spaces.
"...linear transformations between vector spaces."
When you understand what that implies you can start reasoning about it visually.
The 3 simplest (that you can find in blender or any other 3d program, or even partly in 2d programs).
Translation (moving something left,right,up,down,in,out).
Rotation (turn something 2 degrees, 90 degrees, 180 degrees, 360 degrees back to the same heading)
Scaling (make something larger, smaller, etc)
(And a few more that doesn't help right now)
The 2 first can be visualized simply in 2d, just take a paper/book/etc. Move it left-right, up down, rotate it.. the book in the original position and rotation compared to the new position and rotation can be described as a vector space transformation, why?
Because you can look at it in 2 ways, either the book moved from your vantage point, or you follow the book looking at it the same way and the world around the book moved.
In both cases, something moved from one space (point of reference) to another "space".
The thing that defines the space is a "basis vector", basically it says what is "up", what is "left" and was in "in" in the way we move from one space to another.
Think of it as, you have a piece card on a paper. Draw an line/axis along the bottom edge as the X count, then draw on the left side upwards the Y count. In the X,Y space (from space) you count the X and Y steps of various feature points.
Now draw the "to space" as another X axis and another Y axis (could be rotated, could be scaled, could just be moved) and take the counts in steps and put them inside the "to space" measured in equal units as they were in the from space.
Once the feature points are replicated in the "to space" you should have the same image as before, just within the new space.
This is the essence of a so called linear(equal number steps) transform (moved somewhere else), and also exactly what multiplying a set of vectors by a matrix achieves (simplified, in this context, the matrix really is mostly a representation of a number of above mentioned basis vectors that defines the X, Y,etc of the movement).
But be careful of the map–territory relation.
If you can find a model that is a vector space, that you can extend to an Inner product space and extend that to a Hilbert space; nice things happen.
Really the amazing part is finding a map (model) that works within the superpowers of algorithms, which often depends upon finding many to one reductions.
Get stuck with a hay in the haystack problem and math as we know it now can be intractable.
Vector spaces are nice and you can map them to abstract algebra, categories, or topos and see why.
I encourage you to dig into the above.
The age doesn't affect this matrix part, but just FYI that any specific APIs discussed will probably be out of date compared to modern GPU programming.