Can a non-invertible matrix be converted to invertible? (so that my disappointment can be diverted) I still am grieving over the incompatibility of determinants and non- square matrices 😢 It hurts that linear algebra exerts these conditions on our brains.
One line proofs : N=M-3I satisfies N²=0, hence M^10 = (N+3I)^10 = 3^10 I + 10*3^9 N (binomial theorem) Also, with the formula 1/(1-x) = sum x^n, we have : M^{-1} = 1/3 (I+N/3)^{-1} = 1/3 (I-N/3) = 1/9(3I-N) More generally the Dunford decomposition states that for every matrix A there is a couple (D,N) such that A = D+N, DN=ND, N nilpotent and D diagonalizable. This is very convenient if you need to calculate A^k.
Nice problem and solution. Whilst watching the video, I had the feeling that there must be a simple formula for Mⁿ. Then I realised that if we convert the characteristic polynomial equation into a recursively defined sequence (in the form Mⁿ⁺²=6Mⁿ⁺¹-9Mⁿ with initial terms I and M), then solve it as if it were a recursively defined sequence of real numbers (uₙ₊₂=6uₙ₊₁-9uₙ, u₀=1, u₁=m), then the resulting formula should work for the matrix M as well (as the matrix algebra here works just like real algebra), giving Mⁿ=3ⁿ⁻¹(nM-3(n-1)I), which we can certainly prove rigorously by induction. Alas, I see from other comments that writing M=3I+N where N²=0 and applying the binomial theorem allows to reach the same formula much more quickly.
Very interesting point of view And firstly I thought that it won't work for larger matrices but then I understand that there will be just more powers in the sum (for 3x3 matrix every integer power could be represented as a sum of M^2, M and I)
This is usable only for tiny matrices when you need to calculate it by hand. If you want to implement it as an algorithm, it's much easier to perform binary exponentiation on the matrix directly rather than calculating coefficients of the characteristic polynomial and using it many times to reduce power of matrix
If we are working with N×N matrices, we need at least O(N×Mult(N)) multiplications to get powers of matrix from 1 to N-1 for the polynomial, while in binary exponentiation we need only O(Mult(N) log P) (Here Mult(N) is the complexity of multiplying two N×N matrices and P is the power of matrix we are calculating)
To square polynomial of M we need O(N^2) if we do it naively, while squaring matrix requires O(N^3) naively. Better matrix multiplication is N^d where d is >2 closer to 3. I don't remember exactly. While better polynomial multiplication is O(N log N). But we also need to reduce it to lower powers using characteristic polynomial. Looks like it's also O(N log N). So, if p is power, then given we somehow have characteristic polynomial, and powers of M up to N-1 inclusive, we can raise M^p in O((log p * N log N) + N^3) where N^3 comes from addition of powers of M less than N multiplied by corresponding coefficients. The only problem is to somehow efficiently get characteristic polynomial, which naively requires to sum N factorial polynomials of degree N. Also somehow get all powers of M less than N, which is naively O(N^4). Anyway, all of this is tricky and may have any reasonable application is powers more than millions I guess. So binary exponentiation is much easier way to go.
@@MrRyanroberson1 maybe. I'm too lazy to figure out is there efficient way to get characteristics polynomial. N! is way too big starting from N=15, but binary exponentiation even to the power 10^18, of matrices 1000x1000 will be just 60 times 10^9 (60 is log of power, and 10^9 is just 1000^3)
in the realm of pseudo-random number generators, the xoshiro256++ reference implementation includes a "jump" function that advances the underlying generator 2^128 iterations. this sounds like a strange thing to provide, but it lets you easily create multiple generator instances with non-overlapping sequences for distributing across threads. the generator itself can be thought of as a 256x256 matrix on GF(2), as the iteration function is a linear transformation on its 256-bit state. the jump function actually works by using a pre-calculated polynomial derived from the cayley-hamilton theorem, being the polynomial for M^(2^128), and applying the polynomial in a distributive fashion, advancing the generator using the normal function rather than a matrix multiplication for each term. this works well in this particular application, with a fixed matrix *and* exponent, because you only have to store the 32 bytes of coefficient rather than an entire 8KB matrix. running the iterator a single step is also much faster than a more general matrix multiplication, so doing 255 iterations is no big deal.
Another way is to use its Dunford decomposition. M = 3I + N where 3I is the diagonal part and N is nilpotent (N² = 0). Thus M¹⁰ = (3I+N)¹⁰ = 3¹⁰I + 3⁹·10N = 3⁹ (3I+10N) = 3⁹ (-27I+10M)
Yes this would be well defined. We could write out a series expansion like I + M + M^2 / 2! + M^3 / 3! + ..., but we would still need to show that the series converges for a given matrix.
The series definitely must converge always since for any finite multiplier C that the matrix experiences witb each power, some integer will eventually exceed it and dominate in the factorial.
For any 2x2, with a characteristic polynomial of the form l^2-Al+B, using this method it can be shown that the general solution is M^n = A^(n-1)*M-(A^(n-1)-1)/(A-1)*B, which lends itself to fractional and negative n solutions.
If you really, really like the diagonalisation instead, you could use generalized eigenvalues / -vectors and Jordan normal form. In this case, the Jordan normal form of this matrix is 3 1 0 3 And calculating arbitrary powers of this matrix isn't very difficult. There is a little more work finding the relevant generalized eigenbasis, but once you have it, calculating many different powers or very large powers becomes a lot easier than it does for the Cayley-Hamilton approach.
@benniepieters Square roots are much easier this way than with Caley-Hamilton. The diagonal elements are √3, and the above-diagonal element is given by the equation 2√3x=1, with the solution x=√3/6. Obviously this gets harder for larger matrices, but that's the way it always is. If you want a square root for a general size Jordan block, you can use the (generalized) binomial theorem. Write the block as a sum of the diagonal and the superdiagonal, then raise that sum to the power of 1/2. The fact that the superdiagonal is nilpotent means you only get finitely many terms, and it works out perfectly.
If I'm understanding correctly... the Cayley-Hamilton theorem ends up implying that any integer power of any invertible square matrix (or any positive power of any square matrix, invertible or not) is a LINEAR function of the original matrix... which seems like an oddly powerful statement on its own. Why would anyone care about polynomials on square matrices, if they all reduce to a polynomial of degree 1? As all positive integers can be expressed as sums of powers of 2, I don't see at this point where this general statement would break.
@@gavintillman1884 Ah, right! Thanks. I suspected it was too pretty to be true. P.S.: Still... the notion of some family of objects (f.i. 2x2 matrices) never growing beyond linearity seems... an intriguing concept to explore. P.P.S: The one obvious counterexample of a polynomial that does not linearize is the characteristic polynomial itself, or multiples of it. I smell a vector space somewhere.
@@gavintillman1884 Complex numbers (as linear polynomials on i) could be another example. Maybe the adequate framework is just a ring or field extension.
It’s not quite a field, but an appropriate analogue would be algebraic numbers. For example, any integer power of a+b*sqrt(2) for a,b rational turns out to be also of the form c+d*sqrt(2), since the characteristic polynomial of such a number is quadratic. In general in a finite-dimensional algebra (=vector space with some multiplication), the finite dimension means you expect the powers of some element x (1, x, x^2, x^3, …) to become linearly dependent of each other. Perhaps the surprise of the Cayley-Hamilton theorem is that just by counting dimensions, for 2x2 matrices you’d expect there to be a linear dependence between I, M, M^2, M^3 and M^4 (as they are five vectors in four dimensions) but Cayley-Hamilton gives you a linear dependence between just I, M and M^2.
Only integers larger then the degree of the minimal polynomial do. You may think it this way. For some vectors v. The vectors {v, Av, A^2v} might be linearly independent. And hence A^2 is not linear combination of A and I.
I always found it weird that mathematicians use lots of different notations anywhere but when it comes to solving matrix equations I always see them writing "I" (to indicate that this is indeed a matrix equation). I'm curious why won't mathematicians invent and use a simpler notation
So we can invert a matrix using Cayley-Hamilton as long as it is invertible. Great video!
Can a non-invertible matrix be converted to invertible? (so that my disappointment can be diverted) I still am grieving over the incompatibility of determinants and non- square matrices 😢 It hurts that linear algebra exerts these conditions on our brains.
@@9WEAVER9 what does "converted" mean here
@@9WEAVER9Statisticians use a concept called the “generalized inverse”.
@@charlesbrowne9590 Like the Moore Penrose pseudoinverse?
@@MichaelMaths_ Yes, exactly. I had in mind ‘Generalized Inverse … ‘ by Rao.
One line proofs :
N=M-3I satisfies N²=0, hence M^10 = (N+3I)^10 = 3^10 I + 10*3^9 N (binomial theorem)
Also, with the formula 1/(1-x) = sum x^n, we have : M^{-1} = 1/3 (I+N/3)^{-1} = 1/3 (I-N/3) = 1/9(3I-N)
More generally the Dunford decomposition states that for every matrix A there is a couple (D,N) such that A = D+N, DN=ND, N nilpotent and D diagonalizable. This is very convenient if you need to calculate A^k.
In this case you could prove by induction that
M^n = 3^(n-1) (nM - 3(n-1)I)
What amazes me is, when you’ve come that far, you’re not even interested anymore as to what the underlying object M is. This is just algebra
Nice problem and solution.
Whilst watching the video, I had the feeling that there must be a simple formula for Mⁿ. Then I realised that if we convert the characteristic polynomial equation into a recursively defined sequence (in the form Mⁿ⁺²=6Mⁿ⁺¹-9Mⁿ with initial terms I and M), then solve it as if it were a recursively defined sequence of real numbers (uₙ₊₂=6uₙ₊₁-9uₙ, u₀=1, u₁=m), then the resulting formula should work for the matrix M as well (as the matrix algebra here works just like real algebra), giving Mⁿ=3ⁿ⁻¹(nM-3(n-1)I), which we can certainly prove rigorously by induction.
Alas, I see from other comments that writing M=3I+N where N²=0 and applying the binomial theorem allows to reach the same formula much more quickly.
Very interesting point of view
And firstly I thought that it won't work for larger matrices but then I understand that there will be just more powers in the sum (for 3x3 matrix every integer power could be represented as a sum of M^2, M and I)
Very useful video. Well presented.
This is usable only for tiny matrices when you need to calculate it by hand. If you want to implement it as an algorithm, it's much easier to perform binary exponentiation on the matrix directly rather than calculating coefficients of the characteristic polynomial and using it many times to reduce power of matrix
If we are working with N×N matrices, we need at least O(N×Mult(N)) multiplications to get powers of matrix from 1 to N-1 for the polynomial, while in binary exponentiation we need only O(Mult(N) log P)
(Here Mult(N) is the complexity of multiplying two N×N matrices and P is the power of matrix we are calculating)
To square polynomial of M we need O(N^2) if we do it naively, while squaring matrix requires O(N^3) naively. Better matrix multiplication is N^d where d is >2 closer to 3. I don't remember exactly. While better polynomial multiplication is O(N log N). But we also need to reduce it to lower powers using characteristic polynomial. Looks like it's also O(N log N). So, if p is power, then given we somehow have characteristic polynomial, and powers of M up to N-1 inclusive, we can raise M^p in O((log p * N log N) + N^3) where N^3 comes from addition of powers of M less than N multiplied by corresponding coefficients.
The only problem is to somehow efficiently get characteristic polynomial, which naively requires to sum N factorial polynomials of degree N. Also somehow get all powers of M less than N, which is naively O(N^4). Anyway, all of this is tricky and may have any reasonable application is powers more than millions I guess. So binary exponentiation is much easier way to go.
Sounds like, for P sufficiently bigger than exp(N), that binary exponentiation is no longer going to be optimal, no?
@@MrRyanroberson1 maybe. I'm too lazy to figure out is there efficient way to get characteristics polynomial. N! is way too big starting from N=15, but binary exponentiation even to the power 10^18, of matrices 1000x1000 will be just 60 times 10^9 (60 is log of power, and 10^9 is just 1000^3)
in the realm of pseudo-random number generators, the xoshiro256++ reference implementation includes a "jump" function that advances the underlying generator 2^128 iterations. this sounds like a strange thing to provide, but it lets you easily create multiple generator instances with non-overlapping sequences for distributing across threads.
the generator itself can be thought of as a 256x256 matrix on GF(2), as the iteration function is a linear transformation on its 256-bit state. the jump function actually works by using a pre-calculated polynomial derived from the cayley-hamilton theorem, being the polynomial for M^(2^128), and applying the polynomial in a distributive fashion, advancing the generator using the normal function rather than a matrix multiplication for each term.
this works well in this particular application, with a fixed matrix *and* exponent, because you only have to store the 32 bytes of coefficient rather than an entire 8KB matrix. running the iterator a single step is also much faster than a more general matrix multiplication, so doing 255 iterations is no big deal.
Another way is to use its Dunford decomposition. M = 3I + N where 3I is the diagonal part and N is nilpotent (N² = 0).
Thus M¹⁰ = (3I+N)¹⁰ = 3¹⁰I + 3⁹·10N = 3⁹ (3I+10N) = 3⁹ (-27I+10M)
This Dunford decomposition would be worth a video of its own, I guess...
Cayley-Hamilton is my favorite theorem.
De moivre's and Pythagoras do it for me (lights cigarette)
That's pretty useful information!
For the non-diagonizable matrix, is e^M still defined? is sin(M) and cos(M) defined?
Yes this would be well defined. We could write out a series expansion like I + M + M^2 / 2! + M^3 / 3! + ..., but we would still need to show that the series converges for a given matrix.
@@DrBarkeras you propably know, the series always converges in a complete metric space (real matrices) because ||exp(M)||
The series definitely must converge always since for any finite multiplier C that the matrix experiences witb each power, some integer will eventually exceed it and dominate in the factorial.
Nice solution
Can we do anything similar to the diagonalisation trick using the Jordan form?
How do you diagonalise a matrix and how do you know when it's not possible?
For any 2x2, with a characteristic polynomial of the form l^2-Al+B, using this method it can be shown that the general solution is M^n = A^(n-1)*M-(A^(n-1)-1)/(A-1)*B, which lends itself to fractional and negative n solutions.
If you really, really like the diagonalisation instead, you could use generalized eigenvalues / -vectors and Jordan normal form. In this case, the Jordan normal form of this matrix is
3 1
0 3
And calculating arbitrary powers of this matrix isn't very difficult. There is a little more work finding the relevant generalized eigenbasis, but once you have it, calculating many different powers or very large powers becomes a lot easier than it does for the Cayley-Hamilton approach.
What about for square root?
@benniepieters Square roots are much easier this way than with Caley-Hamilton. The diagonal elements are √3, and the above-diagonal element is given by the equation 2√3x=1, with the solution x=√3/6.
Obviously this gets harder for larger matrices, but that's the way it always is. If you want a square root for a general size Jordan block, you can use the (generalized) binomial theorem. Write the block as a sum of the diagonal and the superdiagonal, then raise that sum to the power of 1/2. The fact that the superdiagonal is nilpotent means you only get finitely many terms, and it works out perfectly.
If I'm understanding correctly... the Cayley-Hamilton theorem ends up implying that any integer power of any invertible square matrix (or any positive power of any square matrix, invertible or not) is a LINEAR function of the original matrix... which seems like an oddly powerful statement on its own. Why would anyone care about polynomials on square matrices, if they all reduce to a polynomial of degree 1? As all positive integers can be expressed as sums of powers of 2, I don't see at this point where this general statement would break.
Looks like it’s true for 2x2 matrix. 3x3 would leave a quadratic, 4x4 a quartic etc
@@gavintillman1884 Ah, right! Thanks. I suspected it was too pretty to be true.
P.S.: Still... the notion of some family of objects (f.i. 2x2 matrices) never growing beyond linearity seems... an intriguing concept to explore.
P.P.S: The one obvious counterexample of a polynomial that does not linearize is the characteristic polynomial itself, or multiples of it. I smell a vector space somewhere.
@@gavintillman1884 Complex numbers (as linear polynomials on i) could be another example. Maybe the adequate framework is just a ring or field extension.
It’s not quite a field, but an appropriate analogue would be algebraic numbers. For example, any integer power of a+b*sqrt(2) for a,b rational turns out to be also of the form c+d*sqrt(2), since the characteristic polynomial of such a number is quadratic.
In general in a finite-dimensional algebra (=vector space with some multiplication), the finite dimension means you expect the powers of some element x (1, x, x^2, x^3, …) to become linearly dependent of each other.
Perhaps the surprise of the Cayley-Hamilton theorem is that just by counting dimensions, for 2x2 matrices you’d expect there to be a linear dependence between I, M, M^2, M^3 and M^4 (as they are five vectors in four dimensions) but Cayley-Hamilton gives you a linear dependence between just I, M and M^2.
Only integers larger then the degree of the minimal polynomial do.
You may think it this way. For some vectors v. The vectors {v, Av, A^2v} might be linearly independent. And hence A^2 is not linear combination of A and I.
Is it always the case, for this specific example, that M^N = 3^k (N M - ... I)?
I would probably have square the matrix to get M^2. Squared it again to get M^4 and again to get M^8 then multiplied that by M^2.
I though we're gonna do Jordan normal form
I always found it weird that mathematicians use lots of different notations anywhere but when it comes to solving matrix equations I always see them writing "I" (to indicate that this is indeed a matrix equation). I'm curious why won't mathematicians invent and use a simpler notation
Is it weird that every mathematician uses 1 for the multiplicative identity (a*1=a)?
i kinda like this one