How AI Discovered a Faster Matrix Multiplication Algorithm

Quanta Magazine

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 лис 2024

КОМЕНТАРІ • 1,1 тис.

@alexanderkvenvolden4067 Рік тому ⁺⁶⁹⁸⁹
So to make matrix multiplication more efficient, they used machine learning, which relies heavily on matrix multiplication. Very cool!
@aritroshome8480 Рік тому ⁺¹⁶²
The earth is flat.
@cyancoyote7366 Рік тому ⁺⁵
@@aritroshome8480 Your brain is flat, my friend.
@SuperMaDBrothers Рік тому ⁺⁴⁴
dont read into it chief
@pyropulseIXXI Рік тому ⁺¹
this video is pure hyperbole. They said seasoned mathematicians haven't mastered matrix multiplication. So fcking stupid
@w花b Рік тому ⁺⁵⁸³
To launch a rocket stage filled with fuel even further you indeed need another stage that also contains fuel in the first place.
@sheevys Рік тому ⁺²⁷⁶⁷
One cool thing from tha paper which didn't make it to the video was that instead of using number of multiplications as your cost function, one can use the actual timings from running each algorithm. This allows to find fast multiplication algorithm which is tailored for the device it's running on. Interestingly, they showed that indeed for different hardware platforms, different algorithms performed better.
@AleXander-eo3iz Рік тому ⁺⁷¹
Naturally, one might ask what platform would provide the fastest algorithm and why.
@beargiles4062 Рік тому ⁺⁹⁹
I don't know the details but there's now some logic designs where addition is significantly slower than multiplication!
If I had to guess it's because multiplication of 2 n-bit numbers will always fit within 2n bits. In many common scenarios you'll always use all 2n bits in the results.
In contrast with addition of 2 n-bit numbers plus a carry bit you'll get, at most, an n-bit number plus an carry bit. (Ignoring 2s-compliment and that it would be a 'borrow' bit.) That's a tremendous waste if you're working with 64 bit words, much less 128 or even 256 bit words.
I know - dedicated hardware can use 65-bit words internally. Or you could write a library that quietly converts the data to 63-bit words, plus a carry, for calculations and them convert it back to the standard notation at the end. You'll lose a little bit of range but it might be an acceptable tradeoff. (With floating point numbers you would perform your shift so you would just lose a little bit of precision.) It's some work upfront but it might be worth it if you can keep everything in the innermost cache until you've finished your calculations.
But that's all just guesswork based on some classes that are all decades out of work....
@alphalunamare Рік тому ⁺⁷
I think that that is just incidental dust. Consider for example that the elements of the matrices are objects of a form defined in a class by C++ say. You have then no access to architectural bias. The complexity of the object itself overrides any platform consideration.
@Henrix1998 Рік тому ⁺⁴
@@AleXander-eo3iz ASIC
@gubgubbubbub Рік тому ⁺⁴⁰
More likely, it's because platforms have different performance profiles for addition, multiplication, and memory accesses, and in the case of different ISAs maybe even the number of registers to accumulate dot products into. For example, if computer A has memory access that's four times faster than computer B, it's probably worth it to spend more time doing memory-intensive algorithms which use less multiplication/addition operations.
@ABaumstumpf Рік тому ⁺³⁴⁴
For matrix-multiplications Strassen is the best overall - works for any size and any form of matrices. But there are of course faster methods for specific cases. There are algorithms that are faster for where large matrices, that are better for diagonalised matrices etc
They found an algorithm for matrices only containing 1 and 0 that is slightly faster - single-digit percentage faster. It is an achievement and does help, but the bigger benefit might just be that with specialised hardware it can use fewer multiplication-units - thus less power.
@razorblade413 Рік тому ⁺¹³
so based on the video, the discover from alpha tensor doesn't work in real matrix multiplication because its only 0 and 1. But in 3d games for rotating or translating and object, you need to add different numbers than zeroes or ones when updating in a game loop.
@ryanfeng 11 місяців тому ⁺⁴
Multiplication of Fourier matrix only need N*logN, so for some specific matrices, it is definitely possible to reduce the complexity. But maybe it is not generic.
@Last_Resort991 11 місяців тому ⁺¹⁰
It kind of is pretty dumb. Muliplication of binary numbers is very fast, it's not like that algorithm is really neededbthere anyway. The general case is way more interesting.
@webgpu 9 місяців тому
best overall? is there anything "particularly best" ? 🤣
@ABaumstumpf 9 місяців тому ⁺³
@@webgpu "best overall? is there anything "particularly best" ? 🤣"
I mean - no - why would there? Why would any sane person assume that a solution that is the best in general would be the best for any specific scenario?
Modern desktop CPUs are the best for general computation. Are they the best in matrix-multiplication? No. The best at encryption? No. Best at image-recognition? No. Best at network-trafficing? No. There are specialisations when you need those.
Hopefully that was simple enough for you to comprehend.
@alphalunamare Рік тому ⁺⁸⁵⁴
I devised an algorithm in 1977 that utilised 24 multiplications and 69 additions for the 3x3 case but didn't get much further as it was equivalent to Strassen's in the 2x2 case. I did it with pencil and paper. After seeing this video I don't feel so bad now about my failure :-)
@evamkaushik5392 Рік тому ⁺³⁰
I'd like to learn from you, sir.
@alphalunamare Рік тому ⁺¹⁵⁸
I'll be totally honest and say that I took Strassen's lead and tried it out in the 3x3 case ... hence it was inevitable that it would reduce to his case. There was no spark of a new idea just a plodding through the algebra, it took me about 12 hours and a few hundred feet of line printer paper. When I opened my door, the next morning, Mike's (Atkinson) room opposite seemed open so I walked in and asked for him to look at 'this' and he was most joyed. He smiled so much I felt that I had done something useful, He asked if he could use it in his class and I said 'Yes! f course' and then I left Cardiff. Me and Henry had shared our office ... I am afraid that he was probably quite unhappy that I had left the paper all over the place .. He was a good mate though :-)
@shoam2103 Рік тому ⁺¹³
You 'just' had to continue a 'little' bit 😬
@justingolden21 Рік тому ⁺³
That's AWESOME
@taranjotsingh3981 Рік тому ⁺²
That's amazing nonetheless!
@RepChris Рік тому ⁺⁷⁰³
I mean _technically_ weve had "faster" algorithms than the stassen one for quite a bit. Were down from the factor of ~2.7 to ~2.37 for very large matrices. Although an actually usable algorithm thats faster even in the relatively special cases is always nice to see
@NoNameAtAll2 Рік тому ⁺⁵⁷
even recursive strassen-like algos had better versions than 2x2
with rectangular (3x5, iirc) still scaling better than deepmind's 4x4
@FerrisMcLaren Рік тому ⁺⁴
technically, your comment is useless and provides nothing to the conversation.
@tunafllsh Рік тому ⁺³⁴
They are not talking about algorithm complexity for large matrices. This is about the fastest way to compute a 2x2, 3x3, 4x4 and 5x5.
Algorithms with "faster" complexity end up having larger overhead and perform worse than strassen on smaller matrices.
With this new algorithm for 4x4, you can use your favorite low complexity algorithm to break down the multiplication to 4x4 matrices. And then use the decomposition found by AlphaTensor.
@RepChris Рік тому ⁺⁵⁸
@@tunafllsh Yes, im aware. Due to ambiguity of their wording they said that Strassens algorithm was unbeaten which is technically not true since there are algorithms with much better asymptotic time, albeit not practical ones. A small nitpick for my entertainment and an opportunity to share some trivia for those who might not know
@lukealadeen7836 Рік тому ⁺²
@@RepChris did you get a kick out of that? How fast is your algorithm?
@Wooksley Рік тому ⁺¹⁵⁵⁵
I finished university 4 years ago and I still get nightmares about my diploma being revoked because they’ve discovered that I failed linear algebra exam. Doing matrix multiplication by hand is a literal nightmare for me.
@francogonz Рік тому ⁺³⁹
If the Nemotechnic is hard enough for you, try by definition 🤗
The Sigma sum is a lot easier to memorize the method
@shoam2103 Рік тому ⁺³¹
For a lot of cases there are arguably better alternatives. But numerical computations are better left for computers anyway..
@Wooksley Рік тому ⁺²³
@@francogonz I’m just really bad at mental arithmetic. And even so, I didn’t actually fail it…
@Penrose707 Рік тому ⁺¹⁰
@@Wooksley I personally find matrix multiplaction by cofactor expansion to be the easiest method to compute by hand
@Wooksley Рік тому ⁺⁶⁰
@@Penrose707 I find the best way to do matrix multiplication is to write some python code. Why would I ever do it by hand outside of that dreadful exam? It’s not the 17th century…
@QuantenMagier Рік тому ⁺³⁷⁵
I read the corresponding papers and this is a pretty good recap video on the topic. Kudos for visualizing the algorithms in an understandable way and even giving some background information from the researchers! 🧑‍🎓
@missionaryrav628 Рік тому ⁺⁶
Tbh i didn't understand shit from the visuals...yes ofc it seemed legitimate and got the vibe going...but it wasnt understandable as a highschooler
@foncbi 9 місяців тому
@@missionaryrav628 git gud kid
@KnakuanaRka Рік тому ⁺⁴²
The way you described the tensor decomposition problem as a game of removing tensors from the original to reduce it to all zeros as fast as possible, and how you showed it with the Strassen example, reminds me a lot of the Lights Out game where you press a light to flip it and all adjacent to make all the lights dark. The moves in tensor decomposition are a lot more complex of course, but seeing it as a game makes a lot of sense.
@vincent_hall 26 днів тому
Life is a game.
Those who see their work as games do very well!
@GrimJerr 9 місяців тому ⁺²⁰
The power of AI isn't intelligence, it is a feature of Iteration.
@AtomickPixel 5 місяців тому
Linear Algebra mob want you to think it's intelligence.
@LewisBavin Місяць тому ⁺⁴
What is biological intelligence other than millions and millions of years of evolutionary iteration?
@Alexandre-ur1oi Рік тому ⁺³⁸
For the sake of completness, in practical applications one should keep in mind that reducing the number of basic operations is not the main bottelneck for matrix multiplication efficiency, moving data is (bus 500Mh, cpu 1-2Ghz). However, without new ideas on this (last not understood) basic operation (polynomial, integer multiplication is mastered), brute computer force (cleverly used in the present case) is welcome.
@btf_flotsam478 Рік тому ⁺⁴
Surely, fewer calculations means fewer movements of data?
@pmnt_ Рік тому
@@btf_flotsam478 not necessarily. On hardware, you don't access single numbers, you always access multiple bytes that happen to be adjacent in memory at once (e.g. if the data bus is 128 bits wide, you access 2 or 4 floating point numbers at once, depending on precision). A naive alorithm that uses matrix element in order can basically half or even quarter the number of memory accesses. On the other hand, if the optimized algorithm makes heavy use of out-of-order access to the matrix (the technical term is random access), it might "waste" some of the transferred numbers, just because the adjecent numbers are needed at a later step of the algorithm.
An asymptotically fastest algorithm might not be the fastest algorithm in practice. That's especially true for applications that are already ridiculously optimized to specific hardware, such as the matrix multiplication.
@tappetmanifolds7024 Рік тому ⁺²
@@btf_flotsam478
Not always.
Bott periodicity in 8 dimensions has values 16 times as big.
Even though lines of code can be n - strings short the amount of complexity embedding in a single character can be infinitely large.
@JaxVideos 9 місяців тому ⁺²
The caching behavior on nearly every processor will determine the absolute speed limit for MM. One reason Strassen-style partitioning works is that it continuously tesselates the large matrices which brings down the workingset size and reduces cache overflow lossage.
Similar considerations apply to parallel implementations.
@silentblackhole Рік тому ⁺⁵¹
I didn't follow along with some of the more detailed parts but over all found it to be really interesting. Nice work to everyone involved in this directly or otherwise.
@heyhoe168 Рік тому ⁺¹
They managed to put AI into juggling complex algebraic compositions/decompositions and applied results to computer calculus tasks optimizations. Nothing mind-blowing, but important step towards self-improving algorithms.
@OtakuSanel Рік тому ⁺²¹⁵
matrix math is just one of those things that while more efficient equations might be good on the theory side of things, there is far more gains from specialized hardware to do that kind of math. if something has 8 vs 7 multiplications might be a factor if you are forced to do it consecutively, when you have hardware that does it all at once one more "lane" of computation for that 8th figure is a moot point. Back in the day there used to be math co-processors and we came back full swing where now the gpu IS the math co-processor once more with things like cuda and tensor cores that are specifically designed for that kind of math. The big question is, what other kind of specialized hardware would be of high value and implement that rather than pushing the limits of math theory to find efficiency.
@ronald3836 Рік тому ⁺¹²⁰
If you have to deal with NxN matrices where N grows arbitrarily large and a multiplication takes N^k operations, then faster hardware can't compete with an algorithmic improvement that lowers the exponent k. The point of the 8->7 trick in the Strassen algorithm is that it lowers the exponent k from 3 to 2.808 for larger matrices.
However, if N is fixed and the improved algorithm is very complicated, then it may well be faster to use a simpler algorithm.
According to a 2016 paper, on then modern hardware the Strassen algorithm was 20% faster for N=512 (and for larger N the gain will be even larger).
@MrMctastics Рік тому ⁺¹⁸
@@ronald3836 my man here with that DATA. Dayum
@geekswithfeet9137 Рік тому
Wow you’re short sighted, like you have no idea what you’re talking about it’s so wrong
@xghqhj7910 Рік тому ⁺²
@@ronald3836 can u mention a real life application where we'd need a 512 * 512 matrix?
Edit: I just remembered photos are basically matrices of pixels. But would still love to hear about any other applications?
@promethuser Рік тому ⁺¹¹
⁠@@ronald3836can you cite the paper? Not calling you out or anything but no modern BLAS implementation makes use of strassen's. I'm sure OP understands algorithmic complexity, and they're still right - hardware is the deciding factor for all practical implementations of Matrix mult.
@Dismythed Рік тому ⁺³⁴
Never explained how AT achieved the matrix calculations with fewer steps than previous (47 to 45 or 96 to 95).
@danielhaacke8641 9 місяців тому ⁺⁶
Indeed! Left me gravely unsatisfied.
@daviddlamini4290 Місяць тому ⁺¹
After doing electrical engineering and being traumatized by Matrices, it blows my mind how easily you explained it.
@Techmagus76 Рік тому ⁺¹⁵
To explain the Alphatensor result it would be better to show and explain the Karazuba algorithm as Alphatensor found something similiar for 4x4 to Karazuba for 2x2. Sure Strassen is beating Karazuba on 2x2 so it should be mentioned, but for the didactic of the solution Karazuba is giving more insight to the result.
@tappetmanifolds7024 Рік тому ⁺¹
@Techmagus76
You are on the right track.
Karatsuba discovered n ^ 1.58.
The improvement was n * log n.
@akisuihkonen9056 Рік тому ⁺⁴
These are completely different problems. Karatsuba and its extension like Toom-3 are for polynomial (or big-int) multiplications, where as this is about matrix multiplications. One source of the mistake might be, that after Karatsuba discovery there was also an algorithm by Strassen (And Schonhage) performing the polynomial multiplication (or convolution) using Number Theoretic Transform recursively.
@user-francescob Рік тому ⁺¹¹
Has any of these researchers looked the impact of this new algorithm on specialize matrix libraries such as Tensorflow, and GPUs. It's possible that the 50-year old algorithm will execute faster than the new DM algorithm, IF for example, there's a significant advantage using specialized functions tuned for GPU instruction sets. It would be important to #1 - Know this answer? #2 - Where the cross-over occurs? #3 - Which and When existing libraries will adopt this new algorithm?
@KabelkowyJoe Рік тому ⁺⁵
I guess - none, because it can only do 1 bit computations :> 10:45
@velipekkanousiainen2408 3 місяці тому
I have a better algorithm requiring new tensor operations on a register set of a CPU (or GPU).
I have something else to do right now, but the key is: new tensor opetations on the register set.
Those who have resources for hardware algorithms will easily solve it as well and also prduce the CPU
@cbunix23 Рік тому ⁺¹⁹
For very large matrices you can get an easy 40% speed-up over the naive algorithm if you order the multiplication and addition operations in a cache coherent order. I'm skeptical would be faster than Strassen's though.
@jimwinchester339 Рік тому ⁺¹²
Agreed, but these are considered implementation optimizations, not a different and superior algorithm. The criteria used for a better type of algorithm is a reduction of the overall power by which the matrix dimension are raised by; that is, the "big O" [Order of complexity] of the algorithm.
@cbunix23 Рік тому
@@jimwinchester339Yup, I took that class decades ago. ;)
@YouRich92 Рік тому ⁺¹
Algorithms that reorder operations to achieve better computation locality are not simple implementation optimizations.
They are genuinely different algorithm. For instance one can use them to decompose matrices in sub-blocks.
This allows for instance better parallelism. Or it can enable you to break your original problem in different way, which can give you speed up orders of magnitude better than a pure better multiplication algorithm.
Another thing is that algorithm with better O may be much more difficult to implement in a "computer friendly" way.
For instance they make difficult to take advantage of long cache line memory fetch or poor locality lead to cache thrashing.
So, OK, they are more satisfying on the paper but worse in the real life.
@cbunix23 Рік тому
@@YouRich92A small follow-up. Suppose we're multiplying two large square matrices (A X B = C). First flip B about the main diagonal, then both A and B can easily be traversed in a cache coherent order, instead of only B. I'll have to experiment with this to see how well this works.
@shannonbarber6161 7 місяців тому
@@jimwinchester339 Many theoretically superior algorithms run much slower than optimized "inferior" implementations. Any linked-list case being the poster-child.
Cache-coherency is King.
@asicdathens Рік тому ⁺³
The first Multimedia Extensions for CPU's that Intel made, the MMX instruction set, was basically matrix operations and mostly multiplications
@LordofDorkness94 Рік тому ⁺¹³⁰
Minor error at 2:53. Shouldn't it just say 67% rather than 67% less?
@QuantaScienceChannel Рік тому ⁺⁸³
Good catch! We'll add a correction in the UA-cam description.
@kamel3d Рік тому ⁺⁴
I was confused
@viliml2763 Рік тому ⁺¹³
heh they blurred the word "less"
@jerecakes1 Рік тому ⁺⁷
@@viliml2763 good enough :D
6 місяців тому
@@kamel3d haha
@unkn0vvnmystery 7 місяців тому ⁺⁴
2:23
Quanta Magazine: Going from eight down to seven multiplications may seem trivial.
me thinking: for 2x2 sure but for something like 8x8 it is significant.
2:29 *shows 8x8*
me: lol
@rashid8646 Рік тому ⁺⁷¹
Math just goes over my head, but it is cool that we can automate the trial-and-error aspect of mathematical models.
@dekippiesip Рік тому ⁺⁷
It's better than trial and error, but still kind of mindless. It's like evolutions survival of the fittest.
@ilikegeorgiabutiveonlybeen6705 Рік тому ⁺³
yeah its greedy with extra steps
@alexsere3061 4 місяці тому ⁺¹
I am extremely impressed that they managed to turn this abstract task into a ML problem. I am really surprised at how computationally expensive this can be by brute force. A 4x4 multiplication of matrices would correspond to a tensor of 4^3=64 cells. Since these multiplications are in F_2, then there are only 2^64~10^19 options for a tensor. If we want to find one with 47 by brute forcing that is 10^100 roughly. Insane
@niks660097 Рік тому ⁺⁶
You should make video on constraint based solvers, its just amazing you can bypass all physics and still do real physical simulation just by few linear equations, which in turn just uses matrix multiplication..
@ictogon 8 місяців тому ⁺²
It's not really physics it's more of geometry
@qafmbr 7 місяців тому ⁺²
I so respect guys like Fawzi. Amazing. Imagine what is going to happen within the next 30 years?
@miloweising9781 Рік тому ⁺⁸⁶
Okay but do we know yet if this type of multiplication algorithm will work in any context other than Z/2Z? It is not surprising to think that there are extra tricks to multiplying matrices in Z/2Z that one could use which are useless over the reals.
@davidcotham1939 Рік тому ⁺⁹
Good point. Seems like there are some p vs np implications. If matrices with only binary elements troubles the Google Deepmind team, then doing this with matrices with any non-binary element sets would probably be too convoluted.
@caspermadlener4191 Рік тому
The records of ℤ/2ℤ are of course better than ℂ, where ab=ba, although this is not as important as the records where ab≠ba, like matrices.
But the records are really close to each other.
@SterileNeutrino Рік тому ⁺¹⁴
@@davidcotham1939 My dear colleague, I fail to see how this has anything to do with P vs NP - this is not search nor is the complexity exponential.
@SterileNeutrino Рік тому
No it seems one can apply improvs to any floating point number (or complex number for that matter). The article says that for ℤ/2ℤ however: "In a few cases, AlphaTensor even beat existing records. Its most surprising discoveries happened in modulo 2 arithmetic, where it found a new algorithm for multiplying 4-by-4 matrices in 47 multiplication steps, an improvement over the 49 steps required for two iterations of Strassen’s algorithm. It also beat the best-known algorithm for 5-by-5 modulo 2 matrices, reducing the number of required multiplications from the previous record of 98 to 96. (But this new record still lags behind the 91 steps that would be required to beat Strassen’s algorithm using 5-by-5 matrices.)"
@jakobm.4183 Рік тому ⁺¹
It seems very likely that 49 is optimal over the reals. One way a 47 algorithm could arise is if two rank one tensors which are distinct over the reals become equal over Z/Z2 and cancel (since addition and subtraction are the same in Z/Z2). Also for some other matrix formats where better algorithms are known for Z/Z2, those beat the best known algorithms for the real case by exactly 2 multiplication steps.
@blankspace178 11 місяців тому ⁺¹⁵
*FYI: THIS ISN'T "NEW," IT'S LITERALLY REPLICATING A STACK OF WINDOWED PAPER....WHICH IS HOW MATHEMATICIANS ORIGINALLY SOLVED MATRIX PROBLEMS MANUALLY. THE AI JUST RE-DISCOVERED AN ORIGINAL CONCEPT BUT IT DIDN'T FIND ANYTHING "NEW". THIS VIDEO IS INCREDIBLY MISLEADING.*
@Tony-dp1rl Рік тому ⁺⁷
Um .. 4x4 of ones and zeros ... I would just pre-store a cache of combinations of the row results ... 4 bits per row, 16 bits for the second matrix, 4 bit result row = 512KB of nicely cached memory. 4 lookups, done. ;)
@Iaotle Рік тому ⁺²⁰
Already knew about the AI, but didn't know that we managed another reduction! Cool stuff!
@vivekdabholkar5965 Рік тому ⁺²
Truly amazing solution to centuries-old problem
@tehdii 6 місяців тому ⁺¹
Error codes, multiplications, everything is done faster by splicing it. One thing I noticed that when I used every day D6 dices for a few moments a day just to sharp my addition in a trivial way my arithmetical tasks went smoother back in the days. I did split everything I could to smaller chunks in my mind. Universe is a weird fractal :)
@gg.3812 Рік тому ⁺¹⁵
As long as people do not start calling computer programs ‘proofs’, with no formal error analysis, I m fine with machines helping out. What I do not like is calling a theorem or a proof what is clearly not. And it is already happening in many areas.
@arifmammadov736 Рік тому ⁺²¹
Holy crap, I just understood matrixes in a much more deeper level than what I was taught in university.
@VoiceTotheEndsOfTheEarth 10 місяців тому ⁺²
The caveat is that it works only with binary elements. What's interesting, though, is the fact that there's a loss of rigor that comes from mathematical proof whenever you use a experimental approach (try and see if it works; if not, try something else but make sure you don't try anything close to what didn't work before). That is why the mathematicians were able to find a better solution that the machine learning algorithm missed. These two caveats suggest that the method (of the machine learning algorithm) is not a "proof" in the purest sense, but rather a "proof" in the pragmatic sense ("the proof's in the pudding"). It's typically what engineers do. I remember when I studied digital signal processing in the early days prior to the advancement of discrete mathematics. Mathematicians claimed that the sampling theorem was wrong, and they were right in the purist sense, but that doesn't mean it can't be used in practice, because the ear cannot perceive the errors if you sample beyond the Nyquist rate. At some point, the mathematics journals will come to accept machine learning experiments as "proof", and the meaning of the word and mathematics itself (as a language) will lose the precision it's known for that comes from the rigor it has traditionally enforced.
@xxytxx Рік тому ⁺²
The narration says (2:55) that Strassen's method requires "a third less" computation while the video contradicts this by saying "67% less". 343 is 67% of 512 so it is 33% (a third) less than 512.
@kossboss 8 місяців тому
They meant 67% of the original value but yes if one is being pedantic then you are correct and in this world one can never be too pedantic or descriptive in my opinion
@Rocksite1 Рік тому ⁺⁴
The day will probably come, when AI mathematicians can go over the possibilities so much faster and so much more creatively, it'd be like asking a five year-old to advance the state of mathematics now. Wolfram has an AI that has been coming up with new theorems for some time. I'm not fooled, when ppl say "It won't replace ...". We'll have significant, useful input to add for a while. The time when we won't, I think, isn't nearly as far off as we think. Now, imagine Malthusian elites directing AI, while they still can ...
@moereese5254 Місяць тому
They started in Characteristic 2 and extrapolated the algorithm to all numbers. Brilliant strategy!
@PS3PCDJ Рік тому ⁺⁵
Love the recursive nature of computer science
@chrisf1600 11 місяців тому ⁺³
Love the recursive nature of the recursive nature of computer science
@peterlindner3283 Рік тому ⁺¹
Thank you Dr Simon for funding this
@MarsCorporations Рік тому ⁺⁷
If the matrices only contain 0 or 1, you can build the whole naive algorithm in hardware where you multiply 2 matrices in a few ticks (dont know how many layers that algo has... should be around 1 to 4... multiplying, and adding mod 2 are trivial bit operations which can be executed in parallel). There are propably ASICs for exactly that situation, where you have "zero" multiplications per 4x4 Matmul mod 2.
@robosergTV Рік тому
do you really think no one thought of that? Of course there companies for years who build such tensor accelerators for Deep Learning. Google "TPU"
@noelwass4738 8 місяців тому
Strassen's algorithm was the work of a genius because he paved the way and showed it was possible. Without that breakthrough nobody would have tried. I have no idea how he came up with his method, in particular because the additions needed before the multiplications are done are not at all obvious. The graphics here showed nicely how the 8 3d tensors could be used to represent the operations needed for the matrix multiplication of two 2 x 2 matrices in the standard algorithm and similarly for Strassen's algorithm.
@cjburian1 Рік тому ⁺¹⁴
It's mind-boggling that Strasser's method wasn't discovered a century ago or more. Mathematicians seem to have been expanding and reducing and playing with equations since antiquity. I would have never assumed that multiplying two 2x2 matrices cannot be done with less than 8 multiplications, but simply assume that it had been found by consensus to be objectively or subjectively the 'best' way.
@voidwalker9939 Рік тому ⁺¹⁰
It's always a case of what's more intuitive to teach. The "classic" method is neat and understandable, and efficient for teaching purposes (no exam will ask a 5x5 * 5x5 matrix multiplication). Machines don't have an instinct, they execute code so what matters is only the code itself, not how understandable it is to a human.
The strength of this method of finding things is that they can just tune the parameters to whatever specific case they need and the model will, hopefully, eventually converge to a solution or to a path to the solution, which is incredibly fascinating.
@janekschleicher9661 Рік тому ⁺¹
Also, most of the mathematicians of past centuries, were often extreme good in calculating things in head and probably more often than not more savant like. They had to do all kind of calculations in either their head or by hand with close to no support from machines. And in this environment, there is not much to gain from reducing multiplications und have more additions instead. For smaller numbers, it's close the same effort for (very skilled) humans and for bigger numbers, they'd usually just approximate the calculation with pre-calculated logarthmic tables, and then it's just an addition again.
It's just computers that can additions much more performant than multiplications or lookups in (big) precomputed logarithmic tables.
What I miss here in the video is why the results for matrices \in [-1, 0, 1]^2n matter. Shouldn't be a multiplication for these 3 nrs be have the same computation time as an addition. I guess, we can use the algorithm to adapt for R^n spaces, but that's somehow unmentioned. Or maybe I don't understand it.
@niallrussell7184 Рік тому
AlphaGo found new strategies in the game, that has been played by humans for 4500 years. The human brain has it's limits.
@gehirndoper Рік тому ⁺⁵
There was never a use for Strassen's method before the use of digital computing. If you have to multiply matrices by hand (or abacus) it's more important to not make mistakes and to not get ridiculous intermediate values. Strassen is also not as numerically stable.
@alphalunamare Рік тому ⁺²
@@gehirndoper It is as numerically stable as the normal algorithm. You are right about the digital computer and Abacus though :-)
@davidwilkie9551 Рік тому ⁺¹
Not a criticism of what looks like a great systematic approach to a necessarily exhaustive methodology, to which we just add that the Universe of Logarithmic Time Duration Timing is obviously self-defining probability, and will evolve such solutions as demonstrated by Euler's pure-math function, ie it's QM-TIME Completeness cause-effect holography in the rough.
(But a proper Mathemagical analysis will deduce the best smoothness choice of processes, fun to imagine for Mathematicians)
@Omena0MC 9 місяців тому ⁺³¹
Bruh ai literally made itself faster 💀💀💀
@raheelrehmt 7 місяців тому
14:56₩☆
@justinliu7788 3 місяці тому ⁺¹
@@raheelrehmtbro really linked to a time after the ending
@paramdeepsingh2123 2 місяці тому ⁺¹
thats how it all starts!! terminator...
@Joy-zz8wz 27 днів тому
AI doesn't operate with modulo 2 matrixes though! It's not actually relevant to AI computational speeds.
@wc447 11 місяців тому ⁺¹
"The right way to look at this ..." seems like a very short sighted / unimaginative statement
@macronencer Рік тому ⁺⁵
This is amazing stuff. Just one point of curiosity: why are modulo 2 matrices useful? Is there some real world application that uses just 0 and 1 in a matrix, which can benefit from a speed increase?
@Robbe1912 11 місяців тому ⁺²
everything computer related can pretty much break down into binary computations as far as I know. complex boolean logic algorithms. I hope somebody comments a real world application, I am interested too.
@arpitgaur4310 9 місяців тому
Now I got to know that how important it is for researchers to create UA-cam videos of their research because it's feels very daunting to read research papers and even with UA-cam videos because it gives more reachability and readability which will increase the citations it.
@backwashjoe7864 Рік тому ⁺²
“Don’t praise the machine.”
The Simpsons, S05E17
I hear a giant “for now” after each reassurance that we are not being replaced by these software.
@vardhmansingh3463 Рік тому ⁺⁶
Quanta 👌🏻👌🏻 Always make these awesome and informative videos. Some enlightenment everytime!
@sayyamzahid7312 Рік тому
Punit paaal Singh 😅
@johnchessant3012 Рік тому ⁺¹³
Awesome! But do these results extend to real number matrices? I believe most applications use real number matrices rather than mod 2
@nickfaire Рік тому ⁺⁴
No computer works with real numbers. Computers work with finite fields, as F2 (which is the same that working mode 2)
@myxail0 Рік тому ⁺¹
@@nickfaire floating point numbers if you wanna be that picky
@nickfaire Рік тому ⁺²
@@myxail0 First, I wasn't picky. I just explained with few words that real number matrices are not used by computers, but matrices over finit fields, and hence this results are helpuful. And second, floating-point arithmetic allows you represent real numbers while working with finite fields. What I said is as "picky" as you can be. What you said adds extra info, but does not correct me, because there is nothing to be corrected in what I said.
@myxail0 Рік тому ⁺³
@@nickfaire what you said wasn't helpful at all. There's a high chance the commenter used real numbers meaning floating point numbers. It's not sth outlandish, it's what they are for, for modeling real numbers. On the other hand, it's a huge step algorithmically from binary matrices to floating point matrices which is not trivial at all. Hencewhy the question you answered to is interesting, and your answer isn't. In my opinion.
@shoam2103 Рік тому ⁺⁴
@@myxail0 I think you're both correct, but from different perspectives.
Being based on binary (even floating points!), actual efficient and fast algorithms used are binary or a power of 2.
I realize you wanted to know from a practical perspective in terms of real world applications? The situation is, it can wildly vary between processor architectures. But discounting that, the real bottleneck is memory and bandwidth. Maybe this could help in theory, but I don't think at all, as long as the input and output sizes of the multiplication don't change.
@ip-dj7hb Рік тому ⁺²
So nice that google invests money in such fundamental research. Still, I think the video title is kind of click-bait because the method is only applicable to very specific matrices.
@VideoNOLA Рік тому ⁺²
Reminds me of the famous Seven Queens problem in Chess.
@tappetmanifolds7024 Рік тому
Eight Queens problem has a solution.
@conquererme Рік тому ⁺²
I think I've found a channel to binge watch.
@johanlarsson9805 Рік тому ⁺¹³
Yeah, I learned about this a few days after it happened. This along with AI speeding up the process for the design in the lenses that print the silica chips as well as the improvements it does to the transistor layout........ I'm thinking we can already say it is improving itself rappidly. These improvements will then lead to other improvements shortly- Here we go singularity!
@Soupie62 Рік тому
Circuit analysis using SPICE involves converting nodes in electronic circuits into values in a matrix. The node values change with frequency.
While this sounds good, limiting a matrix to values of zero and one would impact the ability to model frequency response.
@warpdrive9229 Рік тому ⁺¹⁰
It blew my mind when Deepmind discovered Strassen's Algorithm in just minutes.
@warpdrive9229 Рік тому ⁺⁵
@MorTobXD Good for you sir!
@icekoolkr5046 Рік тому ⁺³
Love this channel
@cosmicallyderived Рік тому
12:41 I don’t think people can be made irrelevant by any of this kind of work it just empowers people to do more.
That’s the right spirit, amen.
@foo0815 Рік тому ⁺¹²¹
Remember that matrix multiplication is the basis of neural nets, so AI is actually improving itself already!
@tolkienfan1972 Рік тому ⁺⁸
Not yet. The multiply it imporved is over GF2. The multiplies used in AI are finite precision "reals". Aka floating point. 😁
@AndreiFierbinteanu Рік тому ⁺²
Thanos: I used matrix multiplication to improve matrix multiplication
@vorpal22 Рік тому ⁺²
I was going to say exactly what @tolkeinfan1972 did... This only functions with entries in F_2 or GF(2) (same thing, different convention: F_2 means the unique finite field of two elements up to isomorphism, whereas GF(2) means the Galois field of two elements, and they are the same). In neural networks, the matrices used don't have entries from F_2: they have entries that are floating point numbers, and thus this improvement isn't applicable to neural nets.
@dieSpinnt Рік тому
@@AndreiFierbinteanu
Theranos: We used only one drop of your blood ...
But here, look at this nice and colorful presentation about matrix multiplication.
... oh and btw. we also burned all of your investment capital for that. Good bye!
@xl000 Рік тому
so what
you should have a look at how compilers compile earlier versions of themselves, and the trick they have to use in terms of the feature set of the first compillers.
This is too long to explain in a comment and it' s well documented in the Dragon book (one book on compilers)
@shrdshld 11 місяців тому
quaternion is used for 3d rotation, 16 multiplies versus 27. problem in matrices is floating point, composition of matrices increase level if imprecision quickly
@Leon_George Рік тому ⁺⁵
Correct me if I'm wrong, but I've always considered society to be pushed on by a few brilliant people, and then a lot of very intelligent people having their minds 'opened' let's say by a breakthrough that the likes of Da Vinci or Einstein developed. For example, as a very basic example, many secondary school pupils can get a good understanding of integration, but without Newton, none of them could ever have been able to perform such math.
in this way, a 'normal' or 'bright' person is accelerated by the one-in-a-century geniuses that pop-up every now and then.
Could AI act as a faster replacement for these geniuses, providing these new ideas and breakthroughs that accelerates what a 'bright' person could do?
Are we replacing our geniuses? (not that this is necessarily bad, just are we)?
@okhsunrog 11 місяців тому
AI is still just a tool and it can't be compared to human's mind. Not yet. A lot of work is done by humans that stand behind this discovery
@Enju-Aihara 8 місяців тому
dude without farmers and food producers nobody could make any ai
@samopalvampirenvonbutlegin8603 Рік тому
As long as this operation is numerically stable. Lots of algorithms lead to huge accumulation of operation-to-operation numerical errors.
@homerp.hendelbergenheinzel6649 Рік тому ⁺¹⁹
"The biggest threat that arises from AI is that some people will conclude too early that they have fully understood it."
@jondo7680 Рік тому ⁺³
You are probably talking about general ai, this thing can just do one thing. Find better algorithms to do the multiplication.
@homerp.hendelbergenheinzel6649 Рік тому
@@jondo7680 really, oh thanks for the explanation, I didn't knew that, because I thought it was a Skynet prototype. What would've we done without your crystal clear and precise analysis. Please delete your stupid comment and I will delete this one here, which I wouldappreciate. Maybe you can stay away from the Internet for a couple of days as well.
@beepboop6212 Рік тому ⁺⁷
@@homerp.hendelbergenheinzel6649 what the hell did he do to you, damn, you really have nothing better to do than diss internet strangers
@jondo7680 Рік тому
@@homerp.hendelbergenheinzel6649 the only thing stupid here is your 🧠 my friend.
@troyfrei2962 Рік тому ⁺¹
The Problem is all most ALL matrixs are NOT JUST 0 or 1! So your stuck with doing the FULL matrix multiplications. But good news AFTER you have done the FULL matrix multiplications those can be used as a look up table to solve the answer you are working on.
@cheesySamar 10 місяців тому ⁺⁶
Me who hasnt even graduated highschool watching video and reading all the comments/arguments like i know better
@MathematicFanatic Рік тому
Both this result and this presentation of it are absolutely spectacular
@joaowiciuk Рік тому ⁺¹³
The final comment on mathematicians being replaced by computers was very cliché
@myxail0 Рік тому ⁺²
it was a cliche question
@shannonbarber6161 7 місяців тому ⁺¹
There once was a job called "Computer".
@sspacegghost Рік тому ⁺¹
i was a maths savant in school - cant remember a single bit of it at 52. would have no idea how to sigma an equation. i might be able to do some basic algebra and work stuff out using common sense but its all gone...like the sands of time.
@ictogon 8 місяців тому ⁺¹
Ain't no way a 52 year old listening to sgp
@sahilgoel4777 Рік тому ⁺⁶
One can never imagine the success of Srinivasa Ramanujan if he had access to these type of computers. I wish he was still alive and contribute to our society. 😢
@DrRiq Рік тому ⁺²
02:56 a third fewer is 33% less, not 67% less
@minuspi8372 Рік тому ⁺⁴
This specific kind of technique isn't going to replace mathematicians, but other techniques, some that already work and just need to be scaled up, almost definitely will.
@ushi-l7z 3 місяці тому
I remember that this technique was taught to us when we did our math major in mid 90's
@user-francescob Рік тому ⁺³
Nice work DeepMind, and the humans who collaborated with he/her. While matrices of 1s and 0s are common in many situations, did you ask DM to work on a general solution for all matrices? I'm hoping you'll put DM on solving the Kronecker matrix multiplication next. It's resultant tensor grows much faster then the matrix cross-product - Linear vs. Square. Hence, a [4 x 4] x [4 x 4] cross product only results in a [4 x 4] resultant matrix. Where as a [4 x 4] x [4 x 4] Kronecker results in a [16 x 16] resultant matrix! We use this a lot in physics.
@piyh3962 Рік тому
Deep mind is a company and isn't gendered
@Anonymous-df8it Рік тому ⁺³
It's either "them" (the epicene) or "it" (the neuter)
@naysay02 Рік тому ⁺¹
thanks for the accessible yet detailed explanation. really cool
@alt3241 Рік тому ⁺³
Since I do not use these methods because of the obvious bottlenecks , this presentation shows me a root aspect of the tunnel vision plaguing industry . This motivates me to work out an explanation for a presentation of my methods .
@HarshitaSingh-l7t Місяць тому ⁺¹
Сразу видно, что вы профессионалы
@artieschmidt3039 Рік тому ⁺⁴
Thank you for your work! Best math and tech channel ❤
@debblez Рік тому ⁺¹
this video finally helped me understand tensors. also super cool
@MathOrient Рік тому ⁺¹⁰
Wow, this video is truly mind-blowing! Training the AI system AlphaTensor to explore faster algorithms for matrix multiplication is a remarkable feat with far-reaching implications. The potential impact on various disciplines is truly exciting to envision. Thank you for sharing this fascinating glimpse into the future of mathematical research and its potential real-world applications!
@novakonstant Рік тому
get out of here with this goofy ass chatgpt comment
@valiile 11 місяців тому
The algorithm to perform the quaternions multiplication gives it its compactness that would be available to matrix multiplication if it would use the same path of using the projection of the results on the hypercube and back to 3d cube in DeepMind's AlphaTensor. I bet a 6-pack of beer that it will improve the actual best result.
@p.j.882 11 місяців тому
MMULT function is matrix multiplication.... it's discussed in LINEST multiple linear regression completely explained in Google sheets.
@NataliaBazj 11 місяців тому ⁺³
«Wow! This algorithm uses fewer multiplications!» - «Aha, and it works only when multipliers are zeros or ones.» 😉
@Last_Resort991 11 місяців тому ⁺¹
It's funny cause it for sure adds more additions than the amount of multiplications it saves. This should make it slower than the regular algorithm?!?!
@tone618 Місяць тому
@@Last_Resort991addition is less computationally expensive than multiplication
@Last_Resort991 Місяць тому
@@tone618 Not if you multiply by zero or one.
@holyshit922 Місяць тому
O(n^3) is still polynomial complexity imagine if you have exponential or factorial
About the spelling in German V is read as F and their W is read as your V
St is read as Sht
@pyropulseIXXI Рік тому ⁺⁵
This is pure hyperbole; seasoned mathematicians haven't mastered matrix multiplication? Pure absurdity
@SmileTribeNetwork Рік тому ⁺²
Maybe what the author means to say is that they have yet to find the most effective method of multiplying matrices.
@StephenMcGregor1986 Місяць тому
This could potentially do wonders for video codecs
@saliherenyuzbasoglu5819 Рік тому ⁺⁷
your videos are great! Lot of science videos on youtube are bullshit filled with stock photos and does not even explain the concept that title says
@argandzero0 Рік тому
I'd say it more impressive what Strauss did but kudos to the team over at Google. ;)
@QuantaScienceChannel Рік тому ⁺¹⁶⁹
You can read more about this work on Quanta's website: www.quantamagazine.org/ai-reveals-new-possibilities-in-matrix-multiplication-20221123/
Correction: At 2:53 in the video, the text previously read "67% less" but has been changed to "67%" for accuracy.
@simplicitas5113 Рік тому ⁺¹
Error in percentage of reduction at ca 3 minutes. You meant 33 % reduction for 8 rows.
@LucenProject Рік тому ⁺¹
11:50 What is the second word on the screen?
@Flovus Рік тому ⁺¹
@@LucenProject It's the initials of the team that published the paper "Discovering faster matrix multiplication algorithms with reinforcement learning".
@Flovus Рік тому
3:47 Why is the matrix broken into four 4x4 matrices instead of breaking it into 16 2x2 matrices?
When I perform the new algorithm recursively on an arbitrarily large (N x N) matrix, I'd break it into 4*4=16 matrices of dimension (N/4)x(N/4). These "blocks" are then combined exactly the way the scalars in a 4x4 matrix are combined with 47 multiplications, only here the multiplications are performed recursively. This means that the input size N is divided by four in each step of the recursion and the runtime is N^{log_4(47)} ~ N^{2,777} by the Master theorem.
Doing it the other way around as shown in the video would be performing Strassen on the topmost call (7 multiplications of problem size N/2), am I wrong?
@LucenProject Рік тому
@@Flovus thanks!
@themultiverse5447 5 місяців тому ⁺¹
(Title of ) video starts @3:55
@achatinaslak742 Рік тому ⁺⁴
Very interesting ! It went from 2-D to 3-D. Is it possible to use a 4-D space or higher, to reduce the number of steps even more?
@shoam2103 Рік тому ⁺³
A 3d tensor is used because it's multiplication of 2 2d tensors. So for higher dimensions you need to multiply higher order objects, or do something else. 4d space is naturally suited for 3d rotations for example..
@shoam2103 Рік тому ⁺¹
You might be interested in geometric algebra and projective geometry too
@theepicturtleman Рік тому ⁺¹
@@shoam2103 I’m a bit confused. Are you saying when you multiply 2 matrices in R^2 you can a matrix in R^3? I don’t believe that’s correct
@Atomchild Рік тому ⁺¹
This discovery stands to reason that it should be applied to the study of file compression.
@4Gehe2 9 місяців тому ⁺³
I hate the whole "There are more options than atoms in universe..." etc. Because whenever that is brought up, it tends to be that basically everything but few get eliminated very quickly when basic rules get introduced in to the system. If you got a huge floor and need to make mosaic on it, your possibilities are absurdly huge if you just fill it with random tiles, but soon as you set any kind of rules or boundaries the possibilities start to reduce quickly, until a limit between "there is one solution" and "there is no solution" is reached.
@StephenC555 Рік тому
If it is only for some cases, that may not be really useful in application. It is just like you are adding an "If" condition to check whether for a case should use the old approach or use new approach to get a relatviely faster result, that checking logic may also time consuming, and that time consumed can in fact simply spend on solving the calc directly, and also, this will only adding complexity to the straightforward calculation solution.
@tomkhxde3495 Рік тому ⁺⁵
Correction: the strassen alg has 33% less operations, instead of 67% at 2:57. if 512 is 100%, 343 is 67 then it's 33% less.
@purringraven3648 Місяць тому ⁺¹
Strassen's algorithm is still far from obsolete. This method is only good for limited matrices with 1s and 0s
@profdc9501 Рік тому ⁺⁶
If one is just multiplying two 2x2 matrices with ones or zeros, there's only 8 combinations of those, and so you can just use a 256-entry, 4-bit lookup table and get the answer in O(n^2) time. Then you just divide into blocks. It doesn't seem like the case of elements being zero or one is that interesting.
@Last_Resort991 11 місяців тому
You don't need lookup-tables, this can be implemented directly into hardware and most alu (cpu) should have it.
@moestietabarnak Рік тому ⁺²
ah, I'm a bit confused... doing matrix multiplication when you have only 0 or 1 can be done without any multiplication, you just replace the multiplication operation with 'AND' gate, you know 0x0=0 0x1=0, 1x0=0, 1x1=1...
@Last_Resort991 11 місяців тому ⁺¹
Yep, can even be implemented directly into hardware.
@FractalBob Рік тому ⁺⁷
Will a computer make mathematicians obsolete? I'll believe it when an AI resolves the Riemann Hypothesis.
@NeovanGoth Рік тому ⁺⁵
Then you'd still need mathematicians who are able to understand both the algorithmically generated proof, and well as the algorithms themselves. AI will always automate some tasks, while at the same time create new tasks. The outcome won't be less human work, but higher efficiency for the same or even higher amount of human work.
@justkem3834 Рік тому
The smile at 11:20 is devious.
@Blackwhite2277 Рік тому ⁺⁴
It empowers top-notch people to do more. Everyone else is kind of screwed
@ariadame102 Рік тому ⁺¹
Nothing new. Only top-notch people bother to study enough of the cutting-edge to be empowered by it, the rest don't. Tale as old as time. Nothing has changed, despite easy access to massive amount of information.
@nathanderhake839 Рік тому ⁺²
I don’t see why we want to reduce the number of multiplications for binary numbers. Computationally, multiplying 2 bits should be the same difficulty, if not, easier, than adding 2 bits.
@Last_Resort991 11 місяців тому ⁺¹
That's probabl the reason why nobody found this genius algorithm in 50 years; noone on their right mind would search for a algorithm to replace binary multiplication with several binary additions to slow down their program.