Indeed. To not only admit, but actually welcome, verifiable new information that unseats old hypotheses is the hallmark of good science. I have no doubt that Kolmogorov carefully analyzed Karatsuba's proofs before fully accepting them (as any wise researcher would), but once he had confirmed their validity, he had the intellectual courage and integrity to embrace them. A scientist is not diminished when their hypotheses are disproved, because that is how we evolve the body of human knowledge, but some will diminish themselves by refusing to accept this with grace.
Props to Kolmogorov, he could have sent the paper in his own name without giving credit to an unknown student and take all the merit. The academic world is sometimes ruthless
@@marcnye9221 great that you've got that impression, but the reality is that more and more university professors are favouring producing quantity over quality of papers so they can earn "prestige", and the students are used as free labour to support that.
Sure B Gates gave credits to computer scientist, sure, B Gates is well known for it. In reality however Gates stole almost everything and forced many people over the edge to afterlife. Dr Gary Kildall his Wikipedia can enlighten you how B Gates haircut fooled him when B Gates stole his 10 year of work developing an operating system and the BIOS all computers once had.
Read the comment section on any WW2 obscure event which has an insignificant effect on the war. "Why didnt I learn about this in school? Clearly it's a conspiracy against America!" I know youre joking, but that attitude is so common.
@@jakewalklate6226 Spoken like a true mathematician. "Clearly Stalin invaded at the point due to numerical superiority over Finland. What about mathematical history? I'm still never entirely sure why we use 360 degrees apart from ease of use and something about Babylonians.
I love how you bring nearly-unreachable knowledge to the community through interesting and easy-to-understand videos. I would never know this bit of theoretical CS otherwise. Keep up the good work!!!
& I also £♡✌️€ , how I'm studying Russian language, ,& this Shows me RUSSIAN letters, names, historical events & People! Spasibo. Спасибо. (Thank You).
For anyone curious at 13:38 N^1.6 is used as an approximation. It's really N ^ log base 2 of 3. If you want to enter it into a calculator use the change of base formula. Log(3) / Log(2)
Well, every O(n^log2(3)) algorithm is also an O(n^1.6) algorithm, so the video is fully correct in approximating that number while not labeling the whole thing as approximated. Though I personally do like to state bounds like that exactly. Θ notation is a good way to do that (it just means both O and Ω).
taking log2 of 3 to be 1.58 (it's not, it's much closer to 1.6, I've added about 30% difference here) this difference doesn't break 10% until N=194, in base 10 that's; 10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 it doesn't break 5% until N=13, or one trillion in base 10, so the discrepancy grows at a painfully slow logarithmic curve; 1,000,000,000,000 For all human use cases, N^log2(3) = N^1.6
Between Karatsuba and FFT there is a Toom-Cook algorithm, from 1963-66. As FFT, it treats both numbers as polynomials, evaluate the values naivly in some points (for small numbers! Like 0,+-1, -2,+inf), multiply them and then interpolate it back to polynomial form. "2 way" toom-cook recreates Karatsuba. The original "3 way" and "for way" have the complexity O(N^1.465) and O(N^1.404). The GMP library (a hefty library for big numbers) uses naive, Karatsuba, "3","4","6.5" and "8.5-way" toom-cook, and fft, using each algorithm for numbers of different lengths.
It's amazing how a simple problem like multiplication can devolve into such complex mathematical discoveries. Who would have thought that multiplying optimally is insanely more difficult than adding.
I set out, one afternoon, to write a large number library, just for my own edification. When I got to division, I realized I didn't actually know how to program a computer to divide, other than using built in divide. Sometimes simple things are not as simple as they seem :)
@@AntoineViallonDevelloper Euclid's algorithm is an efficient method for computing the greatest common divisor (GCD) of two integers (numbers) but it uses division itself, so isn't useful to Michael.
Even if the lower bound is Ω(N × log N), there is still mathematical progress to be made (or disproven) in finding an algorithm which is that efficient with smaller and smaller inputs.
@@Ruhrpottpatriot although memory is cheap it's not available in everything. for example if we wanted to use this method we would run out of memory in an Arduino quickly, but i do agree that if we have the memory to spare then this would be the fastest solution.
14:47 That was honest of Kolmogorov. I have met a few people in my career who would pretend to have done work which was actually done by someone else. They would then take the credit for the other person's work.
17:20 note that also loglogN is practically constant like the k^log*(n) since loglog(N) where N is the numbers of atoms in the observable universe is around 8. If N is the number of atoms in the observable universe then loglogN is actually smaller than 4^log*(N).
@@daldi5211 in practice in IT we would use base 2 as we can approximate it by counting bits from the ones bit (which we count as zero) up to the largest bit with a value of 1. However there is a theorem that states that to change a log from one base to another we can multiply by a constant that depends only on the two bases. And we know from earlier that we can ignore constant multipliers. So you can apply this rule in any base you like and it still works.
Fun fact: When computers are multiplying whole numbers, the compiler will often optimize the code, so it doubles (or halves) the number one or more times (which is a single operation in the computer, known as bit shifting) and then add or subtract a konstant to achieve the result. So a code of x * 9 (which is (x * 8) + 1) would be compiled as an equivalent to (x
very interesting, do all compilers do this? is there a way to force this specific method if i notice the compiler isnt doing it? Also, i think you meant to write (x * (8+1)) or even more descriptive, (x * (2^3+1))
@@hoane6777 In general, no, you can't force a compiler to do something not specified by the language. You have to write the code the way you want it, and that's almost always a bad idea for maintainability. Also, if you're working with numbers big enough for calculation speed to matter, the compiler won't know to optimize anything, because the calculations will span multiple variables of the largest builtin type (e.g. 64 bits). If you're working on cryptographic code, you need to worry about how the calculations are performed for more than just speed. If a calculation takes a different number of steps depending on the value of a key, for example, that weakness can be used by attackers to retrieve the value of the key.
I was astonished to learn that my C64 didn‘t have a Mult opcode, and to multiply any number by for example ten, it had to multiply by eight (shift three times) and then add the input value two times. What a hassle!
@@DasHemdchen After the shift left 3 bits, it didn't shift the original left 1 bit to get a double to add to the eight to arrive at ten? Weird... or shift left 1 for double, then shift that result 2 to get 8x. If you've come across the binary exponentiation algorithm (square and multiply), you could use successive bit shifting and addition to arrive at multiplication for arbitrary numbers. As a bonus, if you want the answer mod some n, you can just subtract n after each single bit shift left that yields n or greater, instead of programming division.
Wonderful video :) I am writing an Algorithms exam next week and wanted to take a break from learning but ended up learning about the algorithm more than in my lecture and in a more exciting and relaxing way. Thank you for this masterpiece and wonderful editing!
I have to say, I can't even remember how many times I have learned Big O notation already, but it's the first time in my life I heard about Linear speedup theorem. Like, it suddenly explained everything. I suddenly understand why the linear magnitude does not matter.
I have just discovered this channel and the animations and the gradients are so beautiful, the content, so mesmerising, that I instantly subscribed. Thank you.
Content like this is why I still pay my internet bill. Thoughtfully presented, beautifully explained, and utterly fascinating even to a cynical math-o-phobe like me. Eighteen minutes well spent. I look forward to future content as a new subscriber. Bravo!
what an incredible video, you have a real talent for getting at the core of these ideas and showcasing the clear arguments which easily get muddled by technicalities
I have seen fast multiplication on Commodore 64 (6502 processor without a built-in multiplier) based on a similar idea. a*b = ( (a+b)/2 )^2 - ( (a-b)/2 )^2. For all possible values of a+b and a-b, the square of a half is precalculated in a table; so for 8-bit numbers, 512 precalculated table entries are needed. This is easily a few times faster than trivial multiplication.
@@Nemean I just knew from the previous one (Quake Inverse Sqrt Algorithm), and damn this video was really great. It had some really unexplainable feel at the end (all the multiplication-algorithms and their runtime). It was super informative and very interesting in fact. 👍👍👍👍👍👍👍👍👍👍👍👍👍
Fascinating algorithm and historical context. Thanks for sharing this and for explaining it so lucidly. For those who aren't old enough to remember the old days of computing, one of the reasons multiplication was of such interest is that early CPUs did not have a multiply instruction in hardware. They relied on repeated addition, so if you wanted 58 * 37 it was computed as 58 + 58 + 58 ..... (37 times total), or vice-versa. I'm not sure if the first computers even had the hardware smarts to swap the numbers so they added the larger number a smaller number of times. Repeated addition is often even slower than the O(N**2) elementary school algorithm, so computer scientists were eager for anything that could improve upon that. Also for the non-computer folks, Nemean makes the comment that subtraction is essentially the same problem as addition. You know from grade school that subtracting N is the same as adding -N, of course, but it might occur to you that -N is defined as -1 * N, which seems to imply a hidden multiplication step. Fortunately, since computers work in binary, we avoid that by using the "twos complement". In binary, this means flip every bit of the original number, which gives you the "ones complement", then add one. Adding the twos complement of N to another number, say M, is the same as computing M - N. Here's an example using 8-bit integers, a common size for early CPUs, to compute 100 - 35. 100 is 64 + 32 + 4, or 01100100 binary. 35 is 32 + 3, or 00100011 binary. Take the ones complement of 00100011 to get 11011100, then add one for the twos complement of 11011101. Adding 01100100 to 11011101 gives (1)01000001. The parentheses are around the carry bit, which in this situation we ignore (see note below). 01000001 is 64 + 1, or 65 decimal, the answer we expect. Even in very early computers, the operations to invert every bit (ones complement) and to add one (increment) were single hardware instructions, so the twos complement took at most two steps (and some CPUs had a single instruction to combine them). So subtraction, even on an early CPU with no subtract instruction, was not significantly more difficult than addition. The use of twos complement binary arithmetic does imply a need to keep track of that leftmost bit and being aware of whether it is being used as a sign (1 for negative, 0 for positive) or simply as another binary digit. Programmers can define "signed integers" which cut the value's range in half but allow negative numbers, or "unsigned integers" which allow the full range but cannot be less than zero. For instance, a 16-bit unsigned integer can be 0 to 65535, inclusive, while a 16-bit signed integer can instead be -32768 to +32767, inclusive. The CPU hardware, generally, handles the raw bits the same, but the programming language and compiler help the programmer avoid misinterpreting the data. I hope this side-trip into computer history and binary math is useful to readers who aren't computer specialists.
To give an example, the Mostek / Rockwell 6502 (of Apple II fame) had add and subtract but no multiply or divide instructions, but the Motorola 68000 (Macintosh) had them. These chips hit the market only 4 years apart.
9:30 and 16:00 I think it would've been better if you used actual numbers and showed a practical example of the calculation instead of empty digit boxes/partially filled circle shapes, it would be easier to keep track on and follow what you're talking about. Since the video started with practical examples for the easier algorithms I also was expecting practical examples for the more complicated algorithms. Having to follow where you put which blank box or which abstract circle is filled by how much and trying to find out why you gave the circles these fill values while at the same time also trying to listen to what you are saying is rather irritating.
I love this subject, mainly because it is both quite recent and revolutionary, in a way, as well as rather easily understood by a teenager. Every now and again I talk about it to my students, and it is usually well received.
Thanks for explaining this. I vaguely remember running into this algorithm, but discarded it because it recursed without really reducing complexity. After watching your explanation, I realized that if I restrict the recursion depth, I might get something usable.
@@AffidavidDonda ?? As if Lenin is at all praiseworthy?? I’m sure his black charcoal of a heart is still providing fuel for the fires of “oh hell” tho…. It’s for the despicable evil, deliberately propagated like deadly contagions still infecting the minds the of the vulnerable, mentally weak, and those victims with “compromised intellectual immunity” who had their natural defenses of logic, reason, and objective observation castrated by atrophy, shriveled and withered like undesirable testicles on the proverbial farm hog, resulting from the constrictive rubber bands of indoctrination posing as education by Marxist operatives posing as teachers, all susceptible and succumbing to the mental viruses created and propagated by Marx, Lenin, and the rest of the monsters of yesterday and today, that cause lapses in my Agnosticism to pray that there is a heaven for some and a well deserved hell for others.
This is such an excellent lecture and presentation, you make a great professor! I'm saying this as a computer scientist. It was educating and entertaining at the same time. Kolmogorov was a genius, albeit not infallible. We are standing on the shoulders of giants. Thanks for educating us! I learned new stuff from this video.
The cadence and tone of your voice is very pleasant to listen to. It reminds me of the JCS channel. Thank you very much for teaching me with such detailed and illustrative information.
14:13 You say "It's ONLY application is cryptography" as if that's a weakness or flaw. The fact that it can offer real optimization on modern hardware for a certain subset of applications is awesome and amazing and valuable. Also, even if it had NO practical applications, it's still awesome because it disproved a prevailing theory. Even if it didn't do that, it demonstrated a new kind of algorithm. It did THREE amazing things. And you say, "I dunnno" :D
Just a heads up - there's a terminology mistake at 1:00 . "Addition" should be translated as "Сложение" in Russian, while "Дополнение" in English would be "Complement" term from Set Theory.
I feel like when you were talking about big O, there were some big aspects you missed. In particular, one of the big reasons big O is important is that it better measures how an algorithm scales to extremely large inputs. While the big O might not be able to tell you an exact runtime, it can tell you how that runtime changes when you change the input. For example, for an O(n) algorithm, doubling the size of the input make it take twice as long as before, while an O(n^2) algorithm will take 4 times as long, and an O(log n) algorithm will only take a constant amount of time more. The ways that different algorithms scale tends to be more important than any constant factors when n is extremely large. For example, the runtime of an O(n) algorithm might be like 10n, while an O(n^2) algorithm might be n^2/10; with small n, the O(n) algorithm is slower due to the high overhead (for n=4, the first algorithm is 40 while the second is 1.6), but as n increases, the difference in powers rapidly overcomes the constant factors (for n=10,000, it’s 100,000 versus 10,000,000, so the first is a hundred times faster). That’s why we talk about big O in algorithms; when the input is big enough that runtime is a concern, that’s what gives you a real idea of the runtime.
An O(log n) algorithm isn't constant, but would be proportional to natural logarithm, O (n log n) is more feasible as just processing the input is O(n).
@@RobBCactive I wasn’t saying log n was constant time; I was saying for a log n algorithm, doubling the input length would increase the time by a constant amount, since log 2n = log n + log 2.
@@KnakuanaRka No you specifically said, "and an O(log n) algorithm will only take a constant amount of time more", read your post adding log 2 or log 3 to log n for factors of N is NOT a constant
@@RobBCactive Well, if log n is the runtime for input of length N, adding log 2 for the runtime of 2N is effectively a constant amount more (since it doesn’t depend on N); what’s the problem?
i kind of understood this. thank you for this video. I often come back to your first video when i need inspiration how to change way of thinking when i search for answer.
In the time when Kolmogorov was at the age of Karatsuba (when they met) there was no Fast Fourier Transform, but on the other hand Parseval theorem was already stated in the 18th century - kids read the books and study them ! :)
Can we all agree that this person should make more videos? Explaining complex computer science and mathematics concepts in simpler terms is something we all need in our lives.
If you do the FFT in _modular_ arithmetic (which uses only integers) you get multiplication with no rounding error because you don't need to worry about floating-point arithmetic. The algorithm is the same.
I'm glad I clicked on this video, the thumbnail made it look like it was going to be a dumb math method that still works but overcomplicates things/does the exact same thing the normal method does but displayed slightly differently but clickbaited as "a new faster way to do math" but it turned out to actually be more efficient (as the length of the numbers increases)
This whole video is incredibly interesting and explains lots of things very well, but I am laughing so hard at 17:00 . The deadpan delivery of that line “log star of the number of atoms in the universe... is five.”
Hmmm…reminds me of another algorithm dealing with linear programming (LP). LP is theoretically a N^x steps where x is the number variables. There is a Russian algorithm that has O(N) steps, but the slope of T (time) is so steep it might as well be a quadratic equation. I wrote a paper on this 40 years ago for one of CS Master’s classes after reading about it in a programming journal. The math was so obscure (or maybe the Russian was so obscure) that I had to go back to the original paper to get the algorithm correct. It was a fun project, as I was really interested in linear programming at the time. Seems I fell in love with CAD, though.
I looked at this video on a random whim and I'm glad i did! Very well explained video on a topic that can be difficult to follow. As others have mentioned, practical examples may have worked better than the colored blocks used, as this would allow the audience to follow along in an easier fashion. Thanks!
Another reason computer scientists only examine order of algorithm time is because of asymptotic behavior. If an algorithm has a lower order than another one, then there is a sufficiently large problem for which it will run faster. So no matter how good your linear constants, eventually a slow (high order) algorithm would lose. Also, as we progress socially usually we seek to solve larger problems, so it merits investigating orders. On the other hand, precise run times are not useless. But you always assume a machine model. As you noted, if your machine model is not super precise (like the 64-bit vs 1-bit addition), then your results may be useless in practice. The specificity can be so large that it's not worth it to tune something that no one will ever use -- it makes more sense to just do that when you have a concrete machine and important problem to solve. That's essentially software engineering/computer programming -- it's important, but it's specific. In mathematics and academia the idea is usually to generalize and provide lasting value, versus engineering is solving specific problems (which you can publish as well in the hope it will be useful to similar machines). One interesting case of very useful precise run times is in analyzing digital circuits. Almost every real world implementation of digital circuits uses similar implementation (binary gates with minor differences in area). So I believe there is detailed study of how many digital gates you need to achieve some operation (very interesting and important field).
Great video! And as a Russian-speaking person I want to notice that mathematical operation "addition" is called "сложение" not "дополнение". The term's you used meaning is "a minor member of a sentence, usually expressed as a noun". Best Regards!
дополнение means "addition" in the sense of supplement or expansion (i.e., it would be used in sentences like "the addition of a new terminal to the airport", or "with added vitamins").
At first I was like "Meh, just show me the algorithm, I don't want to watch such a long video," but it is so well-made I can only say thank you for your work!
I'll be honest, I usually hate long videos too. Luckily there was this contest and I made a 1 minute version of my main video for it: ua-cam.com/video/LCY4dnm88oI/v-deo.html . If you want to share Karatsuba's algorithm with a friend or whatever, that video is a good teaser.
You said that this algorith is not "really recomended" for "popular" use, but coincidentally, last week I saw I probability class on youtube where the teacher tells an "easy way" to multiply 2 digits number by 11 without previously memorizing its results: whatever 2 digits number multiplied by 11 results in a 3 digit number which first and last digits are the same original first and last digit, and the digit in-between is obtain by adding them, really simple (and if the addition has a carrier you just add it to the first one - here reading from left to right)... seen your video I have realized why works and also that can be extended to every 2x2 digit multiplication.... I recomend you edit the video and incorporate this "eleven case example" because of its simplicity (since you always multiply digits by "one", and maybe another case as counterexample), but this "11 rule" is just amazing. Try it by yourself.
Excellent video. I was always fascinated by these "dynamic programming" style algorithms. They appear everywhere. Of course I'm just a rando on the internet, but I think such a video on the evolution of matrix multiplication would be equally if not more interesting. Strassen (the same one you mentioned) played a role there too, and a similar (sort of) trick is used in that case. However such a video would probably be extremely long since it's still an open question. Then again, even this video covers an impressive scope.
You're not the only one who recommended matrix multiplication, but I'm not so convinced. Strassen's algorithm is a similar idea to Karatsuba's (instead of 3 digits expressed as a combination of 3 products, it's 8 matrices expressed as 7 products) and the more sophisticated optimisations use tensors heavily and I don't know how to easily explain that (yet). I'll have to think about it.
@@Nemean I already read that in your soothing voice, so please think some more about it ;). Subbed your channel btw, eagerly awaiting more content. You rock!
I love the super subtle humor here, hahaha. The explanatory visuals are also great even though I feel it still quickly becomes a lot to digest for us lower peasants. YT needs more of this. Thank you.
i always knew that constants are avoided during calculation of time complexity, and now you have explained the reason pretty well. Thanks 😊🙏 My teachers: They are avoided Period 😑😑
The moment i saw 1729 i screamed Ramanujan!!.... That's the smallest positive integer that can be written as the sum of 2 cubes in 2 different ways!! 12³ + 1³ = 1729 = 10³ + 9³ Now I don't know about why it showed up here but i was just excited by the observation :D
Kolmogorov is one of the coolest men I've heard of. Admitting defeat and then anonymously supporting the kid. wild
Really curious people wants their ideas to be scrutinized. They seek knowledge.
Soviets and their communism.
nowadays you will get "researchers" sponsored by pharma/oil/any companies.
I know of Kolmogorov mainly from my work in statistical analysis. There he is, basically, a god.
Unthinkable nowadays
Nowadays he would have canceled him and his career for the „crime“ of being right
Indeed. To not only admit, but actually welcome, verifiable new information that unseats old hypotheses is the hallmark of good science. I have no doubt that Kolmogorov carefully analyzed Karatsuba's proofs before fully accepting them (as any wise researcher would), but once he had confirmed their validity, he had the intellectual courage and integrity to embrace them. A scientist is not diminished when their hypotheses are disproved, because that is how we evolve the body of human knowledge, but some will diminish themselves by refusing to accept this with grace.
Props to Kolmogorov, he could have sent the paper in his own name without giving credit to an unknown student and take all the merit. The academic world is sometimes ruthless
@@marcnye9221 Corporate is eroding that, now quicker than ever.
@@marcnye9221 while in engineering... :/
@@marcnye9221 great that you've got that impression, but the reality is that more and more university professors are favouring producing quantity over quality of papers so they can earn "prestige", and the students are used as free labour to support that.
Sure B Gates gave credits to computer scientist, sure, B Gates is well known for it. In reality however Gates stole almost everything and forced many people over the edge to afterlife. Dr Gary Kildall his Wikipedia can enlighten you how B Gates haircut fooled him when B Gates stole his 10 year of work developing an operating system and the BIOS all computers once had.
Incorrect. I can name atleast one Ruth in the academic field (Ruth Aaronson Bari)
I think the fact we don't teach fast fourier transform in elementary school says a lot about society.
We should replace the early education curriculum with theoretical computer science and graph theory
Read the comment section on any WW2 obscure event which has an insignificant effect on the war. "Why didnt I learn about this in school? Clearly it's a conspiracy against America!"
I know youre joking, but that attitude is so common.
@@letsburn00 well there will be no history at all once I’m done with it, mathematics only
@@jakewalklate6226 Spoken like a true mathematician. "Clearly Stalin invaded at the point due to numerical superiority over Finland.
What about mathematical history? I'm still never entirely sure why we use 360 degrees apart from ease of use and something about Babylonians.
@@letsburn00 the use of 60 and 360 is because 60 is divisible by a lot of numbers. 1,2,3,4,5,6,10,12,15,20 and 30. Easy for calculator-less times.
I love how you bring nearly-unreachable knowledge to the community through interesting and easy-to-understand videos. I would never know this bit of theoretical CS otherwise. Keep up the good work!!!
& I also £♡✌️€ , how I'm studying Russian language, ,& this Shows me RUSSIAN letters, names, historical events & People!
Spasibo. Спасибо. (Thank You).
For anyone curious at 13:38 N^1.6 is used as an approximation. It's really N ^ log base 2 of 3. If you want to enter it into a calculator use the change of base formula. Log(3) / Log(2)
Well, every O(n^log2(3)) algorithm is also an O(n^1.6) algorithm, so the video is fully correct in approximating that number while not labeling the whole thing as approximated.
Though I personally do like to state bounds like that exactly. Θ notation is a good way to do that (it just means both O and Ω).
taking log2 of 3 to be 1.58 (it's not, it's much closer to 1.6, I've added about 30% difference here) this difference doesn't break 10% until N=194, in base 10 that's;
10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
it doesn't break 5% until N=13, or one trillion in base 10, so the discrepancy grows at a painfully slow logarithmic curve;
1,000,000,000,000
For all human use cases, N^log2(3) = N^1.6
This is what I studied in my 200-level, 300-level, and 400-level computer science algorithms class. Good explanation!
Between Karatsuba and FFT there is a Toom-Cook algorithm, from 1963-66. As FFT, it treats both numbers as polynomials, evaluate the values naivly in some points (for small numbers! Like 0,+-1, -2,+inf), multiply them and then interpolate it back to polynomial form.
"2 way" toom-cook recreates Karatsuba. The original "3 way" and "for way" have the complexity O(N^1.465) and O(N^1.404). The GMP library (a hefty library for big numbers) uses naive, Karatsuba, "3","4","6.5" and "8.5-way" toom-cook, and fft, using each algorithm for numbers of different lengths.
uhm waht? :sweat_smile:
It's amazing how a simple problem like multiplication can devolve into such complex mathematical discoveries. Who would have thought that multiplying optimally is insanely more difficult than adding.
Would you really not expect multiplication, which is basically an extension of addition, to be harder to optimize, than its more basic "counterpart"?
@@SnakeTwix Yes, but not that much harder.
I set out, one afternoon, to write a large number library, just for my own edification. When I got to division, I realized I didn't actually know how to program a computer to divide, other than using built in divide. Sometimes simple things are not as simple as they seem :)
@@michaelbauers8800 just use Euclid's algorithm for integers.
@@AntoineViallonDevelloper Euclid's algorithm is an efficient method for computing the greatest common divisor (GCD) of two integers (numbers) but it uses division itself, so isn't useful to Michael.
Kolmogorov also advanced the study of fluid flow turbulence so much that they named a constant after him and still refer to his work to this day!
Kolmogorov have done a lot for statistics and random processes
Even if the lower bound is Ω(N × log N), there is still mathematical progress to be made (or disproven) in finding an algorithm which is that efficient with smaller and smaller inputs.
Look-up table!😜
@@icollectstories5702 O(1) multiplication?
@@icollectstories5702 it's fast but not memory efficient.
@@diamondcreeper0982 You always have the trade-off between speed and memory and as it currently goes, memory is cheap.
@@Ruhrpottpatriot although memory is cheap it's not available in everything.
for example if we wanted to use this method we would run out of memory in an Arduino quickly, but i do agree that if we have the memory to spare then this would be the fastest solution.
The legend is back
14:47 That was honest of Kolmogorov. I have met a few people in my career who would pretend to have done work which was actually done by someone else. They would then take the credit for the other person's work.
yeah you know, people learned the lesson after all the Leibniz/Newton kerfuffle over calculus
17:20 note that also loglogN is practically constant like the k^log*(n) since loglog(N) where N is the numbers of atoms in the observable universe is around 8. If N is the number of atoms in the observable universe then loglogN is actually smaller than 4^log*(N).
What base do you use for the log?
@@daldi5211 2
@@daldi5211 in practice in IT we would use base 2 as we can approximate it by counting bits from the ones bit (which we count as zero) up to the largest bit with a value of 1.
However there is a theorem that states that to change a log from one base to another we can multiply by a constant that depends only on the two bases. And we know from earlier that we can ignore constant multipliers.
So you can apply this rule in any base you like and it still works.
it's not a theorem, it's just a simple property deduced from the definition of the logarithm :)
Fuck, I actually checked it for log(log(10^82)) and it truly does round to 8. Granted I did it in my mind, but it does check out.
I loved fast inverse square root and finally you've released some more videos! Makes my day. Take however long you want, they're worth it.
Fun fact: When computers are multiplying whole numbers, the compiler will often optimize the code, so it doubles (or halves) the number one or more times (which is a single operation in the computer, known as bit shifting) and then add or subtract a konstant to achieve the result.
So a code of x * 9 (which is (x * 8) + 1) would be compiled as an equivalent to (x
very interesting, do all compilers do this? is there a way to force this specific method if i notice the compiler isnt doing it? Also, i think you meant to write (x * (8+1)) or even more descriptive, (x * (2^3+1))
(x * 8) + x
@@hoane6777 In general, no, you can't force a compiler to do something not specified by the language. You have to write the code the way you want it, and that's almost always a bad idea for maintainability. Also, if you're working with numbers big enough for calculation speed to matter, the compiler won't know to optimize anything, because the calculations will span multiple variables of the largest builtin type (e.g. 64 bits). If you're working on cryptographic code, you need to worry about how the calculations are performed for more than just speed. If a calculation takes a different number of steps depending on the value of a key, for example, that weakness can be used by attackers to retrieve the value of the key.
I was astonished to learn that my C64 didn‘t have a Mult opcode, and to multiply any number by for example ten, it had to multiply by eight (shift three times) and then add the input value two times. What a hassle!
@@DasHemdchen After the shift left 3 bits, it didn't shift the original left 1 bit to get a double to add to the eight to arrive at ten? Weird... or shift left 1 for double, then shift that result 2 to get 8x.
If you've come across the binary exponentiation algorithm (square and multiply), you could use successive bit shifting and addition to arrive at multiplication for arbitrary numbers. As a bonus, if you want the answer mod some n, you can just subtract n after each single bit shift left that yields n or greater, instead of programming division.
Wonderful video :) I am writing an Algorithms exam next week and wanted to take a break from learning but ended up learning about the algorithm more than in my lecture and in a more exciting and relaxing way. Thank you for this masterpiece and wonderful editing!
I have to say, I can't even remember how many times I have learned Big O notation already, but it's the first time in my life I heard about Linear speedup theorem.
Like, it suddenly explained everything. I suddenly understand why the linear magnitude does not matter.
I have just discovered this channel and the animations and the gradients are so beautiful, the content, so mesmerising, that I instantly subscribed.
Thank you.
Content like this is why I still pay my internet bill. Thoughtfully presented, beautifully explained, and utterly fascinating even to a cynical math-o-phobe like me. Eighteen minutes well spent. I look forward to future content as a new subscriber. Bravo!
what an incredible video, you have a real talent for getting at the core of these ideas and showcasing the clear arguments which easily get muddled by technicalities
I'm not into computer science or even math, yet still here to watch video until finished.
Hahaha I loved your inverse FFT notation. Well played, well played.
I have seen fast multiplication on Commodore 64 (6502 processor without a built-in multiplier) based on a similar idea. a*b = ( (a+b)/2 )^2 - ( (a-b)/2 )^2. For all possible values of a+b and a-b, the square of a half is precalculated in a table; so for 8-bit numbers, 512 precalculated table entries are needed. This is easily a few times faster than trivial multiplication.
Wow. Interesting.
Omg he published again!!!! The god returned yeeeesss
Your content is so high quality, can't emphasize this enough.
How do you know? You commented 2 minutes after the video got published, there's no way you have watched it all yet. Maybe my video sucks.
@@Nemean it slapped bro
@@Nemean I just knew from the previous one (Quake Inverse Sqrt Algorithm), and damn this video was really great. It had some really unexplainable feel at the end (all the multiplication-algorithms and their runtime).
It was super informative and very interesting in fact.
👍👍👍👍👍👍👍👍👍👍👍👍👍
@@notbob9865 Thanks :)
@@Nemean 10x playback speed
I really enjoyed this, good to see more coming from this channel. Excitedly looking forward for more!
Fascinating algorithm and historical context. Thanks for sharing this and for explaining it so lucidly.
For those who aren't old enough to remember the old days of computing, one of the reasons multiplication was of such interest is that early CPUs did not have a multiply instruction in hardware. They relied on repeated addition, so if you wanted 58 * 37 it was computed as 58 + 58 + 58 ..... (37 times total), or vice-versa. I'm not sure if the first computers even had the hardware smarts to swap the numbers so they added the larger number a smaller number of times. Repeated addition is often even slower than the O(N**2) elementary school algorithm, so computer scientists were eager for anything that could improve upon that.
Also for the non-computer folks, Nemean makes the comment that subtraction is essentially the same problem as addition. You know from grade school that subtracting N is the same as adding -N, of course, but it might occur to you that -N is defined as -1 * N, which seems to imply a hidden multiplication step. Fortunately, since computers work in binary, we avoid that by using the "twos complement". In binary, this means flip every bit of the original number, which gives you the "ones complement", then add one. Adding the twos complement of N to another number, say M, is the same as computing M - N.
Here's an example using 8-bit integers, a common size for early CPUs, to compute 100 - 35. 100 is 64 + 32 + 4, or 01100100 binary. 35 is 32 + 3, or 00100011 binary. Take the ones complement of 00100011 to get 11011100, then add one for the twos complement of 11011101. Adding 01100100 to 11011101 gives (1)01000001. The parentheses are around the carry bit, which in this situation we ignore (see note below). 01000001 is 64 + 1, or 65 decimal, the answer we expect.
Even in very early computers, the operations to invert every bit (ones complement) and to add one (increment) were single hardware instructions, so the twos complement took at most two steps (and some CPUs had a single instruction to combine them). So subtraction, even on an early CPU with no subtract instruction, was not significantly more difficult than addition.
The use of twos complement binary arithmetic does imply a need to keep track of that leftmost bit and being aware of whether it is being used as a sign (1 for negative, 0 for positive) or simply as another binary digit. Programmers can define "signed integers" which cut the value's range in half but allow negative numbers, or "unsigned integers" which allow the full range but cannot be less than zero. For instance, a 16-bit unsigned integer can be 0 to 65535, inclusive, while a 16-bit signed integer can instead be -32768 to +32767, inclusive. The CPU hardware, generally, handles the raw bits the same, but the programming language and compiler help the programmer avoid misinterpreting the data.
I hope this side-trip into computer history and binary math is useful to readers who aren't computer specialists.
Great comment
Very helpful 👍🙂
wait, didn't booths multiplication algorithm exist back then? I'm surprised they used repeated addition
To give an example, the Mostek / Rockwell 6502 (of Apple II fame) had add and subtract but no multiply or divide instructions, but the Motorola 68000 (Macintosh) had them. These chips hit the market only 4 years apart.
So subtracting, is adding a negative number: 8 - 2 = 8 + (-2)
And to add a negative number you subtract its absolute value? 8 + (-2) = 8 - |-2| ...
... *= 8 - 2 = 8 + (-2) = 8 - |-2| = 8 - 2 =* ... forever..
9:30 and 16:00 I think it would've been better if you used actual numbers and showed a practical example of the calculation instead of empty digit boxes/partially filled circle shapes, it would be easier to keep track on and follow what you're talking about. Since the video started with practical examples for the easier algorithms I also was expecting practical examples for the more complicated algorithms. Having to follow where you put which blank box or which abstract circle is filled by how much and trying to find out why you gave the circles these fill values while at the same time also trying to listen to what you are saying is rather irritating.
I had the same thoughts
I'm glad someone else pointed this out. I got lost and felt that if I just had a real example to go off of it'd be much easier to follow
Click on his channel. The second video does exactly that
@@DrDeuteron If you're _not_ bothered by getting lost, I'd say that's a sign of complacency.
@@DrDeuteron just admit you don't know wtf is going on and move on lol
The Quality and The Content is top notch! Thanks for sharing!
This story about Kolmogorov and Karatsuba should be made into a film so that more people know it
I'm super excited to watch this later when I'm done with work. Thank you for the amazing content!
I believe Karatsuba's algorithm is used in quantum computing as the current fastest/most efficient method of multiplication.
I really like that the best multiplication algorithm uses the Ramanujan-Hardy number.
My guess is there would be a square root symbol in it somewhere…
Hi YellowBunny!
Hi Estepario
I love this subject, mainly because it is both quite recent and revolutionary, in a way, as well as rather easily understood by a teenager. Every now and again I talk about it to my students, and it is usually well received.
He was so impressed with this young kid's genius that he did all the work for him as a gift.
Glad I had my notifications on, Welcome back! Thanks for yet another informative video that is surprisingly easy to understand :DD
Thanks for explaining this. I vaguely remember running into this algorithm, but discarded it because it recursed without really reducing complexity. After watching your explanation, I realized that if I restrict the recursion depth, I might get something usable.
This video is truly amazing, it mixes the beauty of computer science and math
Thanks for making computational mathematics accessible. The last summary is pure gold.
Amazingly beautiful review and info! Thanks for making this! All the best
i've never seen a youtube account with 3 videos that makes such high quality videos. seriously well done man
1:01 In Russian it is better to use the word "сложение" instead of "дополнение" to denote addition.
"дополнение" in Russian means complement (set theory)
Oh Jesus... thanks for the input though
@@Nemean*Oh Lenin...
@@AffidavidDonda ?? As if Lenin is at all praiseworthy?? I’m sure his black charcoal of a heart is still providing fuel for the fires of “oh hell” tho…. It’s for the despicable evil, deliberately propagated like deadly contagions still infecting the minds the of the vulnerable, mentally weak, and those victims with “compromised intellectual immunity” who had their natural defenses of logic, reason, and objective observation castrated by atrophy, shriveled and withered like undesirable testicles on the proverbial farm hog, resulting from the constrictive rubber bands of indoctrination posing as education by Marxist operatives posing as teachers, all susceptible and succumbing to the mental viruses created and propagated by Marx, Lenin, and the rest of the monsters of yesterday and today, that cause lapses in my Agnosticism to pray that there is a heaven for some and a well deserved hell for others.
@@muchhustle4982 Thanks for that copypasta my dude! Haven't seen that one before
This is such an excellent lecture and presentation, you make a great professor! I'm saying this as a computer scientist. It was educating and entertaining at the same time.
Kolmogorov was a genius, albeit not infallible. We are standing on the shoulders of giants.
Thanks for educating us! I learned new stuff from this video.
Omg, Kolmogorov is the real MVP! So impressed that he actually went so far for a student rather than claiming all the credit for himself!
The cadence and tone of your voice is very pleasant to listen to. It reminds me of the JCS channel.
Thank you very much for teaching me with such detailed and illustrative information.
14:13 You say "It's ONLY application is cryptography" as if that's a weakness or flaw. The fact that it can offer real optimization on modern hardware for a certain subset of applications is awesome and amazing and valuable. Also, even if it had NO practical applications, it's still awesome because it disproved a prevailing theory. Even if it didn't do that, it demonstrated a new kind of algorithm. It did THREE amazing things. And you say, "I dunnno" :D
Glad to have you back! Some of the highest quality content on the platform
This is FIRE! What a spectacular journey. This is how it should be taught. Can't wait for more videos from you!
Not for everyone. I found the presentation confusing.
Imagine teaching this to 9 year olds.
I'm so happy that you've made another video, this makes my hyped to learn again. It's great motivation!
Just a heads up - there's a terminology mistake at 1:00 . "Addition" should be translated as "Сложение" in Russian, while "Дополнение" in English would be "Complement" term from Set Theory.
My goodness the explanation and visuals are amazing! Glad I came across this channel.
I feel like when you were talking about big O, there were some big aspects you missed. In particular, one of the big reasons big O is important is that it better measures how an algorithm scales to extremely large inputs.
While the big O might not be able to tell you an exact runtime, it can tell you how that runtime changes when you change the input. For example, for an O(n) algorithm, doubling the size of the input make it take twice as long as before, while an O(n^2) algorithm will take 4 times as long, and an O(log n) algorithm will only take a constant amount of time more. The ways that different algorithms scale tends to be more important than any constant factors when n is extremely large.
For example, the runtime of an O(n) algorithm might be like 10n, while an O(n^2) algorithm might be n^2/10; with small n, the O(n) algorithm is slower due to the high overhead (for n=4, the first algorithm is 40 while the second is 1.6), but as n increases, the difference in powers rapidly overcomes the constant factors (for n=10,000, it’s 100,000 versus 10,000,000, so the first is a hundred times faster). That’s why we talk about big O in algorithms; when the input is big enough that runtime is a concern, that’s what gives you a real idea of the runtime.
Wow this makes waaaay more sense
An O(log n) algorithm isn't constant, but would be proportional to natural logarithm, O (n log n) is more feasible as just processing the input is O(n).
@@RobBCactive I wasn’t saying log n was constant time; I was saying for a log n algorithm, doubling the input length would increase the time by a constant amount, since log 2n = log n + log 2.
@@KnakuanaRka No you specifically said, "and an O(log n) algorithm will only take a constant amount of time more", read your post adding log 2 or log 3 to log n for factors of N is NOT a constant
@@RobBCactive Well, if log n is the runtime for input of length N, adding log 2 for the runtime of 2N is effectively a constant amount more (since it doesn’t depend on N); what’s the problem?
Excellent discussion, thank you. Also what a mensch Kolmogorov is. Good to see, and thanks for telling us about it.
Since your fast SQRT video I was waiting until your next lauch. This video gave me a thousand goosebumps, incredible! Good job
i kind of understood this. thank you for this video. I often come back to your first video when i need inspiration how to change way of thinking when i search for answer.
Thank you for educating us with this beautiful video of yours. The way you put them together is just perfect, thank you again.
Thank you for showing me how the computer works using algorithms.
In the time when Kolmogorov was at the age of Karatsuba (when they met) there was no Fast Fourier Transform, but on the other hand Parseval theorem was already stated in the 18th century - kids read the books and study them ! :)
i thought this account exists just to post one vid and nothing else, nice to see a new upload!
Absolutely fascinating, please keep them coming.
"I claim", kudos for knowing what's right and saying what's obvious, and moving all of us along the path of learning.
That was a joy to watch. Thank you!
Really glad you're back! Can't wait to see more of your content.
Can we all agree that this person should make more videos? Explaining complex computer science and mathematics concepts in simpler terms is something we all need in our lives.
We learned the Karatsuba Method last week in Numerics, so this was an interesting new take on the way we learned it
If you do the FFT in _modular_ arithmetic (which uses only integers) you get multiplication with no rounding error because you don't need to worry about floating-point arithmetic. The algorithm is the same.
I think Bunimovs optimization of Montgomerys algorithm is another beautiful algorithm to compute products using modular arithmetic in finite sets.
I'm glad I clicked on this video, the thumbnail made it look like it was going to be a dumb math method that still works but overcomplicates things/does the exact same thing the normal method does but displayed slightly differently but clickbaited as "a new faster way to do math" but it turned out to actually be more efficient (as the length of the numbers increases)
This whole video is incredibly interesting and explains lots of things very well, but I am laughing so hard at 17:00 . The deadpan delivery of that line “log star of the number of atoms in the universe... is five.”
Nice video! Really liked the smooth animation
Hmmm…reminds me of another algorithm dealing with linear programming (LP). LP is theoretically a N^x steps where x is the number variables. There is a Russian algorithm that has O(N) steps, but the slope of T (time) is so steep it might as well be a quadratic equation. I wrote a paper on this 40 years ago for one of CS Master’s classes after reading about it in a programming journal. The math was so obscure (or maybe the Russian was so obscure) that I had to go back to the original paper to get the algorithm correct. It was a fun project, as I was really interested in linear programming at the time. Seems I fell in love with CAD, though.
Even if this video doesn't blow up it is still amazing content, thanks!
Really cool seeing that discoveries in mathematics are still being done to this day :)
discoveries in math are being done every day - mathematics is so much more than just arithmetic!
I looked at this video on a random whim and I'm glad i did! Very well explained video on a topic that can be difficult to follow.
As others have mentioned, practical examples may have worked better than the colored blocks used, as this would allow the audience to follow along in an easier fashion.
Thanks!
This algorithm is used in Java's implementation of BigDecimals (or BigIntegers ?) for very big numbers.
BigInt yes
Also python and possibly javascript
I'm just discovering this channel now. Nice stuff!
Another reason computer scientists only examine order of algorithm time is because of asymptotic behavior. If an algorithm has a lower order than another one, then there is a sufficiently large problem for which it will run faster. So no matter how good your linear constants, eventually a slow (high order) algorithm would lose. Also, as we progress socially usually we seek to solve larger problems, so it merits investigating orders.
On the other hand, precise run times are not useless. But you always assume a machine model. As you noted, if your machine model is not super precise (like the 64-bit vs 1-bit addition), then your results may be useless in practice. The specificity can be so large that it's not worth it to tune something that no one will ever use -- it makes more sense to just do that when you have a concrete machine and important problem to solve. That's essentially software engineering/computer programming -- it's important, but it's specific. In mathematics and academia the idea is usually to generalize and provide lasting value, versus engineering is solving specific problems (which you can publish as well in the hope it will be useful to similar machines).
One interesting case of very useful precise run times is in analyzing digital circuits. Almost every real world implementation of digital circuits uses similar implementation (binary gates with minor differences in area). So I believe there is detailed study of how many digital gates you need to achieve some operation (very interesting and important field).
Also, Andrej Kolmogorov's kindness is just lovely and legendary.
Awesome video so far. 48 seconds in and started with a verse, had some code in there... Kool stuff!
Great video! And as a Russian-speaking person I want to notice that mathematical operation "addition" is called "сложение" not "дополнение". The term's you used meaning is "a minor member of a sentence, usually expressed as a noun". Best Regards!
дополнение means "addition" in the sense of supplement or expansion (i.e., it would be used in sentences like "the addition of a new terminal to the airport", or "with added vitamins").
This was super interesting thanks for uploading, I will be watching you form now on
Excellent video!!
Just loved the video man! awesome it was.
At first I was like "Meh, just show me the algorithm, I don't want to watch such a long video," but it is so well-made I can only say thank you for your work!
I'll be honest, I usually hate long videos too. Luckily there was this contest and I made a 1 minute version of my main video for it: ua-cam.com/video/LCY4dnm88oI/v-deo.html . If you want to share Karatsuba's algorithm with a friend or whatever, that video is a good teaser.
Dude dropped two of the best CS videos on UA-cam and then dipped
That was a fun video! (said by a Theoretical Computer Scientist / Mathematician)
You said that this algorith is not "really recomended" for "popular" use, but coincidentally, last week I saw I probability class on youtube where the teacher tells an "easy way" to multiply 2 digits number by 11 without previously memorizing its results: whatever 2 digits number multiplied by 11 results in a 3 digit number which first and last digits are the same original first and last digit, and the digit in-between is obtain by adding them, really simple (and if the addition has a carrier you just add it to the first one - here reading from left to right)... seen your video I have realized why works and also that can be extended to every 2x2 digit multiplication.... I recomend you edit the video and incorporate this "eleven case example" because of its simplicity (since you always multiply digits by "one", and maybe another case as counterexample), but this "11 rule" is just amazing. Try it by yourself.
Excellent video. I was always fascinated by these "dynamic programming" style algorithms. They appear everywhere.
Of course I'm just a rando on the internet, but I think such a video on the evolution of matrix multiplication would be equally if not more interesting. Strassen (the same one you mentioned) played a role there too, and a similar (sort of) trick is used in that case. However such a video would probably be extremely long since it's still an open question. Then again, even this video covers an impressive scope.
You're not the only one who recommended matrix multiplication, but I'm not so convinced. Strassen's algorithm is a similar idea to Karatsuba's (instead of 3 digits expressed as a combination of 3 products, it's 8 matrices expressed as 7 products) and the more sophisticated optimisations use tensors heavily and I don't know how to easily explain that (yet). I'll have to think about it.
@@Nemean I already read that in your soothing voice, so please think some more about it ;).
Subbed your channel btw, eagerly awaiting more content. You rock!
we learn more in this video ...thnx 4 posting .
I love the super subtle humor here, hahaha. The explanatory visuals are also great even though I feel it still quickly becomes a lot to digest for us lower peasants. YT needs more of this. Thank you.
I really love your videos so far, clear and somewhat concise
Really instructive, thanks. I hope you make more
i always knew that constants are avoided during calculation of time complexity, and now you have explained the reason pretty well.
Thanks 😊🙏
My teachers: They are avoided Period
😑😑
This is one of the best explainers on fast multiplication algorithms out there. Much excellent!
Epic video, товарищ! :)
spatsiba or whatever
hey wait a minute, this is the same guy on quake fast sqrt. glad you're back with some bangers
That was brilliant. And actually taught me new maths too.
18:00 - For smaller numbers...
Lololllol 🤣
I love your delivery. Pure gold.
That was fascinating, thank you
Awesome video! I just found this channel now thanks to the algorithm. I'm subscribing right away!
The moment i saw 1729 i screamed Ramanujan!!.... That's the smallest positive integer that can be written as the sum of 2 cubes in 2 different ways!!
12³ + 1³ = 1729 = 10³ + 9³
Now I don't know about why it showed up here but i was just excited by the observation :D
Thank you for a fantastically well put together video.
very interesting and informative video
I really enjoyed your video, I can only encourage you to publish more content like this.