Props to Kolmogorov, he could have sent the paper in his own name without giving credit to an unknown student and take all the merit. The academic world is sometimes ruthless
@@marcnye9221 great that you've got that impression, but the reality is that more and more university professors are favouring producing quantity over quality of papers so they can earn "prestige", and the students are used as free labour to support that.
Sure B Gates gave credits to computer scientist, sure, B Gates is well known for it. In reality however Gates stole almost everything and forced many people over the edge to afterlife. Dr Gary Kildall his Wikipedia can enlighten you how B Gates haircut fooled him when B Gates stole his 10 year of work developing an operating system and the BIOS all computers once had.
Indeed. To not only admit, but actually welcome, verifiable new information that unseats old hypotheses is the hallmark of good science. I have no doubt that Kolmogorov carefully analyzed Karatsuba's proofs before fully accepting them (as any wise researcher would), but once he had confirmed their validity, he had the intellectual courage and integrity to embrace them. A scientist is not diminished when their hypotheses are disproved, because that is how we evolve the body of human knowledge, but some will diminish themselves by refusing to accept this with grace.
Read the comment section on any WW2 obscure event which has an insignificant effect on the war. "Why didnt I learn about this in school? Clearly it's a conspiracy against America!" I know youre joking, but that attitude is so common.
@@jakewalklate6226 Spoken like a true mathematician. "Clearly Stalin invaded at the point due to numerical superiority over Finland. What about mathematical history? I'm still never entirely sure why we use 360 degrees apart from ease of use and something about Babylonians.
For anyone curious at 13:38 N^1.6 is used as an approximation. It's really N ^ log base 2 of 3. If you want to enter it into a calculator use the change of base formula. Log(3) / Log(2)
Well, every O(n^log2(3)) algorithm is also an O(n^1.6) algorithm, so the video is fully correct in approximating that number while not labeling the whole thing as approximated. Though I personally do like to state bounds like that exactly. Θ notation is a good way to do that (it just means both O and Ω).
taking log2 of 3 to be 1.58 (it's not, it's much closer to 1.6, I've added about 30% difference here) this difference doesn't break 10% until N=194, in base 10 that's; 10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 it doesn't break 5% until N=13, or one trillion in base 10, so the discrepancy grows at a painfully slow logarithmic curve; 1,000,000,000,000 For all human use cases, N^log2(3) = N^1.6
I love how you bring nearly-unreachable knowledge to the community through interesting and easy-to-understand videos. I would never know this bit of theoretical CS otherwise. Keep up the good work!!!
& I also £♡✌️€ , how I'm studying Russian language, ,& this Shows me RUSSIAN letters, names, historical events & People! Spasibo. Спасибо. (Thank You).
Between Karatsuba and FFT there is a Toom-Cook algorithm, from 1963-66. As FFT, it treats both numbers as polynomials, evaluate the values naivly in some points (for small numbers! Like 0,+-1, -2,+inf), multiply them and then interpolate it back to polynomial form. "2 way" toom-cook recreates Karatsuba. The original "3 way" and "for way" have the complexity O(N^1.465) and O(N^1.404). The GMP library (a hefty library for big numbers) uses naive, Karatsuba, "3","4","6.5" and "8.5-way" toom-cook, and fft, using each algorithm for numbers of different lengths.
It's amazing how a simple problem like multiplication can devolve into such complex mathematical discoveries. Who would have thought that multiplying optimally is insanely more difficult than adding.
I set out, one afternoon, to write a large number library, just for my own edification. When I got to division, I realized I didn't actually know how to program a computer to divide, other than using built in divide. Sometimes simple things are not as simple as they seem :)
@@AntoineViallonDevelloper Euclid's algorithm is an efficient method for computing the greatest common divisor (GCD) of two integers (numbers) but it uses division itself, so isn't useful to Michael.
Even if the lower bound is Ω(N × log N), there is still mathematical progress to be made (or disproven) in finding an algorithm which is that efficient with smaller and smaller inputs.
@@Ruhrpottpatriot although memory is cheap it's not available in everything. for example if we wanted to use this method we would run out of memory in an Arduino quickly, but i do agree that if we have the memory to spare then this would be the fastest solution.
Fun fact: When computers are multiplying whole numbers, the compiler will often optimize the code, so it doubles (or halves) the number one or more times (which is a single operation in the computer, known as bit shifting) and then add or subtract a konstant to achieve the result. So a code of x * 9 (which is (x * 8) + 1) would be compiled as an equivalent to (x
very interesting, do all compilers do this? is there a way to force this specific method if i notice the compiler isnt doing it? Also, i think you meant to write (x * (8+1)) or even more descriptive, (x * (2^3+1))
@@hoane6777 In general, no, you can't force a compiler to do something not specified by the language. You have to write the code the way you want it, and that's almost always a bad idea for maintainability. Also, if you're working with numbers big enough for calculation speed to matter, the compiler won't know to optimize anything, because the calculations will span multiple variables of the largest builtin type (e.g. 64 bits). If you're working on cryptographic code, you need to worry about how the calculations are performed for more than just speed. If a calculation takes a different number of steps depending on the value of a key, for example, that weakness can be used by attackers to retrieve the value of the key.
I was astonished to learn that my C64 didn‘t have a Mult opcode, and to multiply any number by for example ten, it had to multiply by eight (shift three times) and then add the input value two times. What a hassle!
@@DasHemdchen After the shift left 3 bits, it didn't shift the original left 1 bit to get a double to add to the eight to arrive at ten? Weird... or shift left 1 for double, then shift that result 2 to get 8x. If you've come across the binary exponentiation algorithm (square and multiply), you could use successive bit shifting and addition to arrive at multiplication for arbitrary numbers. As a bonus, if you want the answer mod some n, you can just subtract n after each single bit shift left that yields n or greater, instead of programming division.
17:20 note that also loglogN is practically constant like the k^log*(n) since loglog(N) where N is the numbers of atoms in the observable universe is around 8. If N is the number of atoms in the observable universe then loglogN is actually smaller than 4^log*(N).
@@daldi5211 in practice in IT we would use base 2 as we can approximate it by counting bits from the ones bit (which we count as zero) up to the largest bit with a value of 1. However there is a theorem that states that to change a log from one base to another we can multiply by a constant that depends only on the two bases. And we know from earlier that we can ignore constant multipliers. So you can apply this rule in any base you like and it still works.
14:47 That was honest of Kolmogorov. I have met a few people in my career who would pretend to have done work which was actually done by someone else. They would then take the credit for the other person's work.
Wonderful video :) I am writing an Algorithms exam next week and wanted to take a break from learning but ended up learning about the algorithm more than in my lecture and in a more exciting and relaxing way. Thank you for this masterpiece and wonderful editing!
9:30 and 16:00 I think it would've been better if you used actual numbers and showed a practical example of the calculation instead of empty digit boxes/partially filled circle shapes, it would be easier to keep track on and follow what you're talking about. Since the video started with practical examples for the easier algorithms I also was expecting practical examples for the more complicated algorithms. Having to follow where you put which blank box or which abstract circle is filled by how much and trying to find out why you gave the circles these fill values while at the same time also trying to listen to what you are saying is rather irritating.
I have to say, I can't even remember how many times I have learned Big O notation already, but it's the first time in my life I heard about Linear speedup theorem. Like, it suddenly explained everything. I suddenly understand why the linear magnitude does not matter.
Content like this is why I still pay my internet bill. Thoughtfully presented, beautifully explained, and utterly fascinating even to a cynical math-o-phobe like me. Eighteen minutes well spent. I look forward to future content as a new subscriber. Bravo!
I have just discovered this channel and the animations and the gradients are so beautiful, the content, so mesmerising, that I instantly subscribed. Thank you.
I have seen fast multiplication on Commodore 64 (6502 processor without a built-in multiplier) based on a similar idea. a*b = ( (a+b)/2 )^2 - ( (a-b)/2 )^2. For all possible values of a+b and a-b, the square of a half is precalculated in a table; so for 8-bit numbers, 512 precalculated table entries are needed. This is easily a few times faster than trivial multiplication.
I'm glad I clicked on this video, the thumbnail made it look like it was going to be a dumb math method that still works but overcomplicates things/does the exact same thing the normal method does but displayed slightly differently but clickbaited as "a new faster way to do math" but it turned out to actually be more efficient (as the length of the numbers increases)
what an incredible video, you have a real talent for getting at the core of these ideas and showcasing the clear arguments which easily get muddled by technicalities
This is such an excellent lecture and presentation, you make a great professor! I'm saying this as a computer scientist. It was educating and entertaining at the same time. Kolmogorov was a genius, albeit not infallible. We are standing on the shoulders of giants. Thanks for educating us! I learned new stuff from this video.
@@AffidavidDonda ?? As if Lenin is at all praiseworthy?? I’m sure his black charcoal of a heart is still providing fuel for the fires of “oh hell” tho…. It’s for the despicable evil, deliberately propagated like deadly contagions still infecting the minds the of the vulnerable, mentally weak, and those victims with “compromised intellectual immunity” who had their natural defenses of logic, reason, and objective observation castrated by atrophy, shriveled and withered like undesirable testicles on the proverbial farm hog, resulting from the constrictive rubber bands of indoctrination posing as education by Marxist operatives posing as teachers, all susceptible and succumbing to the mental viruses created and propagated by Marx, Lenin, and the rest of the monsters of yesterday and today, that cause lapses in my Agnosticism to pray that there is a heaven for some and a well deserved hell for others.
I love this subject, mainly because it is both quite recent and revolutionary, in a way, as well as rather easily understood by a teenager. Every now and again I talk about it to my students, and it is usually well received.
i kind of understood this. thank you for this video. I often come back to your first video when i need inspiration how to change way of thinking when i search for answer.
Thanks for explaining this. I vaguely remember running into this algorithm, but discarded it because it recursed without really reducing complexity. After watching your explanation, I realized that if I restrict the recursion depth, I might get something usable.
14:13 You say "It's ONLY application is cryptography" as if that's a weakness or flaw. The fact that it can offer real optimization on modern hardware for a certain subset of applications is awesome and amazing and valuable. Also, even if it had NO practical applications, it's still awesome because it disproved a prevailing theory. Even if it didn't do that, it demonstrated a new kind of algorithm. It did THREE amazing things. And you say, "I dunnno" :D
The cadence and tone of your voice is very pleasant to listen to. It reminds me of the JCS channel. Thank you very much for teaching me with such detailed and illustrative information.
I feel like when you were talking about big O, there were some big aspects you missed. In particular, one of the big reasons big O is important is that it better measures how an algorithm scales to extremely large inputs. While the big O might not be able to tell you an exact runtime, it can tell you how that runtime changes when you change the input. For example, for an O(n) algorithm, doubling the size of the input make it take twice as long as before, while an O(n^2) algorithm will take 4 times as long, and an O(log n) algorithm will only take a constant amount of time more. The ways that different algorithms scale tends to be more important than any constant factors when n is extremely large. For example, the runtime of an O(n) algorithm might be like 10n, while an O(n^2) algorithm might be n^2/10; with small n, the O(n) algorithm is slower due to the high overhead (for n=4, the first algorithm is 40 while the second is 1.6), but as n increases, the difference in powers rapidly overcomes the constant factors (for n=10,000, it’s 100,000 versus 10,000,000, so the first is a hundred times faster). That’s why we talk about big O in algorithms; when the input is big enough that runtime is a concern, that’s what gives you a real idea of the runtime.
An O(log n) algorithm isn't constant, but would be proportional to natural logarithm, O (n log n) is more feasible as just processing the input is O(n).
@@RobBCactive I wasn’t saying log n was constant time; I was saying for a log n algorithm, doubling the input length would increase the time by a constant amount, since log 2n = log n + log 2.
@@KnakuanaRka No you specifically said, "and an O(log n) algorithm will only take a constant amount of time more", read your post adding log 2 or log 3 to log n for factors of N is NOT a constant
@@RobBCactive Well, if log n is the runtime for input of length N, adding log 2 for the runtime of 2N is effectively a constant amount more (since it doesn’t depend on N); what’s the problem?
You said that this algorith is not "really recomended" for "popular" use, but coincidentally, last week I saw I probability class on youtube where the teacher tells an "easy way" to multiply 2 digits number by 11 without previously memorizing its results: whatever 2 digits number multiplied by 11 results in a 3 digit number which first and last digits are the same original first and last digit, and the digit in-between is obtain by adding them, really simple (and if the addition has a carrier you just add it to the first one - here reading from left to right)... seen your video I have realized why works and also that can be extended to every 2x2 digit multiplication.... I recomend you edit the video and incorporate this "eleven case example" because of its simplicity (since you always multiply digits by "one", and maybe another case as counterexample), but this "11 rule" is just amazing. Try it by yourself.
In the time when Kolmogorov was at the age of Karatsuba (when they met) there was no Fast Fourier Transform, but on the other hand Parseval theorem was already stated in the 18th century - kids read the books and study them ! :)
Another reason computer scientists only examine order of algorithm time is because of asymptotic behavior. If an algorithm has a lower order than another one, then there is a sufficiently large problem for which it will run faster. So no matter how good your linear constants, eventually a slow (high order) algorithm would lose. Also, as we progress socially usually we seek to solve larger problems, so it merits investigating orders. On the other hand, precise run times are not useless. But you always assume a machine model. As you noted, if your machine model is not super precise (like the 64-bit vs 1-bit addition), then your results may be useless in practice. The specificity can be so large that it's not worth it to tune something that no one will ever use -- it makes more sense to just do that when you have a concrete machine and important problem to solve. That's essentially software engineering/computer programming -- it's important, but it's specific. In mathematics and academia the idea is usually to generalize and provide lasting value, versus engineering is solving specific problems (which you can publish as well in the hope it will be useful to similar machines). One interesting case of very useful precise run times is in analyzing digital circuits. Almost every real world implementation of digital circuits uses similar implementation (binary gates with minor differences in area). So I believe there is detailed study of how many digital gates you need to achieve some operation (very interesting and important field).
@@Nemean I just knew from the previous one (Quake Inverse Sqrt Algorithm), and damn this video was really great. It had some really unexplainable feel at the end (all the multiplication-algorithms and their runtime). It was super informative and very interesting in fact. 👍👍👍👍👍👍👍👍👍👍👍👍👍
Just a heads up - there's a terminology mistake at 1:00 . "Addition" should be translated as "Сложение" in Russian, while "Дополнение" in English would be "Complement" term from Set Theory.
Great video! And as a Russian-speaking person I want to notice that mathematical operation "addition" is called "сложение" not "дополнение". The term's you used meaning is "a minor member of a sentence, usually expressed as a noun". Best Regards!
дополнение means "addition" in the sense of supplement or expansion (i.e., it would be used in sentences like "the addition of a new terminal to the airport", or "with added vitamins").
Fascinating algorithm and historical context. Thanks for sharing this and for explaining it so lucidly. For those who aren't old enough to remember the old days of computing, one of the reasons multiplication was of such interest is that early CPUs did not have a multiply instruction in hardware. They relied on repeated addition, so if you wanted 58 * 37 it was computed as 58 + 58 + 58 ..... (37 times total), or vice-versa. I'm not sure if the first computers even had the hardware smarts to swap the numbers so they added the larger number a smaller number of times. Repeated addition is often even slower than the O(N**2) elementary school algorithm, so computer scientists were eager for anything that could improve upon that. Also for the non-computer folks, Nemean makes the comment that subtraction is essentially the same problem as addition. You know from grade school that subtracting N is the same as adding -N, of course, but it might occur to you that -N is defined as -1 * N, which seems to imply a hidden multiplication step. Fortunately, since computers work in binary, we avoid that by using the "twos complement". In binary, this means flip every bit of the original number, which gives you the "ones complement", then add one. Adding the twos complement of N to another number, say M, is the same as computing M - N. Here's an example using 8-bit integers, a common size for early CPUs, to compute 100 - 35. 100 is 64 + 32 + 4, or 01100100 binary. 35 is 32 + 3, or 00100011 binary. Take the ones complement of 00100011 to get 11011100, then add one for the twos complement of 11011101. Adding 01100100 to 11011101 gives (1)01000001. The parentheses are around the carry bit, which in this situation we ignore (see note below). 01000001 is 64 + 1, or 65 decimal, the answer we expect. Even in very early computers, the operations to invert every bit (ones complement) and to add one (increment) were single hardware instructions, so the twos complement took at most two steps (and some CPUs had a single instruction to combine them). So subtraction, even on an early CPU with no subtract instruction, was not significantly more difficult than addition. The use of twos complement binary arithmetic does imply a need to keep track of that leftmost bit and being aware of whether it is being used as a sign (1 for negative, 0 for positive) or simply as another binary digit. Programmers can define "signed integers" which cut the value's range in half but allow negative numbers, or "unsigned integers" which allow the full range but cannot be less than zero. For instance, a 16-bit unsigned integer can be 0 to 65535, inclusive, while a 16-bit signed integer can instead be -32768 to +32767, inclusive. The CPU hardware, generally, handles the raw bits the same, but the programming language and compiler help the programmer avoid misinterpreting the data. I hope this side-trip into computer history and binary math is useful to readers who aren't computer specialists.
To give an example, the Mostek / Rockwell 6502 (of Apple II fame) had add and subtract but no multiply or divide instructions, but the Motorola 68000 (Macintosh) had them. These chips hit the market only 4 years apart.
Multiplying binary numbers is great. When bitshifting one number, it becomes ×2 or /2 depending on the direction. It goes like this: loop this until number2=0 { number2 bitshift right (/2) if theres a carry → add number1 to the sum number1 bitshift left (×2) } for example (n1)1001 × (n2)101 iteration 0: (n2)101>> → (n2)10.1 (carry) → add (n1)1001 to sum → sum = 1001 (n1)1001> → (n2)1.0 ( no carry) (n1)10010> → (n2)0.1 (carry) → add (n1)100100 to sum → sum = 100100 + 1001 = 101101 (n1)100100
This whole video is incredibly interesting and explains lots of things very well, but I am laughing so hard at 17:00 . The deadpan delivery of that line “log star of the number of atoms in the universe... is five.”
I used to play around with numbers when I was bored, writing them down on lined paper. I found an easy way to multiply 2 digit numbers in my head. I'm not sure what it's called, I haven't really looked it up. So i'll just put a 2 digit fact on an try and explain the operation. 34 x 26 -------- Step 1. 3 x 2 = 600 Step 2. 2 x 4 = 80 Step 3. 6 x 3 = 180 Step 4. 6 x 4 = 24 Add sums = 884 So step 1: I multiply the first digits of both factors, the first step Is always counted as hundreds. 3 x 2 as 600 In steps 2-3 I count in tens, really I just add a zero place. ex: (2 x 4 = 80) 3 x 4 = 120 - 12(0) ex: (6 x 3 = 180) 3 x 3 = 90 - 9(0) In step 4 I keep it as it should be 6 x 4 = 24. Now in this example question I added everything at the end. If I were actually calculating in my head I would add as I go. ex: 600 - 680 - 860 - 884 This is the order I do it, going through steps 1-4 3 x 2, 2 x 4, 6 x 3, then 6 x 4 Let me know what this is called, I'd like to look it up.
I'm not sure what the real name is, but I just call it the algebraic method. It's essentially the same as the "elementary school multiplication" algorithm in the video. It's just way easier to do with 2-digit numbers than 3-digits (you need to do 4 1-digit multiplications instead of 9... N² steps, see?). Fun fact, the American Common Core curriculum that everybody loved to hate back in 2014 teaches this method.
If you do the FFT in _modular_ arithmetic (which uses only integers) you get multiplication with no rounding error because you don't need to worry about floating-point arithmetic. The algorithm is the same.
Hmmm…reminds me of another algorithm dealing with linear programming (LP). LP is theoretically a N^x steps where x is the number variables. There is a Russian algorithm that has O(N) steps, but the slope of T (time) is so steep it might as well be a quadratic equation. I wrote a paper on this 40 years ago for one of CS Master’s classes after reading about it in a programming journal. The math was so obscure (or maybe the Russian was so obscure) that I had to go back to the original paper to get the algorithm correct. It was a fun project, as I was really interested in linear programming at the time. Seems I fell in love with CAD, though.
I feel like I am missing something here, shouldn't FFT REQUIRE multiplication to exist in order to actually do FFT? how can you do FFT without a previously defined multiplication, pretty sure you can't bootstrap an algorithm like this, unless you do recursion or something until you get to single bit numbers?
Because it sounds like you know what you're talking about, you get the long answer. It has to do with the log log N factor. Schönhage and Strassen found out that, from the complex numbers that you get out of the FFT, you only need to keep the first log N bits. That means that multiplying complex numbers needs (log N)^2 steps, if you multiply naively, (log N)^1.6 steps, if you use Karatsuba, or, as you and Schönhage and Strassen found out, you just use FFT again. Since the numbers now have log N bits instead of N, the FFT needs log N * log(log N) steps. Furthermore, it in turn needs to multiply numbers with log log N bits, so you use FFT again. Doing this recursively gives you a runtime of N * log N * log log N * log log log N * ... and so on. What Schönhage and Strassen published in their paper is how to remove all the factors with 3 or more logs. That's how they got N * log N * log log N.
Its also good to mention that, when we get practical, if your input is smaller than k, then k*N is slower than N². So if you have an algorithm that could be 20*N or N² but your input is *always* smaller than 20, then N² is actually better.
Your result is right, but the reasoning have a mistake: when the video says that "constant" does not matter, its still means you have to compare them "equally" weighted, so you have to compare on your example how N grows compared to N^2, or how 20*N grows compared to 20*N^2 (same result in the abstraction "order" of calculation). But anyway, it is also true that can happen that one algorith works better that other depending on the "size N" of digits to be done, and also, some algorithms which gives an aproximated result could be useful if you need speed instead of precission, because no engineer will really care if the result ends in 6 or 7 when the result have 14 digits before much more significant in his budget, as example, calculating big factorials numbers N! with N large, is used the Stirling's approximation (easily founded on wikipedia if you are interested).
@@whatitmeans I understand, but I meant in terms of implementation. Like the amount of instructions per iteration, for example, and the time cost of each instruction. Sometimes you can even consider the probability of the input being small or big, so when you have to compute several inputs the average is better.
@@victorlaurent37 completely valid argument. That is exactly why there is people studing it: can be improve? How? Aproximations? Decompositions?. Generally speaking, efforts in implementing such things are done because beforehand you know you are going to need such kind of algorithms (scientist and mathematicians mainly - or just simply by scientific curiosity). Since standard binary multiplication algotithms are already implementated and optimized on the libraries of every calculation software you dont really need to think about that, but when needed, have these tools already designed in your libraries is a big relief (ask any big data analyst or programmer "how many times has been saved because something were done before by someone and left it on StackExchange?, maybe 99% of the times hahaha). As an example, astronomy images of the sky have so much resolution, than sometimes individual pictures have to be saved on more than one drive: imagine how much time processing them will require? These kind of "heavy algorithms" have a lot of sense on these situations.
that's right, but the video talks about O(n logn). O(any algorithm) denotes the worst case, it's called big Oh. Since we generally do not know the input, comparing the worst case of algorithms makes sense
When some individuals multiply 6 digits numbers fast mentally, do they use any of those algorithms intuitively? Is something known about the phenomenon from that point of view nowadays? I know (from discussions with my friend mathematician), that usually those individuals are lacking logic till the extend that they can't do anything in the area of mathematics. Not always it is the case though. And probably the brightest example of combination of both skills (means lightning mental computations ability and logic) would be the famous mathematician John von Newmann. His mental computational abilities were such that people who had a lucky opportunity to communicate with him had impression that they are dealing with an extraterrestrial (he could add up series mentally and everything alike) and not a human. Thank you very much for this great film.
karatauba's way is similar to the vedic system of multypling... and another way to broke down the multiplication is by stressen's way of multypling matrixes.. but i really like the fft part..
But Indian "intellectuals" mock vedic maths as "a bag of tricks" and not "actual" maths. They borderline call it quackery. This notion has bothered me for a long time, especially as a software engineer. The entire field of computer science excelled because of various "bags of tricks" we know as algorithms. Even before that in mathematical analysis, there have been various bags of tricks too, be it newton raphson methid for approximating roots, or chinese remainder theorem for solving congruent modulus equations. I wonder why one kind of bag of tricks is mocked while the other is regarded as actual scientific knowledge. I presume It is mostly mocked because a Hindu saint of the Shanakarcharya lineage came up with it and he attributed it to the vedas.
The moment i saw 1729 i screamed Ramanujan!!.... That's the smallest positive integer that can be written as the sum of 2 cubes in 2 different ways!! 12³ + 1³ = 1729 = 10³ + 9³ Now I don't know about why it showed up here but i was just excited by the observation :D
Can we all agree that this person should make more videos? Explaining complex computer science and mathematics concepts in simpler terms is something we all need in our lives.
Considering multiplication requires each digit of one number to operate with each digit of the other, I wonder if a theoretical sub- n log n algorithm would be able to break the n log n limit of comparison sorts
I looked at this video on a random whim and I'm glad i did! Very well explained video on a topic that can be difficult to follow. As others have mentioned, practical examples may have worked better than the colored blocks used, as this would allow the audience to follow along in an easier fashion. Thanks!
Props to Kolmogorov, he could have sent the paper in his own name without giving credit to an unknown student and take all the merit. The academic world is sometimes ruthless
@@marcnye9221 Corporate is eroding that, now quicker than ever.
@@marcnye9221 while in engineering... :/
@@marcnye9221 great that you've got that impression, but the reality is that more and more university professors are favouring producing quantity over quality of papers so they can earn "prestige", and the students are used as free labour to support that.
Sure B Gates gave credits to computer scientist, sure, B Gates is well known for it. In reality however Gates stole almost everything and forced many people over the edge to afterlife. Dr Gary Kildall his Wikipedia can enlighten you how B Gates haircut fooled him when B Gates stole his 10 year of work developing an operating system and the BIOS all computers once had.
Incorrect. I can name atleast one Ruth in the academic field (Ruth Aaronson Bari)
Kolmogorov is one of the coolest men I've heard of. Admitting defeat and then anonymously supporting the kid. wild
Really curious people wants their ideas to be scrutinized. They seek knowledge.
Soviets and their communism.
nowadays you will get "researchers" sponsored by pharma/oil/any companies.
I know of Kolmogorov mainly from my work in statistical analysis. There he is, basically, a god.
Unthinkable nowadays
Nowadays he would have canceled him and his career for the „crime“ of being right
Indeed. To not only admit, but actually welcome, verifiable new information that unseats old hypotheses is the hallmark of good science. I have no doubt that Kolmogorov carefully analyzed Karatsuba's proofs before fully accepting them (as any wise researcher would), but once he had confirmed their validity, he had the intellectual courage and integrity to embrace them. A scientist is not diminished when their hypotheses are disproved, because that is how we evolve the body of human knowledge, but some will diminish themselves by refusing to accept this with grace.
I think the fact we don't teach fast fourier transform in elementary school says a lot about society.
We should replace the early education curriculum with theoretical computer science and graph theory
Read the comment section on any WW2 obscure event which has an insignificant effect on the war. "Why didnt I learn about this in school? Clearly it's a conspiracy against America!"
I know youre joking, but that attitude is so common.
@@letsburn00 well there will be no history at all once I’m done with it, mathematics only
@@jakewalklate6226 Spoken like a true mathematician. "Clearly Stalin invaded at the point due to numerical superiority over Finland.
What about mathematical history? I'm still never entirely sure why we use 360 degrees apart from ease of use and something about Babylonians.
@@letsburn00 the use of 60 and 360 is because 60 is divisible by a lot of numbers. 1,2,3,4,5,6,10,12,15,20 and 30. Easy for calculator-less times.
For anyone curious at 13:38 N^1.6 is used as an approximation. It's really N ^ log base 2 of 3. If you want to enter it into a calculator use the change of base formula. Log(3) / Log(2)
Well, every O(n^log2(3)) algorithm is also an O(n^1.6) algorithm, so the video is fully correct in approximating that number while not labeling the whole thing as approximated.
Though I personally do like to state bounds like that exactly. Θ notation is a good way to do that (it just means both O and Ω).
taking log2 of 3 to be 1.58 (it's not, it's much closer to 1.6, I've added about 30% difference here) this difference doesn't break 10% until N=194, in base 10 that's;
10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
it doesn't break 5% until N=13, or one trillion in base 10, so the discrepancy grows at a painfully slow logarithmic curve;
1,000,000,000,000
For all human use cases, N^log2(3) = N^1.6
I love how you bring nearly-unreachable knowledge to the community through interesting and easy-to-understand videos. I would never know this bit of theoretical CS otherwise. Keep up the good work!!!
& I also £♡✌️€ , how I'm studying Russian language, ,& this Shows me RUSSIAN letters, names, historical events & People!
Spasibo. Спасибо. (Thank You).
This is what I studied in my 200-level, 300-level, and 400-level computer science algorithms class. Good explanation!
Between Karatsuba and FFT there is a Toom-Cook algorithm, from 1963-66. As FFT, it treats both numbers as polynomials, evaluate the values naivly in some points (for small numbers! Like 0,+-1, -2,+inf), multiply them and then interpolate it back to polynomial form.
"2 way" toom-cook recreates Karatsuba. The original "3 way" and "for way" have the complexity O(N^1.465) and O(N^1.404). The GMP library (a hefty library for big numbers) uses naive, Karatsuba, "3","4","6.5" and "8.5-way" toom-cook, and fft, using each algorithm for numbers of different lengths.
uhm waht? :sweat_smile:
It's amazing how a simple problem like multiplication can devolve into such complex mathematical discoveries. Who would have thought that multiplying optimally is insanely more difficult than adding.
Would you really not expect multiplication, which is basically an extension of addition, to be harder to optimize, than its more basic "counterpart"?
@@SnakeTwix Yes, but not that much harder.
I set out, one afternoon, to write a large number library, just for my own edification. When I got to division, I realized I didn't actually know how to program a computer to divide, other than using built in divide. Sometimes simple things are not as simple as they seem :)
@@michaelbauers8800 just use Euclid's algorithm for integers.
@@AntoineViallonDevelloper Euclid's algorithm is an efficient method for computing the greatest common divisor (GCD) of two integers (numbers) but it uses division itself, so isn't useful to Michael.
Even if the lower bound is Ω(N × log N), there is still mathematical progress to be made (or disproven) in finding an algorithm which is that efficient with smaller and smaller inputs.
Look-up table!😜
@@icollectstories5702 O(1) multiplication?
@@icollectstories5702 it's fast but not memory efficient.
@@diamondcreeper0982 You always have the trade-off between speed and memory and as it currently goes, memory is cheap.
@@Ruhrpottpatriot although memory is cheap it's not available in everything.
for example if we wanted to use this method we would run out of memory in an Arduino quickly, but i do agree that if we have the memory to spare then this would be the fastest solution.
The legend is back
Kolmogorov also advanced the study of fluid flow turbulence so much that they named a constant after him and still refer to his work to this day!
Kolmogorov have done a lot for statistics and random processes
Fun fact: When computers are multiplying whole numbers, the compiler will often optimize the code, so it doubles (or halves) the number one or more times (which is a single operation in the computer, known as bit shifting) and then add or subtract a konstant to achieve the result.
So a code of x * 9 (which is (x * 8) + 1) would be compiled as an equivalent to (x
very interesting, do all compilers do this? is there a way to force this specific method if i notice the compiler isnt doing it? Also, i think you meant to write (x * (8+1)) or even more descriptive, (x * (2^3+1))
(x * 8) + x
@@hoane6777 In general, no, you can't force a compiler to do something not specified by the language. You have to write the code the way you want it, and that's almost always a bad idea for maintainability. Also, if you're working with numbers big enough for calculation speed to matter, the compiler won't know to optimize anything, because the calculations will span multiple variables of the largest builtin type (e.g. 64 bits). If you're working on cryptographic code, you need to worry about how the calculations are performed for more than just speed. If a calculation takes a different number of steps depending on the value of a key, for example, that weakness can be used by attackers to retrieve the value of the key.
I was astonished to learn that my C64 didn‘t have a Mult opcode, and to multiply any number by for example ten, it had to multiply by eight (shift three times) and then add the input value two times. What a hassle!
@@DasHemdchen After the shift left 3 bits, it didn't shift the original left 1 bit to get a double to add to the eight to arrive at ten? Weird... or shift left 1 for double, then shift that result 2 to get 8x.
If you've come across the binary exponentiation algorithm (square and multiply), you could use successive bit shifting and addition to arrive at multiplication for arbitrary numbers. As a bonus, if you want the answer mod some n, you can just subtract n after each single bit shift left that yields n or greater, instead of programming division.
17:20 note that also loglogN is practically constant like the k^log*(n) since loglog(N) where N is the numbers of atoms in the observable universe is around 8. If N is the number of atoms in the observable universe then loglogN is actually smaller than 4^log*(N).
What base do you use for the log?
@@daldi5211 2
@@daldi5211 in practice in IT we would use base 2 as we can approximate it by counting bits from the ones bit (which we count as zero) up to the largest bit with a value of 1.
However there is a theorem that states that to change a log from one base to another we can multiply by a constant that depends only on the two bases. And we know from earlier that we can ignore constant multipliers.
So you can apply this rule in any base you like and it still works.
it's not a theorem, it's just a simple property deduced from the definition of the logarithm :)
Fuck, I actually checked it for log(log(10^82)) and it truly does round to 8. Granted I did it in my mind, but it does check out.
14:47 That was honest of Kolmogorov. I have met a few people in my career who would pretend to have done work which was actually done by someone else. They would then take the credit for the other person's work.
yeah you know, people learned the lesson after all the Leibniz/Newton kerfuffle over calculus
I loved fast inverse square root and finally you've released some more videos! Makes my day. Take however long you want, they're worth it.
Hahaha I loved your inverse FFT notation. Well played, well played.
Wonderful video :) I am writing an Algorithms exam next week and wanted to take a break from learning but ended up learning about the algorithm more than in my lecture and in a more exciting and relaxing way. Thank you for this masterpiece and wonderful editing!
9:30 and 16:00 I think it would've been better if you used actual numbers and showed a practical example of the calculation instead of empty digit boxes/partially filled circle shapes, it would be easier to keep track on and follow what you're talking about. Since the video started with practical examples for the easier algorithms I also was expecting practical examples for the more complicated algorithms. Having to follow where you put which blank box or which abstract circle is filled by how much and trying to find out why you gave the circles these fill values while at the same time also trying to listen to what you are saying is rather irritating.
I had the same thoughts
I'm glad someone else pointed this out. I got lost and felt that if I just had a real example to go off of it'd be much easier to follow
Click on his channel. The second video does exactly that
@@DrDeuteron If you're _not_ bothered by getting lost, I'd say that's a sign of complacency.
@@DrDeuteron just admit you don't know wtf is going on and move on lol
I have to say, I can't even remember how many times I have learned Big O notation already, but it's the first time in my life I heard about Linear speedup theorem.
Like, it suddenly explained everything. I suddenly understand why the linear magnitude does not matter.
The Quality and The Content is top notch! Thanks for sharing!
Content like this is why I still pay my internet bill. Thoughtfully presented, beautifully explained, and utterly fascinating even to a cynical math-o-phobe like me. Eighteen minutes well spent. I look forward to future content as a new subscriber. Bravo!
I'm not into computer science or even math, yet still here to watch video until finished.
I have just discovered this channel and the animations and the gradients are so beautiful, the content, so mesmerising, that I instantly subscribed.
Thank you.
I have seen fast multiplication on Commodore 64 (6502 processor without a built-in multiplier) based on a similar idea. a*b = ( (a+b)/2 )^2 - ( (a-b)/2 )^2. For all possible values of a+b and a-b, the square of a half is precalculated in a table; so for 8-bit numbers, 512 precalculated table entries are needed. This is easily a few times faster than trivial multiplication.
Wow. Interesting.
This story about Kolmogorov and Karatsuba should be made into a film so that more people know it
I'm glad I clicked on this video, the thumbnail made it look like it was going to be a dumb math method that still works but overcomplicates things/does the exact same thing the normal method does but displayed slightly differently but clickbaited as "a new faster way to do math" but it turned out to actually be more efficient (as the length of the numbers increases)
what an incredible video, you have a real talent for getting at the core of these ideas and showcasing the clear arguments which easily get muddled by technicalities
This is such an excellent lecture and presentation, you make a great professor! I'm saying this as a computer scientist. It was educating and entertaining at the same time.
Kolmogorov was a genius, albeit not infallible. We are standing on the shoulders of giants.
Thanks for educating us! I learned new stuff from this video.
I really enjoyed this, good to see more coming from this channel. Excitedly looking forward for more!
i've never seen a youtube account with 3 videos that makes such high quality videos. seriously well done man
I'm super excited to watch this later when I'm done with work. Thank you for the amazing content!
He was so impressed with this young kid's genius that he did all the work for him as a gift.
1:01 In Russian it is better to use the word "сложение" instead of "дополнение" to denote addition.
"дополнение" in Russian means complement (set theory)
Oh Jesus... thanks for the input though
@@Nemean*Oh Lenin...
@@AffidavidDonda ?? As if Lenin is at all praiseworthy?? I’m sure his black charcoal of a heart is still providing fuel for the fires of “oh hell” tho…. It’s for the despicable evil, deliberately propagated like deadly contagions still infecting the minds the of the vulnerable, mentally weak, and those victims with “compromised intellectual immunity” who had their natural defenses of logic, reason, and objective observation castrated by atrophy, shriveled and withered like undesirable testicles on the proverbial farm hog, resulting from the constrictive rubber bands of indoctrination posing as education by Marxist operatives posing as teachers, all susceptible and succumbing to the mental viruses created and propagated by Marx, Lenin, and the rest of the monsters of yesterday and today, that cause lapses in my Agnosticism to pray that there is a heaven for some and a well deserved hell for others.
@@muchhustle4982 Thanks for that copypasta my dude! Haven't seen that one before
Glad to have you back! Some of the highest quality content on the platform
I really like that the best multiplication algorithm uses the Ramanujan-Hardy number.
My guess is there would be a square root symbol in it somewhere…
Hi YellowBunny!
Hi Estepario
I love this subject, mainly because it is both quite recent and revolutionary, in a way, as well as rather easily understood by a teenager. Every now and again I talk about it to my students, and it is usually well received.
Glad I had my notifications on, Welcome back! Thanks for yet another informative video that is surprisingly easy to understand :DD
i kind of understood this. thank you for this video. I often come back to your first video when i need inspiration how to change way of thinking when i search for answer.
Thanks for explaining this. I vaguely remember running into this algorithm, but discarded it because it recursed without really reducing complexity. After watching your explanation, I realized that if I restrict the recursion depth, I might get something usable.
Excellent discussion, thank you. Also what a mensch Kolmogorov is. Good to see, and thanks for telling us about it.
This is FIRE! What a spectacular journey. This is how it should be taught. Can't wait for more videos from you!
Not for everyone. I found the presentation confusing.
Imagine teaching this to 9 year olds.
I'm so happy that you've made another video, this makes my hyped to learn again. It's great motivation!
Amazingly beautiful review and info! Thanks for making this! All the best
Thanks for making computational mathematics accessible. The last summary is pure gold.
14:13 You say "It's ONLY application is cryptography" as if that's a weakness or flaw. The fact that it can offer real optimization on modern hardware for a certain subset of applications is awesome and amazing and valuable. Also, even if it had NO practical applications, it's still awesome because it disproved a prevailing theory. Even if it didn't do that, it demonstrated a new kind of algorithm. It did THREE amazing things. And you say, "I dunnno" :D
The cadence and tone of your voice is very pleasant to listen to. It reminds me of the JCS channel.
Thank you very much for teaching me with such detailed and illustrative information.
I feel like when you were talking about big O, there were some big aspects you missed. In particular, one of the big reasons big O is important is that it better measures how an algorithm scales to extremely large inputs.
While the big O might not be able to tell you an exact runtime, it can tell you how that runtime changes when you change the input. For example, for an O(n) algorithm, doubling the size of the input make it take twice as long as before, while an O(n^2) algorithm will take 4 times as long, and an O(log n) algorithm will only take a constant amount of time more. The ways that different algorithms scale tends to be more important than any constant factors when n is extremely large.
For example, the runtime of an O(n) algorithm might be like 10n, while an O(n^2) algorithm might be n^2/10; with small n, the O(n) algorithm is slower due to the high overhead (for n=4, the first algorithm is 40 while the second is 1.6), but as n increases, the difference in powers rapidly overcomes the constant factors (for n=10,000, it’s 100,000 versus 10,000,000, so the first is a hundred times faster). That’s why we talk about big O in algorithms; when the input is big enough that runtime is a concern, that’s what gives you a real idea of the runtime.
Wow this makes waaaay more sense
An O(log n) algorithm isn't constant, but would be proportional to natural logarithm, O (n log n) is more feasible as just processing the input is O(n).
@@RobBCactive I wasn’t saying log n was constant time; I was saying for a log n algorithm, doubling the input length would increase the time by a constant amount, since log 2n = log n + log 2.
@@KnakuanaRka No you specifically said, "and an O(log n) algorithm will only take a constant amount of time more", read your post adding log 2 or log 3 to log n for factors of N is NOT a constant
@@RobBCactive Well, if log n is the runtime for input of length N, adding log 2 for the runtime of 2N is effectively a constant amount more (since it doesn’t depend on N); what’s the problem?
You said that this algorith is not "really recomended" for "popular" use, but coincidentally, last week I saw I probability class on youtube where the teacher tells an "easy way" to multiply 2 digits number by 11 without previously memorizing its results: whatever 2 digits number multiplied by 11 results in a 3 digit number which first and last digits are the same original first and last digit, and the digit in-between is obtain by adding them, really simple (and if the addition has a carrier you just add it to the first one - here reading from left to right)... seen your video I have realized why works and also that can be extended to every 2x2 digit multiplication.... I recomend you edit the video and incorporate this "eleven case example" because of its simplicity (since you always multiply digits by "one", and maybe another case as counterexample), but this "11 rule" is just amazing. Try it by yourself.
i thought this account exists just to post one vid and nothing else, nice to see a new upload!
That was a joy to watch. Thank you!
I believe Karatsuba's algorithm is used in quantum computing as the current fastest/most efficient method of multiplication.
"I claim", kudos for knowing what's right and saying what's obvious, and moving all of us along the path of learning.
This algorithm is used in Java's implementation of BigDecimals (or BigIntegers ?) for very big numbers.
BigInt yes
Also python and possibly javascript
Nice video! Really liked the smooth animation
In the time when Kolmogorov was at the age of Karatsuba (when they met) there was no Fast Fourier Transform, but on the other hand Parseval theorem was already stated in the 18th century - kids read the books and study them ! :)
My goodness the explanation and visuals are amazing! Glad I came across this channel.
Another reason computer scientists only examine order of algorithm time is because of asymptotic behavior. If an algorithm has a lower order than another one, then there is a sufficiently large problem for which it will run faster. So no matter how good your linear constants, eventually a slow (high order) algorithm would lose. Also, as we progress socially usually we seek to solve larger problems, so it merits investigating orders.
On the other hand, precise run times are not useless. But you always assume a machine model. As you noted, if your machine model is not super precise (like the 64-bit vs 1-bit addition), then your results may be useless in practice. The specificity can be so large that it's not worth it to tune something that no one will ever use -- it makes more sense to just do that when you have a concrete machine and important problem to solve. That's essentially software engineering/computer programming -- it's important, but it's specific. In mathematics and academia the idea is usually to generalize and provide lasting value, versus engineering is solving specific problems (which you can publish as well in the hope it will be useful to similar machines).
One interesting case of very useful precise run times is in analyzing digital circuits. Almost every real world implementation of digital circuits uses similar implementation (binary gates with minor differences in area). So I believe there is detailed study of how many digital gates you need to achieve some operation (very interesting and important field).
Also, Andrej Kolmogorov's kindness is just lovely and legendary.
Just loved the video man! awesome it was.
Omg he published again!!!! The god returned yeeeesss
Your content is so high quality, can't emphasize this enough.
How do you know? You commented 2 minutes after the video got published, there's no way you have watched it all yet. Maybe my video sucks.
@@Nemean it slapped bro
@@Nemean I just knew from the previous one (Quake Inverse Sqrt Algorithm), and damn this video was really great. It had some really unexplainable feel at the end (all the multiplication-algorithms and their runtime).
It was super informative and very interesting in fact.
👍👍👍👍👍👍👍👍👍👍👍👍👍
@@notbob9865 Thanks :)
@@Nemean 10x playback speed
Really glad you're back! Can't wait to see more of your content.
Just a heads up - there's a terminology mistake at 1:00 . "Addition" should be translated as "Сложение" in Russian, while "Дополнение" in English would be "Complement" term from Set Theory.
1:01 the correct word you need is not "дополнение", it's "сложение". The first translation would rather work outside of mathematics.
Absolutely fascinating, please keep them coming.
Thank you for showing me how the computer works using algorithms.
Great video! And as a Russian-speaking person I want to notice that mathematical operation "addition" is called "сложение" not "дополнение". The term's you used meaning is "a minor member of a sentence, usually expressed as a noun". Best Regards!
дополнение means "addition" in the sense of supplement or expansion (i.e., it would be used in sentences like "the addition of a new terminal to the airport", or "with added vitamins").
We learned the Karatsuba Method last week in Numerics, so this was an interesting new take on the way we learned it
Fascinating algorithm and historical context. Thanks for sharing this and for explaining it so lucidly.
For those who aren't old enough to remember the old days of computing, one of the reasons multiplication was of such interest is that early CPUs did not have a multiply instruction in hardware. They relied on repeated addition, so if you wanted 58 * 37 it was computed as 58 + 58 + 58 ..... (37 times total), or vice-versa. I'm not sure if the first computers even had the hardware smarts to swap the numbers so they added the larger number a smaller number of times. Repeated addition is often even slower than the O(N**2) elementary school algorithm, so computer scientists were eager for anything that could improve upon that.
Also for the non-computer folks, Nemean makes the comment that subtraction is essentially the same problem as addition. You know from grade school that subtracting N is the same as adding -N, of course, but it might occur to you that -N is defined as -1 * N, which seems to imply a hidden multiplication step. Fortunately, since computers work in binary, we avoid that by using the "twos complement". In binary, this means flip every bit of the original number, which gives you the "ones complement", then add one. Adding the twos complement of N to another number, say M, is the same as computing M - N.
Here's an example using 8-bit integers, a common size for early CPUs, to compute 100 - 35. 100 is 64 + 32 + 4, or 01100100 binary. 35 is 32 + 3, or 00100011 binary. Take the ones complement of 00100011 to get 11011100, then add one for the twos complement of 11011101. Adding 01100100 to 11011101 gives (1)01000001. The parentheses are around the carry bit, which in this situation we ignore (see note below). 01000001 is 64 + 1, or 65 decimal, the answer we expect.
Even in very early computers, the operations to invert every bit (ones complement) and to add one (increment) were single hardware instructions, so the twos complement took at most two steps (and some CPUs had a single instruction to combine them). So subtraction, even on an early CPU with no subtract instruction, was not significantly more difficult than addition.
The use of twos complement binary arithmetic does imply a need to keep track of that leftmost bit and being aware of whether it is being used as a sign (1 for negative, 0 for positive) or simply as another binary digit. Programmers can define "signed integers" which cut the value's range in half but allow negative numbers, or "unsigned integers" which allow the full range but cannot be less than zero. For instance, a 16-bit unsigned integer can be 0 to 65535, inclusive, while a 16-bit signed integer can instead be -32768 to +32767, inclusive. The CPU hardware, generally, handles the raw bits the same, but the programming language and compiler help the programmer avoid misinterpreting the data.
I hope this side-trip into computer history and binary math is useful to readers who aren't computer specialists.
Great comment
Very helpful 👍🙂
wait, didn't booths multiplication algorithm exist back then? I'm surprised they used repeated addition
To give an example, the Mostek / Rockwell 6502 (of Apple II fame) had add and subtract but no multiply or divide instructions, but the Motorola 68000 (Macintosh) had them. These chips hit the market only 4 years apart.
So subtracting, is adding a negative number: 8 - 2 = 8 + (-2)
And to add a negative number you subtract its absolute value? 8 + (-2) = 8 - |-2| ...
... *= 8 - 2 = 8 + (-2) = 8 - |-2| = 8 - 2 =* ... forever..
Awesome video so far. 48 seconds in and started with a verse, had some code in there... Kool stuff!
Since your fast SQRT video I was waiting until your next lauch. This video gave me a thousand goosebumps, incredible! Good job
Multiplying binary numbers is great.
When bitshifting one number, it becomes ×2 or /2 depending on the direction.
It goes like this:
loop this until number2=0
{
number2 bitshift right (/2)
if theres a carry → add number1 to the sum
number1 bitshift left (×2)
}
for example
(n1)1001 × (n2)101
iteration 0:
(n2)101>> → (n2)10.1 (carry) → add (n1)1001 to sum → sum = 1001
(n1)1001> → (n2)1.0 ( no carry)
(n1)10010> → (n2)0.1 (carry) → add (n1)100100 to sum → sum = 100100 + 1001 = 101101
(n1)100100
This whole video is incredibly interesting and explains lots of things very well, but I am laughing so hard at 17:00 . The deadpan delivery of that line “log star of the number of atoms in the universe... is five.”
I used to play around with numbers when I was bored, writing them down on lined paper.
I found an easy way to multiply 2 digit numbers in my head. I'm not sure what it's called, I haven't really looked it up. So i'll just put a 2 digit fact on an try and explain the operation.
34
x 26
--------
Step 1. 3 x 2 = 600
Step 2. 2 x 4 = 80
Step 3. 6 x 3 = 180
Step 4. 6 x 4 = 24
Add sums = 884
So step 1: I multiply the first digits of both factors, the first step Is always counted as hundreds. 3 x 2 as 600
In steps 2-3 I count in tens, really I just add a zero place.
ex: (2 x 4 = 80) 3 x 4 = 120 - 12(0)
ex: (6 x 3 = 180) 3 x 3 = 90 - 9(0)
In step 4 I keep it as it should be 6 x 4 = 24. Now in this example question I added everything at the end. If I were actually calculating in my head I would add as I go.
ex: 600 - 680 - 860 - 884
This is the order I do it, going through steps 1-4
3 x 2, 2 x 4, 6 x 3, then 6 x 4
Let me know what this is called, I'd like to look it up.
I'm not sure what the real name is, but I just call it the algebraic method. It's essentially the same as the "elementary school multiplication" algorithm in the video. It's just way easier to do with 2-digit numbers than 3-digits (you need to do 4 1-digit multiplications instead of 9... N² steps, see?).
Fun fact, the American Common Core curriculum that everybody loved to hate back in 2014 teaches this method.
thats just using the distributive property of multiplication
This video is truly amazing, it mixes the beauty of computer science and math
Even if this video doesn't blow up it is still amazing content, thanks!
If you do the FFT in _modular_ arithmetic (which uses only integers) you get multiplication with no rounding error because you don't need to worry about floating-point arithmetic. The algorithm is the same.
I think Bunimovs optimization of Montgomerys algorithm is another beautiful algorithm to compute products using modular arithmetic in finite sets.
we learn more in this video ...thnx 4 posting .
Hmmm…reminds me of another algorithm dealing with linear programming (LP). LP is theoretically a N^x steps where x is the number variables. There is a Russian algorithm that has O(N) steps, but the slope of T (time) is so steep it might as well be a quadratic equation. I wrote a paper on this 40 years ago for one of CS Master’s classes after reading about it in a programming journal. The math was so obscure (or maybe the Russian was so obscure) that I had to go back to the original paper to get the algorithm correct. It was a fun project, as I was really interested in linear programming at the time. Seems I fell in love with CAD, though.
18:00 - For smaller numbers...
Lololllol 🤣
I love your delivery. Pure gold.
Really cool seeing that discoveries in mathematics are still being done to this day :)
discoveries in math are being done every day - mathematics is so much more than just arithmetic!
I feel like I am missing something here, shouldn't FFT REQUIRE multiplication to exist in order to actually do FFT? how can you do FFT without a previously defined multiplication, pretty sure you can't bootstrap an algorithm like this, unless you do recursion or something until you get to single bit numbers?
Because it sounds like you know what you're talking about, you get the long answer. It has to do with the log log N factor. Schönhage and Strassen found out that, from the complex numbers that you get out of the FFT, you only need to keep the first log N bits. That means that multiplying complex numbers needs (log N)^2 steps, if you multiply naively, (log N)^1.6 steps, if you use Karatsuba, or, as you and Schönhage and Strassen found out, you just use FFT again. Since the numbers now have log N bits instead of N, the FFT needs log N * log(log N) steps. Furthermore, it in turn needs to multiply numbers with log log N bits, so you use FFT again. Doing this recursively gives you a runtime of N * log N * log log N * log log log N * ... and so on. What Schönhage and Strassen published in their paper is how to remove all the factors with 3 or more logs. That's how they got N * log N * log log N.
@@Nemean Wow thanks really informative
Omg, Kolmogorov is the real MVP! So impressed that he actually went so far for a student rather than claiming all the credit for himself!
I once read that IBM used Karatsuba's algorithm to speed up multiplication in and exclusively in the IBM 360 model 91.
That was a fun video! (said by a Theoretical Computer Scientist / Mathematician)
Legend ,
Pls do more of these type.
Excellent video!!
This was super interesting thanks for uploading, I will be watching you form now on
Epic video, товарищ! :)
spatsiba or whatever
Its also good to mention that, when we get practical, if your input is smaller than k, then k*N is slower than N².
So if you have an algorithm that could be 20*N or N² but your input is *always* smaller than 20, then N² is actually better.
Your result is right, but the reasoning have a mistake: when the video says that "constant" does not matter, its still means you have to compare them "equally" weighted, so you have to compare on your example how N grows compared to N^2, or how 20*N grows compared to 20*N^2 (same result in the abstraction "order" of calculation). But anyway, it is also true that can happen that one algorith works better that other depending on the "size N" of digits to be done, and also, some algorithms which gives an aproximated result could be useful if you need speed instead of precission, because no engineer will really care if the result ends in 6 or 7 when the result have 14 digits before much more significant in his budget, as example, calculating big factorials numbers N! with N large, is used the Stirling's approximation (easily founded on wikipedia if you are interested).
@@whatitmeans I understand, but I meant in terms of implementation. Like the amount of instructions per iteration, for example, and the time cost of each instruction. Sometimes you can even consider the probability of the input being small or big, so when you have to compute several inputs the average is better.
@@victorlaurent37 completely valid argument. That is exactly why there is people studing it: can be improve? How? Aproximations? Decompositions?. Generally speaking, efforts in implementing such things are done because beforehand you know you are going to need such kind of algorithms (scientist and mathematicians mainly - or just simply by scientific curiosity). Since standard binary multiplication algotithms are already implementated and optimized on the libraries of every calculation software you dont really need to think about that, but when needed, have these tools already designed in your libraries is a big relief (ask any big data analyst or programmer "how many times has been saved because something were done before by someone and left it on StackExchange?, maybe 99% of the times hahaha). As an example, astronomy images of the sky have so much resolution, than sometimes individual pictures have to be saved on more than one drive: imagine how much time processing them will require? These kind of "heavy algorithms" have a lot of sense on these situations.
that's right, but the video talks about O(n logn). O(any algorithm) denotes the worst case, it's called big Oh. Since we generally do not know the input, comparing the worst case of algorithms makes sense
When some individuals multiply 6 digits numbers fast mentally, do they use any of those algorithms intuitively? Is something known about the phenomenon from that point of view nowadays? I know (from discussions with my friend mathematician), that usually those individuals are lacking logic till the extend that they can't do anything in the area of mathematics. Not always it is the case though. And probably the brightest example of combination of both skills (means lightning mental computations ability and logic) would be the famous mathematician John von Newmann. His mental computational abilities were such that people who had a lucky opportunity to communicate with him had impression that they are dealing with an extraterrestrial (he could add up series mentally and everything alike) and not a human. Thank you very much for this great film.
I'm just discovering this channel now. Nice stuff!
karatauba's way is similar to the vedic system of multypling... and another way to broke down the multiplication is by stressen's way of multypling matrixes.. but i really like the fft part..
But Indian "intellectuals" mock vedic maths as "a bag of tricks" and not "actual" maths. They borderline call it quackery.
This notion has bothered me for a long time, especially as a software engineer. The entire field of computer science excelled because of various "bags of tricks" we know as algorithms.
Even before that in mathematical analysis, there have been various bags of tricks too, be it newton raphson methid for approximating roots, or chinese remainder theorem for solving congruent modulus equations.
I wonder why one kind of bag of tricks is mocked while the other is regarded as actual scientific knowledge.
I presume It is mostly mocked because a Hindu saint of the Shanakarcharya lineage came up with it and he attributed it to the vedas.
I loved this video! Inmediatly suscribed!
The moment i saw 1729 i screamed Ramanujan!!.... That's the smallest positive integer that can be written as the sum of 2 cubes in 2 different ways!!
12³ + 1³ = 1729 = 10³ + 9³
Now I don't know about why it showed up here but i was just excited by the observation :D
hey wait a minute, this is the same guy on quake fast sqrt. glad you're back with some bangers
Can we all agree that this person should make more videos? Explaining complex computer science and mathematics concepts in simpler terms is something we all need in our lives.
I really love your videos so far, clear and somewhat concise
Really instructive, thanks. I hope you make more
Thank you ... great explanation ... interesting history ...
Considering multiplication requires each digit of one number to operate with each digit of the other, I wonder if a theoretical sub- n log n algorithm would be able to break the n log n limit of comparison sorts
I looked at this video on a random whim and I'm glad i did! Very well explained video on a topic that can be difficult to follow.
As others have mentioned, practical examples may have worked better than the colored blocks used, as this would allow the audience to follow along in an easier fashion.
Thanks!
You're a Good Maths Teacher! Thank You So Much Teacher!